Tutorial :SQL Query Theory Question



Question:

I have a large historical transaction table (15-20 million rows MANY columns) and a table with one row one column. The table with one row contains a date (last processing date) which will be used to pull the data in the trasaction table ('process_date').

Question: Should I inner join the 'process_date' table to the transaction table or the transaction table to the 'process_date' table?


Solution:1

This is how I would do it

SELECT <<list only columns you need>>   FROM large_historical_transaction_table t  WHERE EXISTS (SELECT 1 FROM OneRowTable o                 WHERE o.last_processing_date = t.process_date)  


Solution:2

An Inner join is a symmetrical, bi-directional relationship, in general it doesn't matter, but in this case I would suggest not joining at all, read the threshold date into a variable and pass it to the other select query as a parameter...


Solution:3

For readability I would inner join from the transaction table to explicitly indicate that the second table with the date just acts like a filter.


Solution:4

When joining tables, the query optimizer takes a quick sniff of both to determine the most appropriate join implementation. Logically the inner join is symmetric, but the implementation may favor one side over the other for improved performance.


Solution:5

A cross join will do it:

SELECT t.col1, t.col2, p.process_date  FROM Transactions t, Process p;  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »