Tutorial :SQL Query Theory Question


I have a large historical transaction table (15-20 million rows MANY columns) and a table with one row one column. The table with one row contains a date (last processing date) which will be used to pull the data in the trasaction table ('process_date').

Question: Should I inner join the 'process_date' table to the transaction table or the transaction table to the 'process_date' table?


This is how I would do it

SELECT <<list only columns you need>>   FROM large_historical_transaction_table t  WHERE EXISTS (SELECT 1 FROM OneRowTable o                 WHERE o.last_processing_date = t.process_date)  


An Inner join is a symmetrical, bi-directional relationship, in general it doesn't matter, but in this case I would suggest not joining at all, read the threshold date into a variable and pass it to the other select query as a parameter...


For readability I would inner join from the transaction table to explicitly indicate that the second table with the date just acts like a filter.


When joining tables, the query optimizer takes a quick sniff of both to determine the most appropriate join implementation. Logically the inner join is symmetric, but the implementation may favor one side over the other for improved performance.


A cross join will do it:

SELECT t.col1, t.col2, p.process_date  FROM Transactions t, Process p;  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »