Tutorial :Performance of inner join compared to cross join



Question:

The effect of issuing an inner join is the same as stating a cross join with the join condition in the WHERE-clause. I noticed that many people in my company use cross joins, where I would use inner joins. I didn't notice any significant performance gain after changing some of these queries and was wondering if it was just a coincidence or if the DBMS optimizes such issues transparently (MySql in our case). And here a concrete example for discussion:

SELECT User.*  FROM User, Address  WHERE User.addressId = Address.id;    SELECT User.*  FROM User  INNER JOIN Address ON (User.addressId = Address.id);  


Solution:1

Cross Joins produce results that consist of every combination of rows from two or more tables. That means if table A has 6 rows and table B has 3 rows, a cross join will result in 18 rows. There is no relationship established between the two tables â€" you literally just produce every possible combination.

With an inner join, column values from one row of a table are combined with column values from another row of another (or the same) table to form a single row of data.

If a WHERE clause is added to a cross join, it behaves as an inner join as the WHERE imposes a limiting factor.

As long as your queries abide by common sense and vendor specific performance guidelines, I like to think of the decision on which type of join to use to be a simple matter of taste.


Solution:2

There is no difference other than the inner join is a lot clearer because it defines the join, leaving the where clause to be the actual limiting condition.


Solution:3

Use EXPLAIN to view the query plan for both queries, and see if there's any difference. Quite possibly MySQL will use the same execution plan in both cases. I use the INNER JOIN syntax mainly because it's a lot clearer.


Solution:4

I find that work-places that allow the first syntax (comma separated tables) tend to have significant time taken up debugging cases where more rows are returned than intended. Unintentional cross joins are the bane of a system, and can bring even the most well-tuned database to it's knees. It has brought our pre-prod system to a screeching halt on at least two occasions in the last year.

The second syntax (join syntax) forces the writer to think about how the tables are joined together first, and then only return the interesting rows. It is impossible to accidentally do a cross join using this syntax, and thus the danger of accidental poorly performing queries is reduced.

However, that issue aside, I have never noticed any speed difference between the two syntaxes in any systems I have had.


Solution:5

The first example is functionally the same as the second example. However, this syntax should be avoided for several reasons. First it is much easier to accidentally get a cross join when using this syntax especially when there are mulitple joins in the table. If you see a lot of this type of query with the keyword distinct, you probably have someone who is trying to fix the cross joins.

Next, the left and right join syntax using the older style is deprecated and will no longer be supported. Further, it doesn't work correctly now anyway. Sometimes it misinterprets the outer join and sends back the wrong results set. So any queries you have using = or = in the where clause should immediately be replaced.

Third, ANSI standard joins are easier to understand and maintain. An understanding of joins is one of the most critical basic skills that anyone querying any relational database needs to have. It has been my experience that some people who use the older style don't really understand joins and how they work and thus write queries that do not actually do what they intended.


Solution:6

The order in which you join tables or you put your ON / WHERE conditions should not matter.

The query optimizer should optimize and use the best order anyway (and chosing how to best filter the data, where to start, etc)

As many others though, I suggest using the INNER JOIN syntax, since it makes things much more readable, it is more transparent with the syntax of LEFT or FULL joins as well.

There's a somewhat more dwelling text about it here: http://linus.brimstedt.se/?/article/articleview/SQL Syntax

/B


Solution:7

One additional benefit of the first syntax is you can be more general in your limiting condition. Not just equality.

But if you are using equality, why trust the optimizer? Make sure it won't first generate the cross join and then eliminate rows. Use the second one.


Solution:8

SQL Server said "When a WHERE turns a Cross Join into an Inner Join", so there are not difference. http://msdn.microsoft.com/en-us/library/ms190690.aspx

I did SQL server "Execution plan" the Performance is same.


Solution:9

Explaining both queries gives same output

mysql> explain select * from t T1, t T2 where T1.ID=T2.ID;  +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                          |  +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  |  1 | SIMPLE      | T1    | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 |                                |  |  1 | SIMPLE      | T2    | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 | Using where; Using join buffer |  +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  2 rows in set (0.00 sec)    mysql> explain select * from t T1  join t T2 on T1.ID=T2.ID;  +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                          |  +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  |  1 | SIMPLE      | T1    | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 |                                |  |  1 | SIMPLE      | T2    | ALL  | PRIMARY       | NULL | NULL    | NULL |    3 | Using where; Using join buffer |  +----+-------------+-------+------+---------------+------+---------+------+------+--------------------------------+  2 rows in set (0.00 sec)  

But using inner join syntax is preferable as its more clearer and more precise. Mysql may internally tune Left and Right join queries to select less data as compared to Cross Join.


Solution:10

Since the beginning of time optimizers have being built around classic restrict-project-cartesian product syntax. Virtually all the vendors copied the design pioneered by System R. Then, grudgingly, vendors adopted "the latest-and-greatest" ANSI syntax and retrofitted their SQL execution engines. Contrary to what marketing brochure can tell you ("use the latest syntax"), not much on physical implementation level has been changed: it is still [indexed] nested loops, or hash or sort-merge join. Therefore, there is no basis to assume superiority of one syntax over the other.

To my personal taste, the new syntax is redundant, noisy, and inconsistent. As to being sanctioned by the committee, "walk into any park in every city and you'll find no statue of committee".


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »