Inner Join and Cross Join yielding the same result - mysql

use classicmodels;
select Orders.OrderNumber,
Customers.CustomerName, Orders.Status, orders.shippeddate,
Customers.Country
from Customers **cross join/inner join** Orders
on Customers.CustomerNumber = Orders.customerNumber
order by 1 asc
Hi all, I'm really confused as to why inner join in my query isn't really any different from the result of the cross join? I thought that cross join would result of a Cartesian product but both joins are giving me 326 rows. I've also seen somewhere that I shouldn't use non-unique data?

From the MySQL JOIN docs:
In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents
(they can replace each other). In standard SQL, they are not
equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used
otherwise.

Related

ON clause in SQL join-does order matter?

Are these two same?
- SELECT p.name,o.order_time FROM ORDERS o INNER JOIN PRODUCTS p on **o.product_id=p.id**;
- SELECT p.name,o.order_time FROM ORDERS o INNER JOIN PRODUCTS p on **p.id=o.product_id**;
And are these two the same thing:
- SELECT p.name,o.order_time FROM ORDERS o LEFT JOIN PRODUCTS p on **o.product_id=p.id;**
- SELECT p.name,o.order_time FROM ORDERS o LEFT JOIN PRODUCTS p on **p.id=o.product_id;**
Yes, first 2 queries are exactly same in terms of performance and result. In addition to that, first 2 queries is same as below query,
SELECT p.name,o.order_time FROM PRODUCTS p INNER JOIN ORDERS o on p.id=o.product_id;
Which in turn is same as,
SELECT p.name,o.order_time FROM PRODUCTS p INNER JOIN ORDERS o on o.product_id=p.id;
Note: the order of inner join tables is swapped as well.
3rd and 4th are same as well in terms of performance and result. You can't swap the order of table here, unless dataset guarantees same result.
Based on the dataset, first 2 queries can yield same or different result than last 2 queries.
Reason is = is commutative, it means a=b is same as b=a. Extending it further, just INNER JOIN is also commutative.

What is the difference between JOIN and simple SELECT in MySQL?

I have two mySQL statements. First is:
SELECT o.OrderID, c.CustomerName, o.OrderDate
FROM Customers AS c, Orders AS o
WHERE c.CustomerID=o.CustomerID;
The second is:
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Both produce the same result, but second doesn't contain reference on Customers table in FROM request.
My question is - what is the difference between these two sql statements? In which cases should I use JOIN and in which cases should I use simple SELECT from two tables?
They are the same except the second is easier to read, so you should use that one.
Those JOIN are different, although the result are the same.
The First one is CROSS JOIN and adds the condition in where, which is implicit CROSS JOIN
The second one is INNER JOIN
If you want to connect two tables I would use INNER JOIN instead of CROSS JOIN Because the intention of the inner join table is clearer

is it possible to UNION and INNER JOIN for sql

im using the w3schools sql database to learn and i have come across a question.
is it possible to sort the results according to CustomerName after i have done a INNER JOIN?
this is the INNER JOIN statement
SELECT Orders.OrderID, Customers.CustomerName, Customers.ContactName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
The UNION command in SQL is used to combine the results of your first query with the entire results of a second query that has the same columns. It is not used for ordering the first query. With your INNER JOIN you can sort the results just by adding ORDER BY:
SELECT Orders.OrderID, Customers.CustomerName, Customers.ContactName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID
ORDER BY Customers.CustomerName;

What is the difference between these 2 JOINs?

I have the following 2 SQL SELECT statements and am not able to wrap my head around how they're different:
SELECT DISTINCT product.maker
FROM product, pc
WHERE pc.model = product.model AND
product.maker NOT IN
(SELECT DISTINCT product.maker
FROM product, laptop
WHERE product.model = laptop.model)
and
SELECT DISTINCT p.maker
FROM Product p INNER JOIN
PC ON p.model = PC.model
WHERE p.maker NOT IN (SELECT ip.maker
FROM Laptop il INNER JOIN
Product ip ON il.model = ip.model
);
EDIT: The database schema is here - http://www.sql-ex.ru/help/select13.php#db_1
So first obvious difference is absence of DISTINCT in second query's subquery.
The other difference that the second one uses the keyword inner join.
Now, the first way of writing the query is the classical way when join keyword was non-existent.
Using join keyword helps when you have multiple types of join to do such as left join, etc.
Usually the query processor will generate the same database operations, so the performance would be same.

Optimize SQL: Customers that haven't ordered for x days

I have created this SQL in order to find customers that haven't ordered for X days.
It is returning a result set, so this post is mainly just to get a second opinion on it, and possible optimizations.
SELECT o.order_id,
o.order_status,
o.order_created,
o.user_id,
i.identity_firstname,
i.identity_email,
(SELECT COUNT(*)
FROM orders o2
WHERE o2.user_id=o.user_id
AND o2.order_status=1) AS order_count,
(SELECT o4.order_created
FROM orders o4
WHERE o4.user_id=o.user_id
AND o4.order_status=1
ORDER BY o4.order_created DESC LIMIT 1) AS last_order
FROM orders o
INNER JOIN user_identities ui ON o.user_id=ui.user_id
INNER JOIN identities i ON ui.identity_id=i.identity_id
AND i.identity_email!=''
INNER JOIN subscribers s ON i.identity_id=s.identity_id
AND s.subscriber_status=1
AND s.subsriber_type=e
AND s.subscription_id=1
WHERE DATE(o.order_created) = "2013-12-14"
AND o.order_status=1
AND o.user_id NOT IN
(SELECT o3.user_id
FROM orders o3
WHERE o3.user_id=o.user_id
AND o3.order_status=1
AND DATE(o3.order_created) > "2013-12-14")
Can you guys find any potential problems with this SQL? Dates are dynamically inserted.
The final SQL that I put in production, will basically only include o.order_id, i.identity_id and o.order_count - this order_count will need to be correct. The other selected fields and 'last_order' subquery will not be included, it's only for testing.
This should give me a list of users that have their last order on that particular day, and is a newsletter subscriber. I am particular in doubt about correctness of the NOT IN part in the WHERE clause, and the order_count subquery.
There are several problems:
A. Using functions on indexable columns
You are searching for orders by comparing DATE(order_created) with some constant. This is a terrible idea, because a) the DATE() function is executed for every row (CPU) and b) the database can't use an index on the column (assuming one existed)
B. Using WHERE ID NOT IN (...)
Using a NOT IN (...) is almost always a bad idea, because optimizers usually have trouble with this construct, and often get the plan wrong. You can almost always express it as an outer join with a WHERE condition that filters for misses using an IS NULL condition for a joined column (and adds the side benefit of not needing DISTINCT, because there's only ever one miss returned)
C. Leaving joins that filtering out of large portions of rows too late
The earlier you can mask off rows by not making joins the better. You can do this by joining less likely to match tables earlier in the joined table list, and by putting non-key conditions into join rather than the where clause to get the rows excluded as early as possible. Some optimizers to this anyway, but I've often found they don't
D. Avoid correlated subqueries like the plague!
You have several correlated subqueries - ones that are executed for every row of the main table. That's really an incredibly bad idea. Again sometimes the optimizer can craft them into a join, but why rely (hope) on that. Most correlated subqueries can be expressed as a join; you examples are no exception.
With the above in mind, there are some specific changes:
o2 and o4 are the same join, so o4 may be dispensed with entirely - just use o2 after conversion to a join
DATE(order_created) = "2013-12-14" should be written as order_created between "2013-12-14 00:00:00" and "2013-12-14 23:59:59"
This query should be what you want:
SELECT
o.order_id,
o.order_status,
o.order_created,
o.user_id,
i.identity_firstname,
i.identity_email,
count(o2.user_id) AS order_count,
max(o2.order_created) AS last_order
FROM orders o
LEFT JOIN orders o2 ON o2.user_id = o.user_id AND o2.order_status=1
LEFT JOIN orders o3 ON o3.user_id = o.user_id
AND o3.order_status=1
AND o3.order_created >= "2013-12-15 00:00:00"
JOIN user_identities ui ON o.user_id=ui.user_id
JOIN identities i ON ui.identity_id=i.identity_id AND i.identity_email != ''
JOIN subscribers s ON i.identity_id=s.identity_id
AND s.subscriber_status=1
AND s.subsriber_type=e
AND s.subscription_id=1
WHERE o.order_created between "2013-12-14 00:00:00" and "2013-12-14 23:59:59"
AND o.order_status=1
AND o3.order_created IS NULL -- This gets only missed joins on o3
GROUP BY
o.order_id,
o.order_status,
o.order_created,
o.user_id,
i.identity_firstname,
i.identity_email;
The last line is how you achieve the same as NOT IN (...) using a LEFT JOIN
Disclaimer: Not tested.
Can't really comment on the results as you have not posted any table declares or example data, but your query has 3 correlated sub queries which is likely to make it perform poorly (OK, one of those is for last_order and is only for testing).
Eliminating the correlated sub queries and replacing them with joins would give something like this:-
SELECT o.order_id,
o.order_status,
o.order_created,
o.user_id,
i.identity_firstname,
i.identity_email,
Sub1.order_count,
Sub2.last_order
FROM orders o
INNER JOIN user_identities ui ON o.user_id=ui.user_id
INNER JOIN identities i ON ui.identity_id=i.identity_id
AND i.identity_email!=''
INNER JOIN subscribers s ON i.identity_id=s.identity_id
AND s.subscriber_status=1
AND s.subsriber_type=e
AND s.subscription_id=1
LEFT OUTER JOIN
(
SELECT user_id, COUNT(*) AS order_count
FROM orders
WHERE order_status=1
GROUP BY user_id
) Sub1
ON o.user_id = Sub1.user_id
LEFT OUTER JOIN
(
SELECT user_id, MAX(order_created) as last_order
FROM orders
WHERE order_status=1
GROUP BY user_id
) AS Sub2
ON o.user_id = Sub2.user_id
LEFT OUTER JOIN
(
SELECT DISTINCT user_id
FROM orders
WHERE order_status=1
AND DATE(order_created) > "2013-12-14"
) Sub3
ON o.user_id = Sub3.user_id
WHERE DATE(o.order_created) = "2013-12-14"
AND o.order_status=1
AND Sub3.user_id IS NULL