I need help regarding JOIN query in mysql - mysql

I have started learning MySQL and I'm having a problem with JOIN.
I have two tables: purchase and sales
purchase
--------------
p_id date p_cost p_quantity
---------------------------------------
1 2014-03-21 100 5
2 2014-03-21 20 2
sales
--------------
s_id date s_cost s_quantity
---------------------------------------
1 2014-03-21 90 9
2 2014-03-22 20 2
I want these two tables to be joined where purchase.date=sales.date to get one of the following results:
Option 1:
p_id date p_cost p_quantity s_id date s_cost s_quantity
------------------------------------------------------------------------------
1 2014-03-21 100 5 1 2014-03-21 90 9
2 2014-03-21 20 2 NULL NULL NULL NULL
NULL NULL NULL NULL 2 2014-03-22 20 2
Option 2:
p_id date p_cost p_quantity s_id date s_cost s_quantity
------------------------------------------------------------------------------
1 2014-03-21 100 5 NULL NULL NULL NULL
2 2014-03-21 20 2 1 2014-03-21 90 9
NULL NULL NULL NULL 2 2014-03-22 20 2
the main problem lies in the 2nd row of the first result. I don't want the values
2014-03-21, 90, 9 again in row 2... I want NULL instead.
I don't know whether it is possible to do this. It would be kind enough if anyone helps me out.
I tried using left join
SELECT *
FROM sales
LEFT JOIN purchase ON sales.date = purchase.date
output:
s_id date s_cost s_quantity p_id date p_cost p_quantity
1 2014-03-21 90 9 1 2014-03-21 100 5
1 2014-03-21 90 9 2 2014-03-21 20 2
2 2014-03-22 20 2 NULL NULL NULL NULL
but I want 1st 4 values of 2nd row to be NULL

Since there are no common table expressions or full outer joins to work with, the query will have some duplication and instead need to use a left join unioned with a right join;
SELECT p_id, p.date p_date, p_cost, p_quantity,
s_id, s.date s_date, s_cost, s_quantity
FROM (
SELECT *,(SELECT COUNT(*) FROM purchase p1
WHERE p1.date=p.date AND p1.p_id<p.p_id) rn FROM purchase p
) p LEFT JOIN (
SELECT *,(SELECT COUNT(*) FROM sales s1
WHERE s1.date=s.date AND s1.s_id<s.s_id) rn FROM sales s
) s
ON s.date=p.date AND s.rn=p.rn
UNION
SELECT p_id, p.date p_date, p_cost, p_quantity,
s_id, s.date s_date, s_cost, s_quantity
FROM (
SELECT *,(SELECT COUNT(*) FROM purchase p1
WHERE p1.date=p.date AND p1.p_id<p.p_id) rn FROM purchase p
) p RIGHT JOIN (
SELECT *,(SELECT COUNT(*) FROM sales s1
WHERE s1.date=s.date AND s1.s_id<s.s_id) rn FROM sales s
) s
ON s.date=p.date AND s.rn=p.rn
An SQLfiddle to test with.

In a general sense, what you're looking for is called a FULL OUTER JOIN, which is not directly available in MySQL. Instead you only get LEFT JOIN and RIGHT JOIN, which you can UNION together to get essentially the same result. For a very thorough discussion on this subject, see Full Outer Join in MySQL.
If you need help understanding the different ways to JOIN a table, I recommend A Visual Explanation of SQL Joins.
The way this is different from a regular FULL OUTER JOIN is that you're only including any particular row from either table at most once in the JOIN result. The problem being, if you have one purchase record and two sales records on a particular day, which sales record is the purchase record associated with? What is the relationship you're trying to represent between these two tables?
It doesn't sound like there's any particular relationship between purchase and sales records, except that some of them happened to take place on the same day. In which case, you're using the wrong tool for the job. If all you want to do is display these tables side by side and line the rows up by date, you don't need a JOIN at all. Instead, you should SELECT each table separately and do your formatting with some other tool (or manually).

Here's another way to get the same result, but the EXPLAIN for this is horrendous; and performance with large sets is going to be atrocious.
This is essentially two queries UNIONed together. The first query is essentially "purchase LEFT JOIN sales", the second query is essentially "sales ANTI JOIN purchase".
Because there is no foreign key relationship between the two tables, other than rows matching on date, we have to "invent" a key we can join on; we use user variables to assign ascending integer values to each row within a given date, so we can match row 1 from purchase to row 1 from sales, etc.
I wouldn't normally generate this type of result using SQL; it's not a typical JOIN operation, in the sense of how we traditionally join tables.
But, if I had to produce the specified resultset using MySQL, I would do it like this:
SELECT p.p_id
, p.p_date
, p.p_cost
, p.p_quantity
, s.s_id
, s.s_date
, s.s_cost
, s.s_quantity
FROM ( SELECT #pl_i := IF(pl.date = #pl_prev_date,#pl_i+1,1) AS i
, #pl_prev_date := pl.date AS p_date
, pl.p_id
, pl.p_cost
, pl.p_quantity
FROM purchase pl
JOIN ( SELECT #pl_i := 0, #pl_prev_date := NULL ) pld
ORDER BY pl.date, pl.p_id
) p
LEFT
JOIN ( SELECT #sr_i := IF(sr.date = #sr_prev_date,#sr_i+1,1) AS i
, #sr_prev_date := sr.date AS s_date
, sr.s_id
, sr.s_cost
, sr.s_quantity
FROM sales sr
JOIN ( SELECT #sr_i := 0, #sr_prev_date := NULL ) srd
ORDER BY sr.date, sr.s_id
) s
ON s.s_date = p.p_date
AND s.i = p.i
UNION ALL
SELECT p.p_id
, p.p_date
, p.p_cost
, p.p_quantity
, s.s_id
, s.s_date
, s.s_cost
, s.s_quantity
FROM ( SELECT #sl_i := IF(sl.date = #sl_prev_date,#sl_i+1,1) AS i
, #sl_prev_date := sl.date AS s_date
, sl.s_id
, sl.s_cost
, sl.s_quantity
FROM sales sl
JOIN ( SELECT #sl_i := 0, #sl_prev_date := NULL ) sld
ORDER BY sl.date, sl.s_id
) s
LEFT
JOIN ( SELECT #pr_i := IF(pr.date = #pr_prev_date,#pr_i+1,1) AS i
, #pr_prev_date := pr.date AS p_date
, pr.p_id
, pr.p_cost
, pr.p_quantity
FROM purchase pr
JOIN ( SELECT #pr_i := 0, #pr_prev_date := NULL ) prd
ORDER BY pr.date, pr.p_id
) p
ON p.p_date = s.s_date
AND p.i = s.i
WHERE p.p_date IS NULL
ORDER BY COALESCE(p_date,s_date),COALESCE(p_id,s_id)

Related

Group_Concat with multiple joined tables

I have two main tables that comprise bookings for events.
A Registrants table (Bookings) R and an Events table E.
There are also two connected tables, Field_Values V and Event_Categories C
This diagram shows the relationship
What I am trying to do is create an Invoice query that mirrors the user's shopping cart. Often a user will book multiple events in one transaction, so my invoice should have columns for the common items e.g. User Name, User Email, Booking Date, Transaction ID and aggregated columns for the invoice line item values e.g. Quantity "1,2" Description "Desc1, Desc2" Price "10.00, 20.00" where there are two line items in the shopping cart.
The Transaction ID (dcea4_eb_registrant.transaction_id) is unique per Invoice and repeated per line item in that sale.
I have the following query which produces rows for each line item
SELECT
R.id as ID,
E.event_date as ServiceDate,
E.event_date - INTERVAL 1 DAY as DueDate,
Concat('Ad-Hoc Booking:',E.title) as ItemProductService,
Concat(R.first_name, ' ',R.last_name) as Customer,
R.first_name as FirstName,
R.last_name as LastName,
R.email,
R.register_date as InvoiceDate,
R.amount as ItemAmount,
R.comment,
R.number_registrants as ItemQuantity,
R.transaction_id as InvoiceNo,
R.published as Status,
E.event_date AS SERVICEDATE,
Concat('Ad-Hoc Booking:',E.title) AS DESCRIPTION,
R.number_registrants AS QUANTITY,
FORMAT(R.amount / R.number_registrants,2) AS RATE,
R.amount AS AMOUNT,
C.category_id as CLASS,
Concat(Group_Concat(V.field_value SEPARATOR ', '),'. ',R.comment) as Memo
FROM dcea4_eb_events E
LEFT JOIN dcea4_eb_registrants R ON R.event_id = E.id
LEFT JOIN dcea4_eb_field_values V ON V.registrant_id = R.id
LEFT JOIN dcea4_eb_event_categories C ON C.event_id = R.event_id
WHERE 1=1
AND V.field_id IN(14,26,27,15)
AND R.published <> 2 /*Including this line omits Cancelled Invoices */
AND R.published IS NOT NULL
AND (R.published = 1 OR R.payment_method = "os_offline")
AND (R.register_date >= CURDATE() - INTERVAL 14 DAY)
GROUP BY E.event_date, E.title, R.id, R.first_name, R.last_name, R.email,R.register_date, R.amount, R.comment
ORDER BY R.register_date DESC, R.transaction_id
This produces output like this
I'm using the following query to try to group together the rows with a common transaction_ID (rows two and three in the last picture) - I add group_concat on the columns I want to aggregate and change the Group By to be the transaction_id
SELECT
R.id as ID,
E.event_date as ServiceDate,
E.event_date - INTERVAL 1 DAY as DueDate,
Concat('Ad-Hoc Booking:',E.title) as ItemProductService,
Concat(R.first_name, ' ',R.last_name) as Customer,
R.first_name as FirstName,
R.last_name as LastName,
R.email,
R.register_date as InvoiceDate,
R.amount as ItemAmount,
R.comment,
R.number_registrants as ItemQuantity,
R.transaction_id as InvoiceNo,
R.published as Status,
Group_ConCat( E.event_date) AS SERVICEDATE,
Group_ConCat( Concat('Ad-Hoc Booking:',E.title)) AS DESCRIPTION,
Group_ConCat( R.number_registrants) AS QUANTITY,
Group_ConCat( FORMAT(R.amount / R.number_registrants,2)) AS RATE2,
Group_ConCat( R.amount) AS AMOUNT,
Group_ConCat( C.category_id) as CLASS,
Concat(Group_Concat(V.field_value SEPARATOR ', '),'. ',R.comment) as Memo
FROM dcea4_eb_events E
LEFT JOIN dcea4_eb_registrants R ON R.event_id = E.id
LEFT JOIN dcea4_eb_field_values V ON V.registrant_id = R.id
LEFT JOIN dcea4_eb_event_categories C ON C.event_id = R.event_id
WHERE 1=1
AND V.field_id IN(14,26,27,15)
AND R.published <> 2 /*Including this line omits Cancelled Invoices */
AND R.published IS NOT NULL
AND (R.published = 1 OR R.payment_method = "os_offline")
AND (R.register_date >= CURDATE() - INTERVAL 14 DAY)
GROUP BY R.transaction_id
ORDER BY R.register_date DESC, R.transaction_id
But this produces this output
It seems to be multiplying the rows. The Quantity column in the first row should just be 1 and in the second row it should be 2,1 .
I've tried using Group_Concat with DISTINCT but this doesn't work because often the values being concatenated are the same (e.g. the price for two events being booked are both the same) and the query only returns one value e.g. 10 and not 10, 10. The latter being what I need.
I'm guessing the issue is around the way the tables are joined but I'm struggling to work out how to get what I need.
Pointers in the right direction most appreciated.
You seem determined to go in what seems to me to be the wrong direction, so here's a gentle nudge down that hill...
Consider the following...
CREATE TABLE users
(user_id SERIAL PRIMARY KEY
,username VARCHAR(12) UNIQUE
);
INSERT INTO users VALUES
(101,'John'),(102,'Paul'),(103,'George'),(104,'Ringo');
DROP TABLE IF EXISTS sales;
CREATE TABLE sales
(sale_id SERIAL PRIMARY KEY
,purchaser_id INT NOT NULL
,item_code CHAR(1) NOT NULL
,quantity INT NOT NULL
);
INSERT INTO sales VALUES
( 1,101,'A',1),
( 2,103,'A',2),
( 3,103,'A',3),
( 4,104,'A',1),
( 5,104,'A',2),
( 6,104,'A',3),
( 7,103,'B',2),
( 8,103,'B',2),
( 9,104,'B',3),
(10,103,'B',2),
(11,104,'B',2),
(12,104,'B',1);
SELECT u.*
, x.sale_ids
, x.item_codes
, x.quantities
FROM users u
LEFT
JOIN
( SELECT purchaser_id
, GROUP_CONCAT(sale_id ORDER BY sale_id) sale_ids
, GROUP_CONCAT(item_code ORDER BY sale_id) item_codes
, GROUP_CONCAT(quantity ORDER BY sale_id) quantities
FROM sales
GROUP
BY purchaser_id
) x
ON x.purchaser_id = u.user_id;
+---------+----------+---------------+-------------+-------------+
| user_id | username | sale_ids | item_codes | quantities |
+---------+----------+---------------+-------------+-------------+
| 101 | John | 1 | A | 1 |
| 102 | Paul | NULL | NULL | NULL |
| 103 | George | 2,3,7,8,10 | A,A,B,B,B | 2,3,2,2,2 |
| 104 | Ringo | 4,5,6,9,11,12 | A,A,A,B,B,B | 1,2,3,3,2,1 |
+---------+----------+---------------+-------------+-------------+

How to get all entries with SQL query with join

kon
id | name
1 alex
2 peter
3 john
ticket
id | amount | kon_id | package
122 13 1 234
123 12 1 234
124 20 2 NULL
125 23 2 235
126 19 1 236
I would like to get a list of all contacts with the sum of the amount, except tickets, where the package entry is NULL.
My problem is, that I only get the contacts which have a ticket, because of the WHERE clause.
SELECT
kon.id,
kon.name,
SUM(ticket.amount)
FROM kon LEFT JOIN ticket ON kon.id = ticket.kon_id
WHERE ticket.package IS NOT NULL
GROUP BY kon.id
At the moment, the output looks like this
1 alex 44
2 peter 23
but it should look like this
1 alex 44
3 john NULL
2 peter 23
I use a MySQL Server.
Is it possible to solve this?
Replace Where with AND
SELECT
kon.id,
kon.name,
SUM(ticket.amount)
FROM kon LEFT JOIN ticket ON kon.id = ticket.kon_id AND ticket.package IS NOT NULL
GROUP BY kon.id
Check This.
SELECT
k.id,
k.name ,
coalesce (SUM(t.amount) ,0)
FROM kon k LEFT JOIN
( select id,amount,kon_id,package from ticket where package is not null ) t
ON k.id = t.kon_id
GROUP BY k.id, k.name
OutPut :
Begin Tran
Create Table #Kon (id INt , name Nvarchar(255))
Insert into #Kon
Select 1,'alex' UNION ALL
Select 2,'peter' UNION ALL
Select 3,'john'
Create Table #Ticket (id int,amount int,Kon_Id Int,Package Int)
INSERT INTO #Ticket
SELECT 122,13,1,234 UNION ALL
SELECT 123,12,1,234 UNION ALL
SELECT 124,20,2,NULL UNION ALL
SELECT 125,23,2,235 UNION ALL
SELECT 126,19,1,236
SELECT K.id, Name,SUM(amount) amount
FROM #Kon k
LEFT JOIN #Ticket T ON K.id=T.Kon_Id
GROUP BY K.id,Name
RollBAck Tran
Generally, "ticket.package IS NOT NULL" is wrong condition: your query becomes inner join from left join. If ticket.package should be NOT NULL to add from amount, it should be not in condition, but inside SUM agregate function.
working example for MS SQL
SELECT
kon.id,
min(kon.name),
SUM(case when package is NULL then 0 else ticket.amount end)
FROM #kon kon LEFT JOIN #ticket ticket ON kon.id = ticket.kon_id
GROUP BY kon.id
Answer from Mr. Bhosale is right too, but for big tables will have worse performance (the reason is subquery)
the following query return your expected result
SELECT
kon.id,
kon.name,
SUM(ticket.amount) as 'amount'
FROM kon LEFT JOIN ticket ON kon.id = ticket.kon_id
GROUP BY kon.id, kon.name
attached image shows the result
I figured out the fastest way to solve the problem. It takes about 0.2s compared to the other solutions (2s - 2min). The CAST is important, otherwise the summation of double variables is wrong (float string problem).
SELECT
kon1,
kon2,
SUM(CAST(kon3 AS DECIMAL(7,2)))
FROM (
SELECT k.id kon1, k.name kon2, t.amount kon3 FROM kon as k
LEFT JOIN ticket t ON k.id = t.ticket_kon
WHERE t.package IS NOT NULL
UNION ALL
SELECT k.id kon1, k.name kon2, NULL kon3 FROM kon k WHERE) t1
GROUP BY kon1, kon2

Compare data using the year in a query

I have a data and they are recorded by each year, I am trying to compare two years( the past year and the current year) data within one mysql query
Below are my tables
Cost Items
| cid | items |
| 1 | A |
| 2 | B |
Cost
| cid | amount | year |
| 1 | 10 | 1 |
| 1 | 20 | 2 |
| 1 | 30 | 1 |
This is the result I am expecting when i want to compare the year 1 and year 2. Year 1 is the past year and year 2 is the current year
Results
items | pastCost | currentCost |
A | 10 | 20 |
A | 30 | 0 |
However the below query is what i used by gives a strange answer.
SELECT
IFNULL(ps.`amount`, '0') as pastCost
IFNULL(cs.`amount`, '0') as currentCost
FROM
`Cost Items` b
LEFT JOIN
`Cost` ps
ON
b.cID=ps.cID
AND
ps.Year = 1
LEFT JOIN
`Cost` cu
ON
b.cID=cu.cID
AND
cu.Year =2
This is the result i get from my query
items | pastCost | currentCost |
A | 10 | 20 |
A | 30 | 20 |
Please what am i doing wrong? Thanks for helping.
I'm missing something about your query; the SQL text shown can't produce that result.
There is no source for the items column in the SELECT list, and there is no table aliased as cs. (Looks like the expression in the SELECT list would need to be cu.amount
Aside from that, the results being returned look exactly like what we'd expect. Each row returned from year=2 is being matched with each row returned from year=1. If there were three rows for year=1 and two rows for year=2, we'd get six rows back... each row for year=1 "matched" with each row for year=2.
If (cid, year) tuple was UNIQUE in Cost, then this query would return a result similar to what you expect.
SELECT b.items
, IFNULL(ps.amount, '0') AS pastCost
, IFNULL(cu.amount, '0') AS currentCost
FROM `Cost Items` b
LEFT
JOIN `Cost` ps
ON ps.cid = b.cid
AND ps.Year = 1
LEFT
JOIN `Cost` cu
ON cu.cid = b.cid
AND cu.Year = 2
Since (cid, year) is not unique, you need some additional column to "match" a single row for year=1 with a single row for year=2.
Without some other column in the table, we could use an inline view to generate a value. I can illustrate how we can make MySQL return a resultset like the one you show, one way that could be done, but I don't think this is really the solution to whatever problem you are trying to solve:
SELECT b.items
, IFNULL(MAX(IF(a.year=1,a.amount,NULL)),0) AS pastCost
, IFNULL(MAX(IF(a.year=2,a.amount,NULL)),0) AS currentCost
FROM `Cost Items` b
LEFT
JOIN ( SELECT #rn := IF(c.cid=#p_cid AND c.year=#p_year,#rn+1,1) AS `rn`
, #p_cid := c.cid AS `cid`
, #p_year := c.year AS `year`
, c.amount
FROM (SELECT #p_cid := NULL, #p_year := NULL, #rn := 0) i
JOIN `Cost` c
ON c.year IN (1,2)
ORDER BY c.cid, c.year, c.amount
) a
ON a.cid = b.cid
GROUP
BY b.cid
, a.rn
A query something like that would return a resultset that looks like the one you are expecting. But again, I strongly suspect that this is not really the resultset you are really looking for.
EDIT
OP leaves comment with vaguely nebulous report of observed behavior: "the above solution doesnt work"
Well then, let's check it out... create a SQL Fiddle with some tables so we can test the query...
SQL Fiddle here http://sqlfiddle.com/#!9/e3d7e/1
create table `Cost Items` (cid int unsigned, items varchar(5));
insert into `Cost Items` (cid, items) values (1,'A'),(2,'B');
create table `Cost` (cid int unsigned, amount int, year int);
insert into `Cost` (cid, amount, year) VALUES (1,10,1),(1,20,2),(1,30,1);
And when we run the query, we get a syntax error. There's closing paren missing in the expressions in the SELECT list, easy enough to fix.
SELECT b.items
, IFNULL(MAX(IF(a.year=1,a.amount,NULL)),0) AS pastCost
, IFNULL(MAX(IF(a.year=2,a.amount,NULL)),0) AS currentCost
FROM `Cost Items` b
LEFT
JOIN ( SELECT #rn := IF(c.cid=#p_cid AND c.year=#p_year,#rn+1,1) AS `rn`
, #p_cid := c.cid AS `cid`
, #p_year := c.year AS `year`
, c.amount
FROM (SELECT #p_cid := NULL, #p_year := NULL, #rn := 0) i
JOIN `Cost` c
ON c.year IN (1,2)
ORDER BY c.cid, c.year, c.amount
) a
ON a.cid = b.cid
GROUP
BY b.cid
, a.rn
Returns:
items pastCost currentCost
------ -------- -----------
A 10 20
A 30 0
B 0 0

What to do with Full Outer Join

I need a Full Outer Join in mysql. I found a solution here: Full Outer Join in MySQL My problem is that t1 and t2 are subqueries themselves. So resulting query looks like a monster.
What to do in this situation? Should I use views instead of subqueries?
Edit:
I'll try to explain a bit more. I have orders and payments. One payment can cower multiple orders, and one order can be cowered by multiple payments. That is why I have tables orders, payments, and paymentitems. Each order has field company (which made this order) and manager (which accepted this order). Now I need to group orders and payments by company and manager and count money. So I want to get something like this:
company1 | managerA | 200 | 200 | 0
company1 | managerB | Null | 100 | 100
company1 | managerC | 300 | Null | -300
company2 | managerA | 150 | Null | -150
company2 | managerB | 100 | 350 | 250
The query, I managed to create:
SELECT coalesce(o.o_company, p.o_company)
, coalesce(o.o_manager, p.o_manager)
, o.orderstotal
, p.paymentstotal
, (coalesce(p.paymentstotal, 0) - coalesce(o.orderstotal, 0)) AS balance
FROM
(((/*Subquery A*/SELECT orders.o_company
, orders.o_manager
, sum(o_money) AS orderstotal
FROM
orders
WHERE
(o_date >= #startdate)
AND (o_date <= #enddate)
GROUP BY
o_company
, o_manager) AS o
LEFT JOIN (/*Subquery B*/SELECT orders.o_company
, orders.o_manager
, sum(paymentitems.p_money) AS paymentstotal
FROM
((payments
INNER JOIN paymentitems
ON payments.p_id = paymentitems.p_id)
INNER JOIN orders
ON paymentitems.p_oid = orders.o_id)
WHERE
(payments.p_date >= #startdate)
AND (payments.p_date <= #enddate)
GROUP BY
orders.o_company
, orders.o_manager) AS p
ON (o.o_company = p.o_company) and (o.o_manager = p.o_manager))
union
(/*Subquery A*/
right join /*Subquery B*/
ON (o.o_company = p.o_company) and (o.o_manager = p.o_manager)))
This is simplified version of my query. Real query is much more complex, that is why I want to keep it as simple as it can be. Maybe even split in to views, or may be there are other options I am not aware of.
I think the clue is in "group orders and payments by company". Break the outer join into a query on orders and another query on payments, then add up the type of money (orders or payments) for each company.
If you are trying to do a full outer join and the relationship is 1-1, then you can accomplish the same thing with a union and aggreagation.
Here is an example, pulling one column from two different tables:
select id, max(col1) as col1, max(col2) as col2
from ((select t1.id, t1.col1, NULL as col2
from t1
) union all
(select t23.id, NULL as col1, t2.col2
from t2
)
) t
group by id

Why does this WHERE clause make my query 180 times slower?

the following query executes in 1.6 seconds
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND
sex.sex=0 AND
sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
INNER JOIN shops ON shops.shop_id = p1.shop_id
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)
adding AND shops.shop_id=86 to the final WHERE clause causes the query to execute in 292 seconds:
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND
sex.sex=0 AND
sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
INNER JOIN shops ON shops.shop_id = p1.shop_id AND
shops.shop_id=86
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)
I would have thought limiting the shops table with AND shops.shop_id=86 would reduce execution time. Instead, execution time appears to depend upon the number of rows in the products table with products.shop_id equal to the specified shops.shop_id. There are about 34K rows in the products table with products.shop_id=86, and execution time is 292 seconds. For products.shop_id=50, there are about 28K rows, and execution time is 210 seconds. For products.shop_id=175, there are about 2K rows, and execution time is 2.8 seconds. What is going on?
EXPLAIN EXTENDED for the 1.6 second query is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1203 100.00 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 1203 100.00
3 DERIVED sex ALL product_id_2,product_id NULL NULL NULL 526846 75.00 Using where; Using temporary; Using filesort
3 DERIVED p1 eq_ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 PRIMARY 4 mydatabase.sex.product_id 1 100.00
3 DERIVED <derived4> ALL NULL NULL NULL NULL 14752 100.00
3 DERIVED shops eq_ref PRIMARY PRIMARY 4 mydatabase.p1.shop_id 1 100.00
4 DERIVED fav3 ALL NULL NULL NULL NULL 15356 100.00 Using temporary; Using filesort
SHOW WARNINGS for this EXPLAIN EXTENDED is
-----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(#num:=if(((#current_shop_id) = `testtable`.`shop_id`),if(((#current_product_id) = `testtable`.`product_id`),(#num),((#num) + 1)),0)) AS `row_number`,(#current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(#current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select `mydatabase`.`shops`.`shop` AS `shop`,`mydatabase`.`shops`.`shop_id` AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`fav4`.`product_id` = `mydatabase`.`sex`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`mydatabase`.`shops`.`shop_id` = `mydatabase`.`p1`.`shop_id`) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by `mydatabase`.`shops`.`shop`,`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+------
EXPLAIN EXTENDED for the 292 second query is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 36 100.00 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 36 100.00
3 DERIVED shops const PRIMARY PRIMARY 4 1 100.00 Using temporary; Using filesort
3 DERIVED p1 ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 shop_id 4 11799 100.00
3 DERIVED <derived4> ALL NULL NULL NULL NULL 14752 100.00
3 DERIVED sex eq_ref product_id_2,product_id product_id_2 5 mydatabase.p1.product_id 1 100.00 Using where
4 DERIVED fav3 ALL NULL NULL NULL NULL 15356 100.00 Using temporary; Using filesort
SHOW WARNINGS for this EXPLAIN EXTENDED is
----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(#num:=if(((#current_shop_id) = `testtable`.`shop_id`),if(((#current_product_id) = `testtable`.`product_id`),(#num),((#num) + 1)),0)) AS `row_number`,(#current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(#current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select 'shop.nordstrom.com' AS `shop`,'86' AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`fav4`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`p1`.`shop_id` = 86) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by 'shop.nordstrom.com',`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+-----
I am running MySQL client version: 5.1.56. The shops table has a primary index on shop_id:
Action Keyname Type Unique Packed Column Cardinality Collation Null Comment
Edit Drop PRIMARY BTREE Yes No shop_id 163 A
I have analyzed the shop table but this did not help.
I notice that if I remove the LEFT JOIN the difference in execution times drops to 0.12 seconds versus 0.28 seconds.
Cez's solution, namely to use the 1.6-second version of the query and remove irrelevant results by adding rowed_results.shop_dummy=86 to the outer query (as below), executes in 1.7 seconds. This circumvents the problem, but the mystery remains why 292-second query is so slow.
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND sex.sex=0
INNER JOIN shops ON shops.shop_id = p1.shop_id
WHERE sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7) AND
rowed_results.shop_dummy=86;
After the chat room, and actually creating tables/columns to match the query, I've come up with the following query.
I have started my inner-most query to be on the sex, product (for shop_id) and favorites table. Since you described that ProductX at ShopA = Product ID = 1 but same ProductX at ShopB = Product ID = 2 (example only), each product is ALWAYS unique per shop and never duplicated. That said, I can get the product and shop_id WITH the count of favorites (if any) at this query, yet group on just the product_id .. as shop_id won't change per product I am using MAX(). Since you are always looking by a date of "yesterday" and gender (sex=0 female), I would have the SEX table indexed on ( date, sex, product_id )... I would guess you are not adding 1000's of items every day... Products obviously would have an index on product_id (primary key), and favorites SHOULD have an index on product_id.
From that result (alias "sxFav") we can then do a direct join to the sex and products table by that "Product_ID" to get any additional information you may want, such as name of shop, date product added, product description, etc. This result is then ordered by the shop_id the product is being sold from, date and finally product ID (but you may consider grabbing a description column at inner query and using that as sort-by). This results in alias "PreQuery".
With the order being all proper by shop, we can now add the #MySQLVariable references to get each product assigned a row number similar to how you originally attempted. However, only reset back to 1 when a shop ID changes.
SELECT
PreQuery.*,
#num := IF( #current_shop_id = PreQuery.shop_id, #num +1, 1 ) AS RowPerShop,
#current_shop_id := PreQuery.shop_id AS shop_dummy
from
( SELECT
sxFav.product_id,
sxFav.shop_id,
sxFav.Favorites_Count
from
( SELECT
sex.product_id,
MAX( p.shop_id ) shop_id,
SUM( CASE WHEN F.current = 1 AND F.closeted = 1 THEN 1
WHEN F.current = 1 AND F.closeted = 0 THEN -1
ELSE 0 END ) AS favorites_count
from
sex
JOIN products p
ON sex.Product_ID = p.Product_ID
LEFT JOIN Favorites F
ON sex.product_id = F.product_ID
where
sex.date >= subdate( now(), interval 1 day)
and sex.sex = 0
group by
sex.product_id ) sxFav
JOIN sex
ON sxFav.Product_ID = sex.Product_ID
JOIN products p
ON sxFav.Product_ID = p.Product_ID
order by
sxFav.shop_id,
sex.date,
sxFav.product_id ) PreQuery,
( select #num :=0,
#current_shop_id := 0 ) as SQLVars
Now, if you are looking for specific "paging" information (such as 7 entries per shop), wrap the ENTIRE query above into something like...
select * from ( entire query above ) where RowPerShop between 1 and 7
(or between 8 and 14, 15 and 21, etc as needed)
or even
RowPerShop between RowsPerPage*PageYouAreShowing and RowsPerPage*(PageYouAreShowing +1)
You should move the shops.shop_id=86 to the JOIN condition for shops. No reason to put it outside the JOIN, you run the risk of MySQL JOINing first, then filtering. A JOIN can do the same job the a WHERE clause does, especially if you are not referencing other tables.
....
INNER JOIN shops ON shops.shop_id = p1.shop_id AND shops.shop_id=86
....
Same thing with the sex join:
...
INNER JOIN shops ON shops.shop_id = p1.shop_id
AND sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
...
Derived tables are great, but they have no indexes on them. Usually this doesn't matter since they are generally in RAM. But between filtering and sorting with no indexes, things can add up.
Note that in the second query that take much longer, the table processing order changes. The shop table is at the top in the slow query and the p1 table retrieves 11799 rows instead of 1 row in the fast query. It also doesn't use the primary key any more. That's likely where your problem is.
3 DERIVED p1 eq_ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 PRIMARY 4 mydatabase.sex.product_id 1 100.00
3 DERIVED p1 ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 shop_id 4 11799 100.00
Judging by the discussion, the query planner is performing badly when specifying the shop at a lower level.
Add rowed_results.shop_dummy=86 to the outer query to get the results that you are looking for.