I have a table "products" and a table "links". every product can have multiple links, each link can have many sales, clicks and impressions, but doesn't necessarily have all of them. I want to get a list of links of a certain product matching some criteria for them. I want to get this data grouped per day and campaign and link banner size.
The following query works correctly, but it's much slower than it could be. The problem is that the subqueries get the data for all link ids and it's just filtered in the end. The overall query would become much faster if the sub queries included something like
where link_id IN (...) but I only know the link_ids from the main query, not before
if I try to add
where link_id = l.id
it's obviously an unknown column, because the sub query doesn't have access to the main queries results.
how can I let the sub queries only look up data for the link_Ids that the main query found? I could split it up to 2 complete separate queries, but is this possible in one query?
select p.id, p.name, l.id, l.banner_size,
coalesce(sum(case when t1.col = 'sales' then ct else 0 end), 0) as total_sales,
coalesce(sum(case when t1.col = 'clicks' then ct else 0 end), 0) as total_clicks,
coalesce(sum(case when t1.col = 'impressions' then ct else 0 end), 0) as total_impressions,
t1.dt
from links l
inner join products p
on p.id = l.product_id
left join
(
select count(1) as ct, link_id, date(clicked) dt, 'sales' as col
from sales
where clicked >= '2020-01-01 00:00:00' and clicked <= '2020-01-31 00:00:00'
group by date(clicked), link_id
union all
select count(1) as ct, link_id, date(created) dt, 'clicks'
from clicks_source1
where created >= '2020-01-01 00:00:00' and created <= '2020-01-31 00:00:00'
group by date(created), link_id
union all
select count(1) as ct, link_id, date(time) dt, 'clicks'
from clicks_source2
where time >= '2020-01-01 00:00:00' and time <= '2020-01-31 00:00:00'
group by date(time), link_id
union all
select count(1) as ct, link_id, date(created) dt, 'impressions'
from impression_source1
where created > '2020-01-01 00:00:00' and created <= '2020-01-31 00:00:00'
group by date(created), link_id
union all
select count(1) as ct, link_id, date(time) dt, 'impressions'
from impression_source2
where time > '2020-01-01 00:00:00' and time <= '2020-01-31 00:00:00'
group by date(time), link_id
) t1 on t1.link_id = l.id
where l.agent_id = 300
and p.id = 3454
and l.banner_size = 48
and p.company NOT IN (13592, 28189)
group by c.id, l.banner_size, t1.dt
having (total_sales + total_clicks + total_impressions) > 0
order by dt DESC, p.id ASC, l.banner_size ASC
you can just add inner joins with links to all the subqueries
select count(1) as ct, s.link_id, date(s.clicked) dt, 'sales' as col
from sales s
join links l1 on l1.id = s.link_id
where s.clicked >= '2020-01-01 00:00:00' and s.clicked <= '2020-01-31 00:00:00'
group by date(s.clicked), s.link_id
union all etc.
Then you'll only get the rows with a match and the query should run faster
Related
I'm concerned about performance.
It is possibile to optimize the following mysql query?
SELECT u.name, t2.transactions, o2.orders FROM users AS u
LEFT JOIN (
SELECT t.aid AS tuid, SUM( IF( t.status = 1, t.amount, 0 ) ) AS transactions
FROM transactions AS t
WHERE ( t.datetime BETWEEN ('2018-01-01 00:00:00') AND ( '2020-01-01 23:59:59' ) ) GROUP BY t.aid
) AS t2 ON tuid = u.id
LEFT JOIN (
SELECT o.aid AS ouid, SUM(o.net) AS orders FROM orders AS o
WHERE ( o.date BETWEEN ('2018-01-01 00:00:00') AND ( '2020-01-01 23:59:59' ) ) GROUP BY o.aid
) AS o2 ON ouid = u.id
WHERE u.status = 1
ORDER BY t2.transactions DESC
basically I need to sum users' data from multiple tables (and be able to order them)
There's no obvious query-performance antipattern in your query. Performance pretty much depends on the performance of the two subqueries with group by clauses.
Let's take a look at one of them to find some improvements.
SELECT t.aid AS tuid,
SUM( IF( t.status = 1, t.amount, 0 ) ) AS transactions
FROM afs_transactions AS t
WHERE t.datetime BETWEEN '2018-01-01 00:00:00' AND '2020-01-01 23:59:59'
GROUP BY t.aid
This will be OK if you have an index on afs_transactions.datetime.
But the whole subquery can be rewritten
SELECT t.aid AS tuid,
SUM( t.amount ) AS transactions
FROM afs_transactions AS t
WHERE t.datetime BETWEEN '2018-01-01 00:00:00' AND '2020-01-01 23:59:59'
AND t.status = 1
GROUP BY t.aid
This query will take advantage of a compound index on (status, datetime). If you have many rows with status values not equal to 1, and you have the compound index, the rewritten query will be faster.
Pro tip: BETWEEN for datetime values is generally a poor choice, because, well, 59:59. Try using < rather than BETWEEN's <= for the end of the range.
WHERE t.datetime >= '2018-01-01'
AND t.datetime < '2020-01-02' /* notice, it's the day after the range */
Multiple JOIN ( SELECT ... ) used to be a performance killer (pre 5.6). Now it may be a performance problem.
The alternative is
SELECT u.name,
( SELECT ... WHERE ...=u.id ) AS transactions,
( SELECT ... WHERE ...=u.id ) AS orders
FROM users AS u
WHERE u.status = 1
ORDER BY transactions DESC
The first subquery is a correlated subquery and it looks like
( SELECT SUM( IF(status = 1, amount, 0)
FROM transactions
WHERE aid = u.id
AND datetime >= '2018-01-01'
AND datetime < '2018-01-01' + INTERVAL 2 YEAR`
) AS transactions
(The other one is similar.)
Indexes:
users: INDEX(status, name, id) -- "covering"
transactions: INDEX(aid, datetime)
orders: INDEX(aid, date) or INDEX(aid, date, net)
i have a query with subqueries for a timeline widget of participants, leads and customers.
For example with 15k rows in the table but only 2k in this date range (January 1st to January 28th) this takes about 40 seconds!
SELECT created_at as date,
(
SELECT COUNT(id)
FROM participant
WHERE created_at <= date
) as participants,
(
SELECT COUNT(DISTINCT id)
FROM participant
WHERE participant_type = "lead"
AND created_at <= date
) as leads,
(
SELECT COUNT(DISTINCT id)
FROM participant
WHERE participant_type = "customer"
AND created_at <= date
) as customer
FROM participant
WHERE created_at >= '2016-01-01 00:00:00'
AND created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
How can i improve the performance?
The table fields are declared as follows:
id => primary_key, INT 10, auto increment
participant_type => ENUM "lead,customer", NULLABLE, ut8_unicode_ci
created_at => TIMESTAMP, default '0000-00-00 00:00:00'
Possibly try using conditions within the counts (or sums) to get the values you want, having cross joined things:-
SELECT a.created_at as date,
SUM(IF(b.created_at <= a.created_at, 1, 0)) AS participants,
COUNT(DISTINCT IF(b.participant_type = "lead" AND b.created_at <= a.created_at, b.id, NULL)) AS leads,
COUNT(DISTINCT IF(b.participant_type = "customer" AND b.created_at <= a.created_at, b.id, NULL)) AS customer
FROM participant a
CROSS JOIN participant b
WHERE a.created_at >= '2016-01-01 00:00:00'
AND a.created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
or maybe move the date check into the join
SELECT a.created_at as date,
COUNT(b.id) AS participants,
COUNT(DISTINCT IF(b.participant_type = "lead", b.id, NULL)) AS leads,
COUNT(DISTINCT IF(b.participant_type = "customer", b.id, NULL)) AS customer
FROM participant a
LEFT OUTER JOIN participant b
ON b.created_at <= a.created_at
WHERE a.created_at >= '2016-01-01 00:00:00'
AND a.created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
I'm not clearly understanding what you want to do with this query. But may I can provide way for optimization.
Try this one:
SELECT
participants.day as day,
participants.total_count,
leads.lead_count,
customer.customer_count
FROM
(
SELECT created_at as day, COUNT(id) as total_count
FROM participant
WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as participants
LEFT JOIN
(
SELECT created_at as day, COUNT(DISTINCT id) as lead_count
FROM participant
WHERE participant_type = "lead"
AND created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as leads ON (participants.day = leads.day)
LEFT JOIN
(
SELECT created_at as day, COUNT(DISTINCT id) as customer_count
FROM participant
WHERE participant_type = "customer"
AND WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as customer ON (participants.day = customer.day)
Add index to the query. You can execute Explain on this query.
With the help of EXPLAIN, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows.
I have a booking system that has multiple simultaneous bookings with a count number for each. I need to get the minimum and maximum for a specified date-range (for a day, in this case). I found some good code here, which works great in the test. But in my implementation, it fails in this particular instance:
It does not count bookings that start prior to the query-range and end within the query-range.
How do I fix this?
Here is an example:
This booking exists with these properties:
listings (an ID that multiple bookings can have, but only one in this case): 2f23f23f
date_start: 2016-01-15 08:00:00
date_end: 2016-01-17 08:00:00
state: active
count: 1
Result:
min_count: 0
max_count: 0
It should return:
min_count: 0
max_count: 1
If we query the very same, but with date range 2016-01-16 00:00:00 - 2016-01-16 23:59:59, it returns the correct answer:
min_count: 1
max_count: 1
Here is the MYSQL:
SELECT
MAX(simultaneous) AS max_count,
MIN(simultaneous) AS min_count
FROM (
SELECT IFNULL(SUM(
(
CASE WHEN (
listings = '2f23f23f'
AND
(state = 'active')
)
THEN count
ELSE 0
END
)
),0) AS simultaneous
FROM bookings RIGHT JOIN (
SELECT date_start AS boundary
FROM bookings
WHERE date_start BETWEEN '2016-01-17 00:00:00' AND '2016-01-17 23:59:59'
UNION
SELECT date_end
FROM bookings
WHERE date_end BETWEEN '2016-01-17 00:00:00' AND '2016-01-17 23:59:59'
UNION
SELECT MAX(boundary)
FROM (
SELECT MAX(date_start) AS boundary
FROM bookings
WHERE date_start <= '2016-01-17 00:00:00'
UNION ALL
SELECT MAX(date_end)
FROM bookings
WHERE date_end <= '2016-01-17 23:59:59'
) t
) t ON date_start <= boundary AND boundary < date_end
LEFT OUTER JOIN cart ON cart_bookings = id
GROUP BY boundary
) t
Whew, ok, here's the answer. The original evidently wasn't complete. It needed to include the start/end dates of the time-range being requested.
UNION
SELECT :date_start
UNION
SELECT :date_end
Complete code:
SELECT
MAX(simultaneous) AS max_count,
MIN(simultaneous) AS min_count
FROM (
SELECT IFNULL(SUM(
(
CASE WHEN (
listings = '2f23f23f'
AND
(state = 'active')
)
THEN count
ELSE 0
END
)
),0) AS simultaneous
FROM bookings RIGHT JOIN (
SELECT date_start AS boundary
FROM bookings
WHERE date_start BETWEEN '2016-01-17 00:00:00' AND '2016-01-17 23:59:59'
UNION
SELECT date_end
FROM bookings
WHERE date_end BETWEEN '2016-01-17 00:00:00' AND '2016-01-17 23:59:59'
UNION
SELECT MAX(boundary)
FROM (
SELECT MAX(date_start) AS boundary
FROM bookings
WHERE date_start <= '2016-01-17 00:00:00'
UNION ALL
SELECT MAX(date_end)
FROM bookings
WHERE date_end <= '2016-01-17 23:59:59'
) t
UNION
SELECT :date_start
UNION
SELECT :date_end
) t ON date_start <= boundary AND boundary < date_end
LEFT OUTER JOIN cart ON cart_bookings = id
GROUP BY boundary
) t
I appear to be having a little spazzy moment today and can't seem to come up with a better way of doing sub select counts on a table.
Basically what I need to do is for each distinct supplier I then need 2 counts (from the same table) one for the total records assigned to that supplier and one for disputed records for that supplier. At the moment I have the query below which is technically correct but slow as hell on a table with 1 mill + records. I'm sure there's a better way of doing it but I can't for the life of me work it out this morning.
SELECT
DISTINCT tblsuppliers.SupplierName,
(SELECT COUNT(*) FROM tblmovements a WHERE m.Supplier=a.Supplier AND a.TicketStatus IN(0,1) AND a.DateRequired>='2013-09-01 00:00:00' AND a.DateRequired<='2013-11-30 23:59:59') as 'Total Tickets',
(SELECT COUNT(*) FROM tblmovements b WHERE m.Supplier=b.Supplier AND b.TicketStatus IN(0,1) AND b.DateRequired>='2013-09-01 00:00:00' AND b.DateRequired<='2013-11-30 23:59:59' AND (b.SuppIsDisputed=1 OR b.SuppDisputeClearedBy>0)) as 'Total Disputed'
FROM tblmovements m
INNER JOIN tblsuppliers ON m.Supplier=tblsuppliers.ID
ORDER BY tblsuppliers.SupplierName ASC
The joined table is just to give me an supplier name as opposed to the supplier ID which is stored in the movements table. Any suggestions are greatly appreciated.
Try this:
SELECT s.SupplierName,
SUM(CASE WHEN m.TicketStatus IN(0,1) AND m.DateRequired>='2013-09-01 00:00:00' AND
m.DateRequired<='2013-11-30 23:59:59' THEN 1 ELSE 0
END) AS 'Total Tickets',
SUM(CASE WHEN m.TicketStatus IN(0,1) AND m.DateRequired>='2013-09-01 00:00:00' AND
m.DateRequired<='2013-11-30 23:59:59' AND
(m.SuppIsDisputed=1 OR m.SuppDisputeClearedBy>0) THEN 1 ELSE 0
END) AS 'Total Disputed'
FROM tblmovements m
INNER JOIN tblsuppliers s ON m.Supplier=s.ID
GROUP BY s.ID
ORDER BY s.SupplierName ASC
SELECT
tblsuppliers.SupplierName,
SUM(IF(TicketStatus IN(0,1) AND DateRequired>='2013-09-01 00:00:00' AND DateRequired<='2013-11-30 23:59:59', 1, 0)) as 'Total Tickets',
SUM(IF(TicketStatus IN(0,1) AND DateRequired>='2013-09-01 00:00:00' AND DateRequired<='2013-11-30 23:59:59' AND (SuppIsDisputed=1 OR SuppDisputeClearedBy>0), 1, 0)) as 'Total Disputed'
FROM tblmovements m
INNER JOIN tblsuppliers ON m.Supplier=tblsuppliers.ID
GROUP BY tblsuppliers.SupplierName
I have the following query that gets some results I would like to group by attribute 'state'.
I tried different subquery but they didn't work and I'm a bit blocked.
The SQL is:
SELECT state, id_candidate_basic, MAX( DATE ) FROM `candidate_state`
WHERE `date` <= '2013-09-06 00:00:00' GROUP BY id_candidate_basic
ORDER BY `candidate_state`.`id_candidate_basic` DESC
This returns currently:
I would get a count(*) for each state. Example:
F, 14
I, 10
O, 9
SELECT state,
id_candidate_basic,
MAX( DATE ),
COALESCE(totalCount, 0) totalCount
FROM `candidate_state`
LEFT JOIN
(
SELECT state, COUNT(*) totalCount
FROM candidate_state
WHERE `date` <= '2013-09-06 00:00:00'
GROUP BY state
) ON candidate_state.state = b.state
WHERE `date` <= '2013-09-06 00:00:00'
GROUP BY id_candidate_basic
ORDER BY `candidate_state`.`id_candidate_basic` DESC