i have a query with subqueries for a timeline widget of participants, leads and customers.
For example with 15k rows in the table but only 2k in this date range (January 1st to January 28th) this takes about 40 seconds!
SELECT created_at as date,
(
SELECT COUNT(id)
FROM participant
WHERE created_at <= date
) as participants,
(
SELECT COUNT(DISTINCT id)
FROM participant
WHERE participant_type = "lead"
AND created_at <= date
) as leads,
(
SELECT COUNT(DISTINCT id)
FROM participant
WHERE participant_type = "customer"
AND created_at <= date
) as customer
FROM participant
WHERE created_at >= '2016-01-01 00:00:00'
AND created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
How can i improve the performance?
The table fields are declared as follows:
id => primary_key, INT 10, auto increment
participant_type => ENUM "lead,customer", NULLABLE, ut8_unicode_ci
created_at => TIMESTAMP, default '0000-00-00 00:00:00'
Possibly try using conditions within the counts (or sums) to get the values you want, having cross joined things:-
SELECT a.created_at as date,
SUM(IF(b.created_at <= a.created_at, 1, 0)) AS participants,
COUNT(DISTINCT IF(b.participant_type = "lead" AND b.created_at <= a.created_at, b.id, NULL)) AS leads,
COUNT(DISTINCT IF(b.participant_type = "customer" AND b.created_at <= a.created_at, b.id, NULL)) AS customer
FROM participant a
CROSS JOIN participant b
WHERE a.created_at >= '2016-01-01 00:00:00'
AND a.created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
or maybe move the date check into the join
SELECT a.created_at as date,
COUNT(b.id) AS participants,
COUNT(DISTINCT IF(b.participant_type = "lead", b.id, NULL)) AS leads,
COUNT(DISTINCT IF(b.participant_type = "customer", b.id, NULL)) AS customer
FROM participant a
LEFT OUTER JOIN participant b
ON b.created_at <= a.created_at
WHERE a.created_at >= '2016-01-01 00:00:00'
AND a.created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
I'm not clearly understanding what you want to do with this query. But may I can provide way for optimization.
Try this one:
SELECT
participants.day as day,
participants.total_count,
leads.lead_count,
customer.customer_count
FROM
(
SELECT created_at as day, COUNT(id) as total_count
FROM participant
WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as participants
LEFT JOIN
(
SELECT created_at as day, COUNT(DISTINCT id) as lead_count
FROM participant
WHERE participant_type = "lead"
AND created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as leads ON (participants.day = leads.day)
LEFT JOIN
(
SELECT created_at as day, COUNT(DISTINCT id) as customer_count
FROM participant
WHERE participant_type = "customer"
AND WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as customer ON (participants.day = customer.day)
Add index to the query. You can execute Explain on this query.
With the help of EXPLAIN, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows.
Related
I need to get the amount of distinct parent_ids that fill in one of the conditions below , grouped by day:
parent_ids that have both status = pending & processing
OR
parent_ids who have both status = canceled and processing.
I ve tried something similar to :
SELECT count(parent_id) as pencan, created_at, DATE_FORMAT(a.created_at, '%Y') AS year_key, DATE_FORMAT(a.created_at, '%m-%d') as day_key
FROM sales_flat_order_status_history
where created_at BETWEEN '2010-01-01 00:00:00' AND '2013-04-30 23:59:59'
GROUP BY created_at ,parent_id
HAVING SUM(status = 'processing')
AND SUM(status IN ('pending', 'cancelling'))
I think you just need to fix the group by:
SELECT DATE(created_at), count(parent_id) as pencan
FROM sales_flat_order_status_history
where created_at >= '2010-01-01' AND
created_at < '2013-05-01'
GROUP BY DATE(created_at) , parent_id
HAVING SUM(status = 'processing') AND
SUM(status IN ('pending', 'cancelling'))
I have a table "products" and a table "links". every product can have multiple links, each link can have many sales, clicks and impressions, but doesn't necessarily have all of them. I want to get a list of links of a certain product matching some criteria for them. I want to get this data grouped per day and campaign and link banner size.
The following query works correctly, but it's much slower than it could be. The problem is that the subqueries get the data for all link ids and it's just filtered in the end. The overall query would become much faster if the sub queries included something like
where link_id IN (...) but I only know the link_ids from the main query, not before
if I try to add
where link_id = l.id
it's obviously an unknown column, because the sub query doesn't have access to the main queries results.
how can I let the sub queries only look up data for the link_Ids that the main query found? I could split it up to 2 complete separate queries, but is this possible in one query?
select p.id, p.name, l.id, l.banner_size,
coalesce(sum(case when t1.col = 'sales' then ct else 0 end), 0) as total_sales,
coalesce(sum(case when t1.col = 'clicks' then ct else 0 end), 0) as total_clicks,
coalesce(sum(case when t1.col = 'impressions' then ct else 0 end), 0) as total_impressions,
t1.dt
from links l
inner join products p
on p.id = l.product_id
left join
(
select count(1) as ct, link_id, date(clicked) dt, 'sales' as col
from sales
where clicked >= '2020-01-01 00:00:00' and clicked <= '2020-01-31 00:00:00'
group by date(clicked), link_id
union all
select count(1) as ct, link_id, date(created) dt, 'clicks'
from clicks_source1
where created >= '2020-01-01 00:00:00' and created <= '2020-01-31 00:00:00'
group by date(created), link_id
union all
select count(1) as ct, link_id, date(time) dt, 'clicks'
from clicks_source2
where time >= '2020-01-01 00:00:00' and time <= '2020-01-31 00:00:00'
group by date(time), link_id
union all
select count(1) as ct, link_id, date(created) dt, 'impressions'
from impression_source1
where created > '2020-01-01 00:00:00' and created <= '2020-01-31 00:00:00'
group by date(created), link_id
union all
select count(1) as ct, link_id, date(time) dt, 'impressions'
from impression_source2
where time > '2020-01-01 00:00:00' and time <= '2020-01-31 00:00:00'
group by date(time), link_id
) t1 on t1.link_id = l.id
where l.agent_id = 300
and p.id = 3454
and l.banner_size = 48
and p.company NOT IN (13592, 28189)
group by c.id, l.banner_size, t1.dt
having (total_sales + total_clicks + total_impressions) > 0
order by dt DESC, p.id ASC, l.banner_size ASC
you can just add inner joins with links to all the subqueries
select count(1) as ct, s.link_id, date(s.clicked) dt, 'sales' as col
from sales s
join links l1 on l1.id = s.link_id
where s.clicked >= '2020-01-01 00:00:00' and s.clicked <= '2020-01-31 00:00:00'
group by date(s.clicked), s.link_id
union all etc.
Then you'll only get the rows with a match and the query should run faster
I'm concerned about performance.
It is possibile to optimize the following mysql query?
SELECT u.name, t2.transactions, o2.orders FROM users AS u
LEFT JOIN (
SELECT t.aid AS tuid, SUM( IF( t.status = 1, t.amount, 0 ) ) AS transactions
FROM transactions AS t
WHERE ( t.datetime BETWEEN ('2018-01-01 00:00:00') AND ( '2020-01-01 23:59:59' ) ) GROUP BY t.aid
) AS t2 ON tuid = u.id
LEFT JOIN (
SELECT o.aid AS ouid, SUM(o.net) AS orders FROM orders AS o
WHERE ( o.date BETWEEN ('2018-01-01 00:00:00') AND ( '2020-01-01 23:59:59' ) ) GROUP BY o.aid
) AS o2 ON ouid = u.id
WHERE u.status = 1
ORDER BY t2.transactions DESC
basically I need to sum users' data from multiple tables (and be able to order them)
There's no obvious query-performance antipattern in your query. Performance pretty much depends on the performance of the two subqueries with group by clauses.
Let's take a look at one of them to find some improvements.
SELECT t.aid AS tuid,
SUM( IF( t.status = 1, t.amount, 0 ) ) AS transactions
FROM afs_transactions AS t
WHERE t.datetime BETWEEN '2018-01-01 00:00:00' AND '2020-01-01 23:59:59'
GROUP BY t.aid
This will be OK if you have an index on afs_transactions.datetime.
But the whole subquery can be rewritten
SELECT t.aid AS tuid,
SUM( t.amount ) AS transactions
FROM afs_transactions AS t
WHERE t.datetime BETWEEN '2018-01-01 00:00:00' AND '2020-01-01 23:59:59'
AND t.status = 1
GROUP BY t.aid
This query will take advantage of a compound index on (status, datetime). If you have many rows with status values not equal to 1, and you have the compound index, the rewritten query will be faster.
Pro tip: BETWEEN for datetime values is generally a poor choice, because, well, 59:59. Try using < rather than BETWEEN's <= for the end of the range.
WHERE t.datetime >= '2018-01-01'
AND t.datetime < '2020-01-02' /* notice, it's the day after the range */
Multiple JOIN ( SELECT ... ) used to be a performance killer (pre 5.6). Now it may be a performance problem.
The alternative is
SELECT u.name,
( SELECT ... WHERE ...=u.id ) AS transactions,
( SELECT ... WHERE ...=u.id ) AS orders
FROM users AS u
WHERE u.status = 1
ORDER BY transactions DESC
The first subquery is a correlated subquery and it looks like
( SELECT SUM( IF(status = 1, amount, 0)
FROM transactions
WHERE aid = u.id
AND datetime >= '2018-01-01'
AND datetime < '2018-01-01' + INTERVAL 2 YEAR`
) AS transactions
(The other one is similar.)
Indexes:
users: INDEX(status, name, id) -- "covering"
transactions: INDEX(aid, datetime)
orders: INDEX(aid, date) or INDEX(aid, date, net)
i have the following query:
SELECT
t.aff_id,
Ifnull(t.campaign_name, "direct") AS campaign_name,
Count(t.uuid) AS visits,
Count(DISTINCT l.trader_id) AS leads
FROM trackings AS t
LEFT JOIN (SELECT uuid,
trader_id
FROM leads
WHERE aff_id = "1"
AND created_at BETWEEN "2015-05-01 00:00:00" AND
"2015-05-20 23:59:59") AS l
ON l.uuid = t.uuid
WHERE `t`.`created_at` BETWEEN '2015-05-01 00:00:00' AND '2015-05-20 23:59:59'
AND `t`.`aff_id` = '1'
GROUP BY `t`.`campaign_name`
the leads table date range has no effect, what should be the correct structure for such a query?
http://sqlfiddle.com/#!9/5d125/2
I think desired result can be reached with join:
SELECT
t.aff_id,
IFNULL(t.campaign_name, 'direct') AS campaign_name,
COUNT(t.uuid) AS visits,
COUNT(DISTINCT l.trader_id) AS leads
FROM trackings AS t
LEFT JOIN leads AS l
ON l.uuid = t.uuid
AND l.created_at BETWEEN '2015-05-01 00:00:00' AND '2015-05-20 23:59:59'
AND aff_id = '1'
WHERE
t.created_at BETWEEN '2015-05-01 00:00:00' AND '2015-05-20 23:59:59'
AND t.aff_id = '1'
GROUP BY
campaign_name
;
I have the following query that gets some results I would like to group by attribute 'state'.
I tried different subquery but they didn't work and I'm a bit blocked.
The SQL is:
SELECT state, id_candidate_basic, MAX( DATE ) FROM `candidate_state`
WHERE `date` <= '2013-09-06 00:00:00' GROUP BY id_candidate_basic
ORDER BY `candidate_state`.`id_candidate_basic` DESC
This returns currently:
I would get a count(*) for each state. Example:
F, 14
I, 10
O, 9
SELECT state,
id_candidate_basic,
MAX( DATE ),
COALESCE(totalCount, 0) totalCount
FROM `candidate_state`
LEFT JOIN
(
SELECT state, COUNT(*) totalCount
FROM candidate_state
WHERE `date` <= '2013-09-06 00:00:00'
GROUP BY state
) ON candidate_state.state = b.state
WHERE `date` <= '2013-09-06 00:00:00'
GROUP BY id_candidate_basic
ORDER BY `candidate_state`.`id_candidate_basic` DESC