I'm concerned about performance.
It is possibile to optimize the following mysql query?
SELECT u.name, t2.transactions, o2.orders FROM users AS u
LEFT JOIN (
SELECT t.aid AS tuid, SUM( IF( t.status = 1, t.amount, 0 ) ) AS transactions
FROM transactions AS t
WHERE ( t.datetime BETWEEN ('2018-01-01 00:00:00') AND ( '2020-01-01 23:59:59' ) ) GROUP BY t.aid
) AS t2 ON tuid = u.id
LEFT JOIN (
SELECT o.aid AS ouid, SUM(o.net) AS orders FROM orders AS o
WHERE ( o.date BETWEEN ('2018-01-01 00:00:00') AND ( '2020-01-01 23:59:59' ) ) GROUP BY o.aid
) AS o2 ON ouid = u.id
WHERE u.status = 1
ORDER BY t2.transactions DESC
basically I need to sum users' data from multiple tables (and be able to order them)
There's no obvious query-performance antipattern in your query. Performance pretty much depends on the performance of the two subqueries with group by clauses.
Let's take a look at one of them to find some improvements.
SELECT t.aid AS tuid,
SUM( IF( t.status = 1, t.amount, 0 ) ) AS transactions
FROM afs_transactions AS t
WHERE t.datetime BETWEEN '2018-01-01 00:00:00' AND '2020-01-01 23:59:59'
GROUP BY t.aid
This will be OK if you have an index on afs_transactions.datetime.
But the whole subquery can be rewritten
SELECT t.aid AS tuid,
SUM( t.amount ) AS transactions
FROM afs_transactions AS t
WHERE t.datetime BETWEEN '2018-01-01 00:00:00' AND '2020-01-01 23:59:59'
AND t.status = 1
GROUP BY t.aid
This query will take advantage of a compound index on (status, datetime). If you have many rows with status values not equal to 1, and you have the compound index, the rewritten query will be faster.
Pro tip: BETWEEN for datetime values is generally a poor choice, because, well, 59:59. Try using < rather than BETWEEN's <= for the end of the range.
WHERE t.datetime >= '2018-01-01'
AND t.datetime < '2020-01-02' /* notice, it's the day after the range */
Multiple JOIN ( SELECT ... ) used to be a performance killer (pre 5.6). Now it may be a performance problem.
The alternative is
SELECT u.name,
( SELECT ... WHERE ...=u.id ) AS transactions,
( SELECT ... WHERE ...=u.id ) AS orders
FROM users AS u
WHERE u.status = 1
ORDER BY transactions DESC
The first subquery is a correlated subquery and it looks like
( SELECT SUM( IF(status = 1, amount, 0)
FROM transactions
WHERE aid = u.id
AND datetime >= '2018-01-01'
AND datetime < '2018-01-01' + INTERVAL 2 YEAR`
) AS transactions
(The other one is similar.)
Indexes:
users: INDEX(status, name, id) -- "covering"
transactions: INDEX(aid, datetime)
orders: INDEX(aid, date) or INDEX(aid, date, net)
Related
I have a table "products" and a table "links". every product can have multiple links, each link can have many sales, clicks and impressions, but doesn't necessarily have all of them. I want to get a list of links of a certain product matching some criteria for them. I want to get this data grouped per day and campaign and link banner size.
The following query works correctly, but it's much slower than it could be. The problem is that the subqueries get the data for all link ids and it's just filtered in the end. The overall query would become much faster if the sub queries included something like
where link_id IN (...) but I only know the link_ids from the main query, not before
if I try to add
where link_id = l.id
it's obviously an unknown column, because the sub query doesn't have access to the main queries results.
how can I let the sub queries only look up data for the link_Ids that the main query found? I could split it up to 2 complete separate queries, but is this possible in one query?
select p.id, p.name, l.id, l.banner_size,
coalesce(sum(case when t1.col = 'sales' then ct else 0 end), 0) as total_sales,
coalesce(sum(case when t1.col = 'clicks' then ct else 0 end), 0) as total_clicks,
coalesce(sum(case when t1.col = 'impressions' then ct else 0 end), 0) as total_impressions,
t1.dt
from links l
inner join products p
on p.id = l.product_id
left join
(
select count(1) as ct, link_id, date(clicked) dt, 'sales' as col
from sales
where clicked >= '2020-01-01 00:00:00' and clicked <= '2020-01-31 00:00:00'
group by date(clicked), link_id
union all
select count(1) as ct, link_id, date(created) dt, 'clicks'
from clicks_source1
where created >= '2020-01-01 00:00:00' and created <= '2020-01-31 00:00:00'
group by date(created), link_id
union all
select count(1) as ct, link_id, date(time) dt, 'clicks'
from clicks_source2
where time >= '2020-01-01 00:00:00' and time <= '2020-01-31 00:00:00'
group by date(time), link_id
union all
select count(1) as ct, link_id, date(created) dt, 'impressions'
from impression_source1
where created > '2020-01-01 00:00:00' and created <= '2020-01-31 00:00:00'
group by date(created), link_id
union all
select count(1) as ct, link_id, date(time) dt, 'impressions'
from impression_source2
where time > '2020-01-01 00:00:00' and time <= '2020-01-31 00:00:00'
group by date(time), link_id
) t1 on t1.link_id = l.id
where l.agent_id = 300
and p.id = 3454
and l.banner_size = 48
and p.company NOT IN (13592, 28189)
group by c.id, l.banner_size, t1.dt
having (total_sales + total_clicks + total_impressions) > 0
order by dt DESC, p.id ASC, l.banner_size ASC
you can just add inner joins with links to all the subqueries
select count(1) as ct, s.link_id, date(s.clicked) dt, 'sales' as col
from sales s
join links l1 on l1.id = s.link_id
where s.clicked >= '2020-01-01 00:00:00' and s.clicked <= '2020-01-31 00:00:00'
group by date(s.clicked), s.link_id
union all etc.
Then you'll only get the rows with a match and the query should run faster
Here are 4 tables....
tbl_std_working_hour
tbl_attendance
tbl_holiday
tbl_leave
I want to find out the employees absentee reports by this query....but it takes times when I have applied this for many employees...is there any way to simplify this query?
SELECT date
FROM tbl_std_working_hour
WHERE date NOT IN (SELECT date FROM tbl_attendance WHERE emp_id = '$emp_id')
AND date NOT IN (SELECT date FROM tbl_holiday)
AND date NOT IN (SELECT date FROM tbl_leave WHERE emp_id = '$emp_id')
AND total_hour <> '00:00:00'
AND date >= '$start'
AND date <= '$end'
AND emp_id = '$emp_id'
First, I would rewrite using NOT EXISTS:
SELECT wh.date
FROM tbl_std_working_hour wh
WHERE NOT EXISTS (SELECT 1
FROM tbl_attendance a
WHERE a.date = wh.date AND a.emp_id = wh.emp_id
) AND
NOT EXISTS (SELECT 1
FROM tbl_holiday h
WHERE h.date = wh.date
) AND
NOT EXISTS (SELECT 1
FROM tbl_leave l
WHERE l.emp_id = wh.emp_id and l.date = wh.date
)
WHERE wh.total_hour <> '00:00:00' AND
wh.date >= '$start' AND
wh.date <= '$end' AND
wh.emp_id = '$emp_id';
Then add the following composite (multi-column) indexes:
tbl_std_working_hour(emp_id, date, total_hour)
tbl_attendance(emp_id, date)
tbl_holiday(date) (might already exist if date is the primary key or unique)
tbl_leave(emp_id) (might already exist if emp_id is the primary key or unique)
Note that I changed the subqueries to refer to the emp_id in the outer query. This makes it easier to change the emp_id. In addition, your query should be using parameters for the values in the WHERE clause.
This is also a better way that can work better using UNION ALL
SELECT date
FROM tbl_std_working_hour AS tswh
WHERE NOT EXISTS(
SELECT date FROM tbl_attendance ta WHERE ta.date = tswh.date AND ta.emp_id = tswh.emp_id
UNION ALL
SELECT date FROM tbl_holiday th WHERE th.date = tswh.date
UNION ALL
SELECT date FROM tbl_leave tl WHERE tl.date = tswh.date AND tl.emp_id = tswh.emp_id)
AND total_hour <> '00:00:00'
AND date >= '$start'
AND date <= '$end'
AND emp_id = '$emp_id'
I have a query for 1 particular customer_id. How can I execute this query for each customer_id in table?
SELECT *
FROM table
WHERE date <= '2015-12-31 23:59:59' AND customer_id = 100
ORDER BY date DESC
LIMIT 1
You can use NOT EXISTS():
SELECT * FROM YourTable t
WHERE t.date <= '2015-12-31 23:59:59'
AND NOT EXISTS(SELECT 1 FROM YourTable s
WHERE t.customer_id = s.customer_id
AND t.date < s.date)
This will select only a record after the date filter where NOT EXISTS another record for the same id with a bigger date. Its basically the same as limit 1 for all.
You can use Inner join:
SELECT * FROM YourTable t INNER JOIN
(SELECT * FROM Table) s
ON t.customer_id = s.customer_id
WHERE t.date = '2015-12-31 23:59:59'
ORDER BY date DESC
i have a query with subqueries for a timeline widget of participants, leads and customers.
For example with 15k rows in the table but only 2k in this date range (January 1st to January 28th) this takes about 40 seconds!
SELECT created_at as date,
(
SELECT COUNT(id)
FROM participant
WHERE created_at <= date
) as participants,
(
SELECT COUNT(DISTINCT id)
FROM participant
WHERE participant_type = "lead"
AND created_at <= date
) as leads,
(
SELECT COUNT(DISTINCT id)
FROM participant
WHERE participant_type = "customer"
AND created_at <= date
) as customer
FROM participant
WHERE created_at >= '2016-01-01 00:00:00'
AND created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
How can i improve the performance?
The table fields are declared as follows:
id => primary_key, INT 10, auto increment
participant_type => ENUM "lead,customer", NULLABLE, ut8_unicode_ci
created_at => TIMESTAMP, default '0000-00-00 00:00:00'
Possibly try using conditions within the counts (or sums) to get the values you want, having cross joined things:-
SELECT a.created_at as date,
SUM(IF(b.created_at <= a.created_at, 1, 0)) AS participants,
COUNT(DISTINCT IF(b.participant_type = "lead" AND b.created_at <= a.created_at, b.id, NULL)) AS leads,
COUNT(DISTINCT IF(b.participant_type = "customer" AND b.created_at <= a.created_at, b.id, NULL)) AS customer
FROM participant a
CROSS JOIN participant b
WHERE a.created_at >= '2016-01-01 00:00:00'
AND a.created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
or maybe move the date check into the join
SELECT a.created_at as date,
COUNT(b.id) AS participants,
COUNT(DISTINCT IF(b.participant_type = "lead", b.id, NULL)) AS leads,
COUNT(DISTINCT IF(b.participant_type = "customer", b.id, NULL)) AS customer
FROM participant a
LEFT OUTER JOIN participant b
ON b.created_at <= a.created_at
WHERE a.created_at >= '2016-01-01 00:00:00'
AND a.created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
I'm not clearly understanding what you want to do with this query. But may I can provide way for optimization.
Try this one:
SELECT
participants.day as day,
participants.total_count,
leads.lead_count,
customer.customer_count
FROM
(
SELECT created_at as day, COUNT(id) as total_count
FROM participant
WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as participants
LEFT JOIN
(
SELECT created_at as day, COUNT(DISTINCT id) as lead_count
FROM participant
WHERE participant_type = "lead"
AND created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as leads ON (participants.day = leads.day)
LEFT JOIN
(
SELECT created_at as day, COUNT(DISTINCT id) as customer_count
FROM participant
WHERE participant_type = "customer"
AND WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as customer ON (participants.day = customer.day)
Add index to the query. You can execute Explain on this query.
With the help of EXPLAIN, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows.
I have the following query that gets some results I would like to group by attribute 'state'.
I tried different subquery but they didn't work and I'm a bit blocked.
The SQL is:
SELECT state, id_candidate_basic, MAX( DATE ) FROM `candidate_state`
WHERE `date` <= '2013-09-06 00:00:00' GROUP BY id_candidate_basic
ORDER BY `candidate_state`.`id_candidate_basic` DESC
This returns currently:
I would get a count(*) for each state. Example:
F, 14
I, 10
O, 9
SELECT state,
id_candidate_basic,
MAX( DATE ),
COALESCE(totalCount, 0) totalCount
FROM `candidate_state`
LEFT JOIN
(
SELECT state, COUNT(*) totalCount
FROM candidate_state
WHERE `date` <= '2013-09-06 00:00:00'
GROUP BY state
) ON candidate_state.state = b.state
WHERE `date` <= '2013-09-06 00:00:00'
GROUP BY id_candidate_basic
ORDER BY `candidate_state`.`id_candidate_basic` DESC