I have a query I made:
SELECT DISTINCT player_1 AS player,
(SELECT COUNT(*) FROM results WHERE player_1=player OR player_2=player) AS since_start_matches,
(SELECT COUNT(*) FROM results WHERE (player_1=player OR player_2=player) AND ht_total_goals=0) AS since_start_ht_0,
(SELECT COUNT(*) FROM results WHERE (player_1=player OR player_2=player) AND ht_total_goals=1) AS since_start_ht_1,
(SELECT COUNT(*) FROM results WHERE (player_1=player OR player_2=player) AND ht_total_goals=2) AS since_start_ht_2,
(SELECT COUNT(*) FROM results WHERE (player_1=player OR player_2=player) AND ht_total_goals=3) AS since_start_ht_3,
(SELECT COUNT(*) FROM results WHERE (player_1=player OR player_2=player) AND ht_total_goals=4) AS since_start_ht_4,
(SELECT COUNT(*) FROM results WHERE (player_1=player OR player_2=player) AND ht_total_goals>=5) AS since_start_ht_5_plus
FROM results ORDER BY player
The results table has 25000 entries and it takes around 7 seconds to do this query, which is far too long. The query is incredibly inefficient as each column I'm creating is searching again on the same table but with different conditions.
I tried indexing the columns of interest in my where clause. This knocks off a couple of seconds. But it's still too slow.
What is the best approach to handle this kind of query?
I'm using MariaDB 10.2
Unpivot the data then aggregation:
SELECT player,
COUNT(*) AS since_start_matches,
SUM(ht_total_goals=0) AS since_start_ht_0,
SUM(ht_total_goals=1) AS since_start_ht_1,
SUM(ht_total_goals=2) AS since_start_ht_2,
SUM(ht_total_goals=3) AS since_start_ht_3,
SUM(ht_total_goals=4) AS since_start_ht_4,
SUM( ht_total_goals>=5) AS since_start_ht_5_plus
FROM ((SELECT player_1 as player, ht_total_goals
FROM results
) UNION ALL
(SELECT player_2 as player, ht_total_goals
FROM results
)
) p
GROUP BY player
You can use the LEFT JOIN and conditional aggregation as follows:
SELECT player_1 AS player,
COUNT(T2.player_1) AS since_start_matches,
SUM(CASE WHEN T2.ht_total_goals=0 THEN 1 END) AS since_start_ht_0,
SUM(CASE WHEN T2.ht_total_goals=1 THEN 1 END) AS since_start_ht_1,
SUM(CASE WHEN T2.ht_total_goals=2 THEN 1 END) AS since_start_ht_2,
SUM(CASE WHEN T2.ht_total_goals=3 THEN 1 END) AS since_start_ht_3,
SUM(CASE WHEN T2.ht_total_goals=4 THEN 1 END) AS since_start_ht_4,
SUM(CASE WHEN T2.ht_total_goals>=5 THEN 1 END) AS since_start_ht_5_PLUS
FROM results T1 LEFT JOIN results T2
ON (T2.player_1=T.player OR T2.player_2=T.player)
GROUP BY T1.PLA
YER_1
ORDER BY player
Related
I have this table called transactions, where agents can give certain amounts to other agents, we have 2 columns, one called agent_from which is the agent that put the amount and agent_to is the one reciving the amount.
An example with the id 1 would be that the agent2 is giving an amount of 300 to the agent8
The report that I would like to do is a sum and a group by agent_from and agent_to
Right now I am able to make the query separatly like this
SELECT agent_from,
SUM(amount) as from_transaccions
FROM `transactions` GROUP BY agent_from;
This would give me this result:
This return a sum of all the amounts made by agent_from.
Now I can repeat this query changing the column name from agent_from to agent_to so I can get the sum of all the amounts recived by agent_to, that will look like this:
An example would be that the agent8 recived 2 transaccions (300 + 450) = 750
Now what I want to do is make this 2 querys into one that will look like this:
Refer query below -
with data_cte as (
(select agent_from agent, amount, 'af' flag from transactions) union all
(select agent_to agent, amount, 'at' flag from transactions)
)
select agent,
sum(case when flag='af' then amount else 0 end) from_sum,
sum(case when flag='at' then amount else 0 end) to_sum
from data_cte
group by agent
union all
select 'total' as col1,
sum(case when flag='af' then amount else 0 end) from_sum,
sum(case when flag='at' then amount else 0 end) to_sum
from data_cte
group by col1
order by agent
fiddle.
Use UNION ALL to split each row of the table to 2 rows so that you separate the 2 agents and aggregate:
SELECT COALESCE(agent, 'total') agent,
SUM(`from`) `from`,
SUM(`to`) `to`
FROM (
SELECT agent_from agent, amount `from`, 0 `to` FROM `transactions`
UNION ALL
SELECT agent_to, 0 `from`, amount `to` FROM `transactions`
) t
GROUP BY agent WITH ROLLUP
ORDER BY GROUPING(agent);
See the demo.
I've got a table that has id, date, ad_id, ad_network, ad_event columns.
In my database there are millions of distinct ad_id each has a few events associated with them.
When I try to use GROUP BY on the ad_id to count each event it takes so long there is 503 error.
I need to count distinct AdClickThru and AdImpression so that I can calculate the CTR.
The problem is that one user can click many times, so I must count only one AdClickThru.
The query is below:
SELECT
`ad_network`,
`ad_id`,
SUM(DISTINCT CASE WHEN `ad_event` = "AdImpression" THEN 1 ELSE 0 END) as AdImpression,
SUM(DISTINCT CASE WHEN `ad_event` = "AdClickThru" THEN 1 ELSE 0 END) as AdClickThru
FROM `ads`
WHERE 1
AND `ad_event` IN ("AdImpression", "AdClickThru")
AND SUBSTR(`date`, 1, 7) = "2020-08"
GROUP BY `ad_id`
I have indexes on ad_id and ad_event + date but it does not help much.
How can I optimize this query?
The database will grow to billions of entries and more.
#edit
Forgot to mention that the code above is inner part of outer query:
SELECT
`ad_network`,
SUM(`AdImpression`) as cnt_AdImpression,
SUM(`AdClickThru`) as cnt_AdClickThru,
100 * SUM(`AdClickThru`) / SUM(`AdImpression`) as ctr
FROM (
SELECT
`ad_network`,
`ad_id`,
SUM(DISTINCT CASE WHEN `ad_event` = "AdImpression" THEN 1 ELSE 0 END) as AdImpression,
SUM(DISTINCT CASE WHEN `ad_event` = "AdClickThru" THEN 1 ELSE 0 END) as AdClickThru
FROM `ads`
WHERE 1
AND `ad_event` IN ("AdImpression", "AdClickThru")
AND SUBSTR(`date`, 1, 7) = "2020-08" -- better performance
GROUP BY `ad_id`
) a
GROUP BY `ad_network`
ORDER BY ctr DESC
The problem is that one user can click many times, so I must count only one AdClickThru.
Then use MAX(), not COUNT(DISTINCT). This gives the same result as your expression, and is much more efficient. I would also recommend rewriting the date filter so it is index-friendly:
SELECT
`ad_network`,
`ad_id`,
MAX(`ad_event` = 'AdImpression') as AdImpression,
MAX(`ad_event` = 'AdClickThru') as AdClickThru
FROM `ads`
WHERE 1
AND `ad_event` IN ('AdImpression', 'AdClickThru')
AND `date` >= '2020-08-01'
AND `date` < '2020-09-01'
GROUP BY `ad_id`
Notes:
the presence of ad_network in the select clause is hitching me: if there are several values per ad_id, it is undefined which will be picked. Either put this column in the group by clause as well, or use an aggregate function in the sélect clause (such as MAX(ad_network) - or if you are ok with an arbitrary value, then be explicit about it with any_value()
use single quotes for literal strings rather than double quotes (this is the SQL standard)
There is no need for 2 separate aggregations in the main query and the subquery.
You want to count the distinct ad_ids for each of the 2 cases:
SELECT ad_network,
COUNT(DISTINCT CASE WHEN ad_event = 'AdImpression' THEN ad_id END) AS cnt_AdImpression,
COUNT(DISTINCT CASE WHEN ad_event = 'AdClickThru' THEN ad_id END) AS cnt_AdClickThru,
100 *
COUNT(DISTINCT CASE WHEN ad_event = 'AdClickThru' THEN ad_id END) /
COUNT(DISTINCT CASE WHEN ad_event = 'AdImpression' THEN ad_id END) AS ctr
FROM ads
WHERE ad_event IN ('AdImpression', 'AdClickThru') AND SUBSTR(date, 1, 7) = '2020-08'
GROUP BY ad_network
ORDER BY ctr DESC
The problem here is that you have to repeat the expressions for cnt_AdImpression and cnt_AdClickThru.
You can calculate these expressions in a subquery:
SELECT ad_network, cnt_AdImpression, cnt_AdClickThru,
100 * cnt_AdClickThru / cnt_AdImpression AS ctr
FROM (
SELECT ad_network,
COUNT(DISTINCT CASE WHEN ad_event = 'AdImpression' THEN ad_id END) AS cnt_AdImpression,
COUNT(DISTINCT CASE WHEN ad_event = 'AdClickThru' THEN ad_id END) AS cnt_AdClickThru
FROM ads
WHERE ad_event IN ('AdImpression', 'AdClickThru') AND SUBSTR(date, 1, 7) = '2020-08'
GROUP BY ad_network
) t
ORDER BY ctr DESC
I have a mysql table called push_message_info that I want to filter and then calculate some results. The query to get the results is:
SELECT p.pending, s.success, t.total
FROM
(SELECT count(*) AS pending FROM push_message_info WHERE served < total) AS p,
(SELECT count(*) AS success FROM push_message_info WHERE served = total) AS s,
(SELECT count(*) AS total FROM push_message_info) AS t
;
And the thing is I would like to count only the newest entries. Since it has a datetime field called submit_date I've thought something like this would work:
SELECT p.pending, s.success, t.total
FROM
(SELECT * FROM push_message_info WHERE DATE(submit_date) > '2017-05-14') AS filtered
(SELECT count(*) AS pending FROM filtered WHERE served < total) AS p,
(SELECT count(*) AS success FROM filtered WHERE served = total) AS s,
(SELECT count(*) AS total FROM filtered) AS t
;
But it throws me the error:
ERROR 1146 (42S02): Table 'unifiedpush.filtered' doesn't exist
How can I filter/limit the original table so that I can apply other queries on that new table?
Your query isn't good in general.
Instead of making all these subqueries, you can to that with one simple query.
SELECT
SUM(CASE WHEN served < total THEN 1 ELSE 0 END) AS pending,
SUM(CASE WHEN served = total THEN 1 ELSE 0 END) AS served,
COUNT(*) as total
FROM push_message_info
WHERE DATE(submit_date) > '2017-05-14';
I've switched counts (except total) with SUM(CASE WHEN ... THEN 1 ELSE 0 END) which let you do conditional counts.
The problem in your query is the scope of the filtered alias: it is not visible from the other subqueries.
If you really really want to have it, you can create a view:
create view filtered as SELECT * FROM push_message_info WHERE DATE(submit_date) > '2017-05-14';
SELECT p.pending, s.success, t.total
FROM
(SELECT count(*) AS pending FROM filtered WHERE served < total) AS p,
(SELECT count(*) AS success FROM filtered WHERE served = total) AS s,
(SELECT count(*) AS total FROM filtered) AS t;
Well, I can tell you how I have accomplished something like this. Try:
SELECT SUM(IF(served < total, 1, 0)) as pending,
SUM(IF(served = total,1 ,0)) AS success,
COUNT(*) as total
FROM push_message_info
WHERE DATE(submit_date) > '2017-05-14';
That should give you what you need...
Im trying to join two count querys
SELECT COUNT(*) AS total FROM clients WHERE addedby = 1
UNION
SELECT COUNT(*) AS converts FROM clients WHERE addedby = 1 AND status = '6'
What this returns is
total
4
0
this is the correct data, what I was expecting was this
total converts
4 0
You don't need a UNION query to do this. SELECT A UNION SELECT B returns the rows of A followed by the rows of B (deduplicated; if you want all rows from both datasets, use UNION ALL).
What you want is something like this:
select
(select count(*) from clients where addedby=1) as total,
(select count(*) from clients where addedby=1 and status='6') as converts
Other way to do this is using a case ... end expression that returns 1 if status='6':
select
count(*) from clients,
sum(case when status='6' then 1 else 0 end) as converts
from clients
No UNION needed, do it in one pass.
SELECT COUNT(*) as total,
SUM(CASE status WHEN '6' THEN 1 ELSE 0 END) as converts
FROM clients;
The simplest way to write this query is as a conditional aggregation:
select count(*) as total, sum(status = '6') as converts
from cleints
where addedby = 1;
MySQL treats booleans as integers with 1 being true and 0 being false. You can just sum of the values to get a count.
I got an error: #1242 - Subquery returns more than 1 row when i run this sql.
CREATE VIEW test
AS
SELECT cc_name,
COUNT() AS total,
(SELECT COUNT(*)
FROM bed
WHERE respatient_id > 0
GROUP BY cc_name) AS occupied_beds,
(SELECT COUNT(*)
FROM bed
WHERE respatient_id IS NULL
GROUP BY cc_name) AS free_beds
FROM bed
GROUP BY cc_name;
The problem is that your subselects are returning more than one value - IE:
SELECT ...
(SELECT COUNT(*)
FROM bed
WHERE respatient_id IS NULL
GROUP BY cc_name) AS free_beds,
...
...will return a row for each cc_name, but SQL doesn't support compacting the resultset for the subselect - hence the error.
Don't need the subselects, this can be done using a single pass over the table using:
SELECT b.cc_name,
COUNT(*) AS total,
SUM(CASE
WHEN b.respatient_id > 0 THEN 1
ELSE 0
END) AS occupied_beds,
SUM(CASE
WHEN b.respatient_id IS NULL THEN 1
ELSE 0
END) AS free_beds
FROM bed b
GROUP BY b.cc_name
This is because your subqueries (the SELECT bits that are inside parentheses) are returning multiple rows for each outer row. The problem is with the GROUP BY; if you want to use subqueries for this, then you need to correlate them to the outer query, by specifying that they refer to the same cc_name as the outer query:
CREATE VIEW test
AS
SELECT cc_name,
COUNT() AS total,
(SELECT COUNT()
FROM bed
WHERE cc_name = bed_outer.cc_name
AND respatient_id > 0) AS occupied_beds,
(SELECT COUNT(*)
FROM bed
WHERE cc_name = bed_outer.cc_name
WHERE respatient_id IS NULL) AS free_beds
FROM bed AS bed_outer
GROUP BY cc_name;
(See http://en.wikipedia.org/wiki/Correlated_subquery for information about correlated subqueries.)
But, as OMG Ponies and a1ex07 say, you don't actually need to use subqueries for this if you don't want to.
Your subqueries return more than 1 row. I think you you need something like :
SELECT COUNT(*) AS total,
COUNT(CASE WHEN respatient_id > 0 THEN 1 END) AS occupied_beds,
COUNT(CASE WHEN respatient_id IS NULL THEN 1 END) AS free_beds
FROM bed
GROUP BY cc_name
You can also try to use WITH ROLLUP + pivoting (mostly for learning purposes, it's a much longer query ) :
SELECT cc_name,
MAX(CASE
WHEN num_1 = 1 THEN tot_num END) AS free_beds,
MAX(CASE
WHEN num_1 = 2 THEN tot_num END) AS occupied_beds,
MAX(CASE
WHEN num_1 = IS NULL THEN tot_num END) AS total
FROM
(SELECT cc_name, CASE
WHEN respatient_id > 0 THEN 1
WHEN respatient_id IS NULL THEN 2
ELSE 3 END as num_1,
COUNT(*) as tot_num
FROM bed
WHERE
CASE
WHEN respatient_id > 0 THEN 1
WHEN respatient_id IS NULL THEN 2
ELSE 3 END != 3
GROUP BY cc_name,
num_1 WITH ROLLUP)A
GROUP BY cc_name
SELECT COUNT()
FROM bed
WHERE respatient_id > 0
GROUP BY cc_name
You need to remove the group-by in the sub query, so possibly something like
SELECT COUNT(*)
FROM bed
WHERE respatient_id > 0
or possibly -- depending on what your application logic is....
SELECT COUNT(*) from (
select count(*),cc_name FROM bed
WHERE respatient_id > 0
GROUP BY cc_name) filterview