Generic greatest N per group query is too slow

Generic greatest N per group query is too slow - mysql

The following query takes 18 minutes to complete. How can I optimize it to execute faster?
Basically, my query for every citizen joins row from citizens_static and citizens_dynamic table where update_id_to column is highest.
INSERT INTO latest_tmp (...)
SELECT cs1.*, cd1.*
FROM citizens c
JOIN citizens_static cs1 ON c.id = cs1.citizen_id
JOIN citizens_dynamic cd1 ON c.id = cd1.citizen_id
JOIN (
SELECT citizen_id, MAX(update_id_to) AS update_id_to
FROM citizens_static
GROUP BY citizen_id
) AS cs2 ON c.id = cs2.citizen_id AND cs1.update_id_to = cs2.update_id_to
JOIN (
SELECT citizen_id, MAX(update_id_to) AS update_id_to
FROM citizens_dynamic
GROUP BY citizen_id
) cd2 ON c.id = cd2.citizen_id AND cd1.update_id_to = cd2.update_id_to;
latest_tmp table is MyISAM table with indexes disabled during import. Disabling them improved execution time from 20 minutes to 18 minutes, so it's not the biggest problem.
I also benchmarked LEFT JOIN approach with WHERE t2.column IS NULL. It takes several hours comparing to INNER JOIN approach which I'm using.
Explain query output below. It seems to be using indexes.
citizens_dynamic and citizens_static have primary key on citizen_id,update_id_to and secondary key named "id" on update_id_to,citizen_id columns.

Could you explain, in English, what you want?
Then see Groupwise Max And edit the following as needed:
SELECT
province, n, city, population
FROM
( SELECT #prev := '', #n := 0 ) init
JOIN
( SELECT #n := if(province != #prev, 1, #n + 1) AS n,
#prev := province,
province, city, population
FROM Canada
ORDER BY
province,
population DESC
) x
WHERE n <= 3
ORDER BY province, n;
Regardless of the ASC/DESC on the inner ORDER BY, there will be a full table scan and a 'filesort'.

I'm not familiar enough with MySQL to be able to predict if this will run any better, but I would suggest to give this a try:
SELECT cs1.*, cd1.*
FROM citizens c
JOIN citizens_static cs1 ON c.id = cs1.citizen_id
AND NOT EXISTS ( SELECT *
FROM citizens_static cs2
WHERE cs2.citizen_id = cs1.citizen_id
AND cs2.update_id > cs1.update_id )
JOIN citizens_dynamic cd1 ON c.id = cd1.citizen_id
AND NOT EXISTS ( SELECT *
FROM citizens_dynamic cd2
WHERE cd2.citizen_id = cd1.citizen_id
AND cd2.update_id > cd1.update_id )
PS: Please comment the running time (if it returns within the hour =), that way I might learn (not) to propose this construction in the future again.

Related

Order by count(*) of my second table takes long time

Let's assume I have 2 tables. One contains car manufacturer's names and their IDs, the second contains information about car models. I need to select few of them from the first table, but order them by quantity of linked from the second table data.
Currently, my query looks like this:
SELECT DISTINCT `manufacturers`.`name`,
`manufacturers`.`cars_link`,
`manufacturers`.`slug`
FROM `manufacturers`
JOIN `cars`
ON manufacturers.cars_link = cars.manufacturer
WHERE ( NOT ( `manufacturers`.`cars_link` IS NULL ) )
AND ( `cars`.`class` = 'sedan' )
ORDER BY (SELECT Count(*)
FROM `cars`
WHERE `manufacturers`.cars_link = `cars`.manufacturer) DESC
It was working ok for my table of scooters which size is few dozens of mb. But now i need to do the same thing for the cars table, which size is few hundreds megabytes. The problem is that the query takes very long time, sometimes it even causes nginx timeout. Also, i think, that i have all the necesary database indexes. Is there any alternative for the query above?

lets try to use subquery for your count instead.
select * from (
select distinct m.name, m.cars_link, m.slug
from manufacturers m
join cars c on m.cars_link=c.manufacturer
left join
(select count(1) ct, c1.manufacturer from manufacturers m1
inner join cars_link c2 on m1.cars_link=c2.manufacturer
where coalesce(m1.cars_link, '') != '' and c1.class='sedan'
group by c1.manufacturer) as t1
on t1.manufacturer = c.manufacturer
where coalesce(m.cars_link, '') != '' and c.class='sedan') t2
order by t1.ct

Calculations using aliases (from a subquery to the same table) mySQL

I have a database that stores player kills in CS:GO, I am trying to write a query that can show each player's KD.
I've written a query that will show each player's kills and deaths using aliases.
SELECT
`Name`,
`SteamID` as PlayerID,
count(`EventType`) as kills,
(SELECT count(`EventType`)
FROM `logdata`
WHERE (`EventVariable` = PlayerID AND `EventType` = 'killed')
GROUP BY `EventVariable`
ORDER BY `count(``EventType``)` DESC) as deaths
FROM `logdata`
WHERE `EventType` = 'killed'
GROUP BY `EventType`, `Name`
ORDER BY kills DESC
(results limited to just bots, I didn't want to openly advertise my friends SteamIDs)
To work out KD I just need to divide kills / deaths but you can't do that with aliases, I read that I should be able to wrap the alias e.g. (SELECT kills) / (SELECT deaths) as KD but that doesn't work.
The table looks like this: (Limited to bots again)
I am currently working out KD in PHP using the result of my query but that isn't a great way of doing it. (I am unable to query who has the highest KD for example)
So, my question is, how would I go about calculating the KD if I am unable to make calculations using alias?

I might just write your query using two completely separate subqueries which compute the kills and deaths counts:
SELECT
n.Name,
COALESCE(t1.kill_cnt, 0) AS kills,
COALESCE(t2.death_cnt, 0) AS deaths,
CASE WHEN t2.death_cnt > 0
THEN CAST(t1.kill_cnt / t2.death_cnt AS CHAR(50))
ELSE 'NA' END AS ratio
FROM
( SELECT DISTINCT Name FROM logdata ) n
LEFT JOIN
(
SELECT Name, COUNT(*) AS kill_cnt
FROM logdata
WHERE EventType = 'killed'
GROUP BY Name
) t1
ON
n.Name = t1.Name
LEFT JOIN
(
SELECT EventVariable AS Name, COUNT(*) AS death_cnt
FROM logdata
WHERE EventType = 'killed'
GROUP BY Name
) t2
ON
n.Name = t2.Name
Note that the subquery above which I have aliased as n is just intended to generate a complete list of all users in your database. Ideally, there should be a dedicated user table somewhere. If not, and you don't like my approach, then you will have to come up with some other way to obtain a list of all users.

Thanks to Tim for pointing me in the right direction and providing a query. I have made some changes to get the result I want and I wanted to post the final result.
SELECT
n.SteamID,
COALESCE(t1.kill_cnt, 0) AS kills,
COALESCE(t2.death_cnt, 0) AS deaths,
CASE WHEN t2.death_cnt > 0 THEN CAST(t1.kill_cnt / t2.death_cnt AS CHAR(50))
WHEN t1.kill_cnt = 0 THEN '0'
ELSE 'Infinite' END AS ratio
FROM
( SELECT DISTINCT SteamID FROM logdata ) n
LEFT JOIN
(
SELECT SteamID, COUNT(*) AS kill_cnt
FROM logdata
WHERE EventType = 'killed'
GROUP BY SteamID
) t1
ON
n.SteamID = t1.SteamID
LEFT JOIN
(
SELECT EventVariable AS SteamID, COUNT(*) AS death_cnt
FROM logdata
WHERE EventType = 'killed'
GROUP BY EventVariable
) t2
ON
n.SteamID = t2.SteamID
WHERE t1.kill_cnt > 0 or t2.death_cnt > 0
ORDER BY `ratio` DESC
I attempted to get KD 0 to show as such but that is not all that important at the end of the day, NULL is easy to work with.

Double Aggregate Function Mysql

I want to take the maximum value from a series of returned values but I can't figure out a simple way to do it. My query returns all rows so 1/2 way there. I can filter it down with PHP but I'd like to do it all in SQL. I tried with a max subquery but that returned all results still.
DDL:
create table matrix(
count int(4),
date date,
product int(4)
);
create table products(
id int(4),
section int(4)
);
DML:
select max(magic_count), section, id
from (
select sum(count) as magic_count, p.section, p.id
from matrix as m
join products as p on m.product = p.id
group by m.product
) as faketable
group by id, section
Demo with my current try.
Only ids 1 and 3 should be returned from the sample data because they have the highest cumulative count for each of the sections.
Here's a second SQL fiddle that demonstrates the same issue.

Here you go:
select a.id,
a.section,
a.magic_count
from (
select p.id,
p.section,
magic_count
from (
select m.product, sum(count) as magic_count
from matrix m
group by m.product
) sm
join products p on sm.product = p.id
) a
left join (
select p.id,
p.section,
magic_count
from (
select m.product, sum(count) as magic_count
from matrix m
group by m.product
) sm
join products p on sm.product = p.id
) b on a.section = b.section and a.magic_count < b.magic_count
where b.id is null
see a simplified example (and other methods) in the manual entry for The Rows Holding the Group-wise Maximum of a Certain Column
see it working live here

Here you have solution without using JOINs, it has better performance than the other answer, which uses lot of JOINs:
select #rn := 1, #sectionLag := 0;
select id, section, count from (
select id,
case when #sectionLag = section then #rn := #rn + 1 else #rn := 1 end rn,
#sectionLag := section,
section,
count
from (
select id, section, sum(count) count
from matrix m
join products p on m.product = p.id
group by id, section
) a order by section, count desc
) a where rn = 1
Variables at the beginning are used to imitate window functions (LAG and ROW_NUMBER), which are available in MySQL 8.0 or higher (if you are using such version, let me know, so I will give you solution also with window functions).
DEMO
Another demo, where you can compare performance of my and the other query. It contains ~20K rows and my query tends to be almost 2 times faster.

MySQL SUM top n values for several columns and group

I have a MySQL table containing player points for serveral categories (p1, p2 etc) and player id (pid).
I have a query that counts SUM of points for each category, puts them as aliases and groups them by player id (pid).
SELECT *,
SUM(p1) as p1,
SUM(p2) as p2,
SUM(p3) as p3,
SUM(p4) as p4,
SUM(p6) as p6,
SUM(p13) as p13,
SUM(p14) as p14,
SUM(p15) as p15,
SUM(p16) as p16,
SUM(p17) as p17,
SUM(p18) as p18,
SUM(p19) as p19,
SUM(p20) as p20,
SUM(p21) as p21
FROM results GROUP BY pid
Futher I do a while loop and update other table with these alias values.
Now I have a need to count only top 5 or 12 (depending on a category) values for each group. I don't know where to start. I found similar questions, but none of them addresses putting value in an alias, so i don't have to change futher code.
Can someone help me, and write an example query for at least two categories, so i can understand a principle of doing this right?
Thank you in advance!

As we need to do sum of top n records, we need to use something like this:
SELECT pid, sum(p1)
FROM (SELECT p.*,
(#pn := if(#p = pid, #pn + 1,
if(#p := pid, 1, 1)
)
) as seqnum
FROM player p CROSS JOIN
(SELECT #p := 0, #pn := 0) as p1
ORDER BY pid, p1 DESC
) p
WHERE seqnum <= 1
GROUP BY pid;
Here, we can modify seqnum <= 1 condition as per the number of records needed. E.g. if we want 5 records then we need to write seqnum <= 5.
Please note that this will only calculate Top n sum for a particular field. If we want multiple fields then we may need to repeat the query.
Here is the SQL Fiddle example to play around with.

Building on the answer by #DarshanMehta , you can do repeated sub queries like that. Note that the variable names in each sub query need to be different.
Something like this, assuming you have a table of players:-
SELECT players.pid,
suba1.p1sum,
suba2.p2sum
FROM players
LEFT OUTER JOIN
(
SELECT pid, SUM(p1) AS p1sum
FROM (SELECT r.pid,
r.p1,
#p1n := if(#p1 = pid, #p1n + 1, 1) AS seqnum,
#p1 := pid
FROM results r
CROSS JOIN (SELECT #p1 := 0, #p1n := 0) as p1
ORDER BY r.pid, r.p1 DESC
) sub1
WHERE seqnum <= 5
GROUP BY pid
) suba1
ON players.pid = suba1.pid
LEFT OUTER JOIN
(
SELECT pid, SUM(p2) AS p1sum
FROM (SELECT r.pid,
r.p2,
#p2n := if(#p2 = pid, #p2n + 1, 1) AS seqnum,
#p2 := pid
FROM results r
CROSS JOIN (SELECT #p2 := 0, #p2n := 0) as p2
ORDER BY r.pid, r.p2 DESC
) sub1
WHERE seqnum <= 5
GROUP BY pid
) suba2
ON players.pid = suba1.pid

You can build a table with all that SUM information, and use this one:
SELECT * from newTable ORDER BY p1 DESC LIMIT 5;
and you can catch all info that you want, by changing the field p1 and LIMIT 5

sql query very slow when another table gets fuller

I have the following query, but after some time when users start putting in more and more items in the "ci_falsepositives" table, it gets really slow.
The ci_falsepositives table contains a reference field from ci_address_book and another reference field from ci_matched_sanctions.
How can I create a new query but still being able to sort on each field.
For example I can still sort on "hits" or "matches"
SELECT *, matches - falsepositives AS hits
FROM (SELECT c.*, IFNULL(p.total, 0) AS matches,
(SELECT COUNT(*)
FROM ci_falsepositives n
WHERE n.addressbook_id = c.reference
AND n.sanction_key IN
(SELECT sanction_key FROM ci_matched_sanctions)
) AS falsepositives
FROM ci_address_book c
LEFT JOIN
(SELECT addressbook_id, COUNT(match_id) AS total
FROM ci_matched_sanctions
GROUP BY addressbook_id) AS p
ON c.id = p.addressbook_id
) S
ORDER BY folder asc, wholename ASC
LIMIT 0,15

The problem has to be the SELECT COUNT(*) FROM ci_falsepositives sub-query. That sub-query can be written using an inner join between ci_falsepositives and ci_matched_sanctions, but the optimizer might do that for you anyway. What I think you need to do, though, is make that sub-query into a separate query in the FROM clause of the 'next query out' (that is, SELECT c.*, ...). Probably, that query is being evaluated multiple times - and that's what's hurting you when people add records to ci_falsepositives. You should study the query plan carefully.
Maybe this query will be better:
SELECT *, matches - falsepositives AS hits
FROM (SELECT c.*, IFNULL(p.total, 0) AS matches, f.falsepositives
FROM ci_address_book AS c
JOIN (SELECT n.addressbook_id, COUNT(*) AS falsepositives
FROM ci_falsepositives AS n
JOIN ci_matched_sanctions AS m
ON n.sanction_key = m.sanction_key
GROUP BY n.addressbook_id
) AS f
ON c.reference = f.addressbook_id
LEFT JOIN
(SELECT addressbook_id, COUNT(match_id) AS total
FROM ci_matched_sanctions
GROUP BY addressbook_id) AS p
ON c.id = p.addressbook_id
) AS s
ORDER BY folder asc, wholename ASC
LIMIT 0, 15

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Generic greatest N per group query is too slow - mysql

Related

Order by count(*) of my second table takes long time

Calculations using aliases (from a subquery to the same table) mySQL

Double Aggregate Function Mysql

MySQL SUM top n values for several columns and group

sql query very slow when another table gets fuller

Categories

Resources