Highscores on multiple columns, efficient query, right approach - mysql

Let's say we've got high scores table with columns app_id, best_score, best_time, most_drops, longest_something and couple more.
I'd like to collect top three results ON EACH CATEGORY grouped by app_id?
For now I'm using separate rank queries on each category in a loop:
SELECT app_id, best_something1,
FIND_IN_SET( best_something1,
(SELECT GROUP_CONCAT( best_something1
ORDER BY best_something1 DESC)
FROM highscores )) AS rank
FROM highscores
ORDER BY best_something1 DESC LIMIT 3;
Two things worth to add:
All columns for specific app are being updated at the same time (can consider creating a helper table).
the result of prospective "turbo query" might be requested quite often - as often as updating the values.
I'm quite basic with SQL and suspect that it has many more commands that combined together could do the magic?
What I'd expect from this post is that some wise owl would at least point the direction where to go or how to go.
The sample table:
http://sqlfiddle.com/#!2/eef053/1
Here is sample result too (already in json format, sry):
{"total_blocks":[["13","174","1"],["9","153","2"],["10","26","3"]],"total_games":[["13","15","1"],["9","12","2"],["10","2","3"]],"total_score":[["13","410","1"],["9","332","2"],["11","88","3"]],"aver_pps":[["11","4.34011","1"],["13","2.64521","2"],["12","2.60623","3"]],"aver_drop_per_game":[["11","20","1"],["10","13","2"],["9","12.75","3"]],"aver_drop_val":[["11","4.4","1"],["13","2.35632","2"],["9","2.16993","3"]],"aver_score":[["11","88","1"],["9","27.6667","2"],["13","27.3333","3"]],"best_pps":[["13","4.9527","1"],["11","4.34011","2"],["9","4.13076","3"]],"most_drops":[["11","20","1"],["9","16","2"],["13","16","2"]],"longest_drop":[["9","3","1"],["13","2","2"],["11","2","2"]],"best_drop":[["11","42","1"],["13","36","2"],["9","30","3"]],"best_score":[["11","88","1"],["13","78","2"],["9","58","3"]]}

When I encounter this scenario, I prefer to employ the UNION clause, and combine the queries tailored to each ORDERing and LIMIT.
http://dev.mysql.com/doc/refman/5.1/en/union.html
UNION combines the result rows vertically (top 3 rows for 5 sort categories yields 15 rows).
For your specific purpose, you might then pivot them as sub-SELECTs, rolling them up with GROUP_CONCAT GROUPed on user so that each has the delimited list.

I'd test something like this query, to see if the performance is any better or not. I think this comes pretty close to satisfying the specification:
( SELECT 99 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'total_blocks' AS category
, b.`total_blocks` AS val
, b.user_id
FROM app b
ORDER BY b.`total_blocks` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`total_blocks` AS val
FROM app t
ORDER BY t.`total_blocks` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
UNION ALL
( SELECT 97 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'XXX' AS category
, b.`XXX` AS val
, b.user_id
FROM app b
ORDER BY b.`XXX` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`XXX` AS val
FROM app t
ORDER BY t.`XXX` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
ORDER BY seq_ DESC, val DESC
To unpack this a little bit... this is essentially separate queries that are combined with UNION ALL set operator.
Each of the queries returns a literal value to allow for ordering. (In this case, I've given the column a rather anonymous name seq_ (sequence)... if the specific order isn't important, then this could be removed.
Each query is also returning a literal value that tells which "category" the row is for.
Because some of the values returned are INTEGER, and others are FLOAT, I'd cast all of those values to floating point, so the datatypes of each query line up.
For the FLOAT (floating point) type values, there can be a problem with comparison. So I'd go with casting those to decimal and stringing them together into a list using GROUP_CONCAT (as the original query does).
Since we are returning only three rows from each query, we only need to concatenate together the three largest values. (If there's a two way "tie" for first place, we'll return rank values of 1, 1, 3.)
Suitable indexes for each query will improve performance for large sets.
... ON app (total_blocks, user_id)
... ON app (best_pps,user_id)
... ON app (XXX,user_id)

Related

SQL query:Having number=max(number) doesn't work

I have two tables,Writer and Books. A writer can pruduce many books. I want to get the all writers who produce maximal number of books.
Firstly, my sql query is like:
SELECT Name FROM(
SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE
Writer.ID=Book.ID
GROUP BY Writer.Name
)
WHERE NUMBER=(SELECT MAX(NUMBER) FROM
(SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE Writer.ID=Book.ID
GROUP BY Writer.Name
)
It works. However I think this query is too long and there exists some duplications. I want to make this query shorter. So I try another query like this:
SELECT Name FROM(
SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE
Writer.ID=Book.ID
GROUP BY Writer.Name
HAVING NUMBER = MAX(NUMBER)
)
However, this HAVING clause doesn't work and my sqlite says its an error.
I don't know why. Can anyone explain to me ? Thank you!
The HAVING clause provides filtering on the final set (typically after a group by) and does not provide additional grouping functionality. Think of it just like a WHERE clause, but can be applied after a GROUP BY.
Your query with the HAVING NUMBER = MAX(NUMBER) implies grouping of the set of NUMBER values across all records and doesn't make sense in this example (even though we all get what you want it to do).
Each query provides you with one level of aggregation, so you cannot use Max on COUNT in the same query. You need a sub-query like you did in your first query.
However, your first query can be simplified on MySQL to:
SELECT Writer.Name
FROM Writer, Book
WHERE Writer.ID = Book.ID
GROUP BY Writer.Name
HAVING COUNT(Book.ID) = (SELECT COUNT(Book.ID) AS n
FROM Writer, Book
WHERE Writer.ID = Book.ID
GROUP BY Writer.Name
ORDER BY n DESC
LIMIT 1)
In MySQL (but not SQLite), you can use variables to reduce the amount of work and make a simpler query. However, there are nuances there, because variables with group by require an extra level of subqueries:
SELECT name
FROM (SELECT t.*, (#m := if(#m = 0, NUMBER, #m)) as maxn
FROM (SELECT w.Name, COUNT(b.ID) AS NUMBER
FROM Writer w JOIN
Book b
ON w.ID = b.ID
GROUP BY w.Name
) t CROSS JOIN
(SELECT #m := 0) params
ORDER BY NUMBER desc
) t
WHERE maxn = number;
It looks like you are nesting aggregate functions, which is not allowed.
HAVING NUMBER = MAX(NUMBER) is like HAVING COUNT(Book.ID) = MAX(COUNT(Book.ID))
Nesting COUNT inside MAX seems to be the issue here

Incorrect group by and order by merge

I have couple tables joined in MySQL - one has many others.
And try to select items from one, ordered by min values from another table.
Without grouping in seems to be like this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
order by product_kits.new_price ASC
Result:
But when I add group by, I get this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
group by `catalog_products`.`id`
order by product_kits.new_price ASC
Result:
And this is incorrect sorting!
Somehow when I group this results, I get id 280 before 281!
But I need to get:
281|1600.00
280|2340.00
So, grouping breaks existing ordering!
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
select CP.`id`, CP.`alias`, TK.`minPrice`
from catalog_products CP
left join `product_kits` PK on PK.`product_id` = CP.`id`
left join (
SELECT MIN(`new_price`) AS "minPrice", `id` FROM product_kits GROUP BY `id`
) AS TK on TK.`id` = PK.`id`
where CP.`category_id` IN ('62')
order by PK.`new_price` ASC
group by CP.`id`
The thing is that group by does not recognize order by in MySQL.
Actually, what I was doing is really bad practice.
In this case you should use distinct and by catalog_products.*
In my opinion, group by is really useful when you need group result of agregated functions.
Otherwise you should not use it to get unique values.

How to sort result by highest value, using 2 columns?

I have a lot of users with websites and I want to select all websites and sort them by visitor amount. The users can specify the visitor amount in 2 ways. Either they can input it manually as a string that is stored in fb.visitor in the query below.
The second way is that he user install a Javascript Tracking Code on their site that then adds entries to the table tracking_visits and the total amount of visits is count(tv.id) below.
I want to be able to sort this result in 2 ways.
1) I want to get the highest result on top and lowest at bottom, using both columns. Example the Result should be:
99'947 ( COUNT(tv.id) )
75'412 ( COUNT(tv.id) )
40'000 ( fb.visitors )
37'482 ( COUNT(tv.id) )
30'000 ( fb.visitors )
2) Second sort I would like to be able to get all COUNT(tv.id) on top, highest first, and then get fb.visitors with highest first below. Example:
99'947 ( COUNT(tv.id) )
75'412 ( COUNT(tv.id) )
37'482 ( COUNT(tv.id) )
40'000 ( fb.visitors )
30'000 ( fb.visitors )
My current Query looks like this:
SELECT cs.userid, fb.visitors, COUNT( tv.id )
FROM campaigns_signups cs
INNER JOIN fe_blogs fb ON cs.userid = fb.userid
INNER JOIN tracking_visits tv ON tv.blogid = cs.userid
WHERE tv.visitdate
BETWEEN "2013-09-04"
AND "2013-10-04"
AND cs.campaignid = "97"
AND cs.status < "4"
GROUP BY tv.blogid
ORDER BY COUNT( tv.id ) , fb.visitors DESC
Note that the Dates and Integers in the Query is just examples.
The problem with this query is that it only selects the result that has entries in tracking_visits. I want to select a result where I get BOTH bloggers who have visitor amount in tracking_visits AND blogs who have visitor amount in fb.visitors.
For your first task, you can use ORDER BY GREATEST(COUNT(tv.id), fb.visitors) DESC. Documentation on GREATEST. For your second, you will want to use UNION. Documentation on UNION.
If for your first task you want each site to yield two rows (one for the greatest of the two values and the other for the least), you can again achieve this using UNION.
You are looking for greatest
select greatest(ifnull(fb.visitors,0),count(tv_id)) from.... order by 1
select greatest(ifnull(fb.visitors,0),count(tv_id)) from....
order by
case when greatest(ifnull(fb.visitors,0),count(tv_id))=fb.bisitors then 2 else 1 end, greatest(ifnull(fb.visitors,0),count(tv_id))
the second order by case orders by source of value and then by value size
For the second option of selecting the COUNT(tv.id) first, I was able to accomplish this by the following query:
SELECT *, tv.tracked_visits
FROM campaigns_signups cs
INNER JOIN fe_blogs fb ON cs.userid = fb.userid
LEFT JOIN (
SELECT blogid, COUNT( id ) AS "tracked_visits"
FROM tracking_visits
WHERE visitdate
BETWEEN "2013-09-04"
AND "2013-10-04"
GROUP BY blogid
) AS tv ON tv.blogid = cs.userid
WHERE cs.campaignid = :campaignid
AND cs.status < :status
ORDER BY tv.tracked_visits DESC , fb.visitors DESC

How to use actual row count (COUNT(*)) in WHERE clause without writing the same query as subquery?

I have something like this:
SELECT id, fruit, pip
FROM plant
WHERE COUNT(*) = 2;
This weird query is self explanatory I guess. COUNT(*) here means the number of rows in plant table. My requirement is that I need to retrieve values from specified fields only if total number of rows in table = 2. This doesn't work but: invalid use of aggregate function COUNT.
I cannot do this:
SELECT COUNT(*) as cnt, id, fruit, pip
FROM plant
WHERE cnt = 2;
for one, it limits the number of rows outputted to 1, and two, it gives the same error: invalid use of aggregate function.
What I can do is instead:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
But then that subquery is the main query re-run. I'm presenting here a small example of the larger part of the problem, though I know an additional COUNT(*) subquery in the given example isn't that big an overhead.
Edit: I do not know why the question is downvoted. The COUNT(*) I'm trying to get is from a view (a temporary table) in the query which is a large query with 5 to 6 joins and additional where clauses. To re-run the query as a subquery to get the count is inefficient, and I can see the bottleneck as well.
Here is the actual query:
SELECT U.UserName, E.Title, AE.Mode, AE.AttemptNo,
IF(AE.Completed = 1, 'Completed', 'Incomplete'),
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
) AS Inst_Count,
(
SELECT COUNT(DISTINCT(AQ.FK_QId))
FROM attempt_question AS AQ
JOIN `question` AS Q
ON Q.PK_Id = AQ.FK_QId
LEFT JOIN actions AS A
ON A.FK_QId = AQ.FK_QId
WHERE AQ.FK_ExcAttemptId = #excAttemptId
AND (
Q.Type = #descQtn
OR Q.Type = #actQtn
AND A.type = 'CTVI.NotImplemented'
AND A.IsDelete = #status
AND (
SELECT COUNT(*)
FROM actions
WHERE FK_QId = A.FK_QId
AND type != 'CTVI.NotImplemented'
AND IsDelete = #status
) = 0
)
) AS NotEvalInst_Count,
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
AND Mark = #mark
) AS CorrectAns_Count,
E.AllottedTime, AE.TimeTaken
FROM attempt_exercise AS AE
JOIN ctvi_exercise_tblexercise AS E
ON AE.FK_EId = E.PK_EId
JOIN ctvi_user_table AS U
ON AE.FK_UId = U.PK_Id
JOIN ctvi_grade AS G
ON AE.FK_GId = G.PK_GId
WHERE AE.PK_Id = #excAttemptId
-- AND COUNT(AE.*) = #number --the portion in contention.
Kindly ignore the above query and guide me to right direction from the small example query I posted, thanks.
In MySQL, you can only do what you tried:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
or this variation:
SELECT id, fruit, pip
FROM plant
JOIN
(
SELECT COUNT(*) AS cnt
FROM plant
) AS c
ON c.cnt = 2;
Whether the 1st or the 2nd is more efficient, depends on the version of MySQL (and the optimizer). I would bet on the 2nd one, on most versions.
In other DBMSs, that have window functions, you can also do the first query that #Andomar suggests.
Here is a suggestion to avoid the bottleneck of calculating the derived table twice, once to get the rows and once more to get the count. If the derived table is expensive to be calculated, and its rows are thousands or millions, calculating them twice only to throw them away, is a problem, indeed. This may improve efficiency as it will limit the intermediately (twice) calculated rows to 3:
SELECT p.*
FROM
( SELECT id, fruit, pip
FROM plant
LIMIT 3
) AS p
JOIN
( SELECT COUNT(*) AS cnt
FROM
( SELECT 1
FROM plant
LIMIT 3
) AS tmp
) AS c
ON c.cnt = 2 ;
After re-reading your question, you're trying to return rows only if there are 2 rows in the entire table. In that case I think your own example query is already the best.
On another DBMS, you could use a Windowing function:
select *
from (
select *
, count(*) over () as cnt
from plant
) as SubQueryAlias
where cnt = 2
But the over clause is not supported on MySQL.
old wrong anser below
The where clause works before grouping. It works on single rows, not groups of rows, so you can't use aggregates like count or max in the where clause.
To set filters that work on groups of rows, use the having clause. It works after grouping and can be used to filter with aggregates:
SELECT id, fruit, pip
FROM plant
GROUP BY
id, fruit, pip
HAVING COUNT(*) = 2;
The other answers do not fulfill the original question which was to filter the results "without using a subquery".
You can actually do this by using a variable in 2 consecutive MySQL statements:
SET #count=0;
SELECT * FROM
(
SELECT id, fruit, pip, #count:=#count+1 AS count
FROM plant
WHERE
) tmp
WHERE #count = 2;

How to limiting subquery requests to one?

I was thinking a way to using one query with a subquery instead of using two seperate queries.
But turns out using a subquery is causing multiple requests for each row in result set. Is there a way to limit that count subquery result only one with in a combined query ?
SELECT `ad_general`.`id`,
( SELECT count(`ad_general`.`id`) AS count
FROM (`ad_general`)
WHERE `city` = 708 ) AS count,
FROM (`ad_general`)
WHERE `ad_general`.`city` = '708'
ORDER BY `ad_general`.`id` DESC
LIMIT 15
May be using a join can solve the problem but dunno how ?
SELECT ad_general.id, stats.cnt
FROM ad_general
JOIN (
SELECT count(*) as cnt
FROM ad_general
WHERE city = 708
) AS stats
WHERE ad_general.city = 708
ORDER BY ad_general.id DESC
LIMIT 15;
The explicit table names aren't required, but are used both for clarity and maintainability (the explicit table names will prevent any imbiguities should the schema for ad_general or the generated table ever change).
You can self-join (join the table to itself table) and apply aggregate function to the second.
SELECT `adgen`.`id`, COUNT(`adgen_count`.`id`) AS `count`
FROM `ad_general` AS `adgen`
JOIN `ad_general` AS `adgen_count` ON `adgen_count`.city = 708
WHERE `adgen`.`city` = 708
GROUP BY `adgen`.`id`
ORDER BY `adgen`.`id` DESC
LIMIT 15
However, it's impossible to say what the appropriate grouping is without knowing the structure of the table.