SQL query:Having number=max(number) doesn't work - mysql

I have two tables,Writer and Books. A writer can pruduce many books. I want to get the all writers who produce maximal number of books.
Firstly, my sql query is like:
SELECT Name FROM(
SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE
Writer.ID=Book.ID
GROUP BY Writer.Name
)
WHERE NUMBER=(SELECT MAX(NUMBER) FROM
(SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE Writer.ID=Book.ID
GROUP BY Writer.Name
)
It works. However I think this query is too long and there exists some duplications. I want to make this query shorter. So I try another query like this:
SELECT Name FROM(
SELECT Writer.Name,COUNT(Book.ID) AS NUMBER FROM Writer,Book
WHERE
Writer.ID=Book.ID
GROUP BY Writer.Name
HAVING NUMBER = MAX(NUMBER)
)
However, this HAVING clause doesn't work and my sqlite says its an error.
I don't know why. Can anyone explain to me ? Thank you!

The HAVING clause provides filtering on the final set (typically after a group by) and does not provide additional grouping functionality. Think of it just like a WHERE clause, but can be applied after a GROUP BY.
Your query with the HAVING NUMBER = MAX(NUMBER) implies grouping of the set of NUMBER values across all records and doesn't make sense in this example (even though we all get what you want it to do).

Each query provides you with one level of aggregation, so you cannot use Max on COUNT in the same query. You need a sub-query like you did in your first query.
However, your first query can be simplified on MySQL to:
SELECT Writer.Name
FROM Writer, Book
WHERE Writer.ID = Book.ID
GROUP BY Writer.Name
HAVING COUNT(Book.ID) = (SELECT COUNT(Book.ID) AS n
FROM Writer, Book
WHERE Writer.ID = Book.ID
GROUP BY Writer.Name
ORDER BY n DESC
LIMIT 1)

In MySQL (but not SQLite), you can use variables to reduce the amount of work and make a simpler query. However, there are nuances there, because variables with group by require an extra level of subqueries:
SELECT name
FROM (SELECT t.*, (#m := if(#m = 0, NUMBER, #m)) as maxn
FROM (SELECT w.Name, COUNT(b.ID) AS NUMBER
FROM Writer w JOIN
Book b
ON w.ID = b.ID
GROUP BY w.Name
) t CROSS JOIN
(SELECT #m := 0) params
ORDER BY NUMBER desc
) t
WHERE maxn = number;

It looks like you are nesting aggregate functions, which is not allowed.
HAVING NUMBER = MAX(NUMBER) is like HAVING COUNT(Book.ID) = MAX(COUNT(Book.ID))
Nesting COUNT inside MAX seems to be the issue here

Related

How to show last data (max data) of each group by on mysql

I have query like below:
SELECT kd.id_karir, kd.nama, kd.kelamin,
(YEAR(NOW())-YEAR(tanggal)) usia, MAX(pf.jenis), pf.jenis,
pf.nama AS pendidikan, pf.jurusan, kd.alamat, kd.telepon,
kd.handphone, kd.email, kd.tempat AS tempat_lahir,
kd.tanggal AS tanggal_lahir
FROM keadaan_diri AS kd
LEFT OUTER JOIN pendidikan_formal AS pf ON (kd.id_karir = pf.id_karir)
WHERE kd.id_karir = 'P1409047'
GROUP BY kd.id_karir
ORDER BY kd.nama ASC, pf.jenis DESC
I mean to returning the last data on the table pendidikan_formal using max and group but the query doesn't work.
First of all, you can / should (depending on the MySQL configuration) only select and order by columns that are part of your group by clause. For all other columns, you have to specify an aggregation function. For example, let's say you have two records of humans, both have the same name and a different age. When you group by name, you have to choose one of the two age values (max, min, average, ...). If you don't care which, you could turn off sql mode only full group by. I wouldn't suggest that however.
In order to get the one record with some maximum value however, group by is not the right approach. Take a look at these examples:
Subselect:
SELECT name, age, ...
FROM humans
WHERE age=(SELECT MAX(age) FROM humans);
Order by and limit:
SELECT name, age, ...
FROM humans
ORDER BY age DESC
LIMIT 1;
Left join:
SELECT name, age, ...
FROM humans h1
LEFT JOIN humans h2 ON h1.age < h2.age
WHERE h2.age IS NULL;
Now if you want all maximum rows per group, check one of these answers with tag greatest-n-per-group.
You can use a correlated subquery. Your question is a bit vague; I assume that id_karir is the group and tanggal is the date.
If I understand correctly, this would apply to your query as:
SELECT kd.id_karir, kd.nama, kd.kelamin,
(YEAR(NOW())-YEAR(tanggal)) usia, pf.jenis, pf.jenis,
pf.nama AS pendidikan, pf.jurusan, kd.alamat, kd.telepon,
kd.handphone, kd.email, kd.tempat AS tempat_lahir,
kd.tanggal AS tanggal_lahir
FROM keadaan_diri kd LEFT OUTER JOIN
pendidikan_formal pf
ON kd.id_karir = pf.id_karir AND
pf.tanggal = (SELECT MAX(pf2.tanggal) FROM pendidikan_formal pf2 WHERE pf2.id_karir = pf.id_karir)
This is not an aggregation query. This is a filtering query.

SQL query needs optimization

SELECT LM.user_id,LM.users_lineup_id, min( LM.total_score ) AS total_score
FROM vi_lineup_master LM JOIN
vi_contest AS C
ON C.contest_unique_id = LM.contest_unique_id join
(SELECT min( total_score ) as total_score
FROM vi_lineup_master
GROUP BY group_unique_id
) as preq
ON LM.total_score = preq.total_score
WHERE LM.contest_unique_id = 'iledhSBDO' AND
C.league_contest_type = 1
GROUP BY group_unique_id
Above query is to find the loser per group of game, query return accurate result but its not responding with large data. How can I optimize this?
You can try to move your JOINs to subqueries. Also, you should pay attention on your "wrong" GROUP BY usage on the outer query. In Mysql you can group by some columns and select others not specified in the group clause without any aggregation function, but the database can't ensure what data it will return to you. For the sake of consistency of your application, wrap them in an aggregation function.
Check if this one helps:
SELECT
MIN(LM.user_id) AS user_id,
MIN(LM.users_lineup_id) AS users_lineup_id,
MIN(LM.total_score) AS total_score
FROM vi_lineup_master LM
WHERE 1=1
-- check if this "contest_unique_id" is equals
-- to 'iledhSBDO' for a "league_contest_type" valued 1
AND LM.contest_unique_id IN
(
SELECT C.contest_unique_id
FROM vi_contest AS C
WHERE 1=1
AND C.contest_unique_id = 'iledhSBDO'
AND C.league_contest_type = 1
)
-- check if this "total_score" is one of the
-- "min(total_score)" from each "group_unique_id"
AND LM.total_score IN
(
SELECT MIN(total_score)
FROM vi_lineup_master
GROUP BY group_unique_id
)
GROUP BY LM.group_unique_id
;
Also, some pieces of this query may seem redundant, but it's because I did not want to change the filters you wrote, just moved them.
Also, your query logic seems a bit strange to me, based on the tables/columns names and how you wrote it... please, check the comments in my query which reflects what I understood of your implementation.
Hope it helps.

Highscores on multiple columns, efficient query, right approach

Let's say we've got high scores table with columns app_id, best_score, best_time, most_drops, longest_something and couple more.
I'd like to collect top three results ON EACH CATEGORY grouped by app_id?
For now I'm using separate rank queries on each category in a loop:
SELECT app_id, best_something1,
FIND_IN_SET( best_something1,
(SELECT GROUP_CONCAT( best_something1
ORDER BY best_something1 DESC)
FROM highscores )) AS rank
FROM highscores
ORDER BY best_something1 DESC LIMIT 3;
Two things worth to add:
All columns for specific app are being updated at the same time (can consider creating a helper table).
the result of prospective "turbo query" might be requested quite often - as often as updating the values.
I'm quite basic with SQL and suspect that it has many more commands that combined together could do the magic?
What I'd expect from this post is that some wise owl would at least point the direction where to go or how to go.
The sample table:
http://sqlfiddle.com/#!2/eef053/1
Here is sample result too (already in json format, sry):
{"total_blocks":[["13","174","1"],["9","153","2"],["10","26","3"]],"total_games":[["13","15","1"],["9","12","2"],["10","2","3"]],"total_score":[["13","410","1"],["9","332","2"],["11","88","3"]],"aver_pps":[["11","4.34011","1"],["13","2.64521","2"],["12","2.60623","3"]],"aver_drop_per_game":[["11","20","1"],["10","13","2"],["9","12.75","3"]],"aver_drop_val":[["11","4.4","1"],["13","2.35632","2"],["9","2.16993","3"]],"aver_score":[["11","88","1"],["9","27.6667","2"],["13","27.3333","3"]],"best_pps":[["13","4.9527","1"],["11","4.34011","2"],["9","4.13076","3"]],"most_drops":[["11","20","1"],["9","16","2"],["13","16","2"]],"longest_drop":[["9","3","1"],["13","2","2"],["11","2","2"]],"best_drop":[["11","42","1"],["13","36","2"],["9","30","3"]],"best_score":[["11","88","1"],["13","78","2"],["9","58","3"]]}
When I encounter this scenario, I prefer to employ the UNION clause, and combine the queries tailored to each ORDERing and LIMIT.
http://dev.mysql.com/doc/refman/5.1/en/union.html
UNION combines the result rows vertically (top 3 rows for 5 sort categories yields 15 rows).
For your specific purpose, you might then pivot them as sub-SELECTs, rolling them up with GROUP_CONCAT GROUPed on user so that each has the delimited list.
I'd test something like this query, to see if the performance is any better or not. I think this comes pretty close to satisfying the specification:
( SELECT 99 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'total_blocks' AS category
, b.`total_blocks` AS val
, b.user_id
FROM app b
ORDER BY b.`total_blocks` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`total_blocks` AS val
FROM app t
ORDER BY t.`total_blocks` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
UNION ALL
( SELECT 97 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'XXX' AS category
, b.`XXX` AS val
, b.user_id
FROM app b
ORDER BY b.`XXX` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`XXX` AS val
FROM app t
ORDER BY t.`XXX` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
ORDER BY seq_ DESC, val DESC
To unpack this a little bit... this is essentially separate queries that are combined with UNION ALL set operator.
Each of the queries returns a literal value to allow for ordering. (In this case, I've given the column a rather anonymous name seq_ (sequence)... if the specific order isn't important, then this could be removed.
Each query is also returning a literal value that tells which "category" the row is for.
Because some of the values returned are INTEGER, and others are FLOAT, I'd cast all of those values to floating point, so the datatypes of each query line up.
For the FLOAT (floating point) type values, there can be a problem with comparison. So I'd go with casting those to decimal and stringing them together into a list using GROUP_CONCAT (as the original query does).
Since we are returning only three rows from each query, we only need to concatenate together the three largest values. (If there's a two way "tie" for first place, we'll return rank values of 1, 1, 3.)
Suitable indexes for each query will improve performance for large sets.
... ON app (total_blocks, user_id)
... ON app (best_pps,user_id)
... ON app (XXX,user_id)

Alternative to mysql WHERE IN SELECT GROUP BY when wanting max value in group by

I have the following query, which was developed from a hint found online because of a problem with a GROUP BY returning the maximum value; but it's running really slowly.
Having looked online I'm seeing that WHERE IN (SELECT.... GROUP BY) is probably the issue, but, to be honest, I'm struggling to find a way around this:
SELECT *
FROM tbl_berths a
JOIN tbl_active_trains b on a.train_uid=b.train_uid
WHERE (a.train_id, a.TimeStamp) in (
SELECT a.train_id, max(a.TimeStamp)
FROM a
GROUP BY a.train_id
)
I'm thinking I possibly need a derived table, but my experience in this area is zero and it's just not working out!
you can move that to a SUBQUERY and also select only required columns instead of All (*)
SELECT a.train_uid
FROM tbl_berths a
JOIN tbl_active_trains b on a.train_uid=b.train_uid
JOIN (SELECT a.train_id, max(a.TimeStamp) as TimeStamp
FROM a
GROUP BY a.train_id )T
on a.train_id = T.train_id
and a.TimeStamp = T.TimeStamp

How to use actual row count (COUNT(*)) in WHERE clause without writing the same query as subquery?

I have something like this:
SELECT id, fruit, pip
FROM plant
WHERE COUNT(*) = 2;
This weird query is self explanatory I guess. COUNT(*) here means the number of rows in plant table. My requirement is that I need to retrieve values from specified fields only if total number of rows in table = 2. This doesn't work but: invalid use of aggregate function COUNT.
I cannot do this:
SELECT COUNT(*) as cnt, id, fruit, pip
FROM plant
WHERE cnt = 2;
for one, it limits the number of rows outputted to 1, and two, it gives the same error: invalid use of aggregate function.
What I can do is instead:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
But then that subquery is the main query re-run. I'm presenting here a small example of the larger part of the problem, though I know an additional COUNT(*) subquery in the given example isn't that big an overhead.
Edit: I do not know why the question is downvoted. The COUNT(*) I'm trying to get is from a view (a temporary table) in the query which is a large query with 5 to 6 joins and additional where clauses. To re-run the query as a subquery to get the count is inefficient, and I can see the bottleneck as well.
Here is the actual query:
SELECT U.UserName, E.Title, AE.Mode, AE.AttemptNo,
IF(AE.Completed = 1, 'Completed', 'Incomplete'),
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
) AS Inst_Count,
(
SELECT COUNT(DISTINCT(AQ.FK_QId))
FROM attempt_question AS AQ
JOIN `question` AS Q
ON Q.PK_Id = AQ.FK_QId
LEFT JOIN actions AS A
ON A.FK_QId = AQ.FK_QId
WHERE AQ.FK_ExcAttemptId = #excAttemptId
AND (
Q.Type = #descQtn
OR Q.Type = #actQtn
AND A.type = 'CTVI.NotImplemented'
AND A.IsDelete = #status
AND (
SELECT COUNT(*)
FROM actions
WHERE FK_QId = A.FK_QId
AND type != 'CTVI.NotImplemented'
AND IsDelete = #status
) = 0
)
) AS NotEvalInst_Count,
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
AND Mark = #mark
) AS CorrectAns_Count,
E.AllottedTime, AE.TimeTaken
FROM attempt_exercise AS AE
JOIN ctvi_exercise_tblexercise AS E
ON AE.FK_EId = E.PK_EId
JOIN ctvi_user_table AS U
ON AE.FK_UId = U.PK_Id
JOIN ctvi_grade AS G
ON AE.FK_GId = G.PK_GId
WHERE AE.PK_Id = #excAttemptId
-- AND COUNT(AE.*) = #number --the portion in contention.
Kindly ignore the above query and guide me to right direction from the small example query I posted, thanks.
In MySQL, you can only do what you tried:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
or this variation:
SELECT id, fruit, pip
FROM plant
JOIN
(
SELECT COUNT(*) AS cnt
FROM plant
) AS c
ON c.cnt = 2;
Whether the 1st or the 2nd is more efficient, depends on the version of MySQL (and the optimizer). I would bet on the 2nd one, on most versions.
In other DBMSs, that have window functions, you can also do the first query that #Andomar suggests.
Here is a suggestion to avoid the bottleneck of calculating the derived table twice, once to get the rows and once more to get the count. If the derived table is expensive to be calculated, and its rows are thousands or millions, calculating them twice only to throw them away, is a problem, indeed. This may improve efficiency as it will limit the intermediately (twice) calculated rows to 3:
SELECT p.*
FROM
( SELECT id, fruit, pip
FROM plant
LIMIT 3
) AS p
JOIN
( SELECT COUNT(*) AS cnt
FROM
( SELECT 1
FROM plant
LIMIT 3
) AS tmp
) AS c
ON c.cnt = 2 ;
After re-reading your question, you're trying to return rows only if there are 2 rows in the entire table. In that case I think your own example query is already the best.
On another DBMS, you could use a Windowing function:
select *
from (
select *
, count(*) over () as cnt
from plant
) as SubQueryAlias
where cnt = 2
But the over clause is not supported on MySQL.
old wrong anser below
The where clause works before grouping. It works on single rows, not groups of rows, so you can't use aggregates like count or max in the where clause.
To set filters that work on groups of rows, use the having clause. It works after grouping and can be used to filter with aggregates:
SELECT id, fruit, pip
FROM plant
GROUP BY
id, fruit, pip
HAVING COUNT(*) = 2;
The other answers do not fulfill the original question which was to filter the results "without using a subquery".
You can actually do this by using a variable in 2 consecutive MySQL statements:
SET #count=0;
SELECT * FROM
(
SELECT id, fruit, pip, #count:=#count+1 AS count
FROM plant
WHERE
) tmp
WHERE #count = 2;