Using LIMIT in a subquery based on another field in MySQL - mysql

Is it possible to use LIMIT based on another column inside a subquery in MySQL? Here is a working query of what I mean.
SELECT id, name,
(SELECT AVG(value) FROM t2 WHERE t1id = t1.id ORDER BY value DESC LIMIT 4) as average
FROM t1
However I'd like to replace the "4" to a field inside t1.
Something like this where table t1 has fields id, name, size:
SELECT id, name,
(SELECT AVG(value) FROM t2 WHERE t1id = t1.id ORDER BY value DESC LIMIT t1.size) as average
FROM t1
I could join t1 and t2, but I'm not sure that works for this. Does it?
Edit:
Here's some sample data to show what I mean:
Table t1
| id | name | Size |
|----|------|------|
| 1 | Bob | 4 |
| 2 | Joe | 3 |
| 3 | Sam | 4 |
Table t2
| t1id | value |
|------|-------|
| 1 | 16 |
| 1 | 14 |
| 1 | 12 |
| 1 | 10 |
| 1 | 8 |
| 2 | 10 |
| 2 | 8 |
| 2 | 6 |
| 2 | 4 |
| 3 | 20 |
| 3 | 15 |
| 3 | 10 |
| 3 | 5 |
| 3 | 2 |
Expected result:
| id | name | avg |
|----|------|------|
| 1 | Bob | 13 |
| 2 | Joe | 8 |
| 3 | Sam | 12.5 |
Notice that the average is the average of only the top t1.size values. For example the average for Bob is 13 and not 12 (based on 4 values and not 5) and the average for Joe is 8 and not 7 (based on 3 values and not 4).

In MySQL, you have little choice other than LEFT JOIN and aggregation:
SELECT t1.id, t1.name, AVG(t2.value) as average
FROM t1 LEFT JOIN
(SELECT t2.*,
ROW_NUMBER() OVER (PARTITION BY t1id ORDER BY VALUE desc) as seqnum
FROM t2
) t2
on t2.t1id = t1.id AND seqnum <= t1.size
GROUP BY t1.id, t1.name;
Here is a db<>fiddle.

No, you cannot use a column reference in a LIMIT clause.
https://dev.mysql.com/doc/refman/8.0/en/select.html has detailed documentation about MySQL's SELECT statement including all its clauses.
It says:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants, with these exceptions:
Within prepared statements, LIMIT parameters can be specified using ? placeholder markers.
Within stored programs, LIMIT parameters can be specified using integer-valued routine parameters or local variables.
Expressions, including subqueries, are not mentioned as legal argument in the LIMIT clause.
A simple solution would be to do your task in two queries: the first to get the size and then use that value as a constant value in the second query that includes the LIMIT.
Not every task needs to be done in a single SQL statement.

Related

Seek rows with incorrect dates in historic data

I had a table that is an historic log, recently I fixed a bug that was writing in that table an incorrect date, the dates should be correlatives, but in some cases there was a date that wasn't it, so much older than the previous date.
How can I get all the rows that aren't correlatives for each entity_id? In the example below I should get the rows 5 and 10.
The table has millions of rows and thousand of differents entities. I was thinking to compare the results of ordering by date and id but that is a lot of manual work.
| id | entity_id | time_stamp |
|--------|-------------|---------------|
| 1 | 7 | 2019-01-22 |
| 2 | 9 | 2019-01-05 |
| 3 | 6 | 2019-03-14 |
| 4 | 9 | 2019-04-20 |
| 5 | 6 | 2015-10-04 | WRONG
| 6 | 9 | 2019-07-15 |
| 7 | 3 | 2019-07-04 |
| 8 | 7 | 2019-06-01 |
| 9 | 6 | 2019-11-04 |
| 10 | 7 | 2019-03-04 | WRONG
Are there any function to compare the previous date by the entity id? I'm completely lost here, not sure how to clean the data. The database is MYSQL by the way.
If you are running MySQL 8.0, you can use lag(); the idea is to order records by id within groups having the same entity_id, and then to filter on records where the current timestamp is smaller than the previous one:
select t.*
from (
select t.*, lag(time_stamp) over(partition by entity_id order by id) lag_time_stamp
from mytable t
) t
where time_stamp < lag_time_stamp
In earlier versions, one option is to use a correlated subquery to get the previous timestamp:
select t.*
from mytable t
where time_stamp < (
select time_stamp
from mytable t1
where t1.entity_id = t.entity_id and t1.id < t.id
order by id desc
limit 1
)
SELECT s1.*
FROM sourcetable s1
WHERE EXISTS ( SELECT NULL
FROM sourcetable s2
WHERE s1.id < s2.id
AND s1.entity_id = s2.entity_id
AND s1.time_stamp > s2.time_stamp )
The index by (entity_id, id, time_stamp) or (entity_id, time_stamp, id) will increase the performance.

Selecting best row in each group based on two columns [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
Suppose we have the following table, where each row represents a submission a user made during a programming contest, id is an auto-increment primary key, probid identifies the problem the submission was made to, score is the number of points the submission earned for the problem, and date is the timestamp when the submission was made. Each user can submit as many times as they want to the same problem:
+----+----------+--------+-------+------------+
| id | username | probid | score | date |
+----+----------+--------+-------+------------+
| 1 | brian | 1 | 5 | 1542766686 |
| 2 | alex | 1 | 10 | 1542766686 |
| 3 | alex | 2 | 5 | 1542766901 |
| 4 | brian | 1 | 10 | 1542766944 |
| 5 | jacob | 2 | 10 | 1542766983 |
| 6 | jacob | 1 | 10 | 1542767053 |
| 7 | brian | 2 | 8 | 1542767271 |
| 8 | jacob | 2 | 10 | 1542767456 |
| 9 | brian | 2 | 7 | 1542767522 |
+----+----------+--------+-------+------------+
In order to rank the contestants, we need to determine the best submission each user made to each problem. The "best" submission is the one with the highest score, with ties broken by submission ID (i.e., if the user got the same score on the same problem twice, we only care about the earlier of the two submissions). This would yield a table like the following:
+----------+--------+----+-------+------------+
| username | probid | id | score | date |
+----------+--------+----+-------+------------+
| alex | 1 | 2 | 10 | 1542766686 |
| alex | 2 | 3 | 5 | 1542766901 |
| brian | 1 | 4 | 10 | 1542766944 |
| brian | 2 | 7 | 8 | 1542767271 |
| jacob | 1 | 6 | 10 | 1542767053 |
| jacob | 2 | 5 | 10 | 1542766983 |
+----------+--------+----+-------+------------+
How can I write a query to accomplish this?
SELECT username , probid , id , score , `date`
FROM tableName
ORDER BY username, score DESC, ID
Using MySQL-8.0 or MariaDB-10.2 or later:
SELECT username, probid, id, score, `date`
FROM (
SELECT username, probid, id, score, `date`,
ROW_NUMBER() over (
PARTITION BY username,probid
ORDER BY score DESC) as `rank`
FROM tablename
) as tmp
WHERE tmp.`rank` = 1
This query will work on versions of MySQL prior to 8.0 as well. The LEFT JOIN removes duplicate scores, ensuring that equal scores only have the lowest date in the result set for a given score. Then the WHERE clause ensures that we have the maximum score for a given user/problem combination:
SELECT t1.username, t1.probid, t1.id, t1.score, t1.date
FROM tablename t1
LEFT JOIN tablename t2
ON t2.username = t1.username AND
t2.probid = t1.probid AND
t2.score = t1.score AND
t2.date < t1.date
WHERE t2.id IS NULL AND
t1.score = (SELECT MAX(score) FROM tablename t3 WHERE t3.username = t1.username AND t3.probid = t1.probid)
ORDER BY t1.username, t1.probid
Update
It's almost certainly more efficient to JOIN the table to a list of maximum scores per user per problem first rather than computing the MAX value for each row in the result table. This query does that instead:
SELECT t1.username, t1.probid, t1.id, t1.score, t1.date
FROM tablename t1
JOIN (SELECT username, probid, MAX(score) AS score
FROM tablename
GROUP BY username, probid) t2
ON t2.username = t1.username AND
t2.probid = t1.probid AND
t2.score = t1.score
LEFT JOIN tablename t3
ON t3.username = t1.username AND
t3.probid = t1.probid AND
t3.score = t1.score AND
t3.date < t1.date
WHERE t3.id IS NULL
ORDER BY t1.username, t1.probid
Output (for both queries):
username probid id score date
alex 1 2 10 1542766686
alex 2 3 5 1542766901
brian 1 4 10 1542766944
brian 2 7 8 1542767271
jacob 1 6 10 1542767053
jacob 2 5 10 1542766983
Updated Demo on SQLFiddle
In pre-MySQL 8.0.2, we can emulate Row_Number() functionality using User-defined Variables. In this technique, we firstly get the data in a particular order (depends on the problem statement at hand).
In your case, within a partition of probid and username, we need to rank scores in descending order, with the row having lower timestamp value given higher priority (to break the ties). So, we will ORDER BY probid, username, score DESC, date ASC.
Now, we can use this result-set as a Derived Table, and determine the row number. It will be like a Looping technique (which we use in application code, eg: PHP). We would store the previous row values in the User-defined variables, and use conditional CASE .. WHEN expressions to check the current row's value(s) against the previous row. And, then assign row number accordingly.
Eventually, we will consider only those rows where row number is 1, and (if required), sort it by username and probid.
Query
SELECT dt2.username,
dt2.probid,
dt2.id,
dt2.score,
dt2.date
FROM (SELECT #rn := CASE
WHEN #un = dt1.username
AND #pid = dt1.probid THEN #rn + 1
ELSE 1
end AS row_no,
#un := dt1.username AS username,
#pid := dt1.probid AS probid,
dt1.id,
dt1.score,
dt1.date
FROM (SELECT id,
username,
probid,
score,
date
FROM your_table
ORDER BY username,
probid,
score DESC,
date ASC) AS dt1
CROSS JOIN (SELECT #un := '',
#pid := 0,
#rn := 0) AS user_init_vars) AS dt2
WHERE dt2.row_no = 1
ORDER BY dt2.username, dt2.probid;
Result
| username | probid | id | score | date |
| -------- | ------ | --- | ----- | ---------- |
| alex | 1 | 2 | 10 | 1542766686 |
| alex | 2 | 3 | 5 | 1542766901 |
| brian | 1 | 4 | 10 | 1542766944 |
| brian | 2 | 7 | 8 | 1542767271 |
| jacob | 1 | 6 | 10 | 1542767053 |
| jacob | 2 | 5 | 10 | 1542766983 |
View on DB Fiddle

select count only showing 1 result and the wrong one

I want to search TABLE1 and count which number_id has the most 5's in experience column.
TABLE1
+-------------+------------+
| number_id | experience |
+-------------+------------+
| 20 | 5 |
| 20 | 5 |
| 19 | 1 |
| 18 | 2 |
| 15 | 3 |
| 13 | 1 |
| 10 | 5 |
+-------------+------------+
So in this case it would be number_id=20
Then do an inner join on TABLE2 and map the number that matches the number_id in TABLE1.
TABLE2
+-------------+------------+
| id | number |
+-------------+------------+
| 20 | 000000000 |
| 29 | 012345678 |
| 19 | 123456789 |
| 18 | 223456789 |
| 15 | 345678910 |
| 13 | 123457898 |
| 10 | 545678910 |
+-------------+------------+
So the result would be:
000000000 (2 results of 5)
545678910 (1 result of 5)
So far I have:
SELECT number, experience, number_id, COUNT(*) AS SUM FROM TABLE1
INNER JOIN TABLE2 ON TABLE1.number_id = TABLE2.id
WHERE experience = '5' order by SUM LIMIT 10
But it's returning just
545678910
How can I get it to return both results and by order of number of instances of 5 in the experience column?
Thanks
This query will give you the results that you want. The subquery fetches all the number_id that have experience values of 5. The SUM(experience=5) works because MySQL uses a value of 1 for true and 0 for false. The results of the subquery are then joined to table2 to give the number field. Finally the results are ordered by the number of experience=5:
SELECT t2.number, t1.num_fives
FROM (SELECT number_id, SUM(experience = 5) AS num_fives
FROM table1
WHERE experience = 5
GROUP BY number_id) t1
JOIN table2 t2
ON t2.id = t1.number_id
ORDER BY num_fives DESC
Output:
number num_fives
000000000 2
545678910 1
SQLFiddle Demo
Add a group by clause:
SELECT number, experience, number_id, COUNT(*) AS SUM
FROM TABLE1
JOIN TABLE2 ON TABLE1.number_id = TABLE2.id
WHERE experience = '5'
GROUP BY 1, 2, 3 -- <<< Added this clause
ORDER BY SUM
LIMIT 10

I need to get the average for every 3 records in one table and update column in separate table

Table Mytable1
Id | Actual
1 ! 10020
2 | 12203
3 | 12312
4 | 12453
5 | 13211
6 | 12838
7 | 10l29
Using the following syntax:
SELECT AVG(Actual), CEIL((#rank:=#rank+1)/3) AS rank FROM mytable1 Group BY rank;
Produces the following type of result:
| AVG(Actual) | rank |
+-------------+------+
| 12835.5455 | 1 |
| 12523.1818 | 2 |
| 12343.3636 | 3 |
I would like to take AVG(Actual) column and UPDATE a second existing table Mytable2
Id | Predict |
1 | 11133
2 | 12312
3 | 13221
I would like to get the following where the Actual value matches the ID as RANK
Id | Predict | Actual
1 | 11133 | 12835.5455
2 | 12312 | 12523.1818
3 | 13221 | 12343.3636
IMPORTANT REQUIREMENT
I need to set an offset much like the following syntax:
SELECT #rank := #rank + 1 AS Id , Mytable2.Actual FROM Mytable LIMIT 3 OFFSET 4);
PLEASE NOTE THE AVERAGE NUMBER ARE MADE UP IN EXAMPLES
you can join your existing query in the UPDATE statement
UPDATE Table2 T2
JOIN (
SELECT AVG(Actual) as AverageValue,
CEIL((#rank:=#rank+1)/3) AS rank
FROM Table1, (select #rank:=0) t
Group BY rank )T1
on T2.id = T1.rank
SET Actual = T1.AverageValue

mysql select ordernumber by group

I'm trying to do something like 'select groupwise maximum', but I'm looking for groupwise order number.
so with a table like this
briefs
----------
id_brief | id_case | date
1 | 1 | 06/07/2010
2 | 1 | 04/07/2010
3 | 1 | 03/07/2010
4 | 2 | 18/05/2010
5 | 2 | 17/05/2010
6 | 2 | 19/05/2010
I want a result like this
breifs result
----------
id_brief | id_case | dateOrder
1 | 1 | 3
2 | 1 | 2
3 | 1 | 1
4 | 2 | 2
5 | 2 | 1
6 | 2 | 3
I think I want to do something like described here MySQL - Get row number on select, but I don't know how I would reset the variable for each id_case.
This will give you how many records are there with this id_case value and a date less than or equal to this date value.
SELECT t1.id_brief,
t1.id_case,
COUNT(t2.*) AS dateOrder
FROM yourtable AS t1
LEFT JOIN yourtable AS t2 ON t2.id_case = t1.id_case AND t2.date <= t1.date
GROUP BY t1.id_brief
Mysql is permissive about columns which can be queries using GROUP BY. With a more stric DBMS you may need GROUP BY t1.id_brief, t1.id_case.
I strongly advise you to have the right indexes on the table:
CREATE INDEX filter1 ON yourtabl (id_case, date)