I was wondering if it's possible to use a subquery inside a LIMIT.
The reason why I'd like to use this, is to return 20% (1/5th) of the best buying customers.
For instance (though this clearly doesn't work):
SELECT id, revenue
FROM customers
ORDER BY revenue DESC
LIMIT (SELECT (COUNT(*) / 5) FROM customer)
Is there a way to make a subquery in a limit, or return 20% in a different way?
A typical way of doing this using ANSI SQL is with window functions:
SELECT id, revenue
FROM (SELECT c.*,
ROW_NUMBER() OVER (ORDER BY revenue DESC) as seqnum,
COUNT(*) OVER () as cnt
FROM customers
) c
WHERE seqnum <= cnt * 0.2
ORDER BY revenue DESC;
Most databases support these functions.
MySQL is one of the few databases that do not support window functions. You can use variables:
SELECT id, revenue
FROM (SELECT c.*, (#rn := #rn + 1) as rn
FROM customers c CROSS JOIN
(SELECT #rn := 0) params
ORDER BY c.revenue DESC
) c
WHERE rn <= #rn / 5; -- The subquery runs first so #rn should have the total count here.
Related
I am trying to get 2% of the random sample record.
SELECT * FROM Orders
ORDER BY RAND()
LIMIT (SELECT CEIL(0.02 * (SELECT COUNT(*) FROM Orders)));
This one gives a syntax error due to line 3. Is there anything I am doing wrong?
Or is there a better way to get n % of records?
If you are using MySQL 8+, then ROW_NUMBER() provides one option:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY RAND()) rn,
COUNT(*) OVER () cnt
FROM Orders
)
SELECT *
FROM cte
WHERE 1.0*rn / cnt <= 0.02;
On MySQL 5.7 and earlier, we can simulate row number:
SELECT *
FROM
(
SELECT *, (#rn := #rn + 1) AS rn
FROM Orders, (SELECT #rn := 0) AS x
ORDER BY RAND()
) t
CROSS JOIN (SELECT COUNT(*) AS cnt FROM Orders) o
WHERE 1.0*rn / cnt <= 0.02;
I'm trying to generate a random sample of half of a table (or some other percentage). The table is small enough that I can use the ORDER BY RAND() LIMIT x approach. I'd like the code to sample 50% of recipients as the table changes size over time. Below was my first attempt but you can't put a subquery in a LIMIT clause. Any ideas?
SELECT
recipient_id
FROM
recipient
ORDER BY RAND()
LIMIT (
/* Find out how many recipients are on half the list */
SELECT
COUNT(*) / 2
FROM
recipient
);
If you are running MysQL 8.0, you can use window functions:
select *
from (select t.*, ntile(2) over(order by random()) nt from mytable t) t
where nt = 1
In earlier versions, one approach uses user variables:
select t.*
from (
select t.*, #rn := #rn + 1 rn
from (select * from mytable order by random()) t
cross join (select #rn := 0) x
) t
inner join (select count(*) cnt from mytable) c on t.rn <= c.cnt / 2
I have an issue with a query,
this is my old question that was previously solved
mysql get ranking position grouped with joined table
The problem is that when two players have the same score, the query returns the same ranking position, like 1-1-2-3 ecc. How can I fix this?
In the player's table there are also player_tickets (that is the number of game played) and player_date that is a timestamp.
I thought to get my ranking on base of player_score first, player_tickets then, and finally player_date
This is my older query
SELECT *,
(SELECT 1 + Count(*)
FROM players p2
WHERE p2.`player_game` = p.`player_game`
AND p2.player_score > p.player_score
AND p2.player_status = 0) AS ranking
FROM players p
ORDER BY `player_game`,
player_score DESC
You can simply add more columns to the order by of the window function:
rank() over (
partition by player_game_id
order by player_score desc, player_tickets desc, player_date
) as rank
If you really want to avoid having the same rank twice, you can also use row_number(), which guarantees this - when there are ties, row_number() affects distinct numbers (whose order is hence undefined).
Just add the ranking criteria to your WHERE clause:
SELECT *,
(
SELECT 1 + COUNT(*)
FROM players p2
WHERE p2.player_game = p.player_game
AND
(
(p2.player_score > p.player_score) OR
(p2.player_score = p.player_score AND p2.player_tickets > p.player_tickets) OR
(p2.player_score = p.player_score AND p2.player_tickets = p.player_tickets AND p2.player_date > p.player_date) OR
(p2.player_score = p.player_score AND p2.player_tickets = p.player_tickets AND p2.player_date = p.player_date AND p2.player_id > p.player_id)
)
) AS ranking
FROM players p
ORDER BY player_game, player_score DESC;
You can add additional comparisons in the subquery. Or more easily, use variables:
select p.*,
#rn := if(#g = player_game, #rn + 1,
if(#g := player_game, 1, 1)
) as ranking
from (select p.*
from players p
order by player_game, player_score desc, player_tickets desc, player_date desc
) p cross join
(select #rn := 0, #g := 0) as seqnum;
In newer versions, you would just use row_number() if you don't want ties.
I have met a situation that I have a list of IDs of a Store table and need to fetch the latest 10 files from each store.
SELECT *
FROM tblFiles
WHERE storeId in (IDs)
ORDER BY createdDate DESC
LIMIT 10
But, this limits the whole results. I found an answer to a similar SO question. But, the answer recommends using loop for each ID. This results in multiple DB hit.
Another option is to fetch all records and group them in the code. But, this will be heavy if there are large no.of records.
It'll be nice if it can be handled at the query level. Any help will be appreciated.
NB: The tables used here are dummy ones.
Pre-MySQL 8.0, the simplest method is probably variables:
select f.*
from (select f.*,
(#rn := if(#s = storeId, #rn + 1,
if(#s := storeId, 1, 1)
)
) as rn
from (select f.*
from tblfiles f
where storeId in (IDs)
order by storeId, createdDate desc
) f cross join
(select #s := 0, #rn := 0) params
) f
where rn <= 10;
In MySQL 8+ or MariaDB 10.3+, you would simply use window functions:
select f.*
from (select f.*,
row_number() over (partition by storeid order by createdDate desc) as seqnum
from tblfiles f
) f
where seqnum <= 10;
In older versions of MySQL and MariaDB, the innermost subquery may not be needed.
use select in where
SELECT * from tblFiles where storeId in (SELECT id from store ORDER BY datefield/id field desc limit 10)
You could workaround it with an UNIONed query, where each subquery searches for a particular id and enforces a LIMIT clause, like :
(SELECT *
FROM tblFiles
WHERE storeId = ?
ORDER BY createdDate DESC
LIMIT 10)
UNION
(SELECT *
FROM tblFiles
WHERE storeId = ?
ORDER BY createdDate DESC
LIMIT 10)
...
With this solution only one db hit will happen, and you are guarantee to get the LIMIT on a per id basis. Such a SQL can easily be generated from within php code.
Nb : the maximum allowed of UNIONs in a mysql query is 61.
I want to the latest results for my patients. The following sql returns 69,000 results after 87 seconds in mysqlworkbench. I have made both 'date' and 'patientid' columns as index.
select Max(date) as MaxDate, PatientID
from assessment
group by PatientID
I think my table has approximately 440,000 in total. Is it because that my table is 'large'?
Is there a way to increase the speed of this query, because I will have to embed this query inside other queries. For example like below:
select aa.patientID, assessment.Date, assessment.result
from assessemnt
inner join
(select Max(date) as MaxDate, PatientID
from assessment
group by PatientID) as aa
on aa.patientID = assessment.patientID and aa.MaxDate = assessment.Date
The above will give me the latest assessment results for each patient. Then I will also embed this piece of code to do other stuff... So I really need to speed up things. Anyone can help?
I wonder if this version would have better performance with the right indexes:
select a.patientID, a.Date, a.result
from assessemnt a
where a.date = (select aa.date
from assessment aa
where aa.patientID = a.patientID
order by aa.date desc
limit 1
);
Then you want an index on assessment(patientID, date).
EDIT:
Another approach uses an index on assessment(patient_id, date, result):
select a.*
from (select a.patient_id, a.date, a.result,
(#rn := if(#p = a.patient_id, #rn + 1,
if(#p := a.patient_id, 1, 1)
)
) as rn
from assessment a cross join
(select #p := -1, #rn := 0) params
order by patient_id desc, date desc
) a
where rn = 1;