Randomly select half of records - mysql

I'm trying to generate a random sample of half of a table (or some other percentage). The table is small enough that I can use the ORDER BY RAND() LIMIT x approach. I'd like the code to sample 50% of recipients as the table changes size over time. Below was my first attempt but you can't put a subquery in a LIMIT clause. Any ideas?
SELECT
recipient_id
FROM
recipient
ORDER BY RAND()
LIMIT (
/* Find out how many recipients are on half the list */
SELECT
COUNT(*) / 2
FROM
recipient
);

If you are running MysQL 8.0, you can use window functions:
select *
from (select t.*, ntile(2) over(order by random()) nt from mytable t) t
where nt = 1
In earlier versions, one approach uses user variables:
select t.*
from (
select t.*, #rn := #rn + 1 rn
from (select * from mytable order by random()) t
cross join (select #rn := 0) x
) t
inner join (select count(*) cnt from mytable) c on t.rn <= c.cnt / 2

Related

MySQL get 2% of the record

I am trying to get 2% of the random sample record.
SELECT * FROM Orders
ORDER BY RAND()
LIMIT (SELECT CEIL(0.02 * (SELECT COUNT(*) FROM Orders)));
This one gives a syntax error due to line 3. Is there anything I am doing wrong?
Or is there a better way to get n % of records?
If you are using MySQL 8+, then ROW_NUMBER() provides one option:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY RAND()) rn,
COUNT(*) OVER () cnt
FROM Orders
)
SELECT *
FROM cte
WHERE 1.0*rn / cnt <= 0.02;
On MySQL 5.7 and earlier, we can simulate row number:
SELECT *
FROM
(
SELECT *, (#rn := #rn + 1) AS rn
FROM Orders, (SELECT #rn := 0) AS x
ORDER BY RAND()
) t
CROSS JOIN (SELECT COUNT(*) AS cnt FROM Orders) o
WHERE 1.0*rn / cnt <= 0.02;

MySQL convert duplicate field entries to duplicate incrementing value

I currently have a table that has a parent_id field for multiple entries, ie the same ID number for 4 entries (as per the screenshot below). I would like to convert the parent_id to run from 9000 upwards, so in the screenshot 000004 would become 9000 in 4 entries, 000007 would become 9001 in 4 entries and so on (as shown in the example outcome). Does anyone know of an easy way to implement this please as I'd rather not have to manually change 2224 entries!?
Thanks in advance guys!
Table screenshot:
Example outcome:
You seem to want an update. If so:
update t join
(select t.*, row_number() over (order by id) as seqnum
from t
) tt
on t.id = tt.id
set t.parent_id = 9000 + floor( (seqnum - 1) / 4);
Note that this ignores the current parent_id, assigning the same value to groups of 4 rows based on the id.
EDIT:
In older versions of MySQL:
update t join
(select t.*, (#rn := #rn + 1) as seqnum
from (select t.* from t order by id) t cross join
(select #rn := 0) params
) tt
on t.id = tt.id
set t.parent_id = 9000 + floor( (seqnum - 1) / 4);
If you are runnig MySQL 8.0, you can use dense_rank() for this:
select
t.*,
8999 + dense_rank() over(order by parent_id) new_parent_id
from mytable t
On earlier versions, one (less efficient) option uses a correlated subquery:
select
t.*,
9000 + (select count(distinct t1.parent_i) from mytable t1 where t1.parent_id < t.parent_id) new_parent_id
from mytable t

Trying to put a variable into limit to find a median

I trying to use mysql to solve the following solutions:
https://www.hackerrank.com/challenges/weather-observation-station-20/problem
Understanding that a variable cannot be put into LIMIT statement (from this )
My approach>
to declare a new variable to record rowIDs, and use rowID to retrieve the record in the middle.
However, it seems that rowID is not working well.
Could anyone give me some advises?
SELECT ROUND(COUNT(LAT_N)/2,0) FROM STATION into #count;
SELECT ROUND(a.LAT_N,4) FROM (
SELECT *,#row := #row + 1 FROM STATION s, (SELECT #row := 0) r
WHERE #row <=#count
ORDER BY s.LAT_N ASC) a
ORDER BY a.LAT_N DESC LIMIT 1;`
If you are running MySQL 8.0, this is simpler done with window functions:
select round(avg(lat_n), 4) median_lat_n
from (
select s.*, row_number() over(orer by lat_n) rn
from station s
where lat_n is not null
) s
where rn * 2 in (rn - 1, rn, rn + 1)
In earlier versions, variables make it bit tricky; we need one more level of nesting to make it safe:
select round(avg(lat_n), 2) median_lat_n
from (
select s.*, #rn := #rn + 1 rn
from (select * from station order by lat_n) s
cross join (select #rn := 0) p
) s
where rn * 2 in (rn - 1, rn, rn + 1)
The logic is as follows: first enumerate the rows, ordered by lat_n. If the row count is uneven, we pick the middle row; if it is even, we take the average of the two middle values.

MYSQL subquery in a LIMIT

I was wondering if it's possible to use a subquery inside a LIMIT.
The reason why I'd like to use this, is to return 20% (1/5th) of the best buying customers.
For instance (though this clearly doesn't work):
SELECT id, revenue
FROM customers
ORDER BY revenue DESC
LIMIT (SELECT (COUNT(*) / 5) FROM customer)
Is there a way to make a subquery in a limit, or return 20% in a different way?
A typical way of doing this using ANSI SQL is with window functions:
SELECT id, revenue
FROM (SELECT c.*,
ROW_NUMBER() OVER (ORDER BY revenue DESC) as seqnum,
COUNT(*) OVER () as cnt
FROM customers
) c
WHERE seqnum <= cnt * 0.2
ORDER BY revenue DESC;
Most databases support these functions.
MySQL is one of the few databases that do not support window functions. You can use variables:
SELECT id, revenue
FROM (SELECT c.*, (#rn := #rn + 1) as rn
FROM customers c CROSS JOIN
(SELECT #rn := 0) params
ORDER BY c.revenue DESC
) c
WHERE rn <= #rn / 5; -- The subquery runs first so #rn should have the total count here.

How to select certain numbers of groups in MySQL?

I have the table with data:
And for this table I need to create pegination by productId column. I know about LIMIT N,M, but it works with rows and not with groups. For examle for my table with pegination = 2 I expect to retrieve all 9 records with productId = 1 and 2 (the number of groups is 2).
So how to create pegination by numbers of groups ?
I will be very thankfull for answers with example.
One way to do pagination by groups is to assign a product sequence to the query. Using variables, this requires a subquery:
select t.*
from (select t.*,
(#rn := if(#p = productid, #rn + 1,
if(#rn := productid, 1, 1)
)
) as rn
from table t cross join
(select #rn := 0, #p := -1) vars
order by t.productid
) t
where rn between X and Y;
With an index on t(productid), you can also do this with a subquery. The condition can then go in a having clause:
select t.*,
(select count(distinct productid)
from t t2
where t2.productid <= t.productid)
) as pno
from t
having pno between X and Y;
Try this:
select * from
(select * from <your table> where <your condition> group by <with your group>)
LIMIT number;