Take random row for GROUP BY - mysql

I need one row for each textid. But it has to be chosen random, not the first in the MySQL table.
Shuffling like this does not work. It still takes the first row.
SELECT * FROM (
SELECT * FROM textelements ORDER BY rand()
) AS z
GROUP BY z.textid;
Read the comments for my solution. I added LIMIT 1000000 to the subquery.

WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY textid ORDER BY RAND()) rn
FROM textelements
)
SELECT *
FROM cte
WHERE rn = 1

Related

get the most common value for each column

I'm attempting to create an SQL query that retrieves the total_cost for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA and columnB, with their respective values.
For example, with the following table contents:
cost
columnA
columnB
target
250
Foo
Bar
XYZ
200
Foo
Bar
XYZ
150
Bar
Bar
ABC
250
Foo
Bar
ABC
The result would need to be:
total_cost
columnA_dominant
columnB_dominant
columnA_value
columnB_value
850
Foo
Bar
250
400
Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA using this answer. But after this, I'm not sure how to also get the dominant value for columnB and the values too.
This is my current SQL:
SELECT
SUM(`cost`) AS `total_cost`,
COUNT(`columnA`) AS `columnA_dominant`
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC
WHERE `target` = "ABC"
UPDATE: Thanks to #Barmar for the idea of using a subquery, I managed to get the dominant values for columnA and columnB:
SELECT
-- Retrieve total cost.
SUM(`cost`) AS `total_cost`,
-- Get dominant values.
(
SELECT `columnA`
FROM `table`
GROUP BY `columnA`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnA_dominant`,
(
SELECT `columnB`
FROM `table`
GROUP BY `columnB`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnB_dominant`
FROM `table`
WHERE `target` = "XYZ"
However, I'm still having issues figuring out how to calculate the respective values.
You might get close, if we want to get percentage values we can try to add COUNT(*) at subquery to get max count by columnA and columnB then do division by total count
SELECT
SUM(cost),
(
SELECT tt.columnA
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnA_dominant,
(
SELECT tt.columnB
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnB_dominant,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnA_percentage,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnB_percentage
FROM T t1
If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery
SELECT SUM(cost) OVER(),
FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY columnA) counter1,
COUNT(*) OVER (PARTITION BY columnB) counter2
FROM T
) t1
LIMIT 1
sqlfiddle
try this query
select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
so if you want to get the percent and dominant value you must make their own query like this
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
then you can join with the sum query to get all value you want
hope this can help you

SQL query to return random selection but split evently against a value

Let's say i have a table that looks like the below, at the moment I have a simple query that returns a random selection like this
Select * from table order by rand() limit 3
But what sort of query would I need to ensure that get a selection from each AREA from the table below to avoid the chance of just one area being return
Name
Area
A
AreaA
B
AreaA
C
AreaA
D
AreaB
E
AreaB
F
AreaC
G
AreaC
H
AreaC
I
AreaC
J
AreaC
K
AreaC
Aa a joke (but workable nevertheless):
WITH
cte1 AS ( SELECT *
FROM table
ORDER BY RAND() LIMIT 2 ),
cte2 AS ( SELECT *
FROM table
WHERE ( Area NOT IN ( SELECT Area
FROM cte1 )
OR 2 = ( SELECT COUNT(DISTINCT Area)
FROM cte1 ) )
AND Name NOT IN ( SELECT Name
FROM cte1 )
ORDER BY RAND() LIMIT 1 )
SELECT * FROM cte1
UNION ALL
SELECT * FROM cte2;
For MySql 8.0+ you can use ROW_NUMBER() window function with ORDER BY RAND() in the OVER clause and pick the 1st row of each group:
SELECT Name, Area
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Area ORDER BY RAND()) rn
FROM tablename
) t
WHERE rn = 1
If you have only these 2 columns in the table it is easier with FIRST_VALUE() window function:
SELECT DISTINCT
FIRST_VALUE(Name) OVER (PARTITION BY Area ORDER BY RAND()) Name,
Area
FROM tablename
See the demo.

Randomly select half of records

I'm trying to generate a random sample of half of a table (or some other percentage). The table is small enough that I can use the ORDER BY RAND() LIMIT x approach. I'd like the code to sample 50% of recipients as the table changes size over time. Below was my first attempt but you can't put a subquery in a LIMIT clause. Any ideas?
SELECT
recipient_id
FROM
recipient
ORDER BY RAND()
LIMIT (
/* Find out how many recipients are on half the list */
SELECT
COUNT(*) / 2
FROM
recipient
);
If you are running MysQL 8.0, you can use window functions:
select *
from (select t.*, ntile(2) over(order by random()) nt from mytable t) t
where nt = 1
In earlier versions, one approach uses user variables:
select t.*
from (
select t.*, #rn := #rn + 1 rn
from (select * from mytable order by random()) t
cross join (select #rn := 0) x
) t
inner join (select count(*) cnt from mytable) c on t.rn <= c.cnt / 2

How to write sql query to get items from range

I would like to get values without the smallest and the biggest ones, so without entry with 2 and 29 in column NumberOfRepeating.
My query is:
SELECT Note, COUNT(*) as 'NumberOfRepeating'
WHERE COUNT(*) <> MAX(COUNT(*))AND COUNT(*) <> MIN(COUNT(*))
FROM Note GROUP BY Note;
SELECT Note, COUNT(*) as 'NumberOfRepeating'
FROM Notes
GROUP BY Note
HAVING count(*) <
(
SELECT max(t.maxi)
FROM (select
Note, COUNT(Note) maxi FROM Notes
GROUP BY Note
) as t
)
AND
count(*) >
(
SELECT min(t.min)
FROM (select
Note, COUNT(Note) min FROM Notes
GROUP BY Note
) as t
)
try this code.
One method would use order by and limit, twice:
select t.*
from (select t.*
from t
order by NumberOfRepeating asc
limit 99999999 offset 1
) t
order by NumberOfRepeating desc
limit 99999999 offset 1;
Try this code,
Select * from Note where NumberOfRepeating < (select MAX(NumberOfRepeating) from Note ) AND NumberOfRepeating > (select MIN(NumberOfRepeating) from Note );
Here in the code, as in your table Note is the name of the table, and NumberOfRepeating is the column name, as in your table.
Try this. It should work
SELECT *
FROM ( SELECT Note, COUNT(*) as 'NumberOfRepeating'
FROM Notes
GROUP BY Note
ORDER BY NumberOfRepeating DESC
LIMIT 1, 2147483647
) T1
ORDER BY T1.NumberOfRepeating
LIMIT 1, 2147483647

MySQL Sorting using order by not working using unio

I'm using an union statement in mysql but i've some problems sorting the results. The ORDER statement doesn't works at all, the results comes out always sorted by the id field.
Here an example query:
SELECT a.* FROM ( ( select * from ticket_ticket AS t1 WHERE ticket_active=1 ORDER BY t1.ticket_date_last_modified DESC )
UNION ( select * from ticket_ticket AS t2 WHERE ticket_active=0 ORDER BY t2.ticket_date_last_modified DESC, t2.ticket_status_id DESC ) )
AS a LIMIT 0,20;
I want to order the results of the first SELECT by last_modified time, and the second SELECT by time and status. But the ORDER statement get just skipped. The results always come out ordered by the ticket_id ( the PRIMARY KEY ).
What's wrong in this query ?
Thanks!
Ok, i've fixed it writing the query this way:
SELECT a.*
FROM
(SELECT *
FROM ticket_ticket
WHERE ticket_active=1
ORDER BY ticket_date_last_modified DESC) AS a
UNION ALL
SELECT b.*
FROM
(SELECT *
FROM ticket_ticket
WHERE ticket_active=0
ORDER BY ticket_date_last_modified DESC, ticket_status_id DESC) AS b LIMIT 0,
20;
You are using a UNION query that will return distinct values, and the order of the returned rows is not guaranteed.
But you don't need an union query for this:
select *
from ticket_ticket AS t1
ORDER BY
ticket_active!=1,
ticket_date_last_modified DESC,
ticket_status_id DESC
LIMIT 0,20;