very difficult mysql query - random order on two tables - mysql

Consider this classical setup:
entry table:
id (int, PK)
title (varchar 255)
entry_category table:
entry_id (int)
category_id (int)
category table:
id (int, PK)
title (varchar 255)
Which basically means entries can be in one or more categories (the entry_category table is used as MM/join table)
Now I need to query 6 unique categorys along with 1 unique entries from these categories by RANDOM!
EDIT: To clarify: the purpose of this is to display 6 random categories with 1 random entry per category.
A correct result set would look like this:
category_id entry_id
10 200
20 300
30 400
40 500
50 600
60 700
This would be incorrect as there are duplicates in the category_id column:
category_id entry_id
10 300
20 300
...
And this is incorrect as there are duplicates in the member_id column:
category_id entry_id
20 300
20 400
...
How can I query this?
If I use this simple query with order by rand, the result contains duplicated rows:
select c.id, e.id
from category c
inner join entry_category ec on ec.category_id = c.id
inner join entry e on e.id = ec.entry_id
group by c.id
order by rand()
Performance is at the moment not the most important factor, but I would need a reliably working query for this, and the above is pretty much useless and does not do what I want at all.
EDIT: as an aside, the above query is no better when using select distinct ... and leaving out the group by. This includes duplicate rows as distinct only makes sure that the combinations of c.id and e.id are unique.
EDIT: one solution I found, but probably slow as hell on larger datasets:
select t1.e_id, t2.c_id
from (select e.id as e_id from entry e order by rand()) t1
inner join (select ec.entry_id as e_id, ec.category_id as c_id from entry_category ec group by e_id order by rand()) t2 on t2.e_id = t1.e_id
group by t2.c_id
order by rand()

SELECT category_id, entity_id
FROM (
SELECT category_id,
#ce :=
(
SELECT entity_id
FROM category_entity cei
WHERE cei.category_id = ced.category_id
AND NOT FIND_IN_SET(entity_id, #r)
ORDER BY
RAND()
LIMIT 1
) AS entity_id,
(
SELECT #r := CAST(CONCAT_WS(',', #r, #ce) AS CHAR)
)
FROM (
SELECT #r := ''
) vars,
(
SELECT DISTINCT category_id
FROM category_entity
ORDER BY
RAND()
LIMIT 15
) ced
) q
WHERE entity_id IS NOT NULL
LIMIT 6
This solution is not a piece of code I'd be proud of, since it relies on black magic of session variables in MySQL to keep the recursion stack. However, it works.
Also it's not perfectly random and can in fact yield less than 6 values (if entity_id's duplicate across the categories too often). In this case, you can increase the value of 15 in the innermost query.
Create a unique index or a PRIMARY KEY on category_entity (category_id, entity_id) for this to work fast.

Seems to me that the good way to do this is to pick 6 distinct values from each set, shuffle each list of values (each list individually), and then glue the lists together into a two-column result.
To randomize which six you get, shuffle the entire list of each type of value, and grab the first six.

Related

Find Id of second item in group using MySQL Group By query

We are doing a simple grouping query to find the duplicated add by next set of items inserted into our db table.
SET #old_set_id = 71, #new_set_id = 72;
SELECT id,
request_id,
data_capture_id as temp_id,
count(data_capture_id ) as item_count
FROM my_table
WHERE request_id= #old_set_id or request_id= #new_set_id
GROUP BY data_capture_id
Will result in a table like,
id request_id temp_id item_count
----------------------------------------
3 71 2324345 1
4 71 6786867 2
8 72 5276345 1
For all the duplicates we need the id of the second item in the group that is what is the id of 72 for the duplicate record 6786867? Currently, it is displaying id of the first set.
Try to start from this query:
select
t1.id,
t1.record_col_1,
t2.request_id,
count(t1.data_capture_id) as item_count
from (
select id, request_id, data_capture_id, record_col_1
from my_table
order by request_id limit 123456789
) t1
inner join (
select request_id, record_col_1
from my_table
order by request_id limit 123456789
) t2 ON t2.record_col_1 = t1.record_col_1 and t2.request_id > t.request_id
group by t1.record_col_1
having item_count > 1
Our first sub-query assures, that data is sorted by request_id before we group it. We have this junk limit 123456789 because by default MySQL ignores sorting in sub-queries (unless we use this hacky limit). And we have another sub-query with sorted data in order to fetch higher request_id from a set with the same record_col_1. Finally, we collapse data by record_col_1 and filter only duplicates.
I'm not sure, if it will work, but try it.

Select items from distinct categories, including articles with no category

This seems like it would be pretty simple to do. I have a table of articles that has the following fields relevant to this question:
id - INTEGER(11) AUTO_INCREMENT
category_id - INTEGER(11) DEFAULT(-1)
When an article has a category, its ID goes in the category_id field. When it has no category, the column's value is -1.
What I want to do is to select three random articles of distinct categories from this articles table. This alone is pretty simple to do:
SELECT id FROM articles GROUP BY category_id ORDER BY RAND() LIMIT 3;
However, I don't want to group articles with no category into one single category, like the previous query would do. That is, I want to treat each article with a category_id of -1 as being in a separate category. How can I do this?
You can use union to create a derived table that contains
1 article id per non -1 category
All article ids for -1 category
And then select 3 random ids from that table
select id from (
select id from articles
where category_id <> -1
group by category_id
union all
select id from articles
where category_id = -1
) t order by rand() limit 3;
As pointed out in the comments, the query above will likely return the same article id per category id. If that's an issue you can try the query below but it might run slowly since it's ordering the tables by rand() twice.
select id from (
select id from (
select id from articles
where category_id <> -1
order by rand()
) t
group by category_id
union all
select id from articles
where category_id = -1
) t order by rand() limit 3;

Mysql, get rows before specific row in a multi-column index

Say I have this table for high-scores:
id : primary key
username : string
score : int
User names and scores themselves can be repeating, only id is unique for each person. I also have an index to get high-scores fast:
UNIQUE scores ( score, username, id )
How can I get rows below the given person? By 'below' I mean they go before the given row in this index.
E.g. for ( 77, 'name7', 70 ) in format ( score, username, id ) I want to retrieve:
77, 'name7', 41
77, 'name5', 77
77, 'name5', 21
50, 'name9', 99
but not
77, 'name8', 88 or
77, 'name7', 82 or
80, 'name2', 34 ...
Here's one way to get the result:
SELECT t.score
, t.username
, t.id
FROM scores t
WHERE ( t.score < 77 )
OR ( t.score = 77 AND t.username < 'name7' )
OR ( t.score = 77 AND t.username = 'name7' AND t.id < 70 )
ORDER
BY t.score DESC
, t.username DESC
, t.id DESC
(NOTE: the ORDER BY clause may help MySQL decide to use the index to avoid a "Using filesort" operation. Your index is a "covering" index for the query, so we'd expect to see "Using index" in the EXPLAIN output.)
I ran a quick test, and in my environment, this does perform a range scan of the index and avoids a sort operation.
EXPLAIN OUTPUT
id select_type table type possible_keys key rows Extra
-- ----------- ----- ----- ------------------ ---------- ---- --------------------------
1 SIMPLE t range PRIMARY,scores_UX1 scores_UX1 3 Using where; Using index
(You may want to add a LIMIT n to that query, if you don't need to return ALL the rows that satisfy the criteria.)
If you have an unique id of a row, you could avoid specifying the values in the table by doing a join. Given the data in your question:
Here we use a second reference to the same table, to get the row id=70, and then a join to get all the rows "lower".
SELECT t.score
, t.username
, t.id
FROM scores k
JOIN scores t
ON ( t.score < k.score )
OR ( t.score = k.score AND t.username < k.username )
OR ( t.score = k.score AND t.username = k.username AND t.id < k.id )
WHERE k.id = 70
ORDER
BY t.score DESC
, t.username DESC
, t.id DESC
LIMIT 1000
The EXPLAIN for this query also shows MySQL using the covering index and avoiding a sort operation:
id select_type table type possible_keys key rows Extra
-- ----------- ----- ----- ------------------ ---------- ---- ------------------------
1 SIMPLE k const PRIMARY,scores_UX1 PRIMARY 1
1 SIMPLE t range PRIMARY,scores_UX1 scores_UX1 3 Using where; Using index
The concept of "below" for repeating scores is quite fuzzy: Think of 11 users having the same score, but you want the "10 below" a special row. That said, you can do something like (assuming you start with id=70)
SELECT score, username, id
FROM scores
WHERE score<=(SELECT score FROM scores WHERE id=77)
ORDER BY if(id=77,0,1), score DESC
-- you might also want e.g. username
LIMIT 5 -- you might want such a thing
;
Which will give you the rows in question inside this fuzzy factor, with the anchor row first.
Edit
Re-reading your question, you don't want the anchor row, so you need WHERE score<=(...) AND id<>77 and forget the first part of the ORDER BY
Edit 2
After your update to the question, I understand you want only those rows, that have one of
score < score in anchor row
score == score in anchor row AND name < name in anchor row
score == score in anchor row AND name == name in anchor row AND id < id in anchor row
We just have to put that into a query (again assuming your anchor row has id=70):
SELECT score, username, id
FROM scores, (
SELECT
#ascore:=score,
#ausername:=username,
#aid:=id
FROM scores
WHERE id=70
) AS seed
WHERE
score<#ascore
OR (score=#ascore AND username<#ausername)
OR (score=#ascore AND username=#ausername AND id<#aid)
ORDER BY
score DESC,
username DESC,
id DESC
-- limit 5 //You might want that
;
I think this is the query you want:
select s.*
from scores s
where s.score <= (select score
from scores
where id = 70
) and
s.id <> 70
order by scores desc
limit 4;

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?

Get all the rows for the most recent 3 groups

I googled a bit and looked on SO but I didn't find anything that helped me.
I have a working MySQL query that selects some columns (accross three tables, with two JOIN statements) and I am looking to do something extra on the result set.
I would like to SELECT all rows from the 3 most recent groups. (I can only assume I have to use a GROUP BY on that column) I'm having a hard time explaining this clearly so I'll use an example:
id | group
--------------
1 | 1
2 | 2
3 | 2
4 | 2
5 | 3
6 | 3
7 | 4
8 | 4
Of course, I dumbed it down a lot for the sake of simplicity (and my current query doesn't include an id column).
Right now my ideal query would return, in order (that's the id field):
8, 7, 6, 5, 4, 3, 2
If I were to add the following 9th element:
id | group
--------------
9 | 5
My ideal query would then return, in order:
9, 8, 7, 6, 5
Because these are all the rows from the most 3 recent groups. Also, when two rows have the same group (and are still in the results set), I would like to ORDER them BY another field (which I have not included in my dumbed down example).
In my search I only found how to do actions on elements of GROUPS (MAX of each, AVG of group elements, etc.) and not GROUPS themselves (first 3 groups ordered by a field).
Thank you in advance for your help!
Edit: Here is what my real query looks like.
SELECT t1.f1, t1.f2, t2.f1, t2.f2, t2.f3, t3.f1, t3.f2, t3.f3, t3.f4
FROM t1
LEFT JOIN t2 ON t2.f1=t1.f3
LEFT JOIN t3 ON t2.f1=t3.f5
WHERE t1.f4='some_constant' AND t2.f4='some_other_constant'
ORDER BY t1.f2 DESC
SELECT `table`.* FROM
(SELECT DISTINCT `group`
FROM `table`
ORDER BY `group` DESC LIMIT 3) t1
INNER JOIN `table` ON `table`.`group` = t1.`group`
the subquery should return the three groups with the largest value, the INNER JOIN will ensure no rows are included which do not have these group values.
assuming t1.f2 is your group column:
SELECT a,b,c,d,e,f,g,h,i
FROM
(
SELECT t1.f1 as a, t1.f2 as b, t2.f1 as c, t2.f2 as d, t2.f3 as e, t3.f1 as f, t3.f2 as g, t3.f3 as h, t3.f4 as i
FROM t1
LEFT JOIN t2 ON t2.f1=t1.f3
LEFT JOIN t3 ON t2.f1=t3.f5
WHERE t1.f4='some_constant' AND t2.f4='some_other_constant'
ORDER BY t1.f2 DESC
) first_table
INNER JOIN
(
SELECT DISTINCT `f2`
FROM `t1`
ORDER BY `f2` DESC LIMIT 3
) second_table
ON first_table.b = second_table.f2
Note that this may be very inefficient depending on your table structure, but is the best I can do without more information.
how about this way... (i use groupId instead of 'group'
[QUERY] => something like (SELECT id, groupId from tables.....) (your query with 2 joins).
-- with this query you have the last thre groups.
[QUERY2] => SELECT distinct(groupId) as groupId FROM ([QUERY]) ORDER BY groupId DESC LIMIT 0,3
and finally you will have:
SELECT id, groupId from tables----...... WHERE groupId in ([QUERY2]) order by groupId DESC, id DESC