I want to select N random rows from a table, but in all of these rows a specific value may only occur X times.
Table "reviews":
*--------------------*
| ID | CODE_REVIEWER |
*--------------------*
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
*--------------------*
Table "users" (I left out a lot of unimportant stuff:
*----*
| ID |
*----*
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
*----*
Example output:
For X = 3:
*-----------*
| REVIEWER |
*-----------*
| 4 |
| 1 |
| 5 |
*-----------*
For X = 2:
*-----------*
| REVIEWER |
*-----------*
| 1 |
| 5 |
| 3 |
*-----------*
For X = 1 (empty):
*-----------*
| REVIEWER |
*-----------*
So, it must be a ResultSet containing a few IDs that are different from the ID X, but these IDs may only occur in "table 2" as a "code_reviewer" N times.
So everybody can be the "reviewer" FOR 3 people, and everbody can be reviewed BY 3 people
Thanks!
Edit:
This is what I got so far:
select newid from (select id, count(*) as num from (select * from users
where id != ?) as users group by id order by RAND() LIMIT ?) as sb
where num < 3 and newid not in (select code_reviewer from reviews where id = ?)
It works perfectly, apart from that it sometimes returns for example
*---*
| 2 |
| 1 |
| 2 |
*---*
(Contains the 2 twice, which shouldn't be so)
Unfortunately, I know MSSQL and not MySQL. I will try to answer using MSSQl, and hopefully that will lead you in the right direction.
I use variables to determine how many rows I should return, and then use a simple NEWID to act as a randomizer. (It is my understanding that you would order by RAND() in MySQL instead of NEWID())
declare #userId int
select #userId = 1
declare #existingReviewCount int
select #existingReviewCount = COUNT(*) from Reviews where Id = #userId
declare #requiredRowCount int
select #requiredRowCount = 3 - #existingReviewCount
select top (#requiredRowCount) Id from Users
where #userId != Id
order by NEWID()
Now replace #userId with 1 and it will return an empty set.
This seems to be essentially a top n per group problem. There are a few ways to solve that. Here is a quick and dirty way that will give you a comma separated list of id's that you need. If you want to just explode these in your code you are good to go.
select u.*,
-- r_counts.cnt as reviews_count,
substring_index(
group_concat(u_rev.id order by rand()),
',',
greatest(3-r_counts.cnt,0)) as reviewers
from users u
join users u_rev on u.id != u_rev.id
left join (
select u.id, count(r.id) as cnt
from users u
left join reviews r on u.id = r.id
group by u.id
) r_counts on r_counts.id = u.id
left join (
select u.id, count(r.id) as cnt
from users u
left join reviews r on u.id = r.reviewer
group by u.id, r.reviewer
) as did_review_counts
on did_review_counts.id = u_rev.id
where u.id = 11
and did_review_counts.cnt < 3
group by u.id;
If you need the results another way, google "top n per group mysql" and check out some of the solutions there.
Note: the 3 above would be your review number target. Edit: Now this would need to be run only 1 at a time. Then rerun after each review was done.
I have a table that holds the answers to a question which is asked at entry to the system, at review periods and then at closure. The client can be opened and closed multiple times during their life on the system.
I am trying to get the latest 'entry' result from the table which also has either an associated 'review' or 'close' result.
This is my table (I have just included 1 user but the actual table has thousands of users):
row | user_id | answer | type | date_entered |
----+---------+--------+--------+--------------+
1 | 12 | 3 | entry | 2016-03-13 |
2 | 12 | 1 | review | 2016-03-14 |
3 | 12 | 7 | review | 2016-03-16 |
4 | 12 | 7 | close | 2016-03-17 |
5 | 12 | 8 | entry | 2016-03-20 |
6 | 12 | 2 | review | 2016-03-21 |
7 | 12 | 3 | close | 2016-03-22 |
8 | 12 | 1 | entry | 2016-03-28 |
So for this table the query would just return row 5 because the 'entry' on row 8 doesn't have any 'review' or 'closure' records after it.
Hopefully that makes sense.
SELECT a.*
FROM my_table a
JOIN
( SELECT x.user_id
, MAX(x.date_entered) date_entered
FROM my_table x
JOIN my_table y
ON y.user_id = x.user_id
AND y.date_entered > x.date_entered
AND y.type IN ('review','close')
WHERE x.type = 'entry'
GROUP
BY x.user_id
) b
ON b.user_id = a.user_id
AND b.date_entered = a.date_entered;
Basically you can seperate your query into two sub-queries. First query should get lastest record id (review and closure). Second query should have row_id > found_id.
SELECT *
FROM my_table
WHERE type = 'entry'
AND row_id > (SELECT Max(row_id)
FROM my_table
WHERE ( type = 'review'
OR type = 'close' ))
Please be careful about that; subquery may return zero-set.
I could think of several ways of doing it. But first a note: your date_entered field seems to be just a date. To tell which occurs "later" I'm going to use row because e.g. if both entry and review occurred on the same date, it's not possible to tell from the date_entered which one was later.
I just list a couple of solutions. The first one might be more efficient, but you should measure.
Here's a join against a subquery:
SELECT
m1.*
FROM
mytable m1
JOIN (SELECT
row, user_id
FROM
mytable
WHERE
type IN ('review', 'close') AND
user_id = 12
ORDER BY row DESC LIMIT 1) m2 ON m1.user_id = m2.user_id
WHERE
m1.user_id = 12 AND
m1.row < m2.row
ORDER BY
row DESC LIMIT 1
Here's a subquery for max:
SELECT
*
FROM
mytable
WHERE
row = (SELECT
MAX(m1.row)
FROM
mytable m1,
mytable m2
WHERE
m1.user_id = m2.user_id AND
m1.type = 'entry' AND
m2.type IN ('review', 'close') AND
m1.row < MAX(m2.row))
This question is based on: Select row from left join table where multiple conditions are true
I am now trying to select rows from Table 1, which do not have a connection in Table 2 to a certain property ID.
These are the tables:
Table 1
| ID | Name |
| 1 | test |
| 2 | hello |
Table 2
| ID | PropertyID |
| 1 | 3 |
| 1 | 6 |
| 1 | 7 |
| 2 | 6 |
| 2 | 1 |
I am using the following query (which is working with '='):
SELECT tab1ID
FROM table2
WHERE propertyID != 3 OR propertyID = 6
GROUP BY tab1ID
HAVING COUNT(*) = 2;
This query should return ID=2, but it returns zero rows. What I am doing wrong?
Any help is greatly appreciated!
Edit: I had given a MWE but this is my actual query:
SELECT transactionline.total FROM transactionline
LEFT JOIN product_variant ON product_variant.SKU = transactionline.SKU
LEFT JOIN product ON product_variant.productID = product.productID
LEFT JOIN connect_option_product ON connect_option_product.productID = product.productID
LEFT JOIN productattribute_option ON productattribute_option.optionID = connect_option_product.optionID
WHERE productattribute_option.optionID = 4 OR productattribute_option.optionID = 9
GROUP BY transactionline.lineID
HAVING COUNT(*) = 1
AND SUM(productattribute_option.optionID = 4) = 0
AND SUM(productattribute_option.optionID = 9) > 0
A product can have multiple connections to the optionID's. The goal of this query is to select the total amount where some filters are true or false.
Your grouping is correct. But you need to count how many times the value you do not want is in your group. That count must be zero.
SELECT tab1ID
FROM table2
GROUP BY tab1ID
HAVING sum(propertyID = 6) > 0
AND sum(propertyID = 3) = 0
I'm trying to get the hang of NOT EXISTS and am having some trouble.
Say I have a 2 tables.
Employees:
+------+------+
| eid | name |
+------+------+
| 1 | Bob |
| 2 | Alice|
| 3 | Jill |
+------+------+
Transactions:
+----------+----------+----------+-----------+
| tid | eid | type | amount |
+----------+----------+----------+-----------+
| 1 | 1 | Deposit | 50 |
| 2 | 1 | Open | 500 |
| 3 | 3 | Open | 200 |
| 4 | 2 | Withdraw | 25 |
| 5 | 2 | Open | 100 |
+----------+----------+----------+-----------+
Let's say I want to find the names of all the employees that have not opened any account with the amount of $250 or higher. This means that I only want the rows where an employee has opened an account of amount < $250.
Right now I have something like this...
SELECT name FROM Employees e
WHERE NOT EXISTS (
SELECT * FROM Transactions t
WHERE t.type <> 'Open' AND t.amount >= 250 AND t.eid = e.eid);
This is obviously wrong and I don't really understand why.
You need to combine an EXISTS with a NOT EXISTS since you "only want the rows where an employee has opened an account of amount < $250.":
SELECT name FROM Employees e
WHERE EXISTS (
SELECT 1 FROM Transactions t
WHERE t.amount < 250 AND t.type='Open' AND t.eid = e.eid)
AND NOT EXISTS (
SELECT 1 FROM Transactions t
WHERE t.amount >= 250 AND t.eid = e.eid);
You need the EXISTS to ensure that only employee are returned which have an open account with amount < 250 at all. The NOT EXISTS is required to ensure that not employee are included which have additional accounts with amount >= 250.
Here's a sql-fiddle demo
The only issue I see - is that you've used <> for transaction type, not =
SELECT name FROM Employees e
WHERE NOT EXISTS (
SELECT null FROM Transactions t
WHERE t.transaction_type = 'Open' AND t.amount >= 250 AND t.eid = e.eid);
After you edited your question the answer would be:
SELECT name FROM Employees e
WHERE EXISTS (
SELECT null FROM Transactions t
WHERE t.transaction_type = 'Open' AND t.amount < 250 AND t.eid = e.eid);
I'd recommend using an LEFT JOIN instead of a sub select.
SELECT name FROM Employees e
LEFT JOIN Transactions t
ON e.eid = t.eid
WHERE t.tid IS NULL
OR t.type <> 'Open'
OR t.amount <= 250;
This will join all transaction records, and then only include records where a transaction does not exist, the user has a non-open transaction, or the amount doesn't meet the reuiqred $250
I have a two select statements joined by UNION ALL. In the first statement a where clause gathers only rows that have been shown previously to the user. The second statement gathers all rows that haven't been shown to the user, therefore I end up with the viewed results first and non-viewed results after.
Of course this could simply be achieved with the same select statement using a simple ORDER BY, however the reason for two separate selects is simple after you realize what I hope to accomplish.
Consider the following structure and data.
+----+------+-----+--------+------+
| id | from | to | viewed | data |
+----+------+-----+--------+------+
| 1 | 1 | 10 | true | .... |
| 2 | 10 | 1 | true | .... |
| 3 | 1 | 10 | true | .... |
| 4 | 6 | 8 | true | .... |
| 5 | 1 | 10 | true | .... |
| 6 | 10 | 1 | true | .... |
| 7 | 8 | 6 | true | .... |
| 8 | 10 | 1 | true | .... |
| 9 | 6 | 8 | true | .... |
| 10 | 2 | 3 | true | .... |
| 11 | 1 | 10 | true | .... |
| 12 | 8 | 6 | true | .... |
| 13 | 10 | 1 | false | .... |
| 14 | 1 | 10 | false | .... |
| 15 | 6 | 8 | false | .... |
| 16 | 10 | 1 | false | .... |
| 17 | 8 | 6 | false | .... |
| 18 | 3 | 2 | false | .... |
+----+------+-----+--------+------+
Basically I wish all non viewed rows to be selected by the statement, that is accomplished by checking weather the viewed column is true or false, pretty simple and straightforward, nothing to worry here.
However when it comes to the rows already viewed, meaning the column viewed is TRUE, for those records I only want 3 rows to be returned for each group.
The appropriate result in this instance should be the 3 most recent rows of each group.
+----+------+-----+--------+------+
| id | from | to | viewed | data |
+----+------+-----+--------+------+
| 6 | 10 | 1 | true | .... |
| 7 | 8 | 6 | true | .... |
| 8 | 10 | 1 | true | .... |
| 9 | 6 | 8 | true | .... |
| 10 | 2 | 3 | true | .... |
| 11 | 1 | 10 | true | .... |
| 12 | 8 | 6 | true | .... |
+----+------+-----+--------+------+
As you see from the ideal result set we have three groups. Therefore the desired query for the viewed results should show a maximum of 3 rows for each grouping it finds. In this case these groupings were 10 with 1 and 8 with 6, both which had three rows to be shown, while the other group 2 with 3 only had one row to be shown.
Please note that where from = x and to = y, makes the same grouping as if it was from = y and to = x. Therefore considering the first grouping (10 with 1), from = 10 and to = 1 is the same group if it was from = 1 and to = 10.
However there are plenty of groups in the whole table that I only wish the 3 most recent of each to be returned in the select statement, and thats my problem, I not sure how that can be accomplished in the most efficient way possible considering the table will have hundreds if not thousands of records at some point.
Thanks for your help.
Note: The columns id, from, to and viewed are indexed, that should help with performance.
PS: I'm unsure on how to name this question exactly, if you have a better idea, be my guest and edit the title.
What a hairball! This gets progressively harder as you move from most recent, to second most recent, to third most recent.
Let's put this together by getting the list of IDs we need. Then we can pull the items from the table by ID.
This, relatively easy, query gets you the ids of your most recent items
SELECT id FROM
(SELECT max(id) id, fromitem, toitem
FROM stuff
WHERE viewed = 'true'
GROUP BY fromitem, toitem
)a
Fiddle: http://sqlfiddle.com/#!2/f7045/27/0
Next, we need to get the ids of the second most recent items. To do this, we need a self-join style query. We need to do the same summary but on a virtual table that omits the most recent items.
select id from (
select max(b.id) id, b.fromitem, b.toitem
from stuff a
join
(select id, fromitem, toitem
from stuff
where viewed = 'true'
) b on ( a.fromitem = b.fromitem
and a.toitem = b.toitem
and b.id < a.id)
where a.viewed = 'true'
group by fromitem, toitem
)c
Fiddle: http://sqlfiddle.com/#!2/f7045/44/0
Finally, we need to get the ids of the third most recent items. Mercy! We need to join that query we just had, to the table again.
select id from
(
select max(d.id) id, d.fromitem, d.toitem
from stuff d
join
(
select max(b.id) id, b.fromitem, b.toitem
from stuff a
join
(
select id, fromitem, toitem
from stuff
where viewed = 'true'
) b on ( a.fromitem = b.fromitem
and a.toitem = b.toitem
and b.id < a.id)
where a.viewed = 'true'
group by fromitem, toitem
) c on ( d.fromitem = c.fromitem
and d.toitem = c.toitem
and d.id < c.id)
where d.viewed='true'
group by d.fromitem, d.toitem
) e
Fiddle: http://sqlfiddle.com/#!2/f7045/45/0
So, now we take the union of all those ids, and use them to grab the right rows from the table, and we're done.
SELECT *
FROM STUFF
WHERE ID IN
(
SELECT id FROM
(SELECT max(id) id, fromitem, toitem
FROM stuff
WHERE viewed = 'true'
GROUP BY fromitem, toitem
)a
UNION
select id from (
select max(b.id) id, b.fromitem, b.toitem
from stuff a
join
(select id, fromitem, toitem
from stuff
where viewed = 'true'
) b on ( a.fromitem = b.fromitem
and a.toitem = b.toitem
and b.id < a.id)
where a.viewed = 'true'
group by fromitem, toitem
)c
UNION
select id from
(
select max(d.id) id, d.fromitem, d.toitem
from stuff d
join
(
select max(b.id) id, b.fromitem, b.toitem
from stuff a
join
(
select id, fromitem, toitem
from stuff
where viewed = 'true'
) b on ( a.fromitem = b.fromitem
and a.toitem = b.toitem
and b.id < a.id)
where a.viewed = 'true'
group by fromitem, toitem
) c on ( d.fromitem = c.fromitem
and d.toitem = c.toitem
and d.id < c.id)
where d.viewed='true'
group by d.fromitem, d.toitem
) e
UNION
select id from stuff where viewed='false'
)
order by viewed desc, fromitem, toitem, id desc
Tee hee. Too much SQL. Fiddle: http://sqlfiddle.com/#!2/f7045/47/0
And now, we need to cope with your last requirement, the requirement that your graph is unordered. That is, that from=n to=m is the same as from=m to=n.
To do this we need a virtual table instead of the physical table. This will do the trick.
SELECT id, least(fromitem, toitem) fromitem, greatest(fromitem,toitem) toitem, data
FROM stuff
Now we need to use this virtual table, this view, everywhere the physical table used to appear. Let's use a view to do this.
CREATE VIEW
AS
SELECT id,
LEAST(fromitem, toitem) fromitem,
GREATEST (fromitem, toitem) toitem,
viewed,
data;
So, our ultimate query is:
SELECT *
FROM stuff
WHERE ID IN
(
SELECT id FROM
(SELECT max(id) id, fromitem, toitem
FROM STUFF_UNORDERED
WHERE viewed = 'true'
GROUP BY fromitem, toitem
)a
UNION
SELECT id FROM (
SELECT max(b.id) id, b.fromitem, b.toitem
FROM STUFF_UNORDERED a
JOIN
(SELECT id, fromitem, toitem
FROM STUFF_UNORDERED
WHERE viewed = 'true'
) b ON ( a.fromitem = b.fromitem
AND a.toitem = b.toitem
AND b.id < a.id)
WHERE a.viewed = 'true'
GROUP BY fromitem, toitem
)c
UNION
SELECT id FROM
(
SELECT max(d.id) id, d.fromitem, d.toitem
FROM STUFF_UNORDERED d
JOIN
(
SELECT max(b.id) id, b.fromitem, b.toitem
FROM STUFF_UNORDERED a
JOIN
(
SELECT id, fromitem, toitem
FROM STUFF_UNORDERED
WHERE viewed = 'true'
) b ON ( a.fromitem = b.fromitem
AND a.toitem = b.toitem
AND b.id < a.id)
WHERE a.viewed = 'true'
GROUP BY fromitem, toitem
) c ON ( d.fromitem = c.fromitem
AND d.toitem = c.toitem
AND d.id < c.id)
WHERE d.viewed='true'
GROUP BY d.fromitem, d.toitem
) e
UNION
SELECT id FROM STUFF_UNORDERED WHERE viewed='false'
)
ORDER BY viewed DESC,
least(fromitem, toitem),
greatest(fromitem, toitem),
id DESC
Fiddle: http://sqlfiddle.com/#!2/8c154/4/0