I have a Table A as below
id (integer)
follow_up (integer, days under observation)
matched_id (integer)
id ; follow_up ; matched_id
1 ; 10 ; 19
1 ; 10 ; 20
1 ; 10 ; 21
2 ; 5 ; 22
2 ; 5 ; 23
2 ; 5 ; 24
2 ; 5 ; 19
2 ; 5 ; 20
3 ; 6 ; 25
3 ; 6 ; 26
3 ; 6 ; 27
4 ; 7 ; 19
4 ; 7 ; 28
4 ; 7 ; 29
I would like to limit to 2 records per id, and the records should be randomly picked up and be exclusive for each id. For, example
matched_id: "19" and "20" were given to id:1, then "19" and "20" should not be given to id:2
matched_id: "19" was given to id:1, then "19" should not be given to id:4
and so on for the rest of the table.
require output
id ; follow_up ; matched_id
1 ; 10 ; 19
1 ; 10 ; 20
2 ; 5 ; 22
2 ; 5 ; 23
3 ; 6 ; 25
3 ; 6 ; 26
4 ; 7 ; 28
4 ; 7 ; 29
Please help me. Thank you so much!
This is a very good and very challenging SQL question.
You have a very challenging set of requirements:
1. No matched_id should appear more than once in the result set
2. No ID be given more than two matches
3. The matching be random
We will stick to a pure SQL solution, assuming that you can't return, say, a larger result set and do some filtering using business logic in your implementation language.
First, let's tackle random assignment. Randomly ordering items inside of groups is a fun question. I decided to tackle it by ordering on a SHA1 hash of the data in the row (id, follow_up, matched_id), which will give a repeatable result with a feeling of randomness. (This would be best if there were a column that contained the date/time created or modified.)
SELECT * FROM
(
SELECT
a.id,
a.follow_up,
a.matched_id,
a.rank_hash,
count(*) rank
FROM
(SELECT *, SHA1(CONCAT(id, follow_up, matched_id)) rank_hash FROM TableA) a
JOIN
(SELECT *, SHA1(CONCAT(id, follow_up, matched_id)) rank_hash FROM TableA) b
ON a.rank_hash >= b.rank_hash
AND a.id = b.id
GROUP BY a.id, a.matched_id
ORDER BY a.id, rank
) groups
WHERE rank <= 2
GROUP BY matched_id
This might suffice for your use case if there are sufficient matched_id values for each id. But what if there is a hidden fourth requirement:
4. If possible, an ID should receive a match.
In other words, what if, as a result of random shuffling, a matched_id was assigned to an id that had several other matches, but further down the result set it was the only match for an id? An optimal solution in which every ID were matched with a matched_id was possible, but it never happened because all the matched_ids were used up earlier in the process?
For example:
CREATE TABLE TableA
(`id` int, `follow_up` int, `matched_id` varchar(1))
;
INSERT INTO TableA
(`id`, `follow_up`, `matched_id`)
VALUES
(1, 10, 'A'),
(1, 10, 'B'),
(1, 10, 'C'),
(2, 5, 'D'),
(2, 5, 'E'),
(2, 5, 'F'),
(3, 5, 'C')
;
In the above set, if IDs and their matches are assigned randomly, if ID 1 gets assigned matched_id C, then ID 3 will not get a matched_id at all.
What if we first find out how many matches an ID received, and order by that first?
SELECT
a.*,
frequency
FROM TableA a
JOIN
( SELECT
matched_id,
count(*) frequency
FROM
TableA
GROUP BY matched_id
) b
ON a.matched_id = b.matched_id
GROUP BY a.matched_id
ORDER BY b.frequency
This is where a middleman programming language might come in handy to help limit the result set.
But note that we also lost our requirement of randomness! As you can see, a pure SQL solution might get pretty ugly. It is indeed possible combining the techniques outlined above.
Hopefully this will get your imagination firing.
Along with RAND() and MySQL user defined variables you can achieve this:
SELECT
t.id,
t.follow_up,
t.matched_id
FROM
(
SELECT
randomTable.*,
IF(#sameID = id, #rn := #rn + 1,
IF(#sameID := id, #rn := 1, #rn := 1)
) AS rowNumber
FROM
(
SELECT
*
FROM tableA
ORDER BY id, RAND()
) AS randomTable
CROSS JOIN (SELECT #sameID := 0, #rn := 0) var
) AS t
WHERE t.rowNumber <= 2
ORDER BY t.id
See Demo
Here's a solution for the specific problem given. It does not scale!
SELECT *
FROM
( SELECT a.matched_id m1
, b.matched_id m2
, c.matched_id m3
, d.matched_id m4
FROM my_table a
JOIN my_table b
ON b.matched_id NOT IN(a.matched_id)
JOIN my_table c
ON c.matched_id NOT IN(a.matched_id,b.matched_id)
JOIN my_table d
ON d.matched_id NOT IN(a.matched_id,b.matched_id,c.matched_id)
WHERE a.id = 1
AND b.id = 2
AND c.id = 3
AND d.id = 4
) x
JOIN
( SELECT a.matched_id n1
, b.matched_id n2
, c.matched_id n3
, d.matched_id n4
FROM my_table a
JOIN my_table b
ON b.matched_id NOT IN(a.matched_id)
JOIN my_table c
ON c.matched_id NOT IN(a.matched_id,b.matched_id)
JOIN my_table d
ON d.matched_id NOT IN(a.matched_id,b.matched_id,c.matched_id)
WHERE a.id = 1
AND b.id = 2
AND c.id = 3
AND d.id = 4
) y
ON y.n1 NOT IN(x.m1,x.m2,x.m3,x.m4)
AND y.n2 NOT IN(x.m1,x.m2,x.m3,x.m4)
AND y.n3 NOT IN(x.m1,x.m2,x.m3,x.m4)
AND y.n4 NOT IN(x.m1,x.m2,x.m3,x.m4)
ORDER
BY RAND() LIMIT 1;
+----+----+----+----+----+----+----+----+
| m1 | m2 | m3 | m4 | n1 | n2 | n3 | n4 |
+----+----+----+----+----+----+----+----+
| 20 | 24 | 27 | 29 | 21 | 23 | 26 | 28 |
+----+----+----+----+----+----+----+----+
So, in this example, the pairs are:
id1: 20,21
id2: 24,23
id3: 27,26
id4: 29,28
Related
I am looking to fetch the eqp based on min distance by contract, but if an eqp is taken by a contract then it shouldn't be considered.
Table: T1
id
contract
eqp
distance
1
123
A
2
2
123
B
5
3
123
C
20
4
124
A
2
5
124
B
7
6
124
C
11
I used rank and it gives me same rank for two different contract but I would not want to use the rank for a prior record already taken.
SELECT
id,contract,eqp,rk
FROM
(
SELECT id,contract,eqp,
RANK() OVER (PARTITION BY contract ORDER BY distance) AS rk
FROM t1
) a
WHERE rk=1
What I get is below,
id
contract
eqp
distance
rk
1
123
A
2
1
4
124
A
2
1
Expected Output:
id
contract
eqp
distance
rk
1
123
A
2
1
5
124
B
7
1
The task is iterative. It cannot be solved by single query.
Possible solution:
CREATE PROCEDURE proc ()
BEGIN
CREATE TABLE tmp LIKE t1;
REPEAT
INSERT INTO tmp
SELECT t1.*
FROM t1
LEFT JOIN tmp t2 ON t1.contract = t2.contract -- for huge table
LEFT JOIN tmp t3 ON t1.eqp = t3.eqp -- use NOT EXISTS
WHERE t2.id IS NULL
AND t3.id IS NULL
ORDER BY distance LIMIT 1; -- adjust to needed priority
UNTIL NOT ROW_COUNT() END REPEAT;
SELECT * FROM tmp;
DROP TABLE tmp;
END
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c9563e1d2e9884dc607a52f10ff401bb
I have this table called my_users
my_id | name | raffle_tickets
1 | Bob | 3
2 | Sam | 59
3 | Bill | 0
4 | Jane | 10
5 | Mike | 12
As you can see Sam has 59 tickets so he has the highest chance of winning.
Chance of winning:
Sam = 59/74
Bob = 3/74
Jane = 10/74
Bill = 0/74
Mike = 12/74
PS: 74 is the number of total tickets in the table (just so you know I didn't randomly pick 74)
Based on this, how can I randomly pick a winner, but ensure those who have more raffles tickets have a higher chance of being randomly picked? Then the winner which is picked, has 1 ticket deducted from their total tickets
UPDATE my_users
SET raffle_tickets = raffle_tickets - 1
WHERE my_id = --- Then I get stuck here...
Server version: 5.7.30
For MySQL 8+
WITH
cte1 AS ( SELECT name, SUM(raffle_tickets) OVER (ORDER BY my_id) cum_sum
FROM my_users ),
cte2 AS ( SELECT SUM(raffle_tickets) * RAND() random_sum
FROM my_users )
SELECT name
FROM cte1
CROSS JOIN cte2
WHERE cum_sum >= random_sum
ORDER BY cum_sum LIMIT 1;
For 5+
SELECT cte1.name
FROM ( SELECT t2.my_id id, t2.name, SUM(t1.raffle_tickets) cum_sum
FROM my_users t1
JOIN my_users t2 ON t1.my_id <= t2.my_id
WHERE t1.raffle_tickets > 0
GROUP BY t2.my_id, t2.name ) cte1
CROSS JOIN ( SELECT RAND() * SUM(raffle_tickets) random_sum
FROM my_users ) cte2
WHERE cte1.cum_sum >= cte2.random_sum
ORDER BY cte1.cum_sum LIMIT 1;
fiddle
You want a weighted pull from a random sample. For this purpose, variables are probably the most efficient solution:
select u.*
from (select u.*, (#t := #t + raffle_tickets) as running_tickets
from my_users u cross join
(select #t := 0, #r := rand()) params
where raffle_tickets > 0
) u
where #r >= (running_tickets - raffle_tickets) / #t and
#r < (running_tickets / #t);
What this does is calculate the running sum of tickets and then divide by the number of tickets to get a number between 0 and 1. For example this might produce:
my_id name raffle_tickets running_tickets running_tickets / #t
1 Bob 3 3 0.03571428571428571
2 Sam 59 62 0.7380952380952381
4 Jane 10 72 0.8571428571428571
5 Mike 12 84 1
The ordering of the original rows doesn't matter -- which is why there is no ordering in the subquery.
The ratio is then used with rand() to select a particular row.
Note that in the outer query, #t is the total number of tickets.
Here is a db<>fiddle.
I need to pull the name of the students who stood second positions from grade 1 to grade 12. each grade has separate databases with similar table structure
I have the following data:
Set 1
uid marks
1 10
2 20
3 17
4 17
5 20
6 20
Set 2
uid marks
1 10
2 20
3 17
4 17
5 20
6 17
7 20
I need a query which can say uid 3,4 are second in set 1 and 3,4,6 are second in set 2.
i need it in a single query because there are several set of databases
what could be the possible way?
I tried:
SELECT * FROM TBL WHERE marks ! = SELECT MAX(marks) from tbl
but it fetched all marks except the highest
Try this out:
SELECT uid, marks FROM (
SELECT uid, marks, #rank := #rank + (#prevMarks != marks) rank, #prevMarks := marks
FROM t, (SELECT #rank := 0, #prevMarks := 0) init
ORDER BY marks
) s
WHERE rank = 2
Fiddle here.
Another alternative without User Defined Variables:
SELECT t.uid, t.marks FROM t
JOIN (
SELECT DISTINCT marks FROM t
ORDER BY marks
LIMIT 1, 1
) s
ON t.marks = s.marks
Output:
| UID | MARKS |
|-----|-------|
| 3 | 17 |
| 4 | 17 |
Use LIMIT and ORDER BY
SELECT * FROM TBL ORDER BY marks DESC LIMIT 1,1
There you ordered all students by marks fro hi to low. And then limit return from second (0 is first record) and return only one record.
If need all students with second mark, the use subquery
SELECT * FROM TBL WHERE marks = (
SELECT marks FROM TBL ORDER BY marks DESC GROUP BY marks LIMIT 1,1
)
SELECT *
FROM table
WHERE mark = (
SELECT MAX(mark)
FROM table
WHERE mark <
(
SELECT MAX(mark)
FROM table
)
)
Try this
SELECT t.marks, t.uid, (
SELECT COUNT( marks ) +1
FROM tbl t1
WHERE t.marks < t1.marks
) AS rank
FROM tbl t
LIMIT 0 , 30
now you can use rank column with bit modification below
SELECT * from (
SELECT t.marks, t.uid, (
SELECT COUNT( marks ) +1
FROM tbl t1
WHERE t.marks < t1.marks
) AS rank
FROM tbl t
) alias where rank=n (2 here)
I need to find the missing numbers between 0 and 16.
My table is like this:
CarId FromCity_Id ToCity_Id Ran_Date RunId
1001 0 2 01-08-2013 1
1001 5 9 02-08-2013 2
1001 11 16 03-08-2013 3
1002 0 11 02-08-2013 4
1002 11 16 08-08-2013 5
I need to find out:
In past three months from now(), between which cities the car has not ran.
For example, in the above records:
Car 1001 not ran between 02-05 & 09-11
Car 1002 has run fully (ie between 0-11 and 11-16)
Over all is that, I need to generate a query which shows the section between which the car has not run in past 3 months with showing the last run date.
How to make such an query please. If any Stored Procedure please advise.
God help me. This uses a doubly-correlated subquery, a table that might not exist in your system, and too much caffeine. But hey, it works.
Right, here goes.
SELECT CarId, GROUP_CONCAT(DISTINCT missing) missing
FROM MyTable r,
(SELECT #a := #a + 1 missing
FROM mysql.help_relation, (SELECT #a := -1) t
WHERE #a < 16 ) y
WHERE NOT EXISTS
(SELECT r.CarID FROM MyTable m
WHERE y.missing BETWEEN FromCity_Id AND ToCity_Id
AND r.carid = m.carid)
GROUP BY CarID;
Produces (changing the first row for CarID 1002 to 0-9 to open up 10 and give us better test data):
+-------+---------+
| CarId | missing |
+-------+---------+
| 1001 | 3,4,10 |
| 1002 | 10 |
+-------+---------+
2 rows in set (0.00 sec)
And how does it all work?
Firstly...
The inner query gives us a list of numbers from 0 to 16:
(SELECT #a := #a + 1 missing
FROM mysql.help_relation, (SELECT #a := -1) t
WHERE #a < 16 ) y
It does that by starting at -1, and then displaying the result of adding 1 to that number for each row in some sacrificial table. I'm using mysql.help_relation because it's got over a thousand rows and most basic systems have it. YMMV.
Then we cross join that with MyTable:
SELECT CarId, ...
FROM MyTable r,
(...) y
This gives us every possible combination of rows, so we have each CarId and To/From IDs mixed with every number from 1-16.
Filtering...
This is where it gets interesting. We need to find rows that don't match the numbers, and we need to do so per CarID. This sort of thing would do it (as long as y.missing exists, which it will when we correlate the subquery):
SELECT m.CarID FROM MyTable m
WHERE y.missing BETWEEN FromCity_Id AND ToCity_Id
AND m.CarID = 1001;
Remember: y.missing is set to a number between 1-16, cross-joined with the rows in MyTable. This gives us a list of all numbers from 1-16 where CarID 1001 is busy. We can invert that set with a NOT EXISTS, and while we're at it, correlate (again) with CarId so we can get all such IDs.
Then it's an easy matter of filtering the rows that don't fit:
SELECT CarId, ...
FROM MyTable r,
(...) y
WHERE NOT EXISTS
(SELECT r.CarID FROM MyTable m
WHERE y.missing BETWEEN FromCity_Id AND ToCity_Id
AND r.carid = m.carid)
Output
To give a sensible result (attempt 1), we could then get distinct combinations. Here's that version:
SELECT DISTINCT CarId, missing
FROM MyTable r,
(SELECT #a := #a + 1 missing
FROM mysql.help_relation, (SELECT #a := -1) t
WHERE #a < 16 ) y
WHERE NOT EXISTS
(SELECT r.CarID FROM MyTable m
WHERE y.missing BETWEEN FromCity_Id AND ToCity_Id
AND r.carid = m.carid);
This gives:
+-------+---------+
| CarId | missing |
+-------+---------+
| 1001 | 3 |
| 1001 | 4 |
| 1001 | 10 |
| 1002 | 10 |
+-------+---------+
4 rows in set (0.01 sec)
The simple addition of a GROUP BY and a GROUP CONCAT gives the pretty result you get at the top of this answer.
I apologise for the inconvenience.
select * from carstable where CarId not in
(select distinct CarId from ranRecordTable where DATEDIFF(NOW(), Ran_Date) <= 90)
Hope this helps.
Here is the idea. Create a list of all cars and all numbers. Then, return all combinations that are not covered by the data. This is hard because there is more than one row for each car.
Here is one method:
select cars.CarId, n.n
from (select distinct CarId from t) cars cross join
(select 0 as n union all select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all select 7 union all
select 8 union all select 9 union all select 10 union all select 11 union all
select 12 union all select 13 union all select 14 union all select 15 union all
select 16
) n
where t.ran_date >= now() - interval 90 day and
not exists (select 1
from t t2
where t2.ran_date >= now() - interval 90 day and
t2.CarId = cars.CarId and
n.n not between t2.FromCity_id and t2.ToCity_id
);
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE Table1
(`CarId` int, `FromCity_Id` int, `ToCity_Id` int, `Ran_Date` datetime, `RunId` int)
;
INSERT INTO Table1
(`CarId`, `FromCity_Id`, `ToCity_Id`, `Ran_Date`, `RunId`)
VALUES
(1001, 0, 2, '2013-08-01 00:00:00', 1),
(1001, 5, 9, '2013-08-02 00:00:00', 2),
(1001, 11, 16, '2013-08-03 00:00:00', 3),
(1002, 0, 11, '2013-08-02 00:00:00', 4),
(1002, 11, 16, '2013-08-08 00:00:00', 5)
;
Query 1:
SELECT r1.CarId,r1.ToCity_Id as Missing_From, r2.FromCity_Id as Missing_To,
max(t.Ran_Date) as Last_Run_Date
FROM (
SELECT #i1:=#i1+1 AS rownum, t.*
FROM Table1 as t, (SELECT #i1:=0) as foo
ORDER BY CarId, Ran_Date) as r1
INNER JOIN (
SELECT #i2:=#i2+1 AS rownum, t.*
FROM Table1 as t, (SELECT #i2:=0) as foo
ORDER BY CarId, Ran_Date) as r2 ON r1.CarId = r2.CarId AND
r1.ToCity_Id != r2.FromCity_Id AND
r2.rownum = (r1.rownum + 1)
INNER JOIN Table1 as t ON r1.CarId = t.CarId
WHERE r1.Ran_Date >= now() - interval 90 day
GROUP BY r1.CarId, r1.ToCity_Id, r2.FromCity_Id
Results:
| CARID | MISSING_FROM | MISSING_TO | LAST_RUN_DATE |
|-------|--------------|------------|-------------------------------|
| 1001 | 2 | 5 | August, 03 2013 00:00:00+0000 |
| 1001 | 9 | 11 | August, 03 2013 00:00:00+0000 |
I need help in finding the rows that correspond to the most recent date, the next most recent and the one after that, where some condition ABC is "Y" and group it by a column name XYZ ASC but XYZ can appear multiple times. So, say XYZ is 50, then for the rows in the three years, the XYZ will be 50. I have the following code that executes but returns only two rows out of thousands which is impossible. I tried executing just the date condition but it returned dates that were less than or equal to MAX(DATE)-3 as well. Don't know where I am going wrong.
select * from money.cash where DATE =(
select
MAX(DATE)
from
money.cash
where
DATE > (select MAX(DATE)-3 from money.cash)
)
GROUP BY XYZ ASC
having ABC = "Y";
The structure of the table is as follows (only a schematic, not the real thing).
Comp_ID DATE XYZ ABC $$$$ ....
1 2012-1-1 10 Y SOME-AMOUNT
2 2011-1-1 10 Y
3 2006-1-1 10 Y
4 2011-1-1 20 Y
5 2002-1-1 20 Y
6 2000-1-1 20 Y
7 1998-1-1 20 Y
The desired o/p would be the first three rows for XYZ=10 in ascending order and the most recent 3 dates for XYZ=20.
LAST AND IMPORTANT-This table's values keeps changing as new data comes in. So, the o/p(which will be in a new table) must reflect the dynamics in the 1st/original/above TABLE.
MySQL doesn't have functionallity that is friendly to greatest-n-per-group queries.
One option would be...
- Find the MAX(Date) per group (XYZ)
- Then use that result to find the MAX(Date) of all records before that date
- Then do it again for all records before that date
It's really innefficient, but MySQL hasn't got the functionality required to do this efficiently. Sorry...
CREATE TABLE yourTable
(
comp_id INT,
myDate DATE,
xyz INT,
abc VARCHAR(1)
)
;
INSERT INTO yourTable SELECT 1, '2012-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 2, '2011-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 3, '2006-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 4, '2011-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 5, '2002-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 6, '2000-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 7, '1998-01-01', 20, 'Y';
SELECT
yourTable.*
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
yourTable.XYZ,
MAX(yourTable.myDate) AS MaxDate
FROM
yourTable
WHERE
yourTable.ABC = 'Y'
GROUP BY
yourTable.XYZ
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
INNER JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate >= lookup.MaxDate
WHERE
yourTable.ABC = 'Y'
ORDER BY
yourTable.comp_id
;
DROP TABLE yourTable;
There are other options, but they're all a bit hacky. Search SO for greatest-n-per-group mysql.
My results using your example data:
Comp_ID | DATE | XYZ | ABC
------------------------------
1 | 2012-1-1 | 10 | Y
2 | 2011-1-1 | 10 | Y
3 | 2006-1-1 | 10 | Y
4 | 2011-1-1 | 20 | Y
5 | 2002-1-1 | 20 | Y
6 | 2000-1-1 | 20 | Y
Here's another way, hopefully more efficient than Dems' answer.
Test it with an index on (abc, xyz, date):
SELECT m.xyz, m.date --- for all columns: SELECT m.*
FROM
( SELECT DISTINCT xyz
FROM money.cash
WHERE abc = 'Y'
) AS dm
JOIN
money.cash AS m
ON m.abc = 'Y'
AND m.xyz = dm.xyz
AND m.date >= COALESCE(
( SELECT im.date
FROM money.cash AS im
WHERE im.abc = 'Y'
AND im.xyz = dm.xyz
ORDER BY im.date DESC
LIMIT 1
OFFSET 2 --- to get 3 latest rows per xyz
), DATE('1000-01-01') ) ;
If you have more than rows with same (abc, xyz, date), the query may return more than 3 rows per xyz (all tied in 3rd place will all be shown).