In MySQL, How to Select a Row From A Table Exactly Once to Populate Another Table? - mysql

I have a table of seven recipes, each of which needs to be assigned to a student. Each student can be assigned a maximum of one recipe, and there are more total students than total recipes, so some students will not receive any assignment.
In my table of assignments, I need to populate which recipe is assigned to which student. (In my business requirements, assignments must be a freestanding table; I cannot add a column to the recipes table).
Below is the script I am using (including for creating sample data).
I had hoped by using the NOT EXISTS clause, I could prevent a student from being assigned more than one recipe.... but this is not working because the same student is being assigned to every recipe. Any guidance on how to fix my script would be greatly appreciated. Thank you!
/* CREATE TABLE HAVING SEVEN RECIPES */
CREATE TABLE TempRecipes( Recipe VARCHAR(16) );
INSERT INTO TempRecipes VALUES ('Cake'), ('Pie'), ('Cookies'), ('Ice Cream'), ('Brownies'), ('Jello'), ('Popsicles');
/* CREATE TABLE HAVING TEN STUDENTS, i.e. MORE STUDENTS THAN AVAILABLE RECIPES */
CREATE TABLE TempStudents( Student VARCHAR(16) );
INSERT INTO TempStudents VALUES ('Ann'), ('Bob'), ('Charlie'), ('Daphne'), ('Earl'), ('Francine'), ('George'), ('Heather'), ('Ivan'), ('Janet');
/* CREATE TABLE TO STORE THE ASSIGNMENTS */
CREATE TABLE TempAssignments( Recipe VARCHAR(16), Student VARCHAR(16) );
INSERT INTO TempAssignments( Recipe, Student )
SELECT TempRecipes.Recipe, ( SELECT S1.Student FROM TempStudents S1 WHERE NOT EXISTS (SELECT TempAssignments.Student FROM TempAssignments WHERE TempAssignments.Student = S1.Student) LIMIT 1 ) Student
FROM TempRecipes;

One way you can consider is making two separate queries, make them as a derived table and assigning a unique identifier on each query that you can match against another. I think that the unique identifier can be a row number.
This suggestion is for MySQL v8+ that supports ROW_NUMBER() function (or if I'm not mistaken; on MariaDB v10.2+?). You've already established these conditions:
Each student can be assigned a maximum of one recipe.
If students count are more than recipes then some students will not receive any recipe assignment.
Let's assume that there's an additional condition:
The recipe assigned will be random.
So, both table will have basically the same query structure as such:
SELECT Student,
ROW_NUMBER() OVER (ORDER BY RAND()) AS Rn1
FROM TempStudents;
SELECT Recipe,
ROW_NUMBER() OVER (ORDER BY RAND()) AS Rn2
FROM TempRecipes;
In that query, the additional condition no.3 of "random assignment" is implemented in the ROW_NUMBER() function. If you run the query as is, you'll almost definitely get different result of row number assignment every time. If you don't wish to do so - let's say maybe you prefer to order by student/recipe name descending - then you just replace ORDER BY RAND() with ORDER BY Student DESC.
Next we'll make both queries as derived tables then join them by matching the row number like this:
SELECT *
FROM
(SELECT Student,
ROW_NUMBER() OVER (ORDER BY RAND()) AS Rn1
FROM TempStudents) a
LEFT JOIN
(SELECT Recipe,
ROW_NUMBER() OVER (ORDER BY RAND()) AS Rn2
FROM TempRecipes) b
ON a.Rn1=b.Rn2;
The reason I'm doing LEFT JOIN here is to show that there will be some student without recipe assignment. Here's the result:
Student
Rn1
Recipe
Rn2
Ann
1
Cookies
1
Bob
2
Jello
2
Charlie
3
Pie
3
Daphne
4
Brownies
4
Earl
5
Popsicles
5
Francine
6
Cake
6
George
7
Ice Cream
7
Heather
8
NULL
NULL
Ivan
9
NULL
NULL
Janet
10
NULL
NULL
If you're doing INNER JOIN then you'll not see the last 3 of the result above since they're no matching row number from the recipe table. Our last step is just adding insert command to the query like so:
INSERT INTO TempAssignments
SELECT Recipe, Student
FROM
....
Do note that this example is using random ordering therefore the result in the TempAssignments table after the insert might not be the same as the one you get while doing testing.
Here's a fiddle for reference

Related

Determine row number for entry in a sorted SQL table

Suppose I have a table storing say student information
id | first_name | last_name | dob | score
where score is some non-unique numeric assessment of their performance. (the schema isn't really relevant, I'm trying to go as generic as possible)
I'd like to know, for any given student, what their score-based overall ranking is. ROW_NUMBER() or any equivalent counter method doesn't really work since they're only accounting for returned entries, so if you're only interested in one particular student, it's always going to be 1.
Counting the number of students with scores greater than the current one won't work either since you can have multiple students with the same score. Would sorting additionally by a secondary field, such as dob work, or would it be too slow?
You should be able to JOIN into a subquery which will provide the ranks of each student across the entire population:
SELECT student.*, ranking.rank
FROM student
JOIN (
SELECT id, RANK() OVER (ORDER BY score DESC) as rank
FROM student
) ranking ON student.id = ranking.id
I suppose the scale of your data will be a key determinant of whether or not this is a realistic solution for your use case.

MySQL query for multi-column distinct plus an ancillary column condition

Imagine a flat table that tracks game matches in which each game has three participants: an attacker, a defender and a bettor who is wagering on the outcome of the battle between players 1 and 2. The table includes the names of the players and the bettor of each game, as well as the date of the game, the scores of each player, the game venue and the name of the referee. I have included the CREATE sql for some sample data below.
DROP TABLE IF EXISTS `game`;
CREATE TABLE `game` (
`game_date` text,
`player_1` text,
`player_2` text,
`bettor` text,
`p1_score` double DEFAULT NULL,
`p2_score` double DEFAULT NULL,
`result` double DEFAULT NULL,
`venue` text,
`referee` text
)
INSERT INTO `game` VALUES ('2020-04-05','Bob','Kelly','Kevin',100,78,0.2,'TS1','Richard'),('2020-03-06','Jim','Bob','Dave',100,97,1.2,'TS2','Mike'),('2020-02-05','Jim','Bob','Kevin',100,86,0.9,'TS2','Mike'),('2020-01-06','Kelly','Bob','Jim',100,92,1.3,'TS2','Richard'),('2019-12-07','Kelly','Bob','Jim',100,98,1.7,'TS1','Mike'),('2019-11-07','Kelly','Bob','Kevin',78,100,2.1,'TS2','Mike'),('2019-10-08','Kelly','Bob','Kevin',97,100,1.5,'TS1','Mike'),('2019-09-08','Kelly','Jim','Dave',86,100,2.4,'TS1','Richard'),('2019-08-09','Kelly','Jim','Dave',92,100,2.8,'TS2','Mike'),('2019-07-10','Kelly','Jim','Dave',98,100,2.2,'TS2','Mike'),('2019-06-10','Kelly','Jim','Dave',100,78,1.9,'TS2','Richard'),('2019-05-11','Sarah','Jim','Kevin',100,97,2.1,'TS1','Mike'),('2019-04-11','Sarah','Jim','Kevin',100,86,2.1,'TS2','Mike'),('2019-03-12','Sarah','Jim','Kevin',100,92,2.8,'TS1','Mike'),('2019-02-10','Sarah','Jim','Kevin',100,98,1.8,'TS1','Richard');
I need a query that returns match info for each unique assembly of match participants... but only for the first match that the three participants ever played in all together, i.e., for the earliest game_date among the matches that all three participated in.
For example, a game where Bob was player 1, Kelly was player two and Kevin was the bettor would constitute a unique threesome. In the data, there is only one such pairing for this threesome so the query would return a row for that one match.
In the case of Sarah as player 1, Jim as player 2 and Kevin as bettor, there are four matches with that threesome and so the query would return only info for the earliest match, i.e., the one 2/10/2019.
Note that in the sample data there are two matches with the threesome 'Kelly','Bob','Jim'. There are also two other matchs with the threesome 'Kelly','Jim','Bob'. These are not the same because Bob and Jim swap places has player 2 and bettor. So the query would return one row for each of them, i.e., the matches dated '12/072019' and '08/09/2019', respectively.
Using DISTINCT, I can return a list of all of the unique player groupings.
SELECT DISTINCT player_1, player_2, bettor FROM games;
Using GROUP BY, I can return all of the game info for all of the matches the group played in.
SELECT * FROM games GROUP BY player_1, player_2, bettor;
But I can't figure out how to return all of the game info but only for the earliest game where all three participants played together and in distinct roles in the games.
I have tried sub-queries using MIN() for game_date but that's a loser. I suspect there is perhaps an INNER JOIN solution but I haven't found it yet.
I am grateful for any guidance you can provide.
One canonical approach uses a join to a subquery which identifies the earliest games for each trio:
SELECT g1.*
FROM games g1
INNER JOIN
(
SELECT player_1_name, player_2_name, player_3_name,
MIN(game_date) AS min_game_date
FROM games
GROUP BY player_1_name, player_2_name, player_3_name
) g2
ON g2.player_1_name = g1.player_1_name AND
g2.player_2_name = g1.player_2_name AND
g2.player_3_name = g1.player_3_name AND
g2.min_game_date = g1.game_date;
If you are running MySQL 8+, then the ROW_NUMBER analytic function provides another option:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY player_1_name, player_2_name,
player_3_name
ORDER BY game_date) rn
FROM games
)
SELECT *
FROM cte
WHERE rn = 1;

sql table design to fetch records with multiple inclusion and exclusion conditions

We want to select customers based on following parameters i.e. customer should be in:
specific city i.e. cityId=1,2,3...
specific customerId should be excluded i.e. customerId=33,2323,34534...
specific age i.e. 5 years, 7 years, 72 years...
This inclusion & exclusion list can be any long.
How should we design database for this:
Create separate table 'customerInclusionCities' for these inclusion cities and do like:
select * from customers where cityId in (select cityId from customerInclusionCities)
Some we do for age, create table 'customerEligibleAge' with all entries of eligible age entries:
i.e. select * from customers where age in (select age from customerEligibleAge)
and Create separate table 'customerIdToBeExcluded' for excluding customers:
i.e. select * from customers where customerId not in (select customerId from customerIdToBeExcluded)
OR
Create One table with Category and Ids.
i.e. Category1 for cities, Category2 for CustomerIds to be excluded.
Which approach is better, creating one table for these parameters OR creating separate tables for each list i.e. age, customerId, city?
IN ( SELECT ... ) can be very slow. Do your query as a single SELECT without subqueries. I assume all 3 columns are in the same table? (If not, that adds complexity.) The WHERE clause will probably have 3 IN ( constants ) clauses:
SELECT ...
FROM tbl
WHERE cityId IN (1,2,3...)
AND customerId NOT IN (33,2323,34534...)
AND age IN (5, 7, 72)
Have (at least):
INDEX(cityId),
INDEX(age)
(Negated things are unlikely to be able to use an index.)
The query will use one of the indexes; having both will give the Optimizer a choice of which it thinks is better.
Or...
SELECT c.*
FROM customers AS c
JOIN cityEligible AS b ON b.city = c.city
JOIN customerEligibleAge AS ce ON c.age = ce.age
LEFT JOIN customerIdToBeExcluded AS ex ON c.customerId = ex.customerId
WHERE ex.customerId IS NULL
Suggested indexes (probably as PRIMARY KEY):
customers: (city)
customerEligibleAge: (age)
customerIdToBeExcluded: (customerId)
In order to discuss further, please provide SHOW CREATE TABLE for each table and EXPLAIN SELECT ... for any of the queries actually work.
If you use the database only that operation, I recommend to use the first solution. Also the first solution is very simple to deploy.
The second solution fills up with junk the DB.

Clients with at least one call per vendor

I have this situation.
I have clients and i have calls
I want yo know if a clien had at least 1 call per date , and if he/she not , i want an array with de dates without call.
client
id name
1 robert
2 nidia
Call
id date id_client
1 2015-01-01 2
2 2015-01-31 1
The id client 1 has not calls all days least 2015-01-01
did you understand?
Firstly, there's going to have to be some mechanism for determining exactly which dates your query checks against. There are theoretically an infinite number of dates during which a given client may not have placed any call! You have to query against a subset of that infinity.
If you're ok with hard-coding in the query a small number of dates to check, then I think this is what you're looking for:
drop table if exists call;
drop table if exists client;
create table client (id int, name varchar(32), primary key (id) );
insert into client (id,name) values (1,'robert'), (2,'nidia');
create table call (id int, d date, client_id int references client(id), primary key (id) );
insert into call (id,d,client_id) values (1,'2015-01-01',(select id from client where name='nidia')), (2,'2015-01-31',(select id from client where name='robert'));
select
cl.name,
array_agg(ds.d) no_call_dates
from
client cl
cross join (select '2015-01-01'::date d union all select '2015-01-15' union all select '2015-01-31') ds
left join call ca on ca.client_id=cl.id and ca.d=ds.d
where
ca.id is null
group by
cl.name
;
Output:
name | no_call_dates
--------+-------------------------
nidia | {2015-01-31,2015-01-15}
robert | {2015-01-15,2015-01-01}
(2 rows)
I've hard-coded and unioned three dates into a single-column "table literal" (if you will) and cross-joined that with the client table, resulting in one row per-client-per-date. That result-set can then be left-joined with the call table on the client id and call date. You can then use the where clause to filter for only rows where the call table failed to join, which produces a result-set of no-call-days, still per-client-per-date. You can then group by the client and use the array_agg() aggregate function to construct the array of the dates on which the client did not have a call.
If you don't want to hard-code the dates, you can prepare a table of dates in advance and select from that in the cross join clause, or select all dates that are defined in the call table (select distinct d from call;), or use some more complex bit of logic to select the dates to check against. In all these cases, you would simply replace the "table literal" with the appropriate subquery.
Edit: Yes, that's very doable. You can use the generate_series() function to generate an integer series and add it to a fixed start date, which results in a date range. This can be done in the cross-join subquery, as I mentioned before. A good approach to avoid repetition is to use a CTE to set the start and end date:
select
cl.name,
array_agg(ds.d order by ds.d) no_call_dates
from
client cl
cross join (with dr as (select '2015-01-01'::date s, '2015-01-31'::date e) select s+generate_series(0,e-s,15) d from dr) ds
left join call ca on ca.client_id=cl.id and ca.d=ds.d
where
ca.id is null
group by
cl.name
;
Output:
name | no_call_dates
--------+-------------------------
nidia | {2015-01-16,2015-01-31}
robert | {2015-01-01,2015-01-16}
(2 rows)
In the above query, I generate three dates, 2015-01-01, 2015-01-16, and 2015-01-31, by using a date range from Jan 1 to Jan 31 with an increment of 15. Obviously for your case you'll probably want an increment of one, but I just used 15 for a nice simple example with only three dates.
Also, I added an order by clause in the array_agg() call, because it's nicer to get it ordered by date, rather than random.

Mysql How to create columns name from select result

I want to select result from different database based on a subsring value from columns.
Here is my table student:
Original_student Other_student
1010173 1240240
1010173 1240249
The 3rd digit in the number will be used to distinguish database. for example. I want the query be
select original_student, Other_student, month
from student join database-(substring(other_student,3,1).payment
My question is: How can I concatenate the substring to a database name or column name dynamically?
Thanks
Supposing you have a field to identify each student by a unique id (id_student), here is a cheap alternative:
CREATE OR REPLACE VIEW v_student_payment AS
SELECT 0 AS db, payment, id_student FROM database-0
UNION
SELECT 1 AS db, payment, id_student FROM database-1
UNION
SELECT 2 AS db, payment, id_student FROM database-2
UNION
SELECT 3 AS db, payment, id_student FROM database-3
/* here you have to add all databases you're using. There's a little maintenance cost, for if one day there's a new database to be created this view would have to be modified */
;
SELECT
original_student,
Other_student,
month,
v.payment
FROM
student s
JOIN v_student_payment v ON v.id_student = s.id_student AND v.db = SUBSTRING(other_student,3,1)
Did you try using a Case statement for checking with the join. Try looking at this link
Use Case Statement in Join