MYSQL "every row excluded, but only once" join - mysql

Imagine a dance party where we line up boys on one side and girls on the other. Each boy is placed into table A, once and only once. Likewise for the girls in table B.
We then want to match up the boys in A once and only once with a girl in B, and likewise the girls in B need to match with one and only one boy in A.
A would have an ID column as a primary key and a b_id that would hold the ID of the girl they were matched with, or NULL if not yet matched.
B would have the same for the girls.
Tables would look like this:
table A (boys)
---------
|id|b_id|
---------
| 1|NULL|
| 2|NULL|
| 3|NULL|
| 4|NULL|
| 5|NULL|
| 6|NULL|
---------
table B (girls)
---------
|id|a_id|
---------
| 1|NULL|
| 2|NULL|
| 3|NULL|
| 4|NULL|
| 5|NULL|
| 6|NULL|
---------
What type of query would match one and only one boy with one and only one girl that is not yet matched? Results would look something like this:
results
-------------------------
|a.id|a.b_id|b.id|b.a_id|
-------------------------
| 1| NULL | 1| NULL |
| 2| NULL | 2| NULL |
| 3| NULL | 3| NULL |
| 4| NULL | 4| NULL |
| 5| NULL | 5| NULL |
| 6| NULL | 6| NULL |
-------------------------
I can get these results by using a temp table with unique indexes, but I'm looking for a query to give me the results without a temp table as the real application of this query is going against millions of rows on both sides.

SELECT boyUnmatched.id AS aid
, NULL AS b_id --- you really don't need these two
, girlUnmatched.id AS bid
, NULL AS a_id --- columns, do you?
FROM
( SELECT #rownuma := #rownuma+1 AS rank
, id
FROM a
, (SELECT #rownuma :=0) AS dummy
WHERE b_id IS NULL
ORDER BY id
) AS boyUnmatched
JOIN
( SELECT #rownumb := #rownumb+1 AS rank
, id
FROM b
, (SELECT #rownumb :=0) AS dummy
WHERE a_id IS NULL
ORDER BY id
) AS girlUnmatched
ON boyUnmatched.rank = girlUnmatched.rank

Related

How can I refine this query?

You might want to have a look at my previous question.
My database schema looks like this
--------------- ---------------
| candidate 1 | | candidate 2 |
--------------- \ --------------
/ \ |
------- -------- etc
|job 1| | job 2 |
------- ---------
/ \ / \
--------- --------- --------- --------
|company | | skills | |company | | skills |
--------- --------- ---------- ----------
Here's my database:
mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+----------------+
| job_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_id | int(11) | NO | MUL | NULL | |
| company_id | int(11) | NO | MUL | NULL | |
| start_date | date | NO | MUL | NULL | |
| end_date | date | NO | MUL | NULL | |
+--------------+---------+------+-----+---------+----------------+
.
mysql> describe candidates;
+----------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------+------+-----+---------+----------------+
| candidate_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_name | char(50) | NO | MUL | NULL | |
| home_city | char(50) | NO | MUL | NULL | |
+----------------+----------+------+-----+---------+----------------+
.
mysql> describe companies;
+-------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------+------+-----+---------+----------------+
| company_id | int(11) | NO | PRI | NULL | auto_increment |
| company_name | char(50) | NO | MUL | NULL | |
| company_city | char(50) | NO | MUL | NULL | |
| company_post_code | char(50) | NO | | NULL | |
| latitude | decimal(11,8) | NO | | NULL | |
| longitude | decimal(11,8) | NO | | NULL | |
+-------------------+---------------+------+-----+---------+----------------+
.
Note that I should probably call this skill_usage, as it indicates when a skill was use don a job.
mysql> describe skills;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO | MUL | NULL | |
| job_id | int(11) | NO | MUL | NULL | |
+----------+---------+------+-----+---------+-------+
.
mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| skill_id | int(11) | NO | PRI | NULL | auto_increment |
| skill_name | char(32) | NO | MUL | NULL | |
+------------+----------+------+-----+---------+----------------+
So far, my MySQL query looks like this:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND sn.skill_id = s.skill_id
ORDER by can.candidate_id, j.job_id
I am getting output like this, but am not satisfied with it
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| candidate_id | candidate_name | candidate_city | job_id | company_id | start_date | end_date | skill_id |
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 2 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 2 | 2 | 2018-06-01 | 2019-01-31 | 3 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 4 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 5 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 6 |
| 1 | Pamela Brown | Cardiff | 4 | 3 | 2016-08-01 | 2017-11-30 | 1 |
| 2 | Christine Hill | Salisbury | 5 | 2 | 2018-02-01 | 2019-05-31 | 3 |
Now, I would like to restrict the search, by specifying "skill", like Python, C, C++, UML, etc and company names
The user will enter something like Python AND C++ into a skill search box (and/or Microsoft OR Google into a company name search box).
How do I feed that into my query? Please bear in mind that each skill ID has a job Id associated with it. Maybe I first need to convert the skill names from the search (in this case Python C++) into skill Ids? Even so, how do I include that in my query?
Te make a few things clearer:
both the skills & company search box can be empty, which I will interpret as "return everything"
search terms can include the keywords AND and OR, with grouping brackets (NOT is not required). I am happy enough to parse that in PHP & turn it into a MySQL query term (my difficulty is only with SQL, not PHP)
It looks like I made a start, with that INNER JOIN skills AS s ON s.job_id = j.job_id, which I think will handle a search for a single skill, given its ... name ? ... Id?
I suppose my question would be how would that query look if, for example, I wanted to restrict the results to anyone who had worked at Microsoft OR Google and has the skills Python AND C++?
If I get an example for that, I can extrapolate, but, at this point, I am unsure whether I want more INNER JOINs or WHERE clauses.
I think that I want to extend that second last line AND sn.skill_id = s.skill_id by paring the skills search string, in my example Python AND C++ and generating some SQL along the lines of AND (s.skill_id = X ), where X is the skill Id for Python, BUT I don't know how to handle Python AND C++, or something more complex, like Python AND (C OR C++) ...
Update
Just to be clear, the users are technical and expect to be able to enter complex searches. E.g for skills: (C AND kernel)OR (C++ AND realtime) OR (Doors AND (UML OR QT)).
Final update
The requirements just changed. The person that I am coding this for just told me that if a candidate matches the skill search on any job that he ever worked, then I ought to return ALL jobs for that candidate.
That sounds counter-intuitive to me, but he swears that that is what he wants. I am not sure it can even be done in a single query (I am considering multiple queries; a first t get the candidates with matching skills, then a second to get all of their jobs).
The first thing I'd say is that your original query probably needs an outer join on the skills table - as it stands, it only retrieves people whose job has a skill (which may not be all jobs). You say that "both the skills & company search box can be empty, which I will interpret as return everything" - this version of the query will not return everything.
Secondly, I'd rename your "skills" table to "job_skills", and your "skill_names" to "skills" - it's more consistent (your companies table is not called company_names).
The query you show has a duplication - AND sn.skill_id = s.skill_id duplicates the terms of your join. Is that intentional?
To answer your question: I would present the skills to your users in some kind of pre-defined list in your PHP, associated with a skill_id. You could have all skills listed with check boxes, or allow the user to start typing and use AJAX to search for skills matching the text. This solves a UI problem (what if the user tries to search for a skill that doesn't exist?), and makes the SQL slightly easier.
Your query then becomes:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND skill_id in (?, ?, ?)
OR skill_id in (?)
ORDER by can.candidate_id, j.job_id
You need to substitute the question marks for the input your users have entered.
EDIT
The problem with allowing users to enter the skills as free text is that you then have to deal with case conversion, white space and typos. For instance, is "python " the same as "Python"? Your user probably intends it to be, but you can't do a simple comparison with skill_name. If you want to allow free text, one solution might be to add a "normalized" skill_name column in which you store the name in a consistent format (e.g. "all upper case, stripped of whitespace"), and you normalize your input values in the same way, then compare to that normalized column. In that case, the "in clause" becomes something like:
AND skill_id in (select skill_id from skill_name where skill_name_normalized in (?, ?, ?))
The boolean logic you mention - (C OR C++) AND (Agile) - gets pretty tricky. You end up writing a "visual query builder". You may want to Google this term - there are some good examples.
You've narrowed down your requirements somewhat (I may misunderstand). I believe your requirements are
I want to be able to specify zero or more filters.
A filter consists of one or more ANDed skill groups.
A skill group consists of one or more skills.
Filters are ORed together to create a query.
To make this concrete, let's use your example - (A and (B OR C)) OR (D AND (E OR F)). There are two filters: (A and (B OR C)) and (D AND (E OR F)). The first filter has two skill groups: A and (B OR C).
It's hard to explain the suggestion in text, but you could create a UI that allows users to specify individual "filters". Each "filter" would allow the user to specify one or more "in clauses", joined with an "and". You could then convert this into SQL - again, using your example, the SQL query becomes
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND
(skill_id in (A) and skil_id in (B, C))
OR
(skill_id in (D) and skil_id in (E, F))
ORDER by can.candidate_id, j.job_id
Building a bit off previous comments and answers... if handling input like
(A and (B OR C)) OR (D AND (E OR F))
is the blocker you could try moving some of the conditional logic out of the joins and filter instead.
WHERE (
((sn.skill_id LIKE 'A') AND ((sn.skill_id LIKE ('B')) OR (sn.skill_id LIKE('C'))))
AND ((co.company_id IN (1,2,3)) AND ((can.city = 'Springfield') OR (j.city LIKE('Mordor'))))
)
You can build your query string based off used input, search out Id's for selected values and put them into the string and conditionally build as many filters as you like. Think about setting up add_and_filter and add_or_filter functions to construct the <db>.<field> <CONDITION> <VALUE> statements.
$qs = "";
$qs .= "select val from table";
...
$qs .= " WHERE ";
if($userinput){ $qs += add_and_filter($userinput); }
alternately, look at a map/reduce pattern rather than trying to do it all in SQL?

How to calculate count of each value in MySQL JSON array?

I have a MySQL table with the following definition:
mysql> desc person;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | text | YES | | NULL | |
| fruits | json | YES | | NULL | |
+--------+---------+------+-----+---------+-------+
The table has some sample data as follows:
mysql> select * from person;
+----+------+----------------------------------+
| id | name | fruits |
+----+------+----------------------------------+
| 1 | Tom | ["apple", "orange"] |
| 2 | John | ["apple", "mango"] |
| 3 | Tony | ["apple", "mango", "strawberry"] |
+----+------+----------------------------------+
How can I calculate the total number of occurrences for each fruit? For example:
+------------+-------+
| fruit | count |
+------------+-------+
| apple | 3 |
| orange | 1 |
| mango | 2 |
| strawberry | 1 |
+------------+-------+
Some research shows that the JSON_LENGTH function can be used but I cannot find an example similar to my scenario.
You can use JSON_EXTRACT() function to extract each value ("apple", "mango", "strawberry" and "orange") of all three components of the arrays, and then then apply UNION ALL to combine all such queries:
SELECT comp, count(*)
FROM
(
SELECT JSON_EXTRACT(fruit, '$[0]') as comp FROM person UNION ALL
SELECT JSON_EXTRACT(fruit, '$[1]') as comp FROM person UNION ALL
SELECT JSON_EXTRACT(fruit, '$[2]') as comp FROM person
) q
WHERE comp is not null
GROUP BY comp
Indeed If your DB's version is 8, then you can also use JSON_TABLE() function :
SELECT j.fruit, count(*)
FROM person p
JOIN JSON_TABLE(
p.fruits,
'$[*]' columns (fruit varchar(50) path '$')
) j
GROUP BY j.fruit;
Demo
You can't do it without first creating a table with one row per fruit.
CREATE TABLE allfruits (fruit VARCHAR(10) PRIMARY KEY);
INSERT INTO allfruits VALUES ('apple'), ('orange'), ('mango'), ('strawberry');
There is not a good way to generate this from the JSON.
Once you have that table, you can join it to the JSON and then use GROUP BY to count the occurrences.
SELECT fruit, COUNT(*) AS count
FROM allfruits
JOIN person ON JSON_SEARCH(person.fruits, 'one', fruit) IS NOT NULL
GROUP BY fruit;
Output:
+------------+-------+
| fruit | count |
+------------+-------+
| apple | 3 |
| mango | 2 |
| orange | 1 |
| strawberry | 1 |
+------------+-------+
Note that it will do a table-scan on the person table to find each fruit. This is pretty inefficient, and as your person table gets larger, it will become a performance problem.
If you want to optimize for this type of query, then you shouldn't use JSON to store an array of fruits. You should store data in a normalized way, representing the many-to-many relationship between persons and fruits with another table.
This is related to my answer to Is storing a delimited list in a database column really that bad?
I think the simplest solution would be to use JSON_TABLE function.
The query you need is
select ft.fruit, count(ft.fruit) from person,
json_table(
fruits,
'$[*]' columns(
fruit varchar(128) path '$'
)
) as ft
group by ft.fruit
;
You can find working example in this dbfiddle
Fruit demo

MySQL self join return all rows

Per the example data below, I need a query that returns every row, where if the 'contingent_on' field is NULL, it is returned as NULL, but if it is not NULL it is returned with the 'ticket_name' corresponding to the 'primary_key' value.
I tried self join queries but could only get them to return the not NULL rows.
example table data:
primary_key | ticket_name | contingent_on
1 | site preparation | NULL
2 | tender process | NULL
3 | construction | 1
All rows should be returned, where the in the 'construction' row return, 'site preparation' is input in place of '1' in the 'contingent_on' field.
You need a self left join:
select
t.primary_key,
t.ticket_name,
tt.ticket_name ticket_name2
from tablename t left join tablename tt
on tt.primary_key = t.contingent_on
order by t.primary_key
See the demo.
Results:
| primary_key | ticket_name | ticket_name2 |
| ----------- | ---------------- | ---------------- |
| 1 | site preparation | null |
| 2 | tender process | null |
| 3 | construction | site preparation |
It looks simple query:
select
primary_key,
ticket_name,
case when contingent_on is not null then ticket_name else contingent_on end as contingent_on
from <<your_table>>
order by primary_key

MySQL re-order row priority field values and also make them sequential

Given the following table:
+--------+-------------------+-----------+
| ID | Name | Priority |
+--------+-------------------+-----------+
| 1 | Andy | 1 |
| 2 | Bob | 2 |
| 3 | David | 8 |
| 4 | Edward | 9 |
| 5 | CHARLES | 15 |
+--------+-------------------+-----------+
I would like to move CHARLES to between Bob and David by Priority value (ignore the alphabetical list, this is just to make the desired result obvious).
(Also note the Priority values may not be sequential)
To do this I need to change CHARLES' current Priority (15) to Bob's Priority+1, and update David and Edward's Priority to Priority+1.
I can DO this if I know two things, the id of CHARLES and the Priority value of the row he must be after (Bob):
UPDATE mytable SET Priority =
IF(ID = :charles_id, :bob_priority + 1,
IF(Priority >= :bob_priority,
Priority + 1, Priority))
The PROBLEM or at least question is, how could I compress the resulting values to 1,2,3,4,5 instead of 1,2,3,9,10 - and do it in one shot?
Oracle has a "pseudo field" which is the index of the row, but I don't know of anything equivalent in MySQL.
The first part of the problem is fairly trivial...
DROP TABLE IF EXISTS priorities;
CREATE TABLE priorities
(ID SERIAL PRIMARY KEY
,Name VARCHAR(12) NOT NULL
,Priority INT NOT NULL
,INDEX(priority)
);
INSERT INTO priorities VALUES
(101,'Andy',1),
(108,'Bob',2),
(113,'David',8),
(124,'Edward',9),
(155,'CHARLES',15);
UPDATE priorities a
JOIN
( SELECT x.id,x.name, #i:=#i+1 priority FROM priorities x, (SELECT #i:=0) vars ORDER BY id) b
ON b.id = a.id
SET a.priority = b.priority;
SELECT * FROM priorities
+-----+---------+----------+
| ID | Name | Priority |
+-----+---------+----------+
| 101 | Andy | 1 |
| 108 | Bob | 2 |
| 113 | David | 3 |
| 124 | Edward | 4 |
| 155 | CHARLES | 5 |
+-----+---------+----------+

MySQL - Select everything from one table, but only first matching value in second table

I'm feeling a little rusty with creating queries in MySQL. I thought I could solve this, but I'm having no luck and searching around doesn't result in anything similar...
Basically, I have two tables. I want to select everything from one table and the matching row from the second table. However, I only want to have the first result from the second table. I hope that makes sense.
The rows in the daily_entries table are unique. There will be one row for each day, but maybe not everyday. The second table notes contains many rows, each of which are associated with ONE row from daily_entries.
Below are examples of my tables;
Table One
mysql> desc daily_entries;
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| eid | int(11) | NO | PRI | NULL | auto_increment |
| date | date | NO | | NULL | |
| location | varchar(100) | NO | | NULL | |
+----------+--------------+------+-----+---------+----------------+
Table Two
mysql> desc notes;
+---------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+----------------+
| task_id | int(11) | NO | PRI | NULL | auto_increment |
| eid | int(11) | NO | MUL | NULL | |
| notes | text | YES | | NULL | |
+---------+---------+------+-----+---------+----------------+
What I need to do, is select all entries from notes, with only one result from daily_entries.
Below is an example of how I want it to look:
+----------------------------------------------+---------+------------+----------+-----+
| notes | task_id | date | location | eid |
+----------------------------------------------+---------+------------+----------+-----+
| Another note | 3 | 2014-01-02 | Home | 2 |
| Enter a note. | 1 | 2014-01-01 | Away | 1 |
| This is a test note. To see what happens. | 2 | | Away | 1 |
| Testing another note | 4 | | Away | 1 |
+----------------------------------------------+---------+------------+----------+-----+
4 rows in set (0.00 sec)
Below is the query that I currently have:
SELECT notes.notes, notes.task_id, daily_entries.date, daily_entries.location, daily_entries.eid
FROM daily_entries
LEFT JOIN notes ON daily_entries.eid=notes.eid
ORDER BY daily_entries.date DESC
Below is an example of how it looks with my query:
+----------------------------------------------+---------+------------+----------+-----+
| notes | task_id | date | location | eid |
+----------------------------------------------+---------+------------+----------+-----+
| Another note | 3 | 2014-01-02 | Home | 2 |
| Enter a note. | 1 | 2014-01-01 | Away | 1 |
| This is a test note. To see what happens. | 2 | 2014-01-01 | Away | 1 |
| Testing another note | 4 | 2014-01-01 | Away | 1 |
+----------------------------------------------+---------+------------+----------+-----+
4 rows in set (0.00 sec)
At first I thought I could simply GROUP BY daily_entries.date, however that returned only the first row of each matching set. Can this even be done? I would greatly appreciate any help someone can offer. Using Limit at the end of my query obviously limited it to the value that I specified, but applied it to everything which was to be expected.
Basically, there's nothing wrong with your query. I believe it is exactly what you need because it is returning the data you want. You can not look at as if it is duplicating your daily_entries you should be looking at it as if it is return all notes with its associated daily_entry.
Of course, you can achieve what you described in your question (there's an answer already that solve this issue) but think twice before you do it because such nested queries will only add a lot of noticeable performance overhead to your database server.
I'd recommend to keep your query as simple as possible with one single LEFT JOIN (which is all you need) and then let consuming applications manipulate the data and present it the way they need to.
Use mysql's non-standard group by functionality:
SELECT n.notes, n.task_id, de.date, de.location, de.eid
FROM notes n
LEFT JOIN (select * from
(select * from daily_entries ORDER BY date DESC) x
group by eid) de ON de.eid = n.eid
You need to do these queries with explicit filtering for the last row. This example uses a join to do this:
SELECT n.notes, n.task_id, de.date, de.location, de.eid
FROM daily_entries de LEFT JOIN
notes n
ON de.eid = n.eid LEFT JOIN
(select n.eid, min(task_id) as min_task_id
from notes n
group by n.eid
) nmin
on n.task_id = nmin.min_task_id
ORDER BY de.date DESC;