Remove duplicate rows using MIN() function with GROUP BY and INNER JOIN - mysql

I am trying to list the unique/distinct MIN(time) for each person in the 'Results table' while joining the 'Athletes table' but I am getting duplicates.
Here is some sample data (I am running MySql 5.7)
Results Table
+----------+-----------+---------+----------+-------+-------------+------------+
| resultID | athleteID | eventID | ageGroup | time | venue | date |
+----------+-----------+---------+----------+-------+-------------+------------+
| 1 | 10 | 1 | MS | 10.20 | Tokyo | 06-06-2019 |
| 2 | 11 | 1 | MS | 10.24 | London | 03-08-2019 |
| 3 | 10 | 1 | MS | 10.20 | Los Angeles | 01-11-2019 |
| 4 | 13 | 1 | MS | 10.29 | Glasgow | 28-10-2019 |
| 5 | 14 | 1 | MS | 10.32 | Oslo | 16-07-2019 |
| ... | ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... | ... |
+----------+-----------+---------+----------+-------+-------------+------------+
Athletes Table
+-----------+-----------+----------+--------+-------------+
| athleteID | nameFirst | nameLast | gender | dateOfBirth |
+-----------+-----------+----------+--------+-------------+
| 10 | Bill | Smith | MS | 10-11-2000 |
| 11 | John | Brown | MS | 1-08-1999 |
| 12 | Steve | Jones | MS | 16-01-1997 |
| 13 | Alan | Green | MS | 21-07-2001 |
| 14 | Paul | Black | MS | 27-10-2000 |
| ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... |
+-----------+-----------+----------+--------+-------------+
I have tried the following code - which appears to bring the correct results set, but returns duplicate values. Bill Smith ran 10.20 twice but I only need to show one of them.
Have tried using the DISTINCT function on both SELECT's but no luck - so this is what I have:
SELECT *
FROM results
INNER JOIN (
SELECT athleteID, nameFirst, nameLast, MIN(time) as minTime
FROM results
INNER JOIN athletes USING(athleteID)
WHERE eventID = '1'
AND ageGroup IN('MS')
AND YEAR(results.date) = '2019'
GROUP BY athleteID
) AS child ON (results.athleteID = child.athleteID) AND (results.time = minTime)
HAVING YEAR(results.date) = '2019'
ORDER BY minTime ASC
I get this result
+-------+-----------+----------+-------------+------------+
| time | nameFirst | nameLast | venue | date |
+-------+-----------+----------+-------------+------------+
| 10.20 | Bill | Smith | Tokyo | 06-06-2019 |
| 10.20 | Bill | Smith | Los Angeles | 01-11-2019 |
| 10.24 | John | Brown | London | 03-08-2019 |
| 10.29 | Steve | Jones | Glasgow | 28-10-2019 |
| 10.32 | Alan | Green | Oslo | 16-07-2019 |
| ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... |
+-------+-----------+----------+-------------+------------+
As you can see, the additional result for Bill Smith (10.20 - Los Angeles) is also showing up. I need this to be omitted and only show 1 result per athlete - as below.
Desired Result
+-------+-----------+----------+---------+------------+
| time | nameFirst | nameLast | venue | date |
+-------+-----------+----------+---------+------------+
| 10.20 | Bill | Smith | Tokyo | 06-06-2019 |
| 10.24 | John | Brown | London | 03-08-2019 |
| 10.29 | Steve | Jones | Glasgow | 28-10-2019 |
| 10.32 | Alan | Green | Oslo | 16-07-2019 |
+-------+-----------+----------+---------+------------+
Any suggestions as to what I could try?
Many thanks in advance ..

You have athlete with the same min time in this case you need the min date too in outer select
SELECT r.athleteID, r.nameFirst, r.nameLast, min(r.date), child.minTime
FROM results r
INNER JOIN (
SELECT athleteID, nameFirst, nameLast
, MIN(time) as minTime
FROM results
INNER JOIN athletes USING(athleteID)
WHERE eventID = '1'
AND ageGroup IN('MS')
AND YEAR(results.date) = '2019'
GROUP BY athleteID
) AS child ON (r.athleteID = child.athleteID) AND (r.time = minTime)
WHERE YEAR(r.date) = '2019'
GROUP BY r.athleteID, child.minTime
ORDER BY minTime ASC

Related

How can I enhance this query to use only one view?

I am learning SQL and DB, I need to make the following query, I need to make a query that finds the dates that there were more car crashes and list the names of the people who were involved in these car crashes
person
| name | id_person |
|--------|------------|
| Oliver | 000000001 |
| Harry | 000000002 |
| Jacob | 000000003 |
| Maria | 000000004 |
| Jack | 000000005 |
participated
| id_person | num_crash | cost_damage |
|------------|-------------|---------------|
| 00000001 | 11111101 | 200 |
| 00000002 | 11111102 | 120 |
| 00000003 | 11111102 | 120 |
| 00000004 | 11111103 | 400 |
| 00000005 | 11111104 | 300 |
| 00000002 | 11111105 | 280 |
| 00000005 | 11111106 | 260 |
crash
| num_crash | date_crash | crash_scene |
|-------------|--------------|-------------|
| 11111101 | 2020/04/28 | bairro 4 |
| 11111102 | 2020/05/01 | bairro 1 |
| 11111103 | 2020/05/01 | bairro 2 |
| 11111104 | 2020/05/04 | bairro 3 |
| 11111105 | 2020/05/04 | bairro 1 |
| 11111106 | 2020/05/04 | bairro 3 |
output example
| data_crash | num_crash | name |
|--------------|-------------|-------|
| 2020/05/04 | 11111104 | Jack |
| 2020/05/04 | 11111105 | Harry |
| 2020/05/04 | 11111106 | Jack |
| 2020/05/01 | 11111102 | Harry |
| 2020/05/01 | 11111102 | Jacob |
| 2020/05/01 | 11111103 | Maria |
This is my sql query
CREATE VIEW vwfrequencedatecrash AS
SELECT date_crash, num_crash, crash_scene, ROW_NUMBER() OVER (PARTITION
BY date_crash ORDER BY date_crash) AS frequence
FROM crash
ORDER BY frequence DESC;
CREATE VIEW vwmorefrequencedate AS
SELECT date_crash, num_crash, crash_scene, frequence
FROM vwfrequencedatecrash
WHERE frequence > 1;
SELECT vw.date_crash, pa.num_crash, p.name
FROM vwmorefrequencedate vw
JOIN crash c ON c.date_crash = vw.date_crash
JOIN participated pa ON c.num_crash = pa.num_crash
JOIN person p ON pa.id_person = p.id_person
ORDER BY vw.frequence DESC, c.date_crash;
how can I improve this query?

Multiple aggregations and group by in single query

I have an SQL table with roughly the following structure:
Employee| date | department | Country | Designation
What I would like is to get results with the following structure:
count_emp_per_department | count_emp_per_country | count_emp_per_designation |
Currently I am using UNION ALL, that is constructing a query similar to that one:
SELECT emp_ID, NULL, count(1)
FROM employee
GROUP BY country
UNION ALL
SELECT NULL, emp_ID, count(1)
FROM film
GROUP BY designation
Is this the most effective way to perform multiple aggregations and return all of them in a single result set in Hive?
Kindly share if you new approach which can optimize/enhance performance.
Not sure whether its a real requirement.. as the output isnt that useful.. anyway
Here is the structure and query.
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| emp | int | |
| dt | date | |
| dept | string | |
| country | string | |
| desig | string | |
+-----------+------------+----------+
+--------+-------------+---------+------------+----------+
| t.emp | t.dt | t.dept | t.country | t.desig |
+--------+-------------+---------+------------+----------+
| 1 | 2020-02-02 | human | usa | hr |
| 2 | 2020-02-02 | dir | usa | hr |
| 3 | 2020-02-02 | dir | canada | it |
+--------+-------------+---------+------------+----------+
with q1 as (select dept,count(*) as deptcount from t group by dept),
q2 as (select country,count(*) as countrycount from t group by country),
q3 as (select desig,count(*) as desigcount from t group by desig)
select * from q1, q2, q3;
output will be like this..
+----------+---------------+-------------+------------------+-----------+----------------+
| q1.dept | q1.deptcount | q2.country | q2.countrycount | q3.desig | q3.desigcount |
+----------+---------------+-------------+------------------+-----------+----------------+
| dir | 2 | canada | 1 | hr | 2 |
| dir | 2 | usa | 2 | hr | 2 |
| dir | 2 | canada | 1 | it | 1 |
| dir | 2 | usa | 2 | it | 1 |
| human | 1 | canada | 1 | hr | 2 |
| human | 1 | usa | 2 | hr | 2 |
| human | 1 | canada | 1 | it | 1 |
| human | 1 | usa | 2 | it | 1 |
+----------+---------------+-------------+------------------+-----------+----------------+

MySQL select statement missing some fields

I have following statement that is used to select some fields from MySQL DB
select finance_budget_issue.budget_date, SUM(finance_budget_issue.amount) AS amount, finance_vote.office_id as vote_office_id, finance_office.office_head as head,
finance_office.office_name AS office_name,
finance_budget.ref_no, finance_budget_issue.view_status, tbl_signature.office_head as sign_office_head, tbl_signature.name AS name,
tbl_signature.post AS post, tbl_signature.sign_id
from finance_budget_issue
inner join finance_budget on finance_budget.budget_id=finance_budget_issue.budget_id
left join finance_vote on finance_budget_issue.vote_id=finance_vote.vote_id
left join finance_vote_description on finance_vote.description=finance_vote_description.vote_description_id
left join finance_office on finance_budget_issue.office=finance_office.office_id
left join tbl_signature on finance_office.office_id=tbl_signature.office_id
The statement is working fine, but didn't outs the following fields
tbl_signature.office_head as sign_office_head,
tbl_signature.name AS name,
tbl_signature.post AS post
What may be going wrong ? I think that I used incorrect Joins. Can anyone help ?
Tables as follows :
finance_office
+----+-----------+-------------+------+
| id | office_id | office_name | head |
+----+-----------+-------------+------+
| 1 | 48 | A | SS |
| 2 | 69 | B | VV |
+----+-----------+-------------+------+
finance_vote
+---------+-----------+----------------+
| vote_id | office_id | vote |
+---------+-----------+----------------+
| 1 | 48 | 320-1-2-1-1001 |
| 2 | 48 | 320-2-2-2-2002 |
| 3 | 69 | 319-1-2-1-1001 |
| 4 | 69 | 319-1-2-2-1102 |
| 5 | 30 | 318-1-1-2-1101 |
+---------+-----------+----------------+
tbl_signature
+---------+-----------+---------+------------+-------------+
| sign_id | office_id | name | post | office_head |
+---------+-----------+---------+------------+-------------+
| 1 | 48 | Noel | Accountant | Manager |
| 2 | 69 | Jhon | Accountant | Manager |
| 3 | 30 | Micheal | Accountant | Manager |
+---------+-----------+---------+------------+-------------+
finance_budget
+-----------+--------+-------------+
| budget_id | ref_no | budget_date |
+-----------+--------+-------------+
| 1 | Acc/01 | 2020-01-20 |
| 2 | Acc/02 | 2020-01-22 |
+-----------+--------+-------------+
finance_budget_issue
+----+-----------+--------+---------------+-----------------+
| id | budget_id | amount | budget_status | transfer_status |
+----+-----------+--------+---------------+-----------------+
| 1 | 1 | 75000 | issues | Approved |
| 2 | 1 | 22000 | issues | Approved |
| 3 | 2 | 65000 | issues | Approved |
+----+-----------+--------+---------------+-----------------+
Desired Output
+--------+----------------+------+--------+------------------+------+------------+
| amount | vote_office_id | head | ref_no | sign_office_head | name | post |
+--------+----------------+------+--------+------------------+------+------------+
| 75000 | 48 | SS | Acc/01 | Manager | Noel | Accountant |
| 22000 | 48 | SS | Acc/01 | Manager | Noel | Accountant |
| 65000 | 69 | VV | Acc/02 | Manager | Jhon | Accountant |
+--------+----------------+------+--------+------------------+------+------------+
Generated Output (Incorrect)
+--------+----------------+------+--------+------------------+------+------+
| amount | vote_office_id | head | ref_no | sign_office_head | name | post |
+--------+----------------+------+--------+------------------+------+------+
| 75000 | 48 | SS | Acc/01 | | | |
| 22000 | 48 | SS | Acc/01 | | | |
| 65000 | 69 | VV | Acc/02 | | | |
+--------+----------------+------+--------+------------------+------+------+
This is easier to read:
SELECT i.budget_date
, SUM(i.amount) amount
, v.office_id vote_office_id
, o.office_head head
, o.office_name
, b.ref_no
, i.view_status
, s.office_head sign_office_head
, s.name
, s.post
, s.sign_id
FROM finance_budget_issue i
JOIN finance_budget b
ON b.budget_id = i.budget_id
LEFT
JOIN finance_vote v
ON v.vote_id = i.vote_id
LEFT
JOIN finance_vote_description d
ON d.vote_description_id = v.description
LEFT
JOIN finance_office o
ON i.office = o.office_id
LEFT
JOIN tbl_signature s
ON s.office_id = o.office_id
You have an aggregate function (and non-aggregated columns) but no GROUP BY clause; that's not going to work. You have a LEFT JOINed table from which you select no columns; that's pointless.
For further help, see Why should I provide an MCRE for what seems to me to be a very simple SQL query

Inequality in Mysql with count()

I have the following structure :
Table Author :
idAuthor,
Name
+----------+-------+
| idAuthor | Name |
+----------+-------+
| 1 | Renee |
| 2 | John |
| 3 | Bob |
| 4 | Bryan |
+----------+-------+
Table Publication:
idPublication,
Title,
Type,
Date,
Journal,
Conference
+---------------+--------------+------+-------------+------------+-----------+
| idPublication | Title | Date | Type | Conference | Journal |
+---------------+--------------+------+-------------+------------+-----------+
| 1 | Flower thing | 2008 | book | NULL | NULL |
| 2 | Bees | 2009 | article | NULL | Le Monde |
| 3 | Wasps | 2010 | inproceding | KDD | NULL |
| 4 | Whales | 2010 | inproceding | DPC | NULL |
| 5 | Lyon | 2011 | article | NULL | Le Figaro |
| 6 | Plants | 2012 | book | NULL | NULL |
| 7 | Walls | 2009 | proceeding | KDD | NULL |
| 8 | Juices | 2010 | proceeding | KDD | NULL |
| 9 | Fruits | 2010 | proceeding | DPC | NULL |
| 10 | Computers | 2010 | inproceding | DPC | NULL |
| 11 | Phones | 2010 | inproceding | DPC | NULL |
| 12 | Creams | 2010 | proceeding | DPC | NULL |
| 13 | Love | 2010 | proceeding | DPC | NULL |
+---------------+--------------+------+-------------+------------+-----------+
Table author_has_publication :
Author_idAuthor,
Publication_idPublication
+-----------------+---------------------------+
| Author_idAuthor | Publication_idPublication |
+-----------------+---------------------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 1 | 5 |
| 2 | 5 |
| 3 | 5 |
| 3 | 6 |
| 4 | 7 |
| 4 | 8 |
| 4 | 9 |
| 4 | 10 |
| 3 | 11 |
| 3 | 12 |
| 2 | 13 |
+-----------------+---------------------------+
I want to obtain the list of all authors having published at least 2 times at conference DPC in 2010.
I achieved to get the list of autors that have published something, and the number of publication for each, but I can't get my 'at least 2' factor.
My following query
SELECT author.name, COUNT(name) FROM author INNER JOIN author_has_publication ON author.idAuthor=author_has_publication.Author_idAuthor INNER JOIN publication ON author_has_publication.Publication_idPublication=publication.idPublication AND publication.date=2010 AND publication.conference='DPC'GROUP BY author.name;
returns the following result (which is good)
+-------+-------------+
| name | COUNT(name) |
+-------+-------------+
| Bob | 2 |
| Bryan | 3 |
| John | 1 |
+-------+-------------+
but when I try to select only the one with a count(name)>=2, i got an error.
I tried this query :
SELECT author.name, COUNT(name) FROM author INNER JOIN author_has_publication ON author.idAuthor=author_has_publication.Author_idAuthor INNER JOIN publication ON author_has_publication.Publication_idPublication=publication.idPublication AND publication.date=2010 AND publication.conference='DPC'GROUP BY author.name WHERE COUNT(name)>=2;
When you use aggregation funcion you can filter with a proper operator named HAVING
Having worok on the result of the query (then pn the aggrgated result like count() ) instead of where that work on the original value of the tables rows
SELECT author.name, COUNT(name)
FROM author INNER JOIN author_has_publication
ON author.idAuthor=author_has_publication.Author_idAuthor
INNER JOIN publication
ON author_has_publication.Publication_idPublication=publication.idPublication
AND publication.date=2010 AND publication.conference='DPC'
GROUP BY author.name
HAVING COUNT(name)>=2;

Select/Join multiple fields from different tables with same column name

I have one key table, a number of data tables with same column names in them, and one users table. I am trying to select values from the key table, join this output with some selected values from each of the data tables by unique ID (uaID) to the entries selected already from the key table (each key table result will have a relative join from only one of the range of data tables, not all of them and some entries will return a null result from the data tables and we don't want this to break anything or be omitted because of the null result), and then finally join some user data to each result from the users table. This will always have a result.
Let me "draw" a basic version of my tables so you can see.
keyTable
-----------------------------------------
| uaID | userID | key | appName |
|---------------------------------------|
| 1 | 7 | ABC01 | Physics |
| 2 | 9 | DEF20 | Geometry |
| 3 | 12 | XJG14 | Biology |
| 4 | 19 | DAF09 | Chemistry |
| 5 | 27 | KYT78 | Algebra |
| 6 | 29 | PLF43 | Statistics|
| 7 | 34 | COG89 | Geology |
| 8 | 45 | HYL72 | Art |
| 9 | 48 | TSK45 | History |
| 10 | 53 | BBS94 | GeoChem |
| 11 | 59 | DOD10 | BioChem |
| 12 | 27 | HKV62 | Music |
-----------------------------------------
dataTable01
-----------------------------------------------
| uaID | sector | subSector | topic |
|---------------------------------------------|
| 2 | circle | volumn | measure |
| 7 | triangle | hypotenuse |pythagoras |
| 8 | square | | |
| 11 | triangle | hypotenuse |pythagoras |
-----------------------------------------------
dataTable02
---------------------
| uaID | topic |
|-------------------|
| 1 | door |
| 3 | window |
| 9 | porch |
| 12 | |
---------------------
dataTable03
-----------------------------------------------
| uaID | sector | subSector | topic |
|---------------------------------------------|
| 4 | cat | feline | kitty |
| 5 | dog | canine | rover |
| 6 | kangaroo | marsupial | jack |
| 10 | bunny | leporidae | peter |
-----------------------------------------------
users
------------------------------------------------------------------------
| userID | Title | firstName | lastName | email |
|----------------------------------------------------------------------|
| 7 | Dr | Melissa | Smith | mel#email.com |
| 9 | Mr | Bob | Andrews | bob#email.com |
| 12 | Miss | Clare | Greco | clare#email.com |
| 19 | Mr | Dan | Fonseca | dan#email.com |
| 27 | Mr | Matt | Jones | matt#email.com |
| 29 | Mr | Chris | Nimmo | chris#email.com |
| 34 | Mrs | Lisa | Araujo | lisa#email.com |
| 45 | Miss | Raquel | Bailey | raquel#email.com |
| 48 | Dr | Steven | Dowd | steven#email.com |
| 53 | Prof | Roger | Hesp | roger#email.com |
| 59 | Prof | Sally | Bryce | sally#email.com |
| 65 | Mrs | Elena | Eraway | elena#email.com |
------------------------------------------------------------------------
And this is what I am trying to achieve as the end result:
-------------------------------------------------------------------------------------------------------------------------------
| uaID | key | appName | sector | subSector | topic | title | firstName | lastName | email |
|-----------------------------------------------------------------------------------------------------------------------------|
| 1 | ABC01 | Physics | | | door | Dr | Melissa | Smith | mel#email.com |
| 2 | DEF20 | Geometry | circle | volumn | measure | Mr | Bob | Andrews | bob#email.com |
| 3 | XJG14 | Biology | | | window | Miss | Clare | Greco | clare#email.com |
| 4 | DAF09 | Chemistry | cat | feline | kitty | Mr | Dan | Fonseca | dan#email.com |
| 5 | KYT78 | Algebra | dog | canine | rover | Mr | Matt | Jones | matt#email.com |
| 6 | PLF43 | Statistics| kangaroo | marsupial | jack | Mr | Chris | Nimmo | chris#email.com |
| 7 | COG89 | Geology | triangle | hypotenuse |pythagoras | Mrs | Lisa | Araujo | lisa#email.com |
| 8 | HYL72 | Art | square | | | Miss | Raquel | Bailey | raquel#email.com |
| 9 | TSK45 | History | | | porch | Dr | Steven | Dowd | steven#email.com |
| 10 | BBS94 | GeoChem | bunny | leporidae | peter | Prof | Roger | Hesp | roger#email.com |
| 11 | DOD10 | BioChem | triangle | hypotenuse |pythagoras | Prof | Sally | Bryce | sally#email.com |
| 12 | HKV62 | Music | | | | Mr | Matt | Jones | matt#email.com |
-------------------------------------------------------------------------------------------------------------------------------
I am attempting to achieve this by executing:
$sql = "SELECT keyTable.uaID, keyTable.userID, keyTable.key,
keyTable.appName, dataTable01.sector, dataTable01.subSector,
dataTable01.topic, dataTable02.topic, dataTable03.sector,
dataTable03.subSector, dataTable03.topic, users.title,
users.firstName, users.lastName, users.email
FROM keyTable
LEFT OUTER JOIN dataTable01 ON keyTable.uaID = dataTable01.uaID
LEFT OUTER JOIN dataTable02 ON keyTable.uaID = dataTable02.uaID
LEFT OUTER JOIN dataTable03 ON keyTable.uaID = dataTable03.uaID
LEFT OUTER JOIN users ON keyTable.userID = users.userID";
I get all the keyTable data. I get all the users data right where it's supposed to join up all ok. I get all the dataTable03 data as well, but I do not get any data from dataTable01 or dataTable02 showing up in the result. If I omit the call to dataTable03 I then get all the relevant data from dataTable02 showing up, but no data from dataTable01. The call to the users table is at the end and always shows up fine. So clearly it's an issue with the matching field names in the data tables. I get no errors at all and the process completes, just with the mentioned data missing. I've tried different JOINS - INNER JOIN, OUTER JOIN, LEFT OUTER JOIN. There obviously has to be a way to achieve this but cannot seem to find any references on the web to this specific problem. Can someone tell me what I am doing incorrectly please?
After joining, you can use COALESCE to get the non-null value from the table with a matching row.
$sql = "SELECT k.uaID, k.userID, k.key, k.appName,
COALESCE(d1.sector, d3.sector, '') AS sector,
COALESCE(d1.subSector, d3.subSector, '') AS subSector,
COALESCE(d1.topic, d2.topic, d3.topic, '') AS topic,
users.title, users.firstName, users.lastName, users.email
FROM keyTable AS k
LEFT OUTER JOIN dataTable01 AS d1 ON k.uaID = d1.uaID
LEFT OUTER JOIN dataTable02 AS d2 ON k.uaID = d2.uaID
LEFT OUTER JOIN dataTable03 AS d3 ON k.uaID = d3.uaID
LEFT OUTER JOIN users ON k.userID = users.userID
ORDER BY k.uaID";
Another way to merge the data from the datatablesNN tables into the same column os tp use UNION.
SELECT k.uaID, k.userID, k.key, k.appName, IFNULL(d.sector, '') AS sector, IFNULL(d.subSector, '') AS subSector, IFNULL(d.topic, '') AS topic,
u.title, u.firstName, u.lastName, u.email
FROM keyTable AS k
LEFT OUTER JOIN (
SELECT uaID, sector, subSector, topic
FROM dataTable01
UNION
SELECT uaID, NULL, NULL, topic
FROM datatable02
UNION
SELECT uaID, sector, subSector, topic
FROM datatable03) AS d
ON k.uaID = d.uaID
LEFT JOIN users AS u ON u.userID = k.userID
ORDER BY k.uaID
DEMO
You would have to use aliases
simular issue and solution here:
php-mysql-how-to-resolve-ambiguous-column-names-in-join-operation
select * from common inner join (
(select link from table1)
union
(select link from table2)
) as unionT
on unionT.link = common.link