Mysql join query return duplicate row - mysql

SELECT m.*
, p.image_url
, r.acceptance_status
from playermessage m
join playerprofile p
on p.player_id = m.sender_id
join requesttempstorage r
on r.requester_id = m.sender_id
where m.player_id = 48
This query is acting strange it gives me back two duplicate rows back but when I check the table playermessage there are no duplicate rows only this query would show the same message twice to a user while there is only one message can anybody spot the mistake.
| player_id | player_message | date_sent | sender_id | image_url | acceptance_status |<br>
+-----------+---------------------------------------------------------+---------<br>------------+-----------+--------------------+-------------------+<br>
| 48 | imran wants to be a part of the pakistan cricket team | 2018-05-17 18:58:08 | 50 | uploads/imran.jpg | 1 |<br>
| 48 | fakhar wants to be a part of the pakistan cricket team | 2018-05-17 19:13:27 | 51 | uploads/fakhar.jpg | 1 |<br>
| 48 | shadab wants to be a part of the pakistan cricket team | 2018-05-18 11:09:49 | 52 | uploads/shadab.jpg | 1 |<br><strong>
| 48 | asif wants to be a part of the pakistan cricket team | 2018-05-18 11:20:51 | 53 | uploads/asif.jpeg | 0 </strong>|<br>
<strong>| 48 | asif wants to be a part of the pakistan cricket team | 2018-05-18 11:20:51 | 53 | uploads/asif.jpeg | 0 |</strong><br>
+-----------+---------------------------------------------------------+---------------------+-----------+-----------------
The problem lies in the last two results(eg. ), It's returning two message when just all another message there is only one why is it doing it to last two results.

One of these two queries is going to return two rows:
SELECT p.* FROM playerprofile p WHERE p.player_id = 53
or
SELECT r.* FROM requesttempstorage r WHERE r.requester_id = 53
The JOIN operation is finding all matching rows, and returning all of them.
If we have a row in playermessage with sender_id = 53, and
if there are two rows in playerprofile with player_id = 53, then we expect two rows to be returned.
If we had three rows in playerprofile with player_id = 53, then the join operation would return three rows.
If we have zero rows in playerprofile with player_id = 53, then we won't get any rows returned from playermessage with sender_id = 53.
If we also have two rows in requesttempstorage with requester_id = 53, that will also double the number of rows returned.
All of the columns from playermessage will be duplicated on each of those rows.
That's exactly how an inner join operation is designed to operate.

Related

MySQL - Retrieve the max value of an associated column within a LEFT JOIN with a different perimeter than the WHERE clause of the main query

I'm using MySql 5.6 and have a select query with a LEFT JOIN but i need to retrieve the max of a associated column email_nb) but with a different "perimeter" of constraints.
Let's take an example: let me state that it is a mere example with only 5 rows but it should work also when I have thousands... (I'm stating this since there is a LIMIT clause in my query)
Table 'query_results'
+-----------------------------+------------+--------------+
| query_result_id | query_id | author |
+-----------------------------+------------+--------------+
| 2 | 1 | john |
| 3 | 1 | eric |
| 7 | 3 | martha |
| 9 | 4 | john |
| 10 | 1 | john |
+-----------------------------+------------+--------------+
Table 'customers_emails'
+-------------------+-----------------+--------------+-----------+-------------+------------------------
| customer_email_id | query_result_id | customer_id | author | email_nb | days_since_sending
+-------------------+-----------------+--------------+-----------+-------------+------------------------
| 5 | 2 | 12 | john | 2 | 150
| 12 | 3 | 7 | eric | 4 | 90
| 27 | 3 | 12 | eric | 2 | 86
| 40 | 9 | 15 | john | 9 | 87
| 42 | 2 | 12 | john | 7 | 23
| 51 | 10 | 12 | john | 3 | 89
+-------------------+-----------------+--------------+-----------+-------------+-----------------------
Notes:
you can have a query_result where the author appears in NO row at all in any of the customers_emails, hence the LEFT JOIN I'm using.
You can see author is by design kind of duplicated as it's both on the first table and the second table each time associated with a query_result_id. It's important to note.
email_nb is an integer between 0 and 10
there is a LIMIT clause as I need to retrieve a set number of records
Today my query aims at retrieving query_results with a certain number of conditions on The specificity is that I make sure to retrieve query_results with an author who does not appear in any customer_email_id where the days_since_sending would be less than 60 days: it means i check these days_since_sending not only within the records for this query, but across all customers_emails thanks to the subquery NOT IN (see below).
This is my current query for customer_id = 12 and query_id = 1
SELECT
qr.query_result_id,
qr.author,
FROM
query_results qr
LEFT JOIN
customers_emails ce
ON
qr.author = ce.author
WHERE
qr.query_id = 1 AND
qr.author IS NOT NULL
AND qr.author NOT IN (
SELECT recipient
FROM customers_emails
WHERE
(
customer_id = 12 AND
( days_since_sending >= 60) )
)
)
# we don't take by coincidence/bad luck 2 query results with the same author
GROUP BY
qr.author
ORDER BY
qr.query_result_id ASC
LIMIT
20
This is the expected output:
+-----------------------------+------------+--------------+
| query_result_id | author | email_nb |
+-----------------------------+------------+--------------+
| 10 | john | 7 |
| 3 | eric | 2 |
+-----------------------------+------------+--------------+
My challenge/difficulty today:
Notice on the 2nd line Eric is tied to email_nb 2 and not the max of all Eric's emails which could have been 4 if we had taken the max of email_nb across ALL messages to author=eric. but we stay within the limit of customer_id = 12 so there's only one left with email_nb = 2
Also notice that on the first line, the email_nb associated with query_result = 10 is 7, and not 3, which could have been the case as 3 is what appears in table customers_emails on the last line.
Indeed for emails to 'john' i had the choice between email_nb 2, 7 and 3 but I take highest so it's 7 (even if this email is from more than 60 days ago !! This is very important and part of what I don't know how to do: the perimeters are different: today I retrieve all the query_results where the author has NOT been sent a email for the past 60 days (see the NOT IN subquery) BUT I need to have in the column the max email_nb sent to john by customer_id=12 and query_id=1 EVEN if it was sent more than 60 days ago so these are different perimeters...Don't really know how to do this...
It means in other words I don't want to find the max (email_nb) within the same WHERE clauses such as days_since_sending >= 60 or within the same LIMIT and GROUP BY...as my current query: what I neeed is to retrieve the maximum value of email_nb for customer_id=12 AND query_id=1 and sent to john across ALL records on the customers_emails table!
If there is no associated row on customers_emails at all (it means no email have been ever sent by this customer for this query in the past) then the email_nb should be sth like NULL..
This means I do NOT want this output:
+-----------------------------+------------+--------------+
| query_result_id | author | email_nb |
+-----------------------------+------------+--------------+
| 10 | john | 3 |
| 3 | eric | 2 |
+-----------------------------+------------+--------------+
How to achieve this in MySQL 5.6 ?
Since you were confusing a bit, I came up on this.
select
max(q.query_result_id) as query_result_id,q.author,max(email_nb) as email_nb
from query_results q
left join customers_emails c on q.author=c.author
where customer_id=12 and query_id=1
group by q.author;
I think the best thing to do in a situation like this is break it down into smaller queries and then combine them together.
The first thing you want to do is this:
The specificity is that I make sure to retrieve query_results with an author who does not appear in any customer_email_id where the days_since_sending would be less than 60 days
This might look something like this:
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
This will get you the list of authors (with duplicates removed) that haven't had an email in the last 60 days that appear for the given query ID. Your next requirement is the following:
I need to have in the column the max email_nb sent to john by customer_id=12 and query_id=1 EVEN if it was sent more than 60 days ago
This query could look like this:
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
That gets you the maximum email_nb for each author/query_result combination, not taking into consideration the date at all.
The only thing left to do is reduce the set of results from the second query down to only the authors that appear in the first query. There are a few different methods for doing that. For example, you could INNER JOIN the two queries by author:
SELECT b.* FROM (
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
) b INNER JOIN (
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
) a ON a.author = b.author
You could use another NOT IN clause:
SELECT b.* FROM (
-- Query B
SELECT c.query_result_id, c.author, MAX(c.email_nb) as max_email_nb
FROM customers_emails c
LEFT JOIN query_results q ON c.author = q.author
WHERE c.customer_id = 12
AND q.query_id = 1
GROUP BY c.query_result_id, c.author
) b
WHERE b.author NOT IN (
-- Query A
SELECT DISTINCT q.author FROM query_results q
WHERE q.author NOT IN (
SELECT c.author FROM customers_emails c
WHERE c.days_since_sending < 60
)
AND q.query_id = 1
) a
There are most likely ways to improve the speed or reduce down the lines of code for this query, but if you need to do that you now have a query that works at least that you can compare the results to.

How to query for rows that link to another row with a specific ID in MySQL?

I am trying to query two different tables (CALL_HISTORY and HUB_DIRECTORY) to find all the call records that are made between a 'hub store' and a 'spoke store'. Each call has a CallID field and an entry is made with the id of the store that initiated the call and then a separate entry is made for each store that receives the call, and these all have the id of the store that receives them. So they all have the same CallID but the stores id (DID) is different for each.
The problem is that not every call is between a hub and its spoke, so I need to filter it out to find only these records.
Sample Call Data
RecordID | CallID | DID | CallDirection | StartTime
--------------------------------------------------------
1563486 | 255429 | 492 | Initiated | 1520870539
1563487 | 255429 | 849 | Received | 1520870539
1563484 | 255430 | 1098 | Initiated | 1520870562
1563485 | 255430 | 1098 | Received | 1520870562
1563482 | 255431 | 307 | Initiated | 1520870567
1563483 | 255431 | 1013 | Received | 1520870567
1563506 | 255432 | 1108 | Initiated | 1520870580
1563509 | 255432 | 1108 | Received | 1520870580
Here you see a sample of the calls, the CallID group highlighted is between a hub and its spoke and the rest are not. The hubs and spokes are linked together in the HUB_DIRECTORY like so:
HUB_DIRECTORY SAMPLE
HubStore | HubDID | SpokeStore | SpokeDID
-----------------------------------------
4 | 37 | Store0004 | 37
4 | 37 | Store0522 | 470
7 | 1083 | Store0007 | 1083
7 | 1083 | Store1000 | 714
7 | 1083 | Store1055 | 759
12 | 38 | Store0012 | 38
12 | 38 | Store1063 | 758
13 | 45 | Store0013 | 45
13 | 45 | Store0337 | 296
13 | 45 | Store1012 | 724
The HubDID and SpokeDID fields are the same as the DID in CALL_HISTORY. So I'm looking to query for calls where the initiated call DID exists in the HUB_DIRECTORY table, as either a HubDID or a SpokeDID, and its CallID also has a record with a DID that matches with the appropriate hub/spoke.
My end goal would look like this:
HUB | Spoke | Initiated | Received
-----------------------------------------------
Store.0004 | Store.0522 | 304 | 723
I believe I will need to use a UNION to get the row with the hub or spoke but I am just unable to wrap my head around how this would be done.
I think this query will give you the results you want. It works on the limited sample data you provided.
select h1.hubstore, h1.hubdid,
h1.spokestore, h1.spokedid,
count(distinct if(c2.recordid is null or c1.did!=h1.hubdid, null, c1.recordid)) as initiated,
count(distinct if(c2.did!=h1.hubdid, null, c2.recordid)) as received
from hub_directory h1
left join (select * from call_history where calldirection='Initiated') c1
on c1.did=h1.hubdid or c1.did=h1.spokedid
left join (select * from call_history where calldirection='Received') c2
on c2.callid = c1.callid and c2.did=if(c1.did=h1.hubdid, h1.spokedid, h1.hubdid)
group by h1.hubstore, h1.spokestore
Based on the new sample data at your fiddle, this query gives
hubstore spokestore initiated received
355 Store0355 0 0
355 Store0362 0 0
355 Store0655 0 0
357 Store0233 1 2
357 Store0357 0 0
360 Store0360 0 0
360 Store0868 0 0
360 Store1091 0 0
363 Store0363 0 0
363 Store1462 1 0
363 Store1507 1 0
363 Store2507 0 0
As of now What I understand is like, You want to get data like how many call Received By Hub and How many call Initiated by Hub.
select
a.hubDID, a.CallDirection, count(a.CallDirection) ,
hb.SpokeDID, ch.CallDirection, count(ch.CallDirection)
from CALL_HISTORY ch inner join
(
select RecordID, CallID, DID, CallDirection, StartTime, h.HubStore , h.hubDID
from CALL_HISTORY c inner join HUB_DIRECTORY h on (c.DID = h.HubDID)
)
as a
on ch.CallID = a.CallID
and a.RecordID <> ch.RecordID
inner join HUB_DIRECTORY hb on hb.SpokeDID = ch.DID
Try here Demo
Grouped Data Query :
select
a.hubDID, a.CallDirection, count(a.CallDirection) ,
count(hb.SpokeDID),
ch.CallDirection, count(ch.CallDirection)
from CALL_HISTORY ch inner join
(
select distinct c.RecordID, c.CallID, c.DID, c.CallDirection, c.StartTime, c.DID as hubDID
from CALL_HISTORY c inner join HUB_DIRECTORY h on (c.DID = h.HubDID)
)
as a
on ch.CallID = a.CallID
and a.RecordID <> ch.RecordID
inner join HUB_DIRECTORY hb on hb.SpokeDID = ch.DID
group by a.hubDID, a.CallDirection,
ch.CallDirection
;
Without Group Data Query:
select
a.hubDID, a.CallDirection, count(a.CallDirection) ,
hb.SpokeDID,hb.SpokeDID,
ch.CallDirection, count(ch.CallDirection)
from CALL_HISTORY ch inner join
(
select distinct c.RecordID, c.CallID, c.DID, c.CallDirection, c.StartTime, c.DID as hubDID
from CALL_HISTORY c inner join HUB_DIRECTORY h on (c.DID = h.HubDID)
)
as a
on ch.CallID = a.CallID
and a.RecordID <> ch.RecordID
inner join HUB_DIRECTORY hb on hb.SpokeDID = ch.DID
group by a.hubDID, a.CallDirection,
hb.SpokeDID,
ch.CallDirection
;

Search one table and use result to search another table

I want to make a search on one table that returns a value to be used in search on different table.
I have this code, which looks for a team code in the club table:
SELECT Team, Teamcode
FROM epl.club
WHERE Teamcode =
(SELECT Teamcode
FROM epl.club
WHERE Team='Manchester City');
Now I want to use the resulting Teamcode for a select on the matches table.
I have this code that searches the matches table and finds all the matches with a given team code but I need it to get the code from the first search above.
Select *
from epl.matches
where HomeTeam = 35
or AwayTeam = 35
and FTR like "A"
or FTR like "H";
Another thing I don't understand is that I want to make that it would just return the line only if HomeTeam=35'= and '=FTR is A or FTR is H or if AwayTeam=35 and FTR is A or FTR is H, but what the code does is that it returns all the lines even if they contain the 35 but only contain the H or A in the FTR column.
You have to use parentheses in your boolean expression:
SELECT *
FROM epl.matches
WHERE (HomeTeam = 35 or AwayTeam = 35)
AND (FTR like "A" or FTR like "H")
This is because AND has a higher operator precedence as OR.
You can combine the queries with a join:
SELECT Team, Teamcode FROM epl.club c
INNER JOIN epl.matches m ON (m.HomeTeam = c.Teamcode or m.AwayTeam = c.Teamcode)
WHERE (c.Team = 'Manchester City')
AND (m.FTR like "A" or m.FTR like "H")
Additional info:
Here is a very simple explanation how a 'INNER JOIN' can be understood - just if you don't know this already. If you have two tables:
{ Table: Club }----------------| { Table: Matches }----------|
| | | |
| Teamcode | Team | | HomeTeam | AwayTeam | FTR |
|----------+-------------------| |----------+----------+-----|
| 35 | Manchester City | | 38 | 39 | A |
| 38 | Arsenal London | | 38 | 35 | A |
| 39 | Leeds United | | 35 | 39 | H |
|----------+-------------------| | 38 | 35 | A |
| 39 | 38 | H |
|----------+----------+-----|
an INNER JOIN between the tables club and matches means that of all row combinations of the two tables only the rows are included in the result if the join condition m.HomeTeam = c.Teamcode or m.AwayTeam = c.Teamcode is met. If you restrict club.Team to 'Manchester City' you would have the following result for the join:
{ Table: Join Result }------|
| |
| HomeTeam | AwayTeam | FTR |
|----------+----------+-----|
| 38 | 35 | A |
| 35 | 39 | H |
| 38 | 35 | A |
|----------+----------+-----|
It takes some time to get used to declarative style of the join syntax but it helps you to structure your queries (opposed to multiple FROM tables and nested SELECT subqueries). Furthermore, the SQL query optimizer can handle an INNER JOIN better then nested subqueries in most cases.
First query could just be:
SELECT
Team,
Teamcode
FROM epl.club
WHERE Team='Manchester City';
Why a subquery on the same table when you can access directly the Team field?
Then you can do:
SELECT *
FROM epl.matches
WHERE HomeTeam = (SELECT Teamcode FROM epl.club WHERE Team='Manchester City')
OR AwayTeam = (SELECT Teamcode FROM epl.club WHERE Team='Manchester City')
AND FTR like "A"
OR FTR like "H";

query which creates missing rows based on anther table

I have many forms that users fill out. Each form contains a list of questions. In this first table is the form id and the id's of the questions.
form_id | question_id
1 | 1
1 | 2
1 | 3
2 | 4
2 | 5
This table has two forms one which has 3 questions and the other 2. I have a second table which has the answers that the users have given for the questions.
user_id | form_id | question_id | answer
476 | 1 | 1 | "answer1"
476 | 1 | 3 | "answer2"
693 | 1 | 1 | "answer3"
693 | 1 | 2 | "answer4"
235 | 2 | 5 | "answer5"
In this example, 2 users have filled out form 1 and 1 user has filled in form 2. But none have filled in all the questions. Is it possible to write a query which combines the two tables and will give me the answers that the user have given including the questions that they didn't answer? I'd like the results to look like this.
user_id | form_id | question_id | answer
476 | 1 | 1 | "answer1"
476 | 1 | 2 | NULL
476 | 1 | 3 | "answer2"
693 | 1 | 1 | "answer3"
693 | 1 | 2 | "answer4"
693 | 1 | 3 | NULL
235 | 2 | 4 | NULL
235 | 2 | 5 | "answer5"
The problem that I have when I use a left join like this
select * from template t
left join answers a on a.template_id = t.template_id
AND a.question_id = t.question_id
AND t.template_id = t.template_id;
is that the row that results is missing user_id.
Yes, the specified result can be returned by a query.
One way to achieve this is a join to an inline view, and an "outer join" operation to the second table.
The "trick" is getting a distinct list of user_id and form_id from the second table, using a query, for example:
SELECT user_id, form_id
FROM second_table
GROUP BY user_id, form_id
And then using that query as an inline view (wrapping it in parens, assigning a table alias, and referencing it like it was a table in an outer query.
All that's required after that is an "outer join" to the second table.
For example:
SELECT r.user_id
, q.form_id
, q.question_id
, a.answer
FROM first_table q
JOIN ( SELECT p.user_id, p.form_id
FROM second_table p
GROUP BY p.user_id, p.form_id
) r
ON r.form_id = q.form_id
LEFT
JOIN second_table a
ON a.user_id = r.user_id
AND a.form_id = r.form_id
AND a.question_id = q.question_id
ORDER
BY r.user_id
, q.form_id
, q.question_id
Note that the keyword "LEFT" specifies an outer join operation, returning all rows from the left side, along with matching rows from the right side. A typical "inner" join would exclude rows that didn't find a matching row from the table on the right side.
use
left join
something like:
select * from table1 left join table2 on table1.form_id= table2.form_id

Selecting two conditions simultaneously

Say i've got the next to tables: Doctors, and Workdays:
DocNumbers | idNum
118 | 11
119 | 12
120 | 13
121 | 14
122 | 15
Notice: a doctor can work in several different workdays.
DocNum | Workday |AmountOfHours |
118 | 1 | 8 |
118 | 3 | 9 |
120 | 1 | 6 |
121 | 3 | 5 |
122 | 4 | 7 |
I want to create a new table containing all id's of the doctors that work in day 1 and day 3 - That means that i will get a table containing only 118.
So far i've got:
SELECT distinct Doctors.doctorNumber, idNum
FROM Doctors, Workdays
WHERE Workdays.dayInWeek in (1,3)
AND Workdays.doctorNumber=Doctors.doctorNumber
But it seems like a i get irrelevant results like 120 and 121.
So 'IN' is more like a 'OR'. Can't seem to find the equivalence for 'and'?
This is easy to do if you join the Workdays table twice, once for each day you want to check:
select Doctors.DocNumbers, Doctors.idNum
from Doctors
inner join Workdays as Workdays1 on Workdays1.DocNum = Doctors.DocNumbers and Workdays1.Workday = 1
inner join Workdays as Workdays3 on Workdays3.DocNum = Doctors.DocNumbers and Workdays3.Workday = 3;
http://www.sqlfiddle.com/#!2/4c530/3
Try this with simple join
SELECT DISTINCT w.`DocNum`, d.idNum
FROM doctors d
LEFT JOIN Workdays w ON(d.`DocNumbers`=w.`DocNum`)
LEFT JOIN Workdays ww ON(d.`DocNumbers`=ww.`DocNum`)
WHERE w.`Workday` = 1 AND ww.`Workday` =3
See fiddle here
Here is another way of doing same
SELECT w.`DocNum`, d.idNum
FROM doctors d
LEFT JOIN Workdays w ON(d.`DocNumbers`=w.`DocNum`)
GROUP BY d.`DocNumbers`
HAVING GROUP_CONCAT(w.`Workday` SEPARATOR ',')= '1,3'
See fiddle here