mysql returning duplicates on JOIN

mysql returning duplicates on JOIN - mysql

i have two tables in a database.
The table clients looks like this:
----------------------------
|id | name | age | gender |
|---------------------------
|1 | CL1 | 22 | M |
|2 | CL2 | 23 | M |
|3 | CL3 | 24 | M |
|4 | CL4 | 25 | F |
|5 | CL5 | 26 | NA |
----------------------------
Now i have another table which relates to this client table , please note that the "id" in above Table is not AUTO_INCREMENT and is UNIQUE.
The second table is "images" which contain portfolio images of the clients and looks like this :
------------------------------
|id | client_id | url |
|------------------------------
|1 | 1 | img1_1.jpg |
|2 | 1 | img1_2.jpg |
|3 | 1 | img1_3.jpg |
|4 | 2 | img2_1.jpg |
|5 | 2 | img2_2.jpg |
-------------------------------
What i am basically achieving is that i want to pull all results from the client table which include name age gender etc and the first and one result from the images table which means, that if i Query it will have to show me the imag1_1.jpg from images table if i query for CL1 in clients table.
For this i am doing something like this :
SELECT DISTINCT c.* , i.* FROM clients c LEFT JOIN images i ON i.client_id = c.id
This query returns me the results but then the results are more duplicates. I ain't getting or i am either confused for WHAT THE DISTINCT stands for then if it still returns the duplicates, or may be i am missing something.
Any help regarding would be appreciated.
Best,
Ahsan

Here's one way to do it, using a correlated subquery:
SELECT c.*
, ( SELECT i.url
FROM images i
WHERE i.client_id = c.id
ORDER BY i.id
LIMIT 1
) AS url
FROM clients c
You don't really need to pull client_id from the images table, you already know it's value. If you need to return the id value from the images table, you'd need to add another correlated subquery in the select list
, ( SELECT i.id
FROM images i
WHERE i.client_id = c.id
ORDER BY i.id
LIMIT 1
) AS images_id
This approach can get expensive on large sets, but it performs reasonably for a limited number of rows returned from clients.
A more general query is of the form:
SELECT c.*
, i.*
FROM clients c
LEFT
JOIN ( SELECT m.client_id, MIN(m.id) as images_id
FROM images m
GROUP BY m.client_id
) n
LEFT
JOIN images i
ON i.id = n.images_id
The inline view aliased as n will get a single id value from the images table for each client_id, and then we can use that id value to join back to the images table, to retrieve the entire row.
Performance of this form can be better, but with large sets, materializing the inline view aliased as n can take some time. If you have a predicate on the client.id table on the outer query, then for better performance, that predicate can be repeated on m.client_id inside the inline view as well, to limit the number of rows.

Assuming that by "first" you mean the record with the minimal images.id, you are after the groupwise minimum:
SELECT * FROM images NATURAL JOIN (
SELECT client_id, MIN(id) id
FROM images
GROUP BY client_id
) t JOIN clients ON clients.id = images.client_id

SELECT DISTINCT operates on a ROW basis. It checks all values in a row against all other rows. If even one value is different, then the row is not a duplicate and the whole thing will be output. If you want to force a single FIELD to be distinct, then you should GROUP BY that field instead.
Since you're doing a left join, you'll get all records from the clients table, and ANY matching records from the images table.

Related

MySQL joining on specific result

I am trying to build a query to extract data from a form builder where I can join on a field value only where another field value is equal to a specific. My query returns no rows.
users
------------------
|id | name |
------------------
|40 | John |
|45 | Michael |
|47 | Bob |
------------------
data_table
----------------------------------------------------
|id | submission | field_type | field_value |
----------------------------------------------------
|1 | 12345 | user | 40 |
|2 | 12345 | score | 5 |
|3 | 12345 | completed | 1 |
|4 | 23456 | user | 45 |
|5 | 23456 | score | 3 |
|6 | 23456 | completed | 0 |
|7 | 45678 | user | 47 |
|8 | 45678 | score | 2 |
|9 | 45678 | completed | 1 |
----------------------------------------------------
Desired result
---------------
|Name | Score |
---------------
|John | 5 |
|Bob | 2 |
---------------
Select
u.name,
dt2.field_value as score
from
users u
left join
data_table dt on u.id=dt.field_value and dt.field_type='user'
left join
data_table dt2 on dt.submission=dt2.submission and dt2.field_type='score'
where
(dt.field_type='completed' and dt.field_value=1)
http://sqlfiddle.com/#!2/d6e72f/3

You could do something like that (if you really can't change your data structure, which looks... strange).
You'll have a subquery on data_table, with two self join on that table (as you need 3 different rows with conditions)
select u.name, s.score
from users u
join (
select dt.field_value as user_id, dt1.field_value as score
from data_table dt
join data_table dt1 on dt1.submission = dt.submission
join data_table dt2 on dt2.submission = dt1.submission
where dt.field_type='user' and
dt1.field_type = 'score' and
dt2.field_type='completed' and
dt2.field_value = 1
) s
on s.user_id = u.id
see SqlFiddle

Working fiddle http://sqlfiddle.com/#!2/d6e72f/19/0
I've seen this a lot when users get to dynamically add additional attributes to an existing structure. you must first unpivot the additional data and then you can treat it as a normal table. since mysql doesn't support an unpivot, i used the normal work around.
Select u.name, score
from users u
INNER JOIN (
Select submission,
max(case when field_Type='user' then field_value end) as user,
max(case when field_Type='score' then field_value end) as score,
max(case when field_Type='completed' then field_value end) as completed
FROM data_table
group by submission) dt
on dt.user = u.id
and dt.completed = 1
This assumes that for a given submission there can't be more than one field_Type value combination. if there is, this will only return the max value.
Basically what this does is unpivot the data into a table structure that we can then join back to.
The reason we do max or min is so that we get one row back instead of multiple rows for a given submission. Again, this simply unpivots the data and combines the rows back into one. But is based on an assumption that no field_type and field_value will be duplicated within a submission.

Let's take a look at what you are actually doing so we can see what is going wrong:
Select
u.name,
dt2.field_value as score
from
users u
Get a list of users
left join
data_table dt on u.id=dt.field_value and dt.field_type='user'
Join only rows from data_table of type 'user'
left join
data_table dt2 on dt.submission=dt2.submission and dt2.field_type='score'
Join only rows from data_table of type 'score'.
Now your result set looks something like:
User, DT (data table rows for type 'USER'), DT2 (data table rows for type 'SCORE')
where
(dt.field_type='completed' and dt.field_value=1)
Filter the results to include only users where dt.field_type (previously filtered to only include type 'user') have type 'complete'.
Basically your joins filter out all 'complete' rows in 'data_table', so your where statement finds no matches. That is just an explanation of what is happening. On to your problem.
Looking at your schema, you have a few options. As much as I am not a fan of the design, here is how I would write your query:
SELECT U.name, SCORE_DT.field_value AS score
FROM user U
JOIN data_table DT ON DT.field_value=U.id AND DT.field_type="USER"
JOIN data_table SCORE_DT ON SCORE_DT.submission=DT.submission AND SCORE_DT.field_type="SCORE"
JOIN data_table COMPLETED_DT.submission=DT.submission AND COMPLETED_DT.field_type="COMPLETED" AND COMPLETED_DT.field_value=1
Realistically, it would make your life much easier to change the table design, as this data structure requires you to build queries that perform pivot operations for every column you are interested in. For a small data set like this one it is doable, but as the number of columns in your form increases it will become incredibly tedious to work with.

Another variation that works is ...
select x.name
, d.field_value
from data_table d
join (select u.name
, d2.submission
from users u
join data_table d2
on d2.field_value = u.id
and d2.field_type = 'user'
join data_table d3
on d2.submission = d3.submission
and d3.field_type = 'completed'
and d3.field_value = '1'
) x
on x.submission = d.submission
and d.field_type = 'score'
see SqlFiddle
For your set of data, you might find this or xQbert's answer to perform differently.
I would give them both a try. Based on your data, try to get the inner most query to return the smallest data set possible. For example, if you know that only a small subset of the data_table records will have 'completed' = '1', then a 3rd nested select might not be unreasonable if it results in a smaller result for MySql to work with.

Join two tables using multiple rows in the join

I have two tables
Table: color_document
+----------+---------------------+
| color_id | document_id |
+----------+---------------------+
| 180907 | 4270851 |
| 180954 | 4270851 |
+----------+---------------------+
Table: color_group
+----------------+-----------+
| color_group_id | color_id |
+----------------+-----------+
| 3 | 180954 |
| 4 | 180907 |
| 11 | 180907 |
| 11 | 180984 |
| 12 | 180907 |
| 12 | 180954 |
+----------------+-----------+
Is it possible for a query to get a result that looks something like this using multiple color id's to join the two tables?
Result
+----------------+--------------+
| color_group_id | document_id |
+----------------+--------------+
| 12 | 4270851 |
+----------------+--------------+
Since Color Group 12 is the only group that has the exact same set of Colors that Document 4270851 has.
I've got some bad data that i'm being forced to work with so I've had to manufacture the color groups by finding each unique set of color_id's associated with document_id's. I'm trying to then create a new relationship directly between my manufactured color groups and documents.
I know I could probably do something with a GROUP_CONCAT to make a pseudo key of concatenated color ids, but I'm trying to find a solution that would also work in, say, Oracle. Am I barking up the completely wrong tree with this logic?
My ultimate goal is to be able to have a single row in a table that would represent any number of Colors that are associated with a Document to be exported to a completely different system than the one I'm working with.
Any thoughts/comments/suggestions are greatly appreciated.
Thank you in advance for looking at my question.

Do a normal join of the two tables, and count the number of rows in each pairing. Then test whether this is the same as the number of times each of the items appears in the original tables. If all are the same, then all color IDs must match.
SELECT a.color_group_id, a.document_id
FROM (
SELECT color_group_id, document_id, COUNT(*) ct
FROM color_document d
JOIN color_group g ON d.color_id = g.color_id
GROUP BY color_group_id, document_id) a
JOIN (
SELECT color_group_id, COUNT(*) ct
FROM color_group
GROUP BY color_group_id) b
ON a.color_group_id = b.color_group_id and a.ct = b.ct
JOIN (
SELECT document_id, COUNT(*) ct
FROM color_document
GROUP BY document_id) c
ON a.document_id = c.document_id and a.ct = c.ct
SQLFIDDLE

If i understand your question correct you just have to join the two tables and then group the results by color_group_id an document_id.
SQL Fiddle
select color_group_id, document_id
from
color_document cd join
color_group cg
on cd.color_id = cg.color_id
group by color_group_id, document_id
That query will give you this result set:
COLOR_GROUP_ID DOCUMENT_ID
3 4270851
4 4270851
11 4270851
12 4270851
Is that what you want?

MySQL selective GROUP BY, using the maximal value

I have the following (simplified) three tables:
user_reservations:
id | user_id |
1 | 3 |
1 | 3 |
user_kar:
id | user_id | szak_id |
1 | 3 | 1 |
2 | 3 | 2 |
szak:
id | name |
1 | A |
2 | B |
Now I would like to count the reservations of the user by the 'szak' name, but I want to have every user counted only for one szak. In this case, user_id has 2 'szak', and if I write a query something like:
SELECT sz.name, COUNT(*) FROM user_reservations r
LEFT JOIN user_kar k ON k.user_id = r.user_id
LEFT JOIN szak s ON r.szak_id = r.id
It will return two rows:
A | 2 |
B | 2 |
However I want to every reservation counted to only one szak (lets say the highest id only). I tried MAX(k.id) with HAVING, but seems uneffective.
I would like to know if there is a supported method for that in MySQL, or should I first pick all the user ID-s on the backend site first, check their maximum kar.user_id, and then count only with those, removing them from the id list, when the given szak is counted, and then build the data back together on the backend side?
Thanks for the help - I was googling around for like 2 hours, but so far, I found no solution, so maybe you could help me.

Something like this?
SELECT sz.name,
Count(*)
FROM (SELECT r.user_id,
Ifnull(Max(k.szak_id), -1) AS max_szak_id
FROM user_reservations r
LEFT OUTER JOIN user_kar k
ON k.user_id = r.user_id
GROUP BY r.user_id) t
LEFT OUTER JOIN szak sz
ON sz.id = t.max_szak_id
GROUP BY sz.name;

SQL complicated select statement

I am trying to create a SELECT statement, but I am not really sure how to accomplish it.
I have 2 tables, user and group. Each user has a userid and each group has a ownerid that specifies who owns the group. Each group also has a name and then inside the user table, there is a column group designating which group that person belongs to. (excuse the annoying structure, I did not create it). I am trying to find all rows in group where the ownerid of that group does not have group (inside the user table) set to the name of that group. If this helps:
User
|-----------------------|
| id | username | group |
|----|----------|-------|
| 0 | Steve | night |
| 1 | Sally | night |
| 2 | Susan | sun |
| 3 | David | xray |
|-----------------------|
Group
|---------------------|
| ownerid | name |
|---------|-----------|
| 1 | night |
| 3 | bravo |
| 2 | sun |
|---------------------|
Where the SQL statement would return the group row for bravo because bravo's owner does not have his group set to bravo.

This is a join back to the original table and then a comparison of the values:
select g.*
from group g join
user u
on g.ownerid = id
where g.name <> u.group;
If the values can be NULL, then the logic would need to take that into account.

An anti-join is a familiar pattern:
SELECT g.*
FROM `Group` g
LEFT
JOIN `User` u
ON u.group = g.name
AND u.id = g.ownerid
WHERE u.id IS NULL
Let's unpack that a bit. We're going to start with returning all rows from Group. Then, we're going to "match" each row in Group with a row (or rows) from User. To be considered a "match", the User.id has to match the Group.ownerid, and the User.group value has to match the Group.name.
The "trick" is to eliminate all rows where we found a match (that's what the WHERE clause does), and that leaves us with only those rows from Group that didn't have a match.
Another way to obtain an equivalent result using a NOT EXISTS predicate
SELECT g.*
FROM `Group` g
WHERE NOT EXISTS
( SELECT 1
FROM `User` u
WHERE u.group = g.name
AND u.id = g.ownerid
)
This is uses a correlated subquery; it usually doesn't perform as fast as a join.
Note that these have the potential to return a slightly different result than the query from Gordon Linoff, if you had a row with in Group that had an ownerid value that wasn't in the user table.

SELECT G.*
FROM Group AS G
WHERE G.Name NOT IN (SELECT DISTINCT U.Group FROM User AS U)

How to join three table to see all the rows

I have three tables:
(domains)
+----+-----+
| id |title|
+----+-----+
| 1 | com |
+----+-----+
| 2 | net |
+----+-----+
(slabs)
+----+-----+
| id |title|
+----+-----+
| 1 |str1 |
+----+-----+
| 2 |str2 |
+----+-----+
(prices)
+----+------+--------+
| id |slabid|domainId|
+----+------+--------+
empty
here is my query:
SELECT
prices.*,
FROM
prices
RIGHT JOIN domains ON domains.id=prices.domainId
JOIN slabs ON slabs.id=prices.slabId
how should I get query to list by domains & slabs rows...
and the result should be like this:
+----+------+--------+
| id |slabid|domainId|
+----+------+--------+
| 1 |1 |1 |
+----+------+--------+
| 2 |2 |1 |
+----+------+--------+
| 3 |1 |2 |
+----+------+--------+
| 4 |2 |2 |
+----+------+--------+
but it didn't.

An INNER JOIN will suffice your needs. If one of the fields on table prices is nullable and you want to display all records on the table whether it has a matching row on the other table or not, use LEFT JOIN.
The INNER JOIN will only display the record if it has atleast one match on each table. While LEFT JOIN, on the other hand, will display all row on the left hand side whether it has a match or nonmatch on the other tables.
SELECT a.*,
b.title domainTitle,
c.title slabsTitle
FROM prices a
INNER JOIN domain b
ON a.domainID = b.id
INNER JOIN slab c
ON a.slabID = c.ID
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
UPDATE 1
You need to use CROSS JOIN for this because you need the product of the two tables, assuming that you want to insert it on table prices and ID is Identity or AUTO_INCREMENTed column,
INSERT INTO prices (slabid, domainid)
SELECT b.ID as slabID, a.ID as domanID
FROM domains a
CROSS JOIN slabs b
If the column ID on table prices is not auto_incremented and you are using mysql, use a variable to hold an increment the value,
SELECT #ID:=#ID+1 ID,
b.ID as slabID,
a.ID as domanID
FROM domains a
CROSS JOIN slabs b
CROSS JOIN (SELECT #ID:=0) s

Just cross join DOMAINS with SLABS table. Not sure why PRICES table plays a role here, since you just want to permutate the rows in DOMAINS table with the rows in SLABS table.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

mysql returning duplicates on JOIN - mysql

Assuming that by "first" you mean the record with the minimal images.id, you are after the groupwise minimum: SELECT * FROM images NATURAL JOIN ( SELECT client_id, MIN(id) id FROM images GROUP BY client_id ) t JOIN clients ON clients.id = images.client_id

Related

MySQL joining on specific result

Join two tables using multiple rows in the join

MySQL selective GROUP BY, using the maximal value

SQL complicated select statement

How to join three table to see all the rows

Categories

Resources