MySQL joining on specific result - mysql

I am trying to build a query to extract data from a form builder where I can join on a field value only where another field value is equal to a specific. My query returns no rows.
users
------------------
|id | name |
------------------
|40 | John |
|45 | Michael |
|47 | Bob |
------------------
data_table
----------------------------------------------------
|id | submission | field_type | field_value |
----------------------------------------------------
|1 | 12345 | user | 40 |
|2 | 12345 | score | 5 |
|3 | 12345 | completed | 1 |
|4 | 23456 | user | 45 |
|5 | 23456 | score | 3 |
|6 | 23456 | completed | 0 |
|7 | 45678 | user | 47 |
|8 | 45678 | score | 2 |
|9 | 45678 | completed | 1 |
----------------------------------------------------
Desired result
---------------
|Name | Score |
---------------
|John | 5 |
|Bob | 2 |
---------------
Select
u.name,
dt2.field_value as score
from
users u
left join
data_table dt on u.id=dt.field_value and dt.field_type='user'
left join
data_table dt2 on dt.submission=dt2.submission and dt2.field_type='score'
where
(dt.field_type='completed' and dt.field_value=1)
http://sqlfiddle.com/#!2/d6e72f/3

You could do something like that (if you really can't change your data structure, which looks... strange).
You'll have a subquery on data_table, with two self join on that table (as you need 3 different rows with conditions)
select u.name, s.score
from users u
join (
select dt.field_value as user_id, dt1.field_value as score
from data_table dt
join data_table dt1 on dt1.submission = dt.submission
join data_table dt2 on dt2.submission = dt1.submission
where dt.field_type='user' and
dt1.field_type = 'score' and
dt2.field_type='completed' and
dt2.field_value = 1
) s
on s.user_id = u.id
see SqlFiddle

Working fiddle http://sqlfiddle.com/#!2/d6e72f/19/0
I've seen this a lot when users get to dynamically add additional attributes to an existing structure. you must first unpivot the additional data and then you can treat it as a normal table. since mysql doesn't support an unpivot, i used the normal work around.
Select u.name, score
from users u
INNER JOIN (
Select submission,
max(case when field_Type='user' then field_value end) as user,
max(case when field_Type='score' then field_value end) as score,
max(case when field_Type='completed' then field_value end) as completed
FROM data_table
group by submission) dt
on dt.user = u.id
and dt.completed = 1
This assumes that for a given submission there can't be more than one field_Type value combination. if there is, this will only return the max value.
Basically what this does is unpivot the data into a table structure that we can then join back to.
The reason we do max or min is so that we get one row back instead of multiple rows for a given submission. Again, this simply unpivots the data and combines the rows back into one. But is based on an assumption that no field_type and field_value will be duplicated within a submission.

Let's take a look at what you are actually doing so we can see what is going wrong:
Select
u.name,
dt2.field_value as score
from
users u
Get a list of users
left join
data_table dt on u.id=dt.field_value and dt.field_type='user'
Join only rows from data_table of type 'user'
left join
data_table dt2 on dt.submission=dt2.submission and dt2.field_type='score'
Join only rows from data_table of type 'score'.
Now your result set looks something like:
User, DT (data table rows for type 'USER'), DT2 (data table rows for type 'SCORE')
where
(dt.field_type='completed' and dt.field_value=1)
Filter the results to include only users where dt.field_type (previously filtered to only include type 'user') have type 'complete'.
Basically your joins filter out all 'complete' rows in 'data_table', so your where statement finds no matches. That is just an explanation of what is happening. On to your problem.
Looking at your schema, you have a few options. As much as I am not a fan of the design, here is how I would write your query:
SELECT U.name, SCORE_DT.field_value AS score
FROM user U
JOIN data_table DT ON DT.field_value=U.id AND DT.field_type="USER"
JOIN data_table SCORE_DT ON SCORE_DT.submission=DT.submission AND SCORE_DT.field_type="SCORE"
JOIN data_table COMPLETED_DT.submission=DT.submission AND COMPLETED_DT.field_type="COMPLETED" AND COMPLETED_DT.field_value=1
Realistically, it would make your life much easier to change the table design, as this data structure requires you to build queries that perform pivot operations for every column you are interested in. For a small data set like this one it is doable, but as the number of columns in your form increases it will become incredibly tedious to work with.

Another variation that works is ...
select x.name
, d.field_value
from data_table d
join (select u.name
, d2.submission
from users u
join data_table d2
on d2.field_value = u.id
and d2.field_type = 'user'
join data_table d3
on d2.submission = d3.submission
and d3.field_type = 'completed'
and d3.field_value = '1'
) x
on x.submission = d.submission
and d.field_type = 'score'
see SqlFiddle
For your set of data, you might find this or xQbert's answer to perform differently.
I would give them both a try. Based on your data, try to get the inner most query to return the smallest data set possible. For example, if you know that only a small subset of the data_table records will have 'completed' = '1', then a 3rd nested select might not be unreasonable if it results in a smaller result for MySql to work with.

Related

Merge based on "group by" groups

So I have a table called the Activities table that contains a schema of user_id, activity
There is a row for each user, activity combo.
Here is a what it might look like (empty rows added to make things easier to look at, please ignore):
| user_id | activity |
|---------|-----------|
| 1 | swimming | -- We want to match this
| 1 | running | -- person's activities
| | |
| 2 | swimming |
| 2 | running |
| 2 | rowing |
| | |
| 3 | swimming |
| | |
| 4 | skydiving |
| 4 | running |
| 4 | swimming |
I would like to basically find all other users with at least the same activities as a given input id so that I could recommend users with similar activities.
so in the table above, if I wanna find recommended users for user_id=1, the query would return user_id=2 and user_id=4 because they engage in both swimming, running (and more), but not user_id=3 because they only engage in swimming
So a result with a single column of:
| user_id |
|---------|
| 2 |
| 4 |
is what I would ideally be looking for
As far as what I've tried, I am kinda stuck at how to get a solid set of user_id=1's activities to match against. Basically I'm looking for something along the lines of:
SELECT user_id from Activities
GROUP BY user_id
HAVING input_user_activities in user_x_activities
where user1_activities is just a set of our input user's activities. I can create that set using a WITH input_user_activities AS (...) in the beginning, what I'm stuck at is the user_x_activities part
Any thoughts?
To get users with the same activities, you can use a self join. Let me assume that the rows are unique:
select a.user_id
from activities a1 join
activities a
on a1.activity = a.activity and
a1.user_id = #user_id
group by a.user_id
having count(*) = (select count(*) from activities a1 where a1.user_id = #user_id);
The having clause answers your question -- of getting users that have the same activities as a given user.
You can easily get all users ordered by similarity using a JOIN (that finds all common rows) and a GROUP BY (to summarize the similarity per user_id) and finally an ORDER BY to return the most similar users first.
SELECT b.user_id, COUNT(*) similarity
FROM activities a
JOIN activities b
ON a.activity = b.activity
WHERE a.user_id = 1 AND b.user_id != 1
GROUP BY b.user_id
ORDER BY COUNT(*) DESC
An SQLfiddle to test with.

Get data from one table and count matching records from another

I'm not sure if this is possible. I have one table members and a second table transactions.
I need to get the name of the member from the members table, but also count the number of transactions that member has made from another table. Is this even possible in a JOIN statement, or do I need to write two statements?
SELECT
m.first_name,
m.last_name,
COUNT(t.giver_id),
COUNT(t.getter_id)
FROM
members AS m
JOIN
transactions AS t
ON
m.id = t.giver_id
WHERE
m.id = $i
I should add that it's possible a member has not made any transactions and would therefore not appear in the transactions table.
When I run this code, it returns all NULL columns. When I add the EXPLAIN statement, MySql says "Impossible WHERE noticed after reading const table..."
Is this possible? If so, then what am I doing wrong? Thanks in advance.
EDIT:
Sample data structure and expected output:
members
id | first_name | last_name
_______________________________
1 | Bill | Smith
2 | Joe | Jones
transactions table
id | giver_id | getter_id | status
________________________________________
1 | 1 | 2 | complete
2 | 1 | 2 | complete
So running my query should return:
1 | Bill | Smith | 2 | 0
2 | Joe | Jones | 0 | 2
Simple LEFT JOIN should suffice:
SELECT
m.first_name,
m.last_name,
SUM(CASE WHEN m.id = t.giver_id THEN 1 END) AS giver_count,
SUM(CASE WHEN m.id = t.getter_id THEN 1 END) AS getter_count
FROM members AS m
LEFT JOIN transactions AS t ON m.id = t.giver_id OR m.id = t.getter_id
GROUP BY m.first_name, m.last_name
Do not forget adding GROUP BY when using aggregate functions. Just because MySQL allows the query to go through without it, it doesn't mean it is advised. MySQL will pick up random row values for unaggregated columns which can be problematic. Avoid this anti-pattern.

SQL complicated select statement

I am trying to create a SELECT statement, but I am not really sure how to accomplish it.
I have 2 tables, user and group. Each user has a userid and each group has a ownerid that specifies who owns the group. Each group also has a name and then inside the user table, there is a column group designating which group that person belongs to. (excuse the annoying structure, I did not create it). I am trying to find all rows in group where the ownerid of that group does not have group (inside the user table) set to the name of that group. If this helps:
User
|-----------------------|
| id | username | group |
|----|----------|-------|
| 0 | Steve | night |
| 1 | Sally | night |
| 2 | Susan | sun |
| 3 | David | xray |
|-----------------------|
Group
|---------------------|
| ownerid | name |
|---------|-----------|
| 1 | night |
| 3 | bravo |
| 2 | sun |
|---------------------|
Where the SQL statement would return the group row for bravo because bravo's owner does not have his group set to bravo.
This is a join back to the original table and then a comparison of the values:
select g.*
from group g join
user u
on g.ownerid = id
where g.name <> u.group;
If the values can be NULL, then the logic would need to take that into account.
An anti-join is a familiar pattern:
SELECT g.*
FROM `Group` g
LEFT
JOIN `User` u
ON u.group = g.name
AND u.id = g.ownerid
WHERE u.id IS NULL
Let's unpack that a bit. We're going to start with returning all rows from Group. Then, we're going to "match" each row in Group with a row (or rows) from User. To be considered a "match", the User.id has to match the Group.ownerid, and the User.group value has to match the Group.name.
The "trick" is to eliminate all rows where we found a match (that's what the WHERE clause does), and that leaves us with only those rows from Group that didn't have a match.
Another way to obtain an equivalent result using a NOT EXISTS predicate
SELECT g.*
FROM `Group` g
WHERE NOT EXISTS
( SELECT 1
FROM `User` u
WHERE u.group = g.name
AND u.id = g.ownerid
)
This is uses a correlated subquery; it usually doesn't perform as fast as a join.
Note that these have the potential to return a slightly different result than the query from Gordon Linoff, if you had a row with in Group that had an ownerid value that wasn't in the user table.
SELECT G.*
FROM Group AS G
WHERE G.Name NOT IN (SELECT DISTINCT U.Group FROM User AS U)

mysql returning duplicates on JOIN

i have two tables in a database.
The table clients looks like this:
----------------------------
|id | name | age | gender |
|---------------------------
|1 | CL1 | 22 | M |
|2 | CL2 | 23 | M |
|3 | CL3 | 24 | M |
|4 | CL4 | 25 | F |
|5 | CL5 | 26 | NA |
----------------------------
Now i have another table which relates to this client table , please note that the "id" in above Table is not AUTO_INCREMENT and is UNIQUE.
The second table is "images" which contain portfolio images of the clients and looks like this :
------------------------------
|id | client_id | url |
|------------------------------
|1 | 1 | img1_1.jpg |
|2 | 1 | img1_2.jpg |
|3 | 1 | img1_3.jpg |
|4 | 2 | img2_1.jpg |
|5 | 2 | img2_2.jpg |
-------------------------------
What i am basically achieving is that i want to pull all results from the client table which include name age gender etc and the first and one result from the images table which means, that if i Query it will have to show me the imag1_1.jpg from images table if i query for CL1 in clients table.
For this i am doing something like this :
SELECT DISTINCT c.* , i.* FROM clients c LEFT JOIN images i ON i.client_id = c.id
This query returns me the results but then the results are more duplicates. I ain't getting or i am either confused for WHAT THE DISTINCT stands for then if it still returns the duplicates, or may be i am missing something.
Any help regarding would be appreciated.
Best,
Ahsan
Here's one way to do it, using a correlated subquery:
SELECT c.*
, ( SELECT i.url
FROM images i
WHERE i.client_id = c.id
ORDER BY i.id
LIMIT 1
) AS url
FROM clients c
You don't really need to pull client_id from the images table, you already know it's value. If you need to return the id value from the images table, you'd need to add another correlated subquery in the select list
, ( SELECT i.id
FROM images i
WHERE i.client_id = c.id
ORDER BY i.id
LIMIT 1
) AS images_id
This approach can get expensive on large sets, but it performs reasonably for a limited number of rows returned from clients.
A more general query is of the form:
SELECT c.*
, i.*
FROM clients c
LEFT
JOIN ( SELECT m.client_id, MIN(m.id) as images_id
FROM images m
GROUP BY m.client_id
) n
LEFT
JOIN images i
ON i.id = n.images_id
The inline view aliased as n will get a single id value from the images table for each client_id, and then we can use that id value to join back to the images table, to retrieve the entire row.
Performance of this form can be better, but with large sets, materializing the inline view aliased as n can take some time. If you have a predicate on the client.id table on the outer query, then for better performance, that predicate can be repeated on m.client_id inside the inline view as well, to limit the number of rows.
Assuming that by "first" you mean the record with the minimal images.id, you are after the groupwise minimum:
SELECT * FROM images NATURAL JOIN (
SELECT client_id, MIN(id) id
FROM images
GROUP BY client_id
) t JOIN clients ON clients.id = images.client_id
SELECT DISTINCT operates on a ROW basis. It checks all values in a row against all other rows. If even one value is different, then the row is not a duplicate and the whole thing will be output. If you want to force a single FIELD to be distinct, then you should GROUP BY that field instead.
Since you're doing a left join, you'll get all records from the clients table, and ANY matching records from the images table.

How to merge column data using the last updated value in MySQL?

Somewhat confusing so its easier if I put down example and expected output to begin.
I have a table that could look like this: (Unit1 - Unit2 columns could span up to 30 columns in the same general format)
| ID | Name | Unit1_left | Unit2_left |
| 1 | Tom | 50 | NULL |
| 2 | Tom | NULL | 1 |
| 3 | Tom | 45 | NULL |
| 4 | Dan | NULL | NULL |
What I am trying to select is a table like this:
| Name | Unit1_left | Unit2_left |
| Tom | 45 | 1 |
| Dan | NULL | NULL |
What that is doing is grouping by name and attempting to find the last values in the 2 other columns if they exist (if not then it returns NULL).
I have looked at various other questions and they all say to use Max() however this will not work since it selects the highest value (incorrect). I have seen that in MsSQL there is a Last() function which looks vaguely like what I want it to do but its not implemented in MySQL and isn't exactly what I need anyway.
What I am trying to ask is, does anyone know of a possible method of selecting the data like this or if I will have to use a separate programming language to do this?
This will produce the result set you've described
SELECT dname.name,
l1value.unit1_left,
l2value.unit2_left
FROM (SELECT DISTINCT `name`
FROM table1) `DName`
LEFT JOIN (SELECT `name`,
Max(id) id
FROM table1
WHERE unit1_left IS NOT NULL
GROUP BY `name`) l1
ON dname.`name` = l1.`name`
LEFT JOIN table1 l1value
ON l1.id = l1value.id
LEFT JOIN (SELECT `name`,
Max(id) id
FROM table1
WHERE unit2_left IS NOT NULL
GROUP BY `name`) l2
ON dname.`name` = l2.`name`
LEFT JOIN table1 l2value
ON l2.id = l2value.id ;
DEMO
I did it by creating 2 inline views to the highest id for non-null values for both unit1_left and unit2_left (l1 and l2). Then joined it back to original table to get the values (l1value and l2value). We then join that back to a third inline view (dname) that creates the distinct names.
It's quite messy and it might make more sense just to keep your data in a more sensible manner.
You can use subqueries in you select statement. Using SqlFidlle I came up with this.
select o.name,
(select o2.Unit1_left
from original as o2
where o.name = o2.name
and o2.Unit1_left is not null
order by o2.id desc
LIMIT 1) as Unit1_left,
(select o3.Unit2_left
from original as o3
where o.name = o3.name
and o3.Unit2_left is not null
order by o3.id desc
LIMIT 1) as Unit2_left
from original as o
group by o.name
order by id;