MySQL intersection - mysql

I've an existing site, whose DB is not designed correctly and contains lot of records, so we cant change DB structure.
Database for current issue mainly contains 4 tables, users, questions, options and answers. There is standard set of questions and options but for each user, there is one row in answers table for each set of question and options. DB structure and example data is available at SQL fiddle.
Now as a new requirement of advanced search, I need to find users by applying multiple search filters. Example input and expected output is given in comments on SQL Fiddle.
I tried to apply all type of joins, intersection but it always fail somehow. Can someone please help me to write correct query, preferably light weight/optimized joins as DB contain lot of records (10000+ users, 100+ questions, 500+ options and 500000+ records in answers table)?
EDIT: Based on two answers, I used following query
SELECT u.id, u.first_name, u.last_name
FROM users u
JOIN answers a ON a.user_id = u.id
WHERE (a.question_id = 1 AND a.option_id IN (3, 5))
OR (a.question_id = 2 AND a.option_id IN (8))
GROUP BY u.id, u.first_name, u.last_name
HAVING
SUM(CASE WHEN (a.question_id = 1 AND a.option_id IN (3, 5)) THEN 1 ELSE 0 END) >=1
AND SUM(CASE WHEN (a.question_id = 2 AND a.option_id IN (8)) THEN 1 ELSE 0 END) >= 1;
Please note: On real database, columns user_id, question_id and option_id of answers table are indexed.
Running query given on SQL Fiddle.
SQL Fiddle for dnoeth's answer.
SQL Foddle for calcinai's answer.

Add all you n filters into the WHERE using OR and repeat them in a HAVING(SUM(CASE)) using AND:
SELECT u.id, u.first_name, u.last_name
FROM users u JOIN answers a
ON a.user_id = u.id
JOIN questions q
ON a.question_id = q.id
JOIN question_options o
ON a.option_id = o.id
WHERE (q.question = 'Language known' AND o.OPTION IN ('French','Russian'))
OR (q.question = 'height' AND o.OPTION = '1.51 - 1.7')
GROUP BY u.id, u.first_name, u.last_name
HAVING
SUM(CASE WHEN (q.question = 'Language known' AND o.OPTION IN ('French','Russian')) THEN 1 ELSE 0 END) >=1
AND
SUM(CASE WHEN (q.question = 'height' AND o.OPTION = '1.51 - 1.7') THEN 1 ELSE 0 END) >= 1
;
I changed your joins into the more readable Standard SQL syntax.

This will require a bit of fiddling for a dynamic filter, but what you really want to do is search by the IDs, as it'll mean less joins and a faster query.
This produces the results you'd expect. I assume that the search filters are generated based off options in the database, so instead of passing the actual value back in to the query, pass the ID instead.
The multiple inner joins are to support multiple AND criteria and auto-reduce your result set.
SELECT * FROM users u
INNER JOIN answers a ON a.user_id = u.id
AND (a.question_id, a.option_id) IN ((1,3),(1,5)) # q 1: Lang, answer 3/5: En/Ru
INNER JOIN answers a2 ON a2.user_id = u.id
AND (a2.question_id, a2.option_id) = (2,8) # q 2: Height, answer 8: 1.71...
GROUP BY u.id;
I'd suggest making sure there's an index on (user_id, question_id, option_id) for searching:
ALTER TABLE `answers` ADD INDEX idx_search(`user_id`, `question_id`, `option_id`);
Otherwise it should be using primary keys for the joins (if properly defined) so it will be fast.

Related

How to make a query that filter out two AND conditions on same column

Write a query that returns all pages that have been visited by at least one child (demo='child') and ALSO has been visited by at least one person aged 18-25 (demo='18-25'). Your query should return a set of urls.
I am not sure how to write a query that filters out results based on Two AND statements on the same column and it returns out an empty set.
These are the two tables:
User
uid
ip
demo
A
001
child
B
002
18-25
Visit
url
dt
uid
src
rev
A01
1890-05-14
A
A02
10
A01
002
B
A03
15
Select distinct V.url
from Visit V, [User] Z, [User] F
WHERE V.uid = Z.uid AND V.uid = F.uid
AND Z.demo = 'child' AND F.demo = '18-25'
This above code returns me an empty set.
I want it to return A01 as the url
First, you don't need to use User table twice in the select expression.
I think it can be solved by using nested queries or as called subquery.
In explanation: first you will query the items match the first condition (demo='child') then you will search on the results for the items that also match the second condition (demo='18-25').
Your code will be like this:
Select distinct V.url
from Visit V, [User] Z
WHERE V.uid = Z.uid AND Z.demo = 'child'
AND V.url IN (Select distinct V1.url
from Visit V1, [User] Z1
WHERE V1.uid = Z1.uid AND Z1.demo = '18-25')
One way is to join the users, GROUP BY the URL, sum the occurrences of children and 18 to 25 year olds and check that these sums each exceed 0 in a HAVING clause.
SELECT v.url
FROM Visit v
INNER JOIN User u
ON v.uid = u.uid
GROUP BY v.url
HAVING sum(CASE
WHEN u.demo = 'child' THEN
1
ELSE
0
END) > 0
AND sum(CASE
WHEN u.demo = '18-25' THEN
1
ELSE
0
END) > 0;
(Note: In MySQL you don't need the CASE expressions but could directly use the Boolean = expressions. But a CASE doesn't harm there either and with a CASE it'll also work in other DBMS. And since it's not entirely clear which DBMS you use a CASE expression is a safer bet.)
Another approach is to use a conjunction of EXISTS' and correlated subqueries that join the users to the visits and picks the record with the URL and demo in question. It would help if you already had a table with only the URLs. I'll simulate that with a derived table aliased x.
SELECT x.url
FROM (SELECT DISTINCT
v.url
FROM Visit v) x
WHERE EXISTS (SELECT *
FROM Visit v
INNER JOIN User u
ON u.uid = v.uid
WHERE v.url = x.url
AND u.demo = 'child')
AND EXISTS (SELECT *
FROM Visit v
INNER JOIN User u
ON u.uid = v.uid
WHERE v.url = x.url
AND u.demo = '18-25');
It looks like you can JOIN on the User table twice: once to find a user that is "18-25", and another to find a user that is a "child". If you find both of those (as determined by the WHERE clause) then the Visit record will be included in the results:
SELECT
DISTINCT V.url
FROM
Visit V
LEFT JOIN User U ON (V.uid = U.uid AND U.demo = '18-25')
LEFT JOIN User U2 on (V.uid = U2.uid AND U2.demo = 'child')
WHERE
U.uid IS NOT NULL AND U2.uid IS NOT NULL
You don't have to join User table twice and please use the more recommended JOIN operator over comma-join.
A comment from #stickybit made me realize that I misunderstand the question. Therefore I'll update my answer to something workable for the question requirement. I'll retain most of my original answer with little modifications just to make sure that it can return the desired result for the current set of OP data. Here's the query:
SELECT * FROM
(SELECT url,
GROUP_CONCAT(demo) dd
FROM Visit V
JOIN User U
ON V.uid = U.uid
GROUP BY url) A
WHERE dd LIKE '%child%18-25%' OR dd LIKE '%18-25%child%';
I know this is not the best solution but I've seen others have posted their version of query based on the same understanding. So, this is just another variant.
Check the updated demo fiddle

Order by inside the LEFT JOIN

I am trying to write a query. I got it work half way, but I am having problems with the LEFT JOIN.
I have three tables:
user
user_preferences
user_subscription_plan
User will always have one user_preference, but it can have many or no entries in the user_subscription_plan
If the user has no entry in the user_subscription_plan, or if he has only one then my sql works. If I have more then one, then I have issue. In the case of two entries, how can I make it to return the last one entered? I tried playing with ORDER statement, but it does not work as expected. Somehow I get empty rows.
Here is my query:
SELECT u.id AS GYM_USER_ID, subscription_plan.id AS subscriptionId, up.onboarding_completed AS CompletedOnboarding,
(CASE
WHEN ((up.onboarding_completed = 1)
AND (ISNULL(subscription_plan.id)))
THEN 'freemiun'
WHEN (ISNULL(up.onboarding_completed)
AND (ISNULL(subscription_plan.id)))
THEN 'not_paying'
END) AS subscription_status
FROM user AS u
INNER JOIN user_preferences up ON up.user_id = u.id
LEFT JOIN (
SELECT * FROM user_subscription_plan AS usp ORDER BY usp.id DESC LIMIT 1
) AS subscription_plan ON subscription_plan.user_id = u.id
GROUP BY u.id;
If I run it as it is, then subscription_plan.id AS subscriptionId is always empty.
If I remove the LIMIT clause, then its not empty, but I am still getting the first entry, which is wrong in my case
I have more CASE's to cover, but I can't process until I solve this problem.
Please try to use "max(usp.id)" that "group by subscription_plan.user_id" instead of limit 1.
If you limit 1 in the subquery, the subquery's result will always return only 1 record (if the table has data).
So the above query can be rewritten like this.
Sorry, I didn't test, because I don't have data, but please try, hope this can help.
SELECT
u.id AS GYM_USER_ID,
subscription_plan.id AS subscriptionId,
up.onboarding_completed AS CompletedOnboarding,
(CASE
WHEN
((up.onboarding_completed = 1)
AND (ISNULL(subscription_plan.id)))
THEN
'freemiun'
WHEN
(ISNULL(up.onboarding_completed)
AND (ISNULL(subscription_plan.id)))
THEN
'not_paying'
END) AS subscription_status
FROM
user AS u
INNER JOIN
user_preferences up ON up.user_id = u.id
LEFT JOIN
(SELECT
usp.user_id, MAX(usp.id)AS id
FROM
user_subscription_plan AS usp
GROUP BY usp.user_id) AS subscription_plan ON subscription_plan.user_id = u.id;

how to change this mysql query to a efficient one

my table user contains these fields
id,company_id,created_by,name,image
table valet contains
id,vid,dept_id
table cart contains
id,dept_id,map_id,purchase,time
to get the details i have written this mysql query
SELECT c.id, a.id, c.purchace, c.time
FROM user a
LEFT JOIN valet b ON a.vid = b.id
AND a.is_deleted = 0
LEFT JOIN cart c ON b.dept_id = c.dept_id
WHERE a.company_id = 18
AND a.created_by = 102
AND a.is_deleted = 0
AND c.time
IN ( SELECT MAX( time ) FROM cart WHERE dept_id = b.dept_id )
from these three table i want to select last updated raw from cart along with id from user table which is mapped in valet table
this query works fine but it takes almost 15 sec to retrieve the details .
is there any way to improve this query or may be i am doing some wrong.
any help would be appreciated
For one thing, I can see that you’re running the subquery for each row. Depending on what the optimiser does, that may have an impact. max is a pretty expensive operation (there’s nothing for it but to read every row).
If you plan to update and use this query repeatedly, perhaps you should at least index the table on cart.time. This will make it much easier to find the maximum value.
MySQL has the concept of user variables, so you can set a variable to the result of the subquery, and that might help:
SELECT c.id, a.id, c.purchace, c.time
FROM
user a
LEFT JOIN valet b ON a.vid = b.id AND a.is_deleted = '0'
LEFT JOIN cart c ON b.dept_id = c.dept_id
LEFT JOIN (SELECT dept_id,max(time) as mx FROM cart GROUP BY dept_id) m on m.dept_id=c.dept_id
WHERE
a.company_id = '18'
AND a.created_by = '102'
AND a.is_deleted = '0'
AND c.time=m.mx;
Note also:
since you’re only testing a single value (max) for c.time, you should be using = not in.
I’m not sure about is why you are using strings instead of integers. I shold have though that leaving off the quotes makes more sense.
Your JOIN includes AND a.is_deleted = '0', though you make no mention of it in your table description. In any case, why is it in the JOIN and not in the WHERE clause?

SQL multiple test between 3 tables

I've been struggling with a query that I have to do. I have basic knowledge with SQL.
I have 3 tables, which are users, companies and orders. Here is a sample for each one with the columns that interest us :
Users table is just a table with id, firstname, lastname.
Companies table linked with user_id
Orders table linked with user_id
I need to display two columns, one that will be all the different user_id, and the second one, a "current step" that doesn't exist yet, that will be determined like this :
IF (user has an order with status field at ok - orders table)
current = 8
else if (user has an order with status field empty - orders table)
current = 7
else if (user is linked to a company with current-step field at 0-5 - companies table)
current = 1-6
else if (user has a test company in name field - companies table)
current = 0
The thing is, a user can be related to multiple companies, we have to take the highest current possible, and can be related to multiple orders, and take the one with status = ok if he has one and if not, current will be 7 since he has an order anyway.
I could make the first test work, but then when I have to test between the three tables, I don't know how to put all the tests together, my query ends up in a mess. If you could help me going the right direction, that would be kind.
EDIT:
I forgot something, we're working with PHP on CakePHP framework. Would it be possible to divide the query, and to make the test directly in PHP ?
SQL is not a procedural language. So it does not support if-else loops.
That being said , you would require an API like JDBC or embedded sql to perform such queries. Look up SQL API's and embedded sql to know what they are.
If this is some college assignment, then I don't know how your teacher expects you to do it. But if you are learning about sql and databases out of self interests, Then you have to look into the topics I mentioned above!
You will also need to look into SQL Joins.
Without test data, I can't test it, but try this:
select UserID,
case when oc.Current > cc.Current then oc.Current else cc.Current end as Current
from(
select userID,
IfNull( Max( case when o.Status = 'OK' then 8 else 7 end ), -1 ) Current
from Users u
left join Orders o
on o.UserID = u.UserID
group by u.UserID
) oc
join(
select userID,
IfNull( Max( case when c.CurrentStep between 0 and 5 then c.CurrentStep + 1
when c.Name = 'Test' then 0 else -1 end ), -1 ) as Current
from Users u
left join Companies c
on c.UserID = u.UserID
group by u.UserID
) cc;
This would be a lot neater with CTEs, alas. Also, you don't state the value of Current for those uses who meet none of the requirements, so I used -1.
The first inline view returns an 8 if a linked company has status=ok, 7 if no status=ok and -1 if not linked to any company. The second inline view returns -1 through 6 as appropriate or -1 if not linked with any orders.
Sorry for answering that late.
I proposed your solutions to my team, but a friend of mine succeeded by doing this query :
select u.* from (
SELECT u.*,
c.name,
c.canton,
c.country as 'company_country',
c.id as 'company_id',
c.user_id,
MAX(case
when o.status = 'ok' then 8
when o.status = '' then 7
when c.current_step = 5 then 6
when c.current_step = 4 then 5
when c.current_step = 3 then 4
when c.current_step = 2 then 3
when c.current_step = 1 then 2
when c.current_step = 0 then 1
else
(case
when c.name LIKE '%test%' then 0
else 'Empty'
end )
end) as current
FROM ezycount_users u
LEFT OUTER JOIN ezycount_companies c ON c.user_id = u.id
LEFT OUTER JOIN ezycount_orders o ON o.user_id = u.id
group by u.id ) as u
And by using a function in CakePHP to group by again, and it worked for us !
Thanks all of you !

SQL query in WHERE condition

Is it good if i write query like this:- (see query in where condition)
SELECT distinct(id) "idea_id"
FROM ideas
WHERE deleted_by_user = 0 AND moderation_flag = 1 AND
user_id in (select id
from users
where confirm like "yes")
ORDER BY time_of_creation DESC
let me know if there is some issue in this query :
thanx in advance..
You can wirte this query in two ways:
SELECT DISTINCT(i.id) "idea_id"
FROM ideas i
INNER JOIN users u ON i.user_id = u.id
WHERE i.deleted_by_user = 0 AND i.moderation_flag = 1 AND u.confirm = 'yes'
ORDER BY i.time_of_creation DESC;
And
SELECT DISTINCT(i.id) "idea_id"
FROM ideas i
WHERE i.deleted_by_user = 0 AND i.moderation_flag = 1 AND
EXISTS (SELECT * FROM users u WHERE i.user_id = u.id AND u.confirm = 'yes')
ORDER BY i.time_of_creation DESC;
SELECT distinct a.ID idea_id
FROM ideas a
INNER JOIN users b
ON a.user_id = b.id
WHERE a.deleted_by_user = 0 AND
a.moderation_flag = 1
b.confirm = 'YES'
ORDER BY time_of_creation DESC
To answer your question - there are no problems with using subqueries.
On the other hand, you have (at least) three different things to think about when writing a query in one way or another:
How efficient will the data base run my query? (If the data base is small, this may not matter at all)
How easy is this to formulate and write? - which often connects to
How easy is this to understand for someone else who reads my code? (and I may myself count as "somebody else" if I look into code I've written a year ago...)
If you have a database of a size where efficiency counts, the best way to select how to formulate a query is normally to write it in different ways and test it on the data base. (but often the query optimizer in the data base is so good, it does not matter)
SELECT distinct i.id "idea_id"
FROM ideas i join users u
on i.user_id=u.id and u.confirm ='yes'
WHERE i.deleted_by_user = 0
AND i.moderation_flag = 1
ORDER BY i.time_of_creation DESC