Using mysql to find common set between two lists - mysql

I have two queries in which I would like to find their common values. I'm trying to ultimately find out what percentage of users have visited both webpages.
SELECT DISTINCT user_id
FROM table
WHERE url ='y'
ORDER BY user_id;
SELECT DISTINCT user_id
FROM table
WHERE url ='z'
ORDER BY user_id;
I've tried a
NOT IN
and a
UNION
but haven't had much luck - though I could easily be doing it wrong. I'm new.

One method is to use conditional aggregation. To get information for each user:
select user_id,
sum(url = 'y') as y_visits,
sum(url = 'z') as z_visits
from t
group by user_id;
To get the list of users, add a having clause:
having y_visits >= 1 and z_visits >- 1
To get summary information:
select y_visitor, z_visitor, count(*)
from (select user_id,
max(url = 'y') as y_visitor,
max(url = 'z') as z_visitor
from t
group by user_id
) yz
group by y_visitor, z_visitor;
To get a simple percentage:
select avg(y_visitor = 1 and z_visitor = 1) as p_VisitedBothYandZ
from (select user_id,
max(url = 'y') as y_visitor,
max(url = 'z') as z_visitor
from t
group by url
) yz;

Related

My select query is very slow, I need insights on how to improve it

I have a query that I use to retrieve table data. The query is very slow when there are more than 3-4 rows returned, and I don't understand why.
This is my query:
select
ig.id,
ig.username,
ig.created,
ig.is_completed,
ig.user_id,
ig.is_error,
ig.last_appeal_process_update,
(unix_timestamp() - ig.created) as time_running,
ig.is_deleted,
ap.id as appealprocessid,
(case
when ap.id is null then 0
when ap.id is not null then ap.status
end
) as current_status,
( select count(*) from appeal_process where ig_account_id = ig.id)
as total_appeals
from instagram_accounts ig
left join appeal_process ap on ig.id = ap.ig_account_id
where
(ig.username like CONCAT('%',?,'%') or ig.id like CONCAT('%',?,'%') or ig.username like CONCAT('%',?,'%')) and
ig.user_id = ? and is_deleted = 0 and
(
ap.id is null or
ap.id = ( -- WE SELECT ONLY THE LATEST APPEAL PROCESS
select max(id) from appeal_process tmp where tmp.ig_account_id = ig.id limit 1
)
)
order by ig.username asc
limit ?,?
EDIT
This is EXPLAIN query (although I have no idea how to read it tbh)
enter image description here
This is SHOW CREATE TABLE for instagram_accounts:
enter image description here
This is SHOW CREATE TABLE for appeal_process:
enter image description here
On instagram_accounts you have critiera specific to user_id, is_deleted but also username. As you are also sorting by username the first fix is to add an index.
CREATE INDEX user_id_deleted_username ON
instagram_accounts(user_id, is_deleted, username);
In appeal_process you are searching by ig_account_id both in the JOIN and tmp subquery:
CREATE INDEX id_account_id ON
appeal_process ( ig_account_id)
On the query, you are pulling the status on the maximum appeal process id for a user. Lets GROUP BY ig.id and that simplifies how to get the results as MAX and others are per group (changes in CAPs for emphasis):
select
ig.id,
ig.username,
ig.created,
ig.is_completed,
ig.user_id,
ig.is_error,
ig.last_appeal_process_update,
(unix_timestamp() - ig.created) as time_running,
ig.is_deleted,
COALESE(MAX(ap.id),0) as appealprocessid,
(SELECT status FROM appeal_process WHERE appeal_process.id = appealprocessid LIMIT 1) as current_status,
COUNT(*) as total_appeals
from instagram_accounts ig
left join appeal_process ap on ig.id = ap.ig_account_id
where
(ig.username like CONCAT('%',?,'%') or ig.id like CONCAT('%',?,'%') or ig.username like CONCAT('%',?,'%')) and
ig.user_id = ? and is_deleted = 0
GROUP BY ig.id
order by ig.username asc
limit ?,?
It is very likely that your problem comes from lack of indexes, it is necessary to index the fields used in the WHERE AND JOIN clauses.
Create an index for each of these fields:
ig_account_id
ig.id
ig.username
is_deleted
ig.user_id
tmp.ig_account_id

SQL query that limits the results to one when using count inside count

I am trying to select the count of likes on a specific project. The idea i came up with is
CAST(count(uploads.ID in (SELECT uploadID from votes)) as decimal) as numberoflikes
this works but the query then only returns one thing.
Entire query
SELECT DISTINCT users.NAME AS username
,users.ID AS userID
,subjects.NAME AS subjectname
,uploads.TIME
,uploads.description
,uploads.NAME
,uploads.ID
,CASE
WHEN uploads.ID IN (
SELECT uploadID
FROM votes
WHERE userID = 2
)
THEN CAST(1 AS DECIMAL)
ELSE CAST(0 AS DECIMAL)
END AS liked
,CASE
WHEN uploads.ID IN (
SELECT uploadID
FROM bookmarks
WHERE userID = 2
)
THEN CAST(1 AS DECIMAL)
ELSE CAST(0 AS DECIMAL)
END AS bookmarked
,CAST(count(uploads.ID IN (
SELECT uploadID
FROM votes
)) AS DECIMAL) AS numberoflikes
FROM uploads
INNER JOIN subjects ON (subjects.ID = uploads.subjectID)
INNER JOIN users ON (users.ID = uploads.userID)
INNER JOIN uploadGrades ON (uploads.ID = uploadGrades.uploadID)
INNER JOIN grades ON (grades.ID = uploadGrades.gradeID)
WHERE uploads.active = 1
AND subjects.ID IN (
SELECT subjectID
FROM userSubjects
INNER JOIN users ON (users.ID = userSubjects.userID)
WHERE userSubjects.userID = 2
)
AND grades.ID IN (
SELECT userGrades.gradeID
FROM uploadGrades
INNER JOIN userGrades ON (uploadGrades.gradeID = userGrades.gradeID)
WHERE userGrades.userID = 2
)
ORDER BY uploads.trueRating DESC;
Lets try a reduce version of your query, That is the base to get better answers
I reduce the initial query to user and upload to start. Also remove the fields you already know how to calculate.
.
SELECT DISTINCT users.NAME AS username
,users.ID AS userID
,uploads.NAME
,uploads.ID
,CAST(count(uploads.ID IN (
SELECT uploadID
FROM votes
)) AS DECIMAL) AS numberoflikes
FROM uploads
INNER JOIN users ON (users.ID = uploads.userID)
WHERE uploads.active = 1
ORDER BY uploads.trueRating DESC;
Then add votes with LEFT JOIN to replace the SELECT in the COUNT that way if not match you will get NULL and as I say in my comment COUNT doesnt count NULL's
.
SELECT DISTINCT users.NAME AS username
,users.ID AS userID
,uploads.NAME
,uploads.ID
,CAST(count(votes.uploadID)) AS DECIMAL) AS numberoflikes
FROM uploads
INNER JOIN users ON (users.ID = uploads.userID)
LEFT JOIN votes ON (uploads.ID = votes.uploadID)
WHERE uploads.active = 1
ORDER BY uploads.trueRating DESC;
Try something like this...
SELECT users.name as username, users.ID as userID, subjects.name as subjectname,
uploads.time, uploads.description, uploads.name, uploads.ID,
count(userVotes.userId), count(bookmarksMade.userId),
FROM uploads
join subjects on(subjects.ID = uploads.subjectID)
join users on(users.ID = uploads.userID)
join uploadGrades on(uploads.ID = uploadGrades.uploadID)
join grades on(grades.ID = uploadGrades.gradeID)
left join (select userId, uploadId from votes where userId = 2) as userVotes on uploads.id = userVotes.uploadId
left join (select userId, uploadId from bookmarks where userId = 2) as bookmarksMade on uploads.id = bookmarksMade.uploadId
join userSubjects on subjects.id = userSubjects.subjectID
WHERE uploads.active = 1 AND
userSubjects.userID = 2
ORDER BY uploads.trueRating DESC;
But, I am leaving out the userGrades thing, because you are doing a funky join there that I don't really understand (joining two tables on something that looks like it is not the whole primary key on either table).
Anyway, you really need to go to something more like this or what Oropeza suggests in his answer. Get more direct about what you want. This query looks like a monster that has been growing and getting things added in with "IN" clauses, as you needed them. Time to go back to the drawing board and think about what you want and how to get at it directly.
count(uploads.ID in (SELECT uploadID from votes)) as numberoflikes
group by uploads.Id ORDER BY uploads.trueRating DESC
I managed to do it like this. If i added the group by then it split the numberoflikes into rows and returned more then one row. Thanks for the help!

Multiple Selects in Single Query

I'm hoping someone fluent in MySQL will be able to assist me with this. I'm trying to do a select on a select on a select, but the query doesn't seem to want to complete. Any help would be greatly appreciated.
SELECT
product as pid,
leg_name as name,
dimensions as dims
FROM
pmaint
WHERE
product in (
SELECT product
FROM qb_export_items_attributes
WHERE attribute_name = 'Z'
AND product in (
SELECT product
FROM pmainT
WHERE type_ID = (
SELECT ID
FROM type
WHERE SOFTCARTCATEGORY = 'End Table Legs'
)
)
AND attribute_value <= 3.5
)
Try to use INNER JOINs instead of IN subqueries
UPD: I've edited this query according you comment. the first JOIN subquery output all product where both attributes exists and true.
SELECT
pmaint.product as pid,
pmaint.leg_name as name,
pmaint.dimensions as dims
FROM
pmaint
JOIN (select product from qb_export_items_attributes
where ((attribute_name = 'Z') and (attribute_value <= 3.5))
OR
((attribute_name = 'top_square') and (attribute_value >= 4))
GROUP BY product HAVING COUNT(*)=2
)
t1 on (pmaint.product=t1.product )
JOIN type on (pmaint.type_ID=type.ID)
WHERE
type.SOFTCARTCATEGORY = 'End Table Legs'

How do I get more than one column from a SELECT subquery?

Here is my problem :
I have 3 tables : account, account_event and account_subscription
account contains details like : company_name, email, phone, ...
account_event contains following events : incoming calls, outgoing calls, visit, mail
I use account_subscription in this query to retrieve the "prospects" accounts. If the account does not have a subscription, it is a prospect.
What I am using right now is the following query, which is working fine :
SELECT `account`.*,
(SELECT event_date
FROM clients.account_event cae
WHERE cae.account_id = account.id
AND cae.event_type = 'visit'
AND cae.event_done = 'Y'
ORDER BY event_date DESC
LIMIT 1) last_visit_date
FROM (`clients`.`account`)
WHERE (SELECT count(*)
FROM clients.account_subscription cas
WHERE cas.account_id = account.id) = 0
ORDER BY `last_visit_date` DESC
You can see that it returns the last_visit_date.
I would like to modify my query to return the last event details (last contact). I need the event_date AND the event_type.
So I tried the following query which is NOT working because apparently I can't get more than one column from my select subquery.
SELECT `account`.*,
(SELECT event_date last_contact_date, event_type last_contact_type
FROM clients.account_event cae
WHERE cae.account_id = account.id
AND cae.event_done = 'Y'
ORDER BY event_date DESC
LIMIT 1)
FROM (`clients`.`account`)
WHERE (SELECT count(*)
FROM clients.account_subscription cas
WHERE cas.account_id = account.id) = 0
ORDER BY `last_visit_date` DESC
I tried a lot of solutions around joins but my problem is that I need to get the last event for each account.
Any ideas?
Thank you in advance.
Jerome
Get a PRIMARY KEY in a subquery and join the actual table on it:
SELECT a.*, ae.*
FROM account a
JOIN account_event ae
ON ae.id =
(
SELECT id
FROM account_event aei
WHERE aei.account_id = a.id
AND aei.event_done = 'Y'
ORDER BY
event_date DESC
LIMIT 1
)
WHERE a.id NOT IN
(
SELECT account_id
FROM account_subscription
)
ORDER BY
last_visit_date DESC
Try moving the subquery to from part and alias it; it will look as just another table and you'll be able to extract more than one column from it.

Select value that is not "*username*"

My problem is this:
I need to select values from "groups" table that do not contain specific "user_id".
So I execute this:
SELECT DISTINCT group.active, hide_group.user_id, group.group_id, hide_group.hide, group.desc AS group_desc
FROM group
LEFT JOIN hide_group ON hide_group.group_id = group.group_id
WHERE (
hide_group.user_id != 'testuser'
OR hide_group.user_id IS NULL
AND (
hide != 'true'
OR hide IS NULL
)
)
AND active =1
ORDER BY `group`.`group_id` ASC
LIMIT 0 , 500
But now lets say we have such records in my 'group_hide' table:
[group_hide]
(user_id, group_id, hide)
john, ABC, true;
joe, ZZZ, true;
mark, ABC, true;
So now when I do my query, mark still sees ABC group, because the condition is true and valid because user_id = john and we therefor take the ABC value, even when it is hidden for mark.
I have tried changing this query several times, but I can't figure out this simple problem.
I suggest using a subquery:
SELECT *
FROM group
WHERE group_id NOT IN (
SELECT group_id
FROM group_hide
WHERE user_id = 'testuser'
AND hide = true)
AND active =1
ORDER BY `group`.`group_id` ASC
LIMIT 0 , 500
You can use Having to apply conditions on the JOIN results. I would also use GROUP BY rather than DISTINCT.
I am not sure I completely follow your WHERE condition (I think you have some mixup with the parenthesis), but here's a version which may get you the result you need:
SELECT DISTINCT group.active, hide_group.user_id, group.group_id, hide_group.hide, group.desc AS group_desc
FROM group
LEFT JOIN hide_group ON hide_group.group_id = group.group_id
WHERE active =1
HAVING (hide_group.user_id != 'testuser'
OR hide_group.user_id IS NULL)
AND
(hide != 'true'
OR hide IS NULL)
ORDER BY `group`.`group_id` ASC
LIMIT 0 , 500
If you post the tables structure, some sample data and a sample wanted output, you'll get better responses with working queries...