How to optimize this mysql query? Runs slow - mysql

I'm using a database that, imho, wasn't designed well, but maybe it's just me not understanding it. Anyways, I have a query that pulls the correct information, but it is really slowing down my php script. I was hoping someone could take a look at this and let me know if nesting queries to this depth is bad, and whether or not there is a way to simplify the query from the relationships depicted in the sql statement below.
SELECT name
FROM groups
WHERE id = (SELECT DISTINCT immediateparentid
FROM cachedgroupmembers
WHERE groupid = (SELECT g.id AS AdminCc
FROM Tickets t, groups g
WHERE t.Id = 124 AND t.id = g.instance AND g.type = 'AdminCc')
AND immediateparentid <> (SELECT g.id AS AdminCc
FROM Tickets t, groups g
WHERE t.Id = 124 AND t.id = g.instance AND g.type = 'AdminCc'))
Please help
Update:
Here is the output from using Explain
You may need to right click and select "View Image" for the text to be clear.

From what I can tell, you can eliminate one sub-select.
SELECT name
FROM groups
WHERE id = (
SELECT DISTINCT immediateparentid
FROM cachedgroupmembers
WHERE groupid = (
SELECT g.id
FROM Tickets t, groups g
WHERE t.Id = 124 AND t.id = g.instance AND g.type = 'AdminCc'
) AND immediateparentid != groupid
)

I'm much more used to PL/SQL on Oracle but I'll give it a try.
Get rid of aliases, you don't need them here.
Make sure columns used in the where clause are indexed (t.Id and g.type).
Don't know if MySQL indexes foreign keys by default but worth the check.
You can shorten your SQL code like that:
SELECT name
FROM groups
WHERE id = (
SELECT DISTINCT immediateparentid
FROM cachedgroupmembers
WHERE groupid = (
SELECT g.id
FROM Tickets t, groups g
WHERE t.Id = 124 AND t.id = g.instance AND g.type = 'AdminCc'
) AND immediateparentid != groupid
)
or:
SELECT name
FROM groups
WHERE id = (
SELECT DISTINCT immediateparentid
FROM cachedgroupmembers
WHERE groupid = (
SELECT g.id
FROM Tickets t inner join groups g on t.id = g.instance
WHERE t.Id = 124 AND g.type = 'AdminCc'
) AND immediateparentid != groupid
)

if your tickets table is big you may consider a temp table instead of querying it twice

Related

How to find a result with multiple different values on same column?

Imagine that we have a database with a logs table and types table. I want to do a query where I figure out if UserX has entries for certain types of logs. Let's say that UserX has logged type_1 and type_2, but not type_3. I want to write a simple query to see if this is true or false.
At first I tried something like:
SELECT * FROM logs AS l
INNER JOIN types AS t
ON t.id = l.type_id
WHERE t.name = "type_1"
AND t.name = "type_2"
AND t.name != "type_3";
But I quickly realised that it was not possible to do it like this, since t.name cannot have multiple values. I have tried a bunch of different approaches now, but cannot seem to find the one right for me. I'm sure the solution is fairly simple, I just don't see it at the moment.
Hope someone can point me in the right direction.
I have made a simple test database in this fiddle, to use for testing and example: https://www.db-fiddle.com/f/nA6iKgCcJwKnXKsxaNvsLt/0
One option with conditional aggregation.
SELECT l.userID
FROM logs AS l
JOIN types AS t ON t.id = l.type_id
GROUP BY l.userID
HAVING COUNT(DISTINCT CASE WHEN t.name IN ('type_1','type_2') THEN t.name END) = 2
AND COUNT(DISTINCT CASE WHEN t.name = 'type_3' THEN t.name END) = 0
You can do it like Vamsi, but if you prefer an easier to understand SQL then you can do it like this:
SELECT * FROM logs AS l
INNER JOIN types AS t
ON t.id = l.type_id
WHERE true
AND EXISTS (SELECT 1 FROM logs ll WHERE l.user_id = ll.user_id AND type_id = 1)
AND EXISTS (SELECT 1 FROM logs ll WHERE l.user_id = ll.user_id AND type_id = 2)
AND NOT EXISTS (SELECT 1 FROM logs ll WHERE l.user_id = ll.user_id AND type_id = 3)
I do not recommend using count(distinct) for this purpose. It can be expensive. I would simply do:
SELECT l.userId
FROM logs l INNER JOIN
types t
ON t.id = l.type_id
WHERE t.name IN ('type_1', 'type_2', 'type_3')
GROUP BY l.userId
HAVING SUM(t.name = 'type_1') > 0 AND -- at least one
SUM(t.name = 'type_2') > 0 AND -- at least one
SUM(t.name = 'type_3') = 0 ; -- none

MySQL select with group and one to many relations condition

For example have such structure:
CREATE TABLE clicks
(`date` varchar(50), `sum` int, `id` int)
;
CREATE TABLE marks
(`click_id` int, `name` varchar(50), `value` varchar(50))
;
where click can have many marks
So example data:
INSERT INTO clicks
(`sum`, `id`, `date`)
VALUES
(100, 1, '2017-01-01'),
(200, 2, '2017-01-01')
;
INSERT INTO marks
(`click_id`, `name`, `value`)
VALUES
(1, 'utm_source', 'test_source1'),
(1, 'utm_medium', 'test_medium1'),
(1, 'utm_term', 'test_term1'),
(2, 'utm_source', 'test_source1'),
(2, 'utm_medium', 'test_medium1')
;
I need to get agregated values of click grouped by date which contains all of selected values.
I make request:
select
c.date,
sum(c.sum)
from clicks as c
left join marks as m ON m.click_id = c.id
where
(m.name = 'utm_source' AND m.value='test_source1') OR
(m.name = 'utm_medium' AND m.value='test_medium1') OR
(m.name = 'utm_term' AND m.value='test_term1')
group by date
and get 2017-01-01 = 700, but I want to get 100 which means that only click 1 has all of marks.
Or if condition will be
(m.name = 'utm_source' AND m.value='test_source1') OR
(m.name = 'utm_medium' AND m.value='test_medium1')
I need to get 300 instead of 600
I found answer in getting distinct click_id by first query and then sum and group by date with condition whereIn, but on real database which is very large and has id as uuid this request executes extrimely slow. Any advices how to get it work propely?
You can achieve it using below queries:
When there are the three conditions then you have to pass the HAVING count(*) >= 3
SELECT cc.DATE
,sum(cc.sum)
FROM clicks AS cc
INNER JOIN (
SELECT id
FROM clicks AS c
LEFT JOIN marks AS m ON m.click_id = c.id
WHERE (
m.NAME = 'utm_source'
AND m.value = 'test_source1'
)
OR (
m.NAME = 'utm_medium'
AND m.value = 'test_medium1'
)
OR (
m.NAME = 'utm_term'
AND m.value = 'test_term1'
)
GROUP BY id
HAVING count(*) >= 3
) AS t ON cc.id = t.id
GROUP BY cc.DATE
When there are the three conditions then you have to pass the HAVING count(*) >= 2
SELECT cc.DATE
,sum(cc.sum)
FROM clicks AS cc
INNER JOIN (
SELECT id
FROM clicks AS c
LEFT JOIN marks AS m ON m.click_id = c.id
WHERE (
m.NAME = 'utm_source'
AND m.value = 'test_source1'
)
OR (
m.NAME = 'utm_medium'
AND m.value = 'test_medium1'
)
GROUP BY id
HAVING count(*) >= 2
) AS t ON cc.id = t.id
GROUP BY cc.DATE
Demo: http://sqlfiddle.com/#!9/fe571a/35
Hope this works for you...
You're getting 700 because the join generates multiple rows for the different IDs. There are 3 rows in the mark table with ID=1 and sum=100 and there are two rows with ID=2 and sum=200. On doing the join where shall have 3 rows with sum=100 and 2 rows with sum=200, so adding these sum gives 700. To fix this you have to aggregate on the click_id too as illustrated below:
select
c.date,
sum(c.sum)
from clicks as c
inner join (select * from marks where (name = 'utm_source' AND
value='test_source1') OR (name = 'utm_medium' AND value='test_medium1')
OR (name = 'utm_term' AND value='test_term1')
group by click_id) as m
ON m.click_id = c.id
group by c.date;
DEMO SQL FIDDLE
I found the right way myself, which works on large amounts of data
The main goal is to make request generate one table with subqueries(conditions) which do not depend on amount of data in results, so the best way is:
select
c.date,
sum(c.sum)
from clicks as c
join marks as m1 ON m1.click_id = c.id
join marks as m2 ON m2.click_id = c.id
join marks as m3 ON m3.click_id = c.id
where
(m1.name = 'utm_source' AND m1.value='test_source1') AND
(m2.name = 'utm_medium' AND m2.value='test_medium1') AND
(m3.name = 'utm_term' AND m3.value='test_term1')
group by date
So we need to make as many joins as many conditions we have

Join between sub-queries in SQLAlchemy

In relation to the answer I accepted for this post, SQL Group By and Limit issue, I need to figure out how to create that query using SQLAlchemy. For reference, the query I need to run is:
SELECT t.id, t.creation_time, c.id, c.creation_time
FROM (SELECT id, creation_time
FROM thread
ORDER BY creation_time DESC
LIMIT 5
) t
LEFT OUTER JOIN comment c ON c.thread_id = t.id
WHERE 3 >= (SELECT COUNT(1)
FROM comment c2
WHERE c.thread_id = c2.thread_id
AND c.creation_time <= c2.creation_time
)
I have the first half of the query, but I am struggling with the syntax for the WHERE clause and how to combine it with the JOIN. Any one have any suggestions?
Thanks!
EDIT: First attempt seems to mess up around the .filter() call:
c = aliased(Comment)
c2 = aliased(Comment)
subq = db.session.query(Thread.id).filter_by(topic_id=122098).order_by(Thread.creation_time.desc()).limit(2).offset(2).subquery('t')
subq2 = db.session.query(func.count(1).label("count")).filter(c.id==c2.id).subquery('z')
q = db.session.query(subq.c.id, c.id).outerjoin(c, c.thread_id==subq.c.id).filter(3 >= subq2.c.count)
this generates the following SQL:
SELECT t.id AS t_id, comment_1.id AS comment_1_id
FROM (SELECT count(1) AS count
FROM comment AS comment_1, comment AS comment_2
WHERE comment_1.id = comment_2.id) AS z, (SELECT thread.id AS id
FROM thread
WHERE thread.topic_id = :topic_id ORDER BY thread.creation_time DESC
LIMIT 2 OFFSET 2) AS t LEFT OUTER JOIN comment AS comment_1 ON comment_1.thread_id = t.id
WHERE z.count <= 3
Notice the sub-query ordering is incorrect, and subq2 somehow is selecting from comment twice. Manually fixing that gives the right results, I am just unsure of how to get SQLAlchemy to get it right.
Try this:
c = db.aliased(Comment, name='c')
c2 = db.aliased(Comment, name='c2')
sq = (db.session
.query(Thread.id, Thread.creation_time)
.order_by(Thread.creation_time.desc())
.limit(5)
).subquery(name='t')
sq2 = (
db.session.query(db.func.count(1))
.select_from(c2)
.filter(c.thread_id == c2.thread_id)
.filter(c.creation_time <= c2.creation_time)
.correlate(c)
.as_scalar()
)
q = (db.session
.query(
sq.c.id, sq.c.creation_time,
c.id, c.creation_time,
)
.outerjoin(c, c.thread_id == sq.c.id)
.filter(3 >= sq2)
)

(My)SQL JOIN - get teams with exactly specified members

Assume tables
team: id, title
team_user: id_team, id_user
I'd like to select teams with just and only specified members. In this example I want team(s) where the only users are those with id 1 and 5, noone else. I came up with this SQL, but it seems to be a little overkill for such simple task.
SELECT team.*, COUNT(`team_user`.id_user) AS cnt
FROM `team`
JOIN `team_user` user0 ON `user0`.id_team = `team`.id AND `user0`.id_user = 1
JOIN `team_user` user1 ON `user1`.id_team = `team`.id AND `user1`.id_user = 5
JOIN `team_user` ON `team_user`.id_team = `team`.id
GROUP BY `team`.id
HAVING cnt = 2
EDIT: Thank you all for your help. If you want to actually try your ideas, you can use example database structure and data found here: http://down.lipe.cz/team_members.sql
How about
SELECT *
FROM team t
JOIN team_user tu ON (tu.id_team = t.id)
GROUP BY t.id
HAVING (SUM(tu.id_user IN (1,5)) = 2) AND (SUM(tu.id_user NOT IN (1,5)) = 0)
I'm assuming a unique index on team_user(id_team, id_user).
You can use
SELECT
DISTINCT id,
COUNT(tu.id_user) as cnt
FROM
team t
JOIN team_user tu ON ( tu.id_team = t.id )
GROUP BY
t.id
HAVING
count(tu.user_id) = count( CASE WHEN tu.user_id = 1 or tu.user_id = 5 THEN 1 ELSE 0 END )
AND cnt = 2
Not sure why you'd need the cnt = 2 condition, the query would get only those teams where all of users having the ID of either 1 or 5
Try This
SELECT team.*, COUNT(`team_user`.id_user) AS cnt FROM `team`
JOIN `team_user` ON `team_user`.id_team = `team`.id
where `team_user`.id_user IN (1,5)
GROUP BY `team`.id
HAVING cnt = 2

MySql on what cols should I put indexes?

I have this query:
SELECT Concat(f.name, ' ', f.parent_names) AS FullName,
stts.name AS 'Status',
u.name AS Unit,
city.name AS City,
(SELECT Group_concat(c.mobile1)
FROM contacts c
WHERE c.id = f.husband_id
OR c.id = f.wife_id) AS MobilePhones,
f.phone AS HomePhone,
f.contact_initiation_date AS InitDate,
f.status_change_date AS StatusChangeDate,
cmt.created_at AS CommentDate,
cmt.comment AS LastComment,
f.reconnection_date AS ReconnectionDate,
(SELECT Group_concat(t.name, ' ')
FROM taggings tgs
JOIN tags t
ON tgs.tag_id = t.id
WHERE tgs.taggable_type = 'family'
AND tgs.taggable_id = f.id) AS HandlingStatus
FROM families f
JOIN categories stts
ON f.family_status_cat_id = stts.id
JOIN units u
ON f.unit_id = u.id
JOIN categories city
ON f.main_city_cat_id = city.id
LEFT JOIN comments cmt
ON f.last_comment_id = cmt.id
WHERE 1 = 0
OR ( u.is_busy = 1 )
OR ( f.family_status_cat_id = 1423 )
OR ( f.family_status_cat_id = 1422
AND f.status_change_date BETWEEN '2011-03-21' AND '2012-03-13' )
My problem is very specific. It is regarding the line:
SELECT GROUP_CONCAT( c.mobile1 )
FROM contacts c
WHERE c.id = f.husband_id
OR c.id = f.wife_id
) AS MobilePhones
When I use EXPLAIN, it seems that this query is bad. I get for this table (c = contacts): 38307 rows.
On what columns should I put the index according to the query?
I tried mobile1 - but no improvement (BTW - family_id is indexed in the contacts table).
I attach the image of the explain result:
Or maybe someone can help me optimize the query...
Any column you'll be searching on, to speed up the process. Keep in mind that keys are already indexed.
Well, it seems that using the GROUP_CONCAT is the problem.
I just seperated the wife and husband mobile to be 2 different columns.
First, I thought that using the GROUP_CONCAT will be faster, but it proved to be VERY WRONG.
Just out of my curiosity, what is the performance of the query
SELECT GROUP_CONCAT( c.mobile1 )
FROM contacts c
WHERE c.id IN(f.husband_id, f.wife_id)
) AS MobilePhones