Optimize MySQL query nested not exists possible? - mysql

I have list of submissions of exercises done by students who are part of a group(classroom), this contains:
submission table: userId, groupId, exercise_id (and more irrelevant data)
users table: userId, groupId
I want to select all the exercises done by all the students in a specific group. For this I currently have:
SELECT DISTINCT(exercise_id) FROM submissions as c1 WHERE c1.groupId = 1
AND NOT EXISTS(
SELECT DISTINCT(UserId) FROM users as u WHERE u.GroupId = 1
AND NOT EXISTS (
SELECT exercise_id FROM submissions as c2 WHERE u.UserId = c2.UserId
AND c2.exercise_id = c1.exercise_id
)
)
i.e. I select all the exercises for which there are no users in the group that have not done the exercise.
However, this query takes 5 seconds on a submission table with 1.5 million rows. Which steps could I take to further optimise this query? I have considered inner joins, but won't this result in the same query execution plan?

The groupid really shouldn't be in both tables. Assuming the values are consistent, try the following:
select s.exercise_id
from submissions s
where s.groupid = 1
group by s.exercise_id
having count(distinct userid) = (select count(distinct userid) from users where groupid = 1);
For performance, you want an index on submissions(groupid, exercise_id). Also, if you know there are no duplicate submissions or users, then remove the distinct, because that has an adverse effect on performance.

Related

Creating a SQL view with personal best records

I have the following SQL Database structure:
Users are the registered users. Maps are like circuits or race tracks. When a user is driving a time a new time record will be created including the userId, mapId and the time needed to finish the racetrack.
I wish to create a view where all the users personal bests on all maps are listed.
I tried creating the view like this:
CREATE VIEW map_pb AS
SELECT MID, UID, TID
FROM times
WHERE score IN (SELECT MIN(score) FROM times)
ORDER BY registered
This does not lead to the wished result.
Thank you for your help!
I hope that you have 'times' table created as the above diagram and 'score' column in the table that you use to measure the best record.
(MIN(score) is the best record).
You can simply create a view to have the personal best records using sub-queries like this.
CREATE VIEW map_pb AS
SELECT a.MID, a.UID, a.TID
FROM times a
INNER JOIN (
SELECT TID, UID, MIN(score) score
FROM times
GROUP BY UID
) b ON a.UID = b.UID AND a.score= b.score
-- if you have 'registered' column in the 'times' table to order the result
ORDER BY registered
I hope this may work.
You probably need to use a query that will first return the minimum score for each user on each map. Something like this:
SELECT UID,
MID,
MIN(score) AS best_time
FROM times
GROUP BY UID, MID
Note: I used MIN(score) as this is what is shown in your example query, but perhaps it should be MIN(time) instead?
Then just use the subquery JOINed to your other tables to get the output:
SELECT *
FROM (
SELECT UID,
MID,
MIN(score) AS best_time
FROM times
GROUP BY UID, MID
) a
INNER JOIN users u ON u.UID = a.UID
INNER JOIN maps m ON m.MID = a.MID
Of course, replace SELECT * with the columns you actually want.
Note: code untested but does give an idea as to a solution.
Start with a subquery to determine each user's minimum score on each map
SELECT UID, TID, MIN(time) time
FROM times
GROUP BY UID, TID
Then join that subquery into a main query.
SELECT times.UID, times.TID,
mintimes.time
FROM times
JOIN (
) mintimes ON times.TID = mintimes.TID
AND times.UID = mintimes.UID
AND times.time = mintimes.time
JOIN maps ON times.MID = maps.MID
JOIN users ON times.UID = users.UID
This query pattern uses a GROUP BY function to find the outlying (MIN in this case) value for each combination. It then uses that subquery to find the detail record for each outlying value.

How to get the sum of a specific user from two tables?

I currently have 2 tables:
Favorite:
userID
drinkName
History:
userID
drinkName
I want to get the sum of the total times a specific userID shows up in each table, and then then the total number of times userID shows up in both tables.
(SELECT COUNT(userID) AS totalDrinks FROM History
WHERE userID = 'sai') union
(SELECT COUNT(userID) AS totalDrinks FROM Favorite
WHERE userID = 'sai')
So that code gets me the following output:
totalDrinks
4
2
However I am trying to use the MySQL sum function and that's not adding the two things up though.
So I was wondering how I would rewrite my query to output 6?
SELECT SUM(userID)as totalDrinks FROM History h
JOIN Favorite f ON h.userID=f.userID
GROUP BY userID
WHERE userID = 'sai'
Your UNION approach was almost there. You will have to SUM the result of both queries:
SELECT SUM(totalDrings) totalDrings FROM (
SELECT COUNT(*) totalDrinks FROM History
WHERE userID = 'sai'
UNION ALL
SELECT COUNT(*) FROM Favorite
WHERE userID = 'sai'
) s
A few things to note. You should use UNION ALL otherwise if the COUNTs result in the same number then they will be added only once. Another thing is that you should not use an INNER JOIN in here as that will force the users to be present in both tables.

How Can I improve this MySQL query?

I am trying to improve the performance of this query as it is taking 3-4 seconds to execute.
Here is the query
SELECT SQL_NO_CACHE
ac.account_id,
ac.account_name,
cl.name AS client_name,
IFNULL(cn.contact_number, "") AS Phone
FROM accounts AS ac
STRAIGHT_JOIN clients AS cl ON cl.client_id = ac.client_id
LEFT JOIN (
SELECT bc.contact_number, bc.account_id
FROM contact_numbers AS bc
INNER JOIN (
SELECT account_id, MAX(number_id) AS number_id
FROM contact_numbers
WHERE status = 1 AND contact_type != "Fax" AND contact_link = "Account"
GROUP BY account_id
) AS bb ON bb.number_id = bc.number_id
) AS cn ON ac.account_id = cn.account_id
WHERE ac.status = 1
ORDER BY ac.account_name
LIMIT 0, 100
the client table contains about 10 rows that's why I have straight join. The account table contains 350K records. The contact_numbers contains about 500k records
I believe the problem here is the left Join and also the ORDER BY but I am not sure how to work around it. Also I am using SQL_NO_CACHE because the accounts, contact_numbers tables are being updated at a fast rate.
What else can I do to improve performance of this query?
this is a screenshot of the explain on this query
I am using MySQL 5.6.13
I Set sort_buffer_size=1M
My server has 32GB of RAM
The below should make the outer query run without requiring a filesort.
CREATE INDEX ac_status_acctname ON accounts (status, account_name);
The below should make the inner query Using index, and help it to do the GROUP by without using a temp table.
CREATE INDEX cn_max ON contact_numbers (account_id, status, contact_link,
contact_type, number_id);
You need to join on both account_id and number_id to get the greatest entry per account. The way you have it now, you just get any account that happens to have the same number_id, which is probably not what you intended, and it could be what's generating too many rows for the subquery result set.
bc INNER JOIN ... bb ON bb.account_id = bc.account_id AND bb.number_id = bc.number_id
You can also write the same join condition as:
bc INNER JOIN ... bb USING (account_id, number_id)
I would actualy rewrite the query. You currently select a lot of data you do not need and discard. I would minimize the amount of the fetched data.
It seems you basically select something for each account with a certain status and take only 100 of them. So I would put this in a subquery:
SELECT
account_id,
account_name,
c.name AS client_name,
IFNULL(contact_number, '') as Phone
FROM (
SELECT
account_id,
MAX(number_id) as number_id
FROM (
SELECT account_id
FROM accounts
WHERE status = 1 -- other conditions on accounts go here
ORDER BY account_name
LIMIT 0, 100) as a
LEFT JOIN contact_numbers n
ON a.coount_id = n.account_id
AND n.status = 1
AND contact_type != "Fax"
AND contact_link = "Account"
GROUP BY account_id) an
LEFT JOIN contact_numbers USING (account_id, number_id)
JOIN accounts a USING (account_id)
JOIN clients c USING (client_id);
You will need (status, account_name) index for accounts table (for the query with client_id = 4 (status, client_id, account_name) as well) and an index on account_id in contact_numbers. This should suffice.

How to query data without repeats and minimize the time?

There are 3 entities - articles, journals and subscribers. There are no restrictions on how to store data in database.
The same article can be simultaneously published in several journals.
How to select all published articles from subscribed journals sorted
by date of publication and without repeats?
The easiest way:
Create a table with articles:
posts
p_id, j1_id, j2_id, text, date
Create a table with subscribtions:
follows
f_id, u_id, j_id (u_id — is a user id from table users)
Execute:
example query
select posts.* from posts inner join follows on (j_id = j1_id or j_id
= j2_id) where u_id = 1 order by date desc
This query returns data with duplicates. You can use mechanisms DISTINCT or GROUP BY, but it creates an additional sorting operation to remove duplicates.
The other way it can be done using mechanism UNION, but it also uses a DISTINCT.
(select posts.* from posts inner join follows on j_id = j1_id where u_id = 1)
union
(select posts.* from posts inner join follows on j_id = j2_id where u_id = 1)
order by date desc
Perhaps I selected the incorrect storage structure in my way.
Actually the question, is it possible to do something about this problem, to minimize the time required for big data?
you can use the following table structure
posts : pid, text, date
journals : jid, jtext
journals_posts : jid, pid
follows : fid, uid, jid
select distinct posts.* from posts
inner join journals_posts on journals_posts.pid = posts.pid
inner join follows on follows.jid = journals_posts.jid
where follows.uid = <userid>
to take care of speed you can create index on
journals_posts(jid)
follows(uid)
you might required to create indexes on other fields check with "explain " which tables are scanned without using joins

How to search when joining 3 tables but exclude result for one of them?

I've been tying for hours now to get a particular result and haven't found any answer on the web - and as I'm not an SQL expert at all, I'm asking a question here.
I have 3 tables: user (id, name...), cars (id, type, color, engine power...) and an intermediary table to save all the scores users gave to the car: scores (id, user_id, car_id, score).
I'm trying to find a query that could return for one particular user, all the cars that he hasn't rated yet. I've tried the following but it returns null:
$q=mysql_query("SELECT * FROM cars LEFT OUTER JOIN scores ON cars.id = scores.car_id WHERE scores.user_id != ('".$userId."')");
Does someone have a clue?
SELECT
*
FROM
cars
WHERE
NOT EXISTS (SELECT 1 FROM scores WHERE car_id = cars.id AND user_id = ?)
where ? is the ID of that particular user.
A composite index in scores over (car_id, user_id) is useful here.
You can use your code with small modification:
SELECT * FROM cars
LEFT OUTER JOIN scores ON cars.id = scores.car_id and scores.user_id=".$userId."
WHERE scores.id IS NULL
SELECT * FROM
car c
WHERE c.id NOT IN (
SELECT s.car_id
FROM score s, user u
WHERE u.id = s.user_id
AND u.id = ?
)