MySQL query with limited subqueries

MySQL query with limited subqueries - mysql

I'm trying to get some statistical data from a few tables. We have a users table, quiz table, a quiz question set table, and a quiz questions table. Each quiz has many sets, and each set has one or many questions. There's also a questions table which is where the question comes from (the quiz questions table links a question to question set, which then links to a quiz, which then links to the user). What I need is to see the number of questions answered correctly, and the number of questions answered, but only up to the past 50 questions. So if one user has answered 120 questions only the most recent 50 should be used in this query; if a user has answered 37 questions, then all of their questions should be used. I'd like to get this laid out so theres the user_id, questions_answered, questions_answered_correctly. I currently have this working, but I'm looking through each user and grabbing their 50 most recent questions, and with some additional tables limiting organization being joined on I have to do hundreds, if not thousands of these to get one statistical report.
I'm guessing I need to do a subquery somewhere to only pull the most recent questions from the user, but I'm not sure how a subquery like that would work. Here's what I have so far, but I'm sure I'm totally off on this. It executes, but incorrectly. Some of the results are over 50 when they shouldn't be:
SELECT users.id, (SELECT COUNT(grammar_quiz_questions.id) FROM `grammar_quiz_questions`
INNER JOIN `grammar_quiz_question_sets` ON `grammar_quiz_question_sets`.`id` = `grammar_quiz_questions`.`grammar_quiz_question_set_id`
INNER JOIN `grammar_quizzes` ON `grammar_quizzes`.`id` = `grammar_quiz_question_sets`.`grammar_quiz_id`
INNER JOIN `grammar_questions` ON `grammar_questions`.`id` = `grammar_quiz_questions`.`grammar_question_id`
WHERE (grammar_quiz_questions.finished is not null AND grammar_quizzes.user_id = users.id)
ORDER BY grammar_quiz_questions.finished DESC LIMIT 50) AS `questions_answered`, (SELECT COUNT(grammar_quiz_questions.id) FROM `grammar_quiz_questions`
INNER JOIN `grammar_quiz_question_sets` ON `grammar_quiz_question_sets`.`id` = `grammar_quiz_questions`.`grammar_quiz_question_set_id`
INNER JOIN `grammar_quizzes` ON `grammar_quizzes`.`id` = `grammar_quiz_question_sets`.`grammar_quiz_id`
INNER JOIN `grammar_questions` ON `grammar_questions`.`id` = `grammar_quiz_questions`.`grammar_question_id`
WHERE (grammar_quiz_questions.finished is not null AND grammar_quizzes.user_id = users.id AND grammar_quiz_question_sets.correct_on_first_attempt = 1)
ORDER BY grammar_quiz_questions.finished DESC LIMIT 50) AS `questions_answered_correctly`
FROM users
Thanks,
James

UPDATE:
Following update is not a complete answer to the question, but some nudges. I am not sure why you are querying on all these tables. are grammar_quiz_question_sets mutually exclusive subsets of grammar_quiz_questions? how about grammar_quizzes and grammar_questions, what is the set relation? Given that I don't know these answers, but you do, look at the code snippet following. I hope it guides you:
set #correct:=0;
select users.id, count(p.id), sum(if(r.correct_on_first_attempt = 1,1,0)) as correct
from grammar_quiz_questions p, grammar_quiz_question_sets r, users;
ORIGINAL:
I imagine you have a control and data access layer (java, php, python,etc) through which records are added and manipulated. Further, I imagine you need to grab statistics more than once in the lifetime of a user. Therefore, while you may need a query like yours to recalibrate once in a while -- if that will ever be necessary--, you need something less heady. Hence the following proposal.
1] create a statistics table:
create table statistics(
user_id int(11) not null, -- foreign key
questions_answered int(11) not null default 0,
questions_answered_correctly int(11) no null default 0
-- for primary key, you may use user_id or some auto record_id
)
2] the first time around, run your "heavy/administrative" query
3] subsequently, update the stats for a user after each quiz or each answered question. The idea here is that you will have that information in memory (i.e. in your programming layer) since you have to update the quiz table; during that time do some math to update the stats table. e.g. imagine java:
public void updateStats(int userId, int questions, int correct){
String query =
"insert into statistics(user_id,questions_answered,questions_answered_correctly) "+
"values("+userId+", "+questions+", "+correct+") "+
"on duplicate key update "+
"questions_answered=questions_answered+values(questions_answered), "+
"questions_answered_correctly = questions_answered_correctly + values(questions_answered_correctly)";
... //execute the statement
}
now for the "heavy" query, I am rewriting it below with a bit more clarity to encourage others to take a stab at it:
SELECT users.id,
(
SELECT COUNT(p.id)
FROM grammar_quiz_questions p, grammar_quiz_question_sets r, grammar_quizzes t, grammar_questions u
WHERE r.id = p.grammar_quiz_question_set_id
AND t.id = r.grammar_quiz_id
AND u.id = p.grammar_question_id
AND p.finished is not null
AND t.user_id = users.id
ORDER BY p.finished DESC LIMIT 50
) AS questions_answered,
(
SELECT COUNT(p.id)
FROM grammar_quiz_questions p, grammar_quiz_question_sets r, grammar_quizzes t, grammar_questions u
WHERE r.id = p.grammar_quiz_question_set_id
AND t.id = r.grammar_quiz_id
AND u.id = p.grammar_question_id
AND p.finished is not null
AND t.user_id = users.id
AND r.correct_on_first_attempt = 1
ORDER BY p.finished DESC LIMIT 50
) AS questions_answered_correctly
FROM users

Related

How can I create this complicated SQL query?

I’m using Active Record now. But it’s too slow if this code runs.
#communities = current_user.get_up_voted(Community)
#codes = Code.includes(:user).where(:community_id => #communities.collect(&:id)).order("users.active_at DESC").page(params[:page])
So, I’d like better performance with direct SQL query. But I have no idea how to write such complicated condition.
How can I write SQL query to make this sophisticated and fast?
How do you write SQL query for this condition?
There are 4 tables such as below
Users
id
active_at(date time)
Votes
votable_id (this links to community.id)
voter_id (the id of user who bookmarked)
vote_flag (this has to be 1)
Community
id
Codes
community_id (this links to community.id)
user_id (this links to a user who created a code record)
User can bookmark Community by updating vote_flag of Votes table.
I’d like to fetch all the codes that belongs to communities that current_user has already bookmarked, and they have to be sorted by ‘user.active_at’ column.

You just need to JOIN all those 4 tables and use ORDER BY to sort the records..
SELECT * FROM Users, Votes, Community, Codes
WHERE Codes.community_id = Community_id
AND Codes.user_id = Users.id
AND Votes.votable_id = Community.id
AND Votes.voter_id = Users.id
AND Votes.votable_id = Codes.community_id
AND Votes.voter_id = Codes.user_id
AND Votes.vote_flag = 1
ORDER BY USERS.active_at ASC
OR, for more readability:
SELECT * FROM Users
JOIN Codes ON Codes.user_id = Users.id
JOIN Community ON Codes.community_id = Community.id
JOIN Votes ON Votes.votable_id = Codes.community_id AND Votes.voter_id = Codes.user_id
WHERE vote_flag = 1
ORDER BY USERS.active_at ASC
Hope it helps..

Query efficiency (multiple selects)

I have two tables - one called customer_records and another called customer_actions.
customer_records has the following schema:
CustomerID (auto increment, primary key)
CustomerName
...etc...
customer_actions has the following schema:
ActionID (auto increment, primary key)
CustomerID (relates to customer_records)
ActionType
ActionTime (UNIX time stamp that the entry was made)
Note (TEXT type)
Every time a user carries out an action on a customer record, an entry is made in customer_actions, and the user is given the opportunity to enter a note. ActionType can be one of a few values (like 'designatory update' or 'added case info' - can only be one of a list of options).
What I want to be able to do is display a list of records from customer_records where the last ActionType was a certain value.
So far, I've searched the net/SO and come up with this monster:
SELECT * FROM (
SELECT * FROM (
SELECT * FROM `customer_actions` ORDER BY `EntryID` DESC
) list1 GROUP BY `CustomerID`
) list2 WHERE `ActionType`='whatever' LIMIT 0,30
Which is great - it lists each customer ID and their last action. But the query is extremely slow on occasions (note: there are nearly 20,000 records in customer_records). Can anyone offer any tips on how I can sort this monster of a query out or adjust my table to give faster results? I'm using MySQL. Any help is really appreciated, thanks.
Edit: To be clear, I need to see a list of customers who's last action was 'whatever'.

To filter customers by their last action, you could use a correlated sub-query...
SELECT
*
FROM
customer_records
INNER JOIN
customer_actions
ON customer_actions.CustomerID = customer_records.CustomerID
AND customer_actions.ActionDate = (
SELECT
MAX(ActionDate)
FROM
customer_actions AS lookup
WHERE
CustomerID = customer_records.CustomerID
)
WHERE
customer_actions.ActionType = 'Whatever'
You may find it more efficient to avoid the correlated sub-query as follows...
SELECT
*
FROM
customer_records
INNER JOIN
(SELECT CustomerID, MAX(ActionDate) AS ActionDate FROM customer_actions GROUP BY CustomerID) AS last_action
ON customer_records.CustomerID = last_action.CustomerID
INNER JOIN
customer_actions
ON customer_actions.CustomerID = last_action.CustomerID
AND customer_actions.ActionDate = last_action.ActionDate
WHERE
customer_actions.ActionType = 'Whatever'

I'm not sure if I understand the requirements but it looks to me like a JOIN would be enough for that.
SELECT cr.CustomerID, cr.CustomerName, ...
FROM customer_records cr
INNER JOIN customer_actions ca ON ca.CustomerID = cr.CustomerID
WHERE `ActionType` = 'whatever'
ORDER BY
ca.EntryID
Note that 20.000 records should not pose a performance problem

Please note that I've adapted Lieven's answer (I made a separate post as this was too long for a comment). Any credit for the solution itself goes to him, I'm just trying to show you some key points for improving performance.
If speed is a concern then the following should give you some suggestions for improving it:
select top 100 -- Change as required
cr.CustomerID ,
cr.CustomerName,
cr.MoreDetail1,
cr.Etc
from customer_records cr
inner join customer_actions ca
on ca.CustomerID = cr.CustomerID
where ca.ActionType = 'x'
order by cr.CustomerID
A few notes:
In some cases I find left outer joins to be faster then inner joins - It would be worth measuring performance for both for this query
Avoid returning * wherever possible
You don't have to reference 'cr.x' in the initial select but it's a good habit to get into for when you start working on large queries that can have multiple joins in them (this will make a lot of sense once you start doing this
When using joins always join on a primary key

Maybe I'm missing something but what's wrong with a simple join and a where clause?
Select ActionType, ActionTime, Note
FROM Customer_Records CR
INNER JOIN customer_Actions CA
ON CR.CustomerID = CA.CustomerID
Where ActionType = 'added case info'

Explain SQL and Query optimization

Explain SQL (in phpmyadmin) of a query that is taking more than 5 seconds is giving me the above. I read that we can study the Explain SQL to optimize a query. Can anyone tell if this Explain SQL telling anything as such?
Thanks guys.
Edit:
The query itself:
SELECT
a.`depart` , a.user,
m.civ, m.prenom, m.nom,
CAST( GROUP_CONCAT( DISTINCT concat( c.id, '~', c.prenom, ' ', c.nom ) ) AS char ) AS coordinateur,
z.dr
FROM `0_activite` AS a
JOIN `0_member` AS m ON a.user = m.id
LEFT JOIN `0_depart` AS d ON ( m.depart = d.depart AND d.rank = 'mod' AND d.user_sec =2 )
LEFT JOIN `0_member` AS c ON d.user_id = c.id
LEFT JOIN `zone_base` AS z ON m.depart = z.deprt_num
GROUP BY a.user
Edit 2:
Structures of the two tables a and d. Top: a and bottom: d
Edit 3:
What I want in this query?
I first want to get the value of 'depart' and 'user' (which is an id) from the table 0_activite. Next, I want to get name of the person (civ, prenom and name) from 0_member whose id I am getting from 0_activite via 'user', by matching 0_activite.user with 0_member.id. Here depart is short of department which is also an id.
So at this point, I have depart, id, civ, nom and prenom of a person from two tables, 0_activite and 0_member.
Next, I want to know which dr is related with this depart, and this I get from zone_base. The value of depart is same in both 0_activite and 0_member.
Then comes the trickier part. A person from 0_member can be associated with multiple departs and this is stored in 0_depart. Also, every user has a level, one of what is 'mod', stands for moderator. Now I want to get all the people who are moderators in the depart from where the first user is, and then get those moderaor's name from 0_member again. I also have a variable user_sec, but this is probably less important in this context, though I cannot overlook it.
This is what makes the query a tricky one. 0_member is storing id, name of users, + one depart, 0_depart is storing all departs of users, one line for each depart, and 0_activite is storing some other stuffs and I want to relate those through userid of 0_activite and the rest.
Hope I have been clear. If I am not, please let me know and I will try again to edit this post.
Many many thanks again.

Aside from the few answers provided by the others here, it might help to better understand the "what do I want" from the query. As you've accepted a rather recent answer from me in another of your questions, you have filters applied by department information.
Your query is doing a LEFT join at the Department table by rank = 'mod' and user_sec = 2. Is your overall intent to show ALL records in the 0_activite table REGARDLESS of a valid join to the 0_Depart table... and if there IS a match to the 0_Depart table, you only care about the 'mod' and 2 values?
If you only care about those people specifically associated with the 0_depart with 'mod' and 2 conditions, I would reverse the query starting with THIS table first, then join to the rest.
Having keys on tables via relationship or criteria is always a performance benefit (vs not having the indexes).
Start your query with whatever would be your smallest set FIRST, then join to other tables.
From clarification in your question... I would start with the inner-most... Who it is and what departments are they associated with... THEN get the moderators (from department where condition)... Then get actual moderator's name info... and finally out to your zone_base for the dr based on the department of the MODERATOR...
select STRAIGHT_JOIN
DeptPerMember.*
Moderator.Civ as ModCiv,
Moderator.Prenom as ModPrenom,
Moderator.Nom as ModNom,
z.dr
from
( select
m.ID,
m.Depart,
m.Civ,
m.Prenom,
m.Nom
from
0_Activite as a
join 0_member m
on a.User = m.ID
join 0_Depart as d
on m.depart = d.depart ) DeptPerMember
join 0_Depart as DeptForMod
on DeptPerMember.Depart = DeptForMod.Depart
and DeptForMod.rank = 'mod'
and DeptForMod.user_sec = 2
join 0_Member as Moderator
on DeptForMod.user_id = Moderator.ID
join zone_base z
on Moderator.depart = z.deprt_num
Notice how I tier'd the query to get each part and joined to the next and next and next. I'm building the chain based on the results of the previous with clear "alias" references for clarification of content. Now, you can get whatever respective elements from any of the levels via their distinct "alias" references...

The output from EXPLAIN is showing us that the first and third tables listed (a & d) are not having any indexes utilised by the database engine in executing this query. The key column is NULL for both - which is a shame since both are 'large' tables (OK, they're not really large, but compared to the rest of the tables they're the big 'uns).
Judging from the query, an index on user on 0_activite and an index on (depart, rank, user_sec) on 0_depart would go some way to improving performance.

you can see that columns key and key_len are null this means its not using any key in the possible_keys column. So table a and d are both scanning all rows. (check larger numbers in rows column. you want this smaller).
To deal with 0_depart:
Make sure you have a key on (d.depart, d.rank,d.user_sec) which are part of the join of 0_depart.
To deal with 0_activite:
I'm not positive but a GROUP column should be indexed too so you need a key on a.user

MYSQL: SELECT users from one user group and exclude users from another group - optimisation

I have an optimisation question here.
Background
I have a 12000 users in a user table, on record per user. Each user can be in zero or more groups. I have a groups table with 45 groups and a groups_link table with 75000 records (to facilitate the many to many relationship).
I am making a querying screen which allows a user to filter users from the user list.
Aim
Specifically, I need help with: Selecting all users that are in one group but are not in another group.
DB Structure
Query
My current query which runs too slowly...
SELECT U.user_id,U.user_email
FROM (sc_module_users AS U)
JOIN sc_module_users_groups_links AS group_join ON group_join.user_id = U.user_id
LEFT JOIN sc_module_users_groups_links AS excluded_group_join ON group_join.user_id = U.user_id
WHERE group_join.group_id IN (27) AND excluded_group_join.group_id NOT IN (19) OR excluded_group_join.group_id IS NULL AND U.user_subscribed=1 AND U.user_active=1
GROUP BY U.user_id,U.user_id
This query takes 9 minutes to complete, it returns 11,000 records (out of 12,000).
Explain
Here's the explain on that query:
Click here for a closer look
Can anyone help me optimise this to below the 1 minute mark...?
After 3 revisions, I changed it to this
SELECT U.user_id,U.user_email FROM (sc_module_users AS U) WHERE ( user_country LIKE '%australia%' ) AND
EXISTS (SELECT group_id FROM sc_module_users_groups_links l WHERE group_id in (31) AND l.user_id=U.user_id) AND
NOT EXISTS (SELECT group_id FROM sc_module_users_groups_links l WHERE group_id in (27) AND l.user_id=U.user_id)
AND U.user_subscribed=1 AND U.user_active=1 GROUP BY U.user_id
'
mucccch faster

EDIT: removed my query suggestion but the index stuff should still apply:
The indexes on the sc_module_users_groups_links could be improved by creating a composite index just on user_id and group_id. The order of the columns in the index can have an impact - i believe having user_id first should perform better.
You could also try removing the link_id and just using a composite primary key since the link_id doesn't seem to serve any other purpose.

I believe the very first thing you need to do is to place parentheses:
// should be
.. AND ( excluded_group_join.group_id NOT IN (19)
OR excluded_group_join.group_id IS NULL) AND ....

Serious MySQL Performance Issue (Joins, Temporary Table, Filesort....)

I've got a users table and a votes table. The votes table stores votes toward other users. And for better or worse, a single row in the votes table, stores the votes in both directions between the two users.
Now, the problem is when I wanna list for example all people someone has voted on.
I'm no MySQL expert, but from what I've figured out, thanks to the OR condition in the join statement, it needs to look through the whole users table (currently +44,000 rows), and it creates a temporary table to do so.
Currently, the bellow query takes about two minutes, yes, two minutes to complete. If I remove the OR condition, and everything after it in the join statement, it runs in less than half a second, as it only needs to look through about 17 of the 44,000 user rows (explain ftw!).
The bellow example, the user ID is 9834, and I'm trying to fetch his/her own no votes, and join the info from user who was voted on to the result.
Is there a better, and faster way to do this query? Or should I restructure the tables? I seriously hope it can be fixed by modifying the query, cause there's already a lot of users (+44,000), and votes (+130,000) in the tables, which I'd have to migrate.
thanks :)
SELECT *, votes.id as vote_id
FROM `votes`
LEFT JOIN users ON (
(
votes.user_id_1 = 9834
AND
users.uid = votes.user_id_2
)
OR
(
votes.user_id_2 = 9834
AND
users.uid = votes.user_id_1
)
)
WHERE (
(
votes.user_id_1 = 9834
AND
votes.vote_1 = 0
)
OR
(
votes.user_id_2 = 9834
AND
votes.vote_2 = 0
)
)
ORDER BY votes.updated_at DESC
LIMIT 0, 10

Instead of the OR, you could do a UNION of 2 queries. I have known instances where this is an order of magnitude faster in at least one other DBMS, and I'm guessing MySQL's query optimizer may share the same "feature".
SELECT whatever
FROM votes v
INNER JOIN
users u
ON v.user_id_1 = u.uid
WHERE v.user_id_2 = 9834
AND v.votes_2 = 0
UNION
SELECT whatever
FROM votes v
INNER JOIN
users u
ON v.user_id_2 = u.uid
WHERE v.user_id_1 = 9834
AND v.votes_1 = 0
ORDER BY updated_at DESC

You've answered your own question: yes, you should redesign the table, as it's not working for you. It's too slow, and requires overly complicated queries. Fortunately, migrating the data is just a matter of doing essentially the query you're asking about here, but for all user instead of just one. (That is, a sum or count over the unions the first answering suggested.)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008