I have a game application in which users users answer questions and rating is based on the time elapsed on answering this questions.
I am trying to build a query that returns a the rating for top 20 players. the game has some stages and I need to retrieve the players which played all stages (assume the number of stages are 5)
This is what have I wrote:
SELECT `usersname` , `time`
FROM `users`
WHERE `users`.`id`
IN (
SELECT `steps`.`user_id`
FROM `steps`
GROUP BY `steps`.`user_id`
HAVING COUNT( `steps`.`id` ) = 5
)
ORDER BY `time` ASC
LIMIT 20
In the inner Select I am selecting all user_id-s who have played 5 stages (steps). The query works correctly but It's horribly slow. It takes about minute and a half to execute. can you provide some tips on optimizing it. Inner Select returns about 2000 rows.
Feel free to ask me if you need additional information.
Try with JOIN, instead of IN (SELECT ...):
SELECT usersname , `time`
FROM users
JOIN
( SELECT steps.user_id
FROM steps
GROUP BY steps.user_id
HAVING COUNT(*) = 5
) grp
ON grp.user_id = users.id
ORDER BY `time` ASC
LIMIT 20
Assuming that you have an index on users.time, which is the first obvious optimization, replacing HAVING with WHERE in the inner query may be worth a try.
The query optimizer might do this already if you are lucky, but you cannot rely on it, and strictly to the specification, HAVING runs after fetching every record whereas WHERE prunes them before.
If that does not help, simply having a counter that increments for every stage completed in the users table might speed up things, eleminating the sub-query. This will make completing a stage minimally slower (but this won't happen a million times per second!), but will be very fast to query only the users who have completed all 5 stages (especially if you have an index on that field).
Also, using memcached or some similar caching technology may be worthwile for something like a highscore, which is typically of the kind of "not necessarily 100% accurate to the second, changing slowly, queried billions of times" data.
If memcached is not an option, even writing the result to a temp file and re-using that for 1-2 seconds (or even longer) would be an option. Nobody will notice. Even if you cache highscores for as long as 1-2 minutes, still nobody will take offense because that is just "how long it takes".
I think you should use where instead of having. Also, in my opinion you should do this in a stored function. In my opinion the best way is to use where instead of having, also, run the inner query, store the results and run the outer query based on the results of your inner query.
This use case may benefit from de-normalization. There is no need to search through all 2000 user records to determine if a user is better than 20 records.
Create a Top_20_Users table.
After the 5th stage check if user's time is less than any in the Top_20_Users table. If yes, update the slowest/worst record.
Things you can do with this.
Since the Top_20_Users table will be so small, add a field for stage and include Top 20 times for each stage as well as for all five stages completed.
Let the Top_20_Users table grow. A history of all top 20 users ever, their times and the date when that time was good enough to be a top 20. Show trends as users learn the game and the top 20 times get better and better.
Related
Ok, so we have a rather large database on a Wordpress website, we run a query every month or so that removes old logs from the database and I've been using the query below to accomplish it.
I essentially look at the total rows that need removing and keep running this query until the logs are all gone.
However, with a LIMIT of 1000 the query takes around 30 seconds to complete. The posts table contains around 400,000 entries and on this occasion, around 60,000 of these need removing.
Any help is much much appreciated! I'm a novice when it comes to SQL so this query will probably hurt some peoples eyes, but please be gentle, I'm learning! :)
CREATE TEMPORARY TABLE LOGSTOCLEAN
SELECT
ID
FROM
wp_posts WTCUP
LEFT JOIN wp_term_relationships WTCUTR ON WTCUP.ID = WTCUTR.object_id
LEFT JOIN wp_term_taxonomy WTCUTT ON WTCUTR.term_taxonomy_id = WTCUTT.term_taxonomy_id
WHERE
WTCUTT.term_id IN (10)
AND WTCUP.post_date < DATE_SUB(NOW(), INTERVAL 45 DAY)
GROUP BY
WTCUP.ID
LIMIT 1000;
DELETE FROM wp_posts
WHERE wp_posts.ID IN(SELECT ID FROM LOGSTOCLEAN)
LIMIT 1000;
Rather than a rows limit, a time would allow the oldest rows to be removed, which we might hope are co-located by clustering or partitioning, or at least quickly locatable by index. Select min date-time + some interval and delete all < that.
Partitioning by day means you can just clean/drop one partition at a time. Your RDBMS may vary! Dropping a day (partition?) every day at midnight seems sensible!
PARTITIONing is out of the question since you are purging only rows with term_id=10, and that test is not in wp_posts.
It sound like there are more than 1000 rows to remove on a typical day? Then consider removing only, say, 300. But do it every hour instead of every month. A "cron" job or a MySQL "Event" could take care of running that automatically. (If there is more than 300 in an hour, don't worry, the next invocation will take care of more.)
You have not really asked a Question. Is what you are doing too slow? Interferes with other activities? Too 'manual'? Or something else?
If it is a performance question, then please provide SHOW CREATE TABLE for each table.
So I have 2 tables, one called user, and one called user_favorite. user_favorite stores an itemId and userId, for storing the items that the user has favorited. I'm simply trying to locate the users who don't have a record in user_favorite, so I can find those users who haven't favorited anything yet.
For testing purposes, I have 6001 records in user and 6001 in user_favorite, so there's just one record who doesn't have any favorites.
Here's my query:
SELECT u.* FROM user u
JOIN user_favorite fav ON u.id != fav.userId
ORDER BY id DESC
Here the id in the last statement is not ambigious, it refers to the id from the user table. I have a PK index on u.id and an index on fav.userId.
When I run this query, my computer just becomes unresponsive and completely freezes, with no output ever being given. I have 2gb RAM, not a great computer, but I think it should be able to handle a query like this with 6k records easily.
Both tables are in MyISAM, could that be the issue? Would switching to INNODB fix it?
Let's first discuss what your query (as written) is doing. Because of the != in the on-clause, you are joining every user record with every one of the other user's favorites. So your query is going to produce something like 36M rows. This is not going to give you the answer that you want. And it explains why your computer is unhappy.
How should you write the query? There are three main patterns you can use. I think this is a pretty good explanation: http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/ and discusses performance specifically in the context of mysql. And it shows you how to look at and read an execution plan, which is critical to optimizing queries.
change your query to something like this:
select * from User
where not exists (select * from user_favorite where User.id = user_favorite.userId)
let me know how it goes
A join on A != B means that every record of A is joined with every record of B in which the id's aren't equal.
In other words, instead of producing 6000 rows, you're producing approximately 36 million (6000 * 6001) rows of output, which all have to be collected, then sorted...
[site_list] ~100,000 rows... 10mb in size.
site_id
site_url
site_data_most_recent_record_id
[site_list_data] ~ 15+ million rows and growing... about 600mb in size.
record_id
site_id
site_connect_time
site_speed
date_checked
columns in bold are unique index keys.
I need to return 50 most recently updated sites AND the recent data that goes with it - connect time, speed, date...
This is my query:
SELECT SQL_CALC_FOUND_ROWS
site_list.site_url,
site_list_data.site_connect_time,
site_list_data.site_speed,
site_list_data.date_checked
FROM site_list
LEFT JOIN site_list_data
ON site_list.site_data_most_recent_record_id = site_list_data.record_id
ORDER BY site_data.date_checked DESC
LIMIT 50
Without the ORDER BY and SQL_CALC_FOUND_ROWS(I need it for pagination), the query takes about 1.5 seconds, with those it takes over 2 seconds or more which is not good enough because that particular page where this data will be shown is getting 20K+ pageviews/day and this query is apparently too heavy(server almost dies when I put this live) and too slow.
Experts of mySQL, how would you do this? What if the table got to 100 million records? Caching this huge result into a temp table every 30 seconds is the only other solution I got.
You need to add a heuristic to the query. You need to gate the query to get reasonable performance. It is effectively sorting your site_list_date table by date descending -- the ENTIRE table.
So, if you know that the top 50 will be within the last day or week, add a "and date_checked > <boundary_date>" to the query. Then it should reduce the overall result set first, and THEN sort it.
SQL_CALC_ROWS_FOUND is slow use COUNT instead. Take a look here
A couple of observations.
Both ORDER BY and SQL_CALC_FOUND_ROWS are going to add to the cost of your performance. ORDER BY clauses can potentially be improved with appropriate indexing -- do you have an index on your date_checked column? This could help.
What is your exact need for SQL_CALC_FOUND_ROWS? Consider replacing this with a separate query that uses COUNT instead. This can be vastly better assuming your Query Cache is enabled.
And if you can use COUNT, consider replacing your LEFT JOIN with an INNER JOIN as this will help performance as well.
Good luck.
I have a table with more than 1 million records. The problem is that the query takes too much times, like 5 minutes. The "ORDER BY" is my problem, but i need the expression in the query order by to get most popular videos. And because of the expression i can't create an index on it.
How can i resolve this problem?
Thx.
SELECT DISTINCT
`v`.`id`,`v`.`url`, `v`.`title`, `v`.`hits`, `v`.`created`, ROUND((r.likes*100)/(r.likes+r.dislikes),0) AS `vote`
FROM
`videos` AS `v`
INNER JOIN
`votes` AS `r` ON v.id = r.id_video
ORDER BY
(v.hits+((r.likes-r.dislikes)*(r.likes-r.dislikes))/2*v.hits)/DATEDIFF(NOW(),v.created) DESC
Does the most popular have to be calculated everytime? I doubt if the answer is yes. Some operations will take a long time to run no matter how efficient your query is.
Also bear in mind you have 1 million now, you might have 10 million in the next few months. So the query might work now but not in a month, the solution needs to be scalable.
I would make a job to run every couple of hours to calculate and store this information on a different table. This might not be the answer you are looking for but I just had to say it.
What I have done in the past is to create a voting system based on Integers.
Nothing will outperform integers.
The voting system table has 2 Columns:
ProductID
VoteCount (INT)
The votecount stores all the votes that are submitted.
Like = +1
Unlike = -1
Create an Index in the vote table based on ID.
You have to alternatives to improve this:
1) create a new column with the needed value pre-calculated
1) create a second table that holds the videos primary key and the result of the calculation.
This could be a calculated column (in the firts case) or modify your app or add triggers that allow you to keep it in sync (you'd need to manually load it the firs time, and later let your program keep it updated)
If you use the second option your key could be composed of the finalRating plus the primary key of the videos table. This way your searches would be hugely improved.
Have you try moving you arithmetic of the order by into your select, and then order by the virtual column such as:
SELECT (col1+col2) AS a
FROM TABLE
ORDER BY a
Arithmetic on sort is expensive.
Here's the query (the largest table has about 40,000 rows)
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
If I run this, it executes very quickly (.05 seconds roughly). It returns 13 rows.
When I add an ORDER BY clause at the end of the query (ordering by any column) the query takes about 10 seconds.
I'm using this database in production now, and everything is working fine. All my other queries are speedy.
Any ideas of what it could be? I ran the query in MySQL's Query Browser, and from the command line. Both places it was dead slow with the ORDER BY.
EDIT: Tolgahan ALBAYRAK solution works, but can anyone explain why it works?
maybe this helps:
SELECT * FROM (
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
) ORDER BY CourseID
Is the column you're ordering by indexed?
Indexing drastically speeds up ordering and filtering.
You are selecting from "UserCourse" which I assume is a joining table between courses and users (Many to Many).
You should index the column that you need to order by, in the "UserCourse" table.
Suppose you want to "order by CourseID", then you need to index it on UserCourse table.
Ordering by any other column that is not present in the joining table (i.e. UserCourse) may require further denormalization and indexing on the joining table to be optimized for speed;
In other words, you need to have a copy of that column in the joining table and index it.
P.S.
The answer given by Tolgahan Albayrak, although correct for this question, would not produce the desired result, in cases where one is doing a "LIMIT x" query.
Have you updated the statistics on your database? I ran into something similar on mine where I had 2 identical queries where the only difference was a capital letter and one returned in 1/2 a second and the other took nearly 5 minutes. Updating the statistics resolved the issue
Realise answer is too late, however I have just had a similar problem, adding order by increased the query time from seconds to 5 minutes and having tried most other suggestions for speeding it up, noticed that the /tmp files where getting to be 12G for this query. Changed the query such that a varchar(20000) field being returned was "trim("ed and performance dramatically improved (back to seconds). So I guess its worth checking whether you are returning large varchars as part of your query and if so, process them (maybe substring(x, 1, length(x))?? if you dont want to trim them.
Query was returning 500k rows and the /tmp file indicated that each row was using about 20k of data.
A similar question was asked before here.
It might help you as well. Basically it describes using composite indexes and how order by works.
Today I was running into a same kind of problem. As soon as I was sorting the resultset by a field from a joined table, the whole query was horribly slow and took more than a hundred seconds.
The server was running MySQL 5.0.51a and by chance I noticed that the same query was running as fast as it should have always done on a server with MySQL 5.1. When comparing the explains for that query I saw that obviously the usage and handling of indexes has changed a lot (at least from 5.0 -> 5.1).
So if you encounter such a problem, maybe your resolution is to simply upgrade your MySQL