I'm working on how to implement a leaderboard. What I'd like to do is be able to sort the table by several different filters(score,number of submissions, average). The table might look like this.
+--------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-----------------------+------+-----+---------+-------+
| userID | mediumint(8) unsigned | NO | PRI | 0 | |
| score | int | YES | MUL | NULL | |
| numSub | int | YES | MUL | NULL | |
+--------+-----------------------+------+-----+---------+-------+
And a sample set of data like so:
+--------+----------+--------+
| userID | score | numSub |
+--------+----------+--------+
| 505610 | 1245 | 2 |
| 544222 | 1458 | 2 |
| 547278 | 245 | 1 |
| 659241 | 12487 | 8 |
| 681087 | 5487 | 3 |
+--------+----------+--------+
My queries will be coming from PHP.
// get the top 100 scores
$q = "select userID, score from table order by score desc limit 0, 100";
this will return a set of userID/score sorted highest score first
I also have a query to sort by numSub (number of submissions)
What I would like is to sort the table by the avg score that being score/numSub; The table could be large so efficiency is important to me.
Thanks in advance!
If efficiency is important, then add a column avgscore and assign it the value of score/numsub. Then, create an index on the column.
You can use an insert/update trigger to do the average calculation automatically when a row is added or modified.
Once your tables gets large, the sort is going to take a noticeable amount of time.
As far as I can see, there's no reason to make it more complicated than this;
SELECT userID, score/numsub AS average_score
FROM Table1
ORDER BY score/numsub DESC;
Related
At first, I want to apologize for providing such a weak title; I couldn't describe it in a better way.
Consider the following: We have three tables, one for users, one for records and one for ratings. The tables are quite self-explanatory but the schema for database is as following:
+---------------------+
| Tables_in_relations |
+---------------------+
| records |
| ratings |
| users |
+---------------------+
The schema for records table is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| title | varchar(256) | NO | | NULL | |
| year | int(4) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
The schema for users table is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(256) | NO | | NULL | |
| name | varchar(256) | NO | | NULL | |
| password | varchar(256) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
ratings table is, obvoiusly, where the ratings are stored among with the record_id and user_id and works as a relation table.
It's schema is as following:
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| record_id| smallint(5) unsigned | NO | MUL | NULL | |
| user_id | smallint(5) unsigned | NO | MUL | NULL | |
| rating | int(1) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
Now, In my application, I have a search function that fetches records based on a certain keyword. The output should also include the average rating of a certain record and a total amount of ratings per record. This can be accomplished by following query:
SELECT re.id, re.title, re.year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id;
which will give me the following output:
+----+------------------------+------+----------+-------------------+
| id | title | year | avg_rate | total_times_rated |
+----+------------------------+------+----------+-------------------+
| 1 | Test Record 1 | 2008 | 3 | 4 |
| 2 | Test Record 2 | 2012 | 2 | 4 |
| 3 | Test Record 3 | 2003 | 3 | 4 |
| 4 | Test Record 4 | 2012 | 3 | 3 |
| 5 | Test Record 5 | 2003 | 2 | 3 |
| 6 | Test Record 6 | 2006 | 2 | 3 |
+----+------------------------+------+----------+-------------------+
Question:
Now, here comes the tricky part, at least for me. Within my app, you can search records whether signed in or not and if signed in, I'd also like to include the user's own rating value in the above query.
I know that I can run a conditional to check whether user is signed in or not by reading the session value and execute a corresponding query based on that. I just don't know how to include that individual rating value of a certain user to the above query.
You can add user's rating in the result by adding a SELECT query in columns:
SELECT re.id, re.title, re.year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated,
(SELECT rating FROM ratings WHERE user_id = ? AND record_id = re.id) as user_rating
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id;
We can get the user_id from session and pass it to this query in order to generate user_rating column in the result.
Assuming user can rate a record multiple times, I have used SUM. If not, we can remove it from the query.
Update
If you don't want GROUP BY to consider that value then you can wrap the existing query into another query and add a column to it, e.g.:
SELECT a.id, a.title, a.year, a.avg_rate, a.total_times_rated,
(SELECT rating FROM ratings WHERE user_id = ? AND record_id = a.id) as user_rating
FROM (SELECT re.id as id, re.title as title, re.year as year, ROUND(avg(ra.rating)) as avg_rate,
COUNT(ra.record_id) as total_times_rated
FROM records re
LEFT JOIN ratings ra ON ra.record_id = re.id
GROUP BY re.id) a;
I ran a somewhat nonsense query on MySQL, but because its output is the same each time, I'm wondering if someone can help me understand the underlying algorithm.
Here's the table Orders on which we'll execute the query (database taken from here, just in case someone's interested):
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| orderNumber | int(11) | NO | PRI | NULL | |
| orderDate | date | NO | | NULL | |
| requiredDate | date | NO | | NULL | |
| shippedDate | date | YES | | NULL | |
| status | varchar(15) | NO | | NULL | |
| comments | text | YES | | NULL | |
| customerNumber | int(11) | NO | MUL | NULL | |
+----------------+-------------+------+-----+---------+-------+
There are 326 records for now, with the largest orderNumber being 10425.
Now here's the query I ran (basically removed GROUP BY from a sensible query):
mysql> select count(1), orderNumber, status from orders;
+----------+-------------+---------+
| count(1) | orderNumber | status |
+----------+-------------+---------+
| 326 | 10100 | Shipped |
+----------+-------------+---------+
1 row in set (0.00 sec)
So I'm asking for the total number of rows, along with status and orderNumber, which can be just about anything under the given circumstances. But the query always returns orderNumber 10100, even if I log out and run it again.
Is there a predictable answer for this?
There's no predictable answer for which you should use in your design. In general, the DB will return the values of the first row that matches the query. If you want predictability, you should apply an aggregate to every column (e.g. using MIN or MAX to always get smallest/largest value)
I have a table in my database.
+--------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| rollno | int(11) | NO | | NULL | |
| name | varchar(20) | NO | | NULL | |
| marks | int(11) | NO | | NULL | |
+--------+-------------+------+-----+---------+----------------+
By default if I query
select * from students;
Shows result sorted by id INT (auto-increment).
+----+--------+------------+-------+
| id | rollno | name | marks |
+----+--------+------------+-------+
| 1 | 65 | John Doe | 89 |
| 2 | 62 | John Skeet | 76 |
| 3 | 33 | Mike Ross | 78 |
+----+--------+------------+-------+
3 rows in set (0.00 sec)
I want to change default sorting behaviour and make rollno the default sorting field, how do I do this?
There is no default sort order!
The DB returns the data in the fastest way possible. If this happen to be the order in which it is stored or a key is defined then this is up to the system. You can't rely on that.
Think about it: Why would the DB use performace to order something by default if you don't need it ordered. DBs are optimised for speed.
If you want it being ordered then you have to specify that in an order by clause.
Run this
ALTER TABLE students ORDER BY rollno ASC;
select * from students order by rollno asc; will return your results sorted by that column. It should be noted that there is no default sorting behavior as far as data is actually stored in the database (aside from identities and indexes); you should never depend on your results being sorted a certain way unless you explicitly sort them (using order by).
Is there anyway to get better performance out of this.
select * from p_all where sec='0P00009S33' order by date desc
Query took 0.1578 sec.
Table structure is shown below. There are more than 100 Millions records in this table.
+------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+---------------+------+-----+---------+-------+
| sec | varchar(10) | NO | PRI | NULL | |
| date | date | NO | PRI | NULL | |
| open | decimal(13,3) | NO | | NULL | |
| high | decimal(13,3) | NO | | NULL | |
| low | decimal(13,3) | NO | | NULL | |
| close | decimal(13,3) | NO | | NULL | |
| volume | decimal(13,3) | NO | | NULL | |
| unadjusted_close | decimal(13,3) | NO | | NULL | |
+------------------+---------------+------+-----+---------+-------+
EXPLAIN result
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | price_all | ref | PRIMARY | PRIMARY | 12 | const | 1731 | Using where |
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
How can i speed up this query?
In your example, you do a SELECT *, but you only have an INDEX that contains the columns sec and date.
In result, MySQLs execution plan roughly looks like the following:
Find all rows that have sec = 0P00009S33 in the INDEX. This is fast.
Sort all returned rows by date. This is also possibly fast, depending on the size of your MySQL buffer. Here is possibly room for improvement by optimizing the sort_buffer_size.
Fetch all columns (= full row) for each returned row from the previous INDEX query. This is slow! see (1)
You can optimize this drastically by reducing the SELECTed fields to the minimum. Example: If you only need the open price, do only a SELECT sec, date, open instead of SELECT *.
When you identified the minimum columns you need to query, add a combined INDEX that contains exactly these colums (all columns involved - in the WHERE, SELECT or ORDER BY clause)
This way you can completely skip the slow part of this query, (3) in my example above. When the INDEX already contains all necessary columns, MySQLs optimizer can avoid looking up the full columns and serve your query directly from the INDEX.
Disclaimer: I'm unsure in which order MySQL executes the steps, possibly i ordered (2) and (3) the wrong way round. But this is not important to answer this question, though.
I apologize in advance if this question is too specific, but I think that it is a fairly typical scenario: join and group bys bogging down the db and the best way to get around it. My specific problem is that I need to create a scoreboard based on:
plays (userid,gameid,score) 40M rows
games (gameid) 100K rows
app_games (appid,gameid) ie, the games are grouped into apps and there's a total score for the app which is the sum on all its associated games <20 rows
The users can play multiple times and their best score for each game is recorded. Formulating the query is easy, I've done several variations but they have a nasty tendency to get locked in "copying temp table" for 30-60 seconds when under load.
What can I do? Are there server variables that I should be tweaking or is there a way to reformulate the query to make it faster? The derived version of the query that I'm using is as follows (minus a user table join to grab the name):
select userID,sum(score) as cumscore from
(select userID, gameID,max(p.score) as score
from play p join app_game ag using (gameID)
where ag.appID = 1 and p.score>0
group by userID,gameID ) app_stats
group by userid order by cumscore desc limit 0,20;
Or as a temp table:
drop table if exists app_stats;
create temporary table app_stats
select userID,gameID,max(p.score) as score
from play p join app_game ag using (gameID)
where ag.appID = 1 and p.score>0
group by userid,gameID;
select userID,sum(score) as cumscore from app_stats group by userid
order by cumscore desc limit 0,20;
I have indexes as follows:
show indexes from play;
+-------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+
| play | 0 | PRIMARY | 1 | playID | A | 38353712 | NULL | NULL | | BTREE | |
| play | 0 | uk_play_uniqueID | 1 | uniqueID | A | 38353712 | NULL | NULL | YES | BTREE | |
| play | 1 | play_score_added | 1 | dateTimeFinished | A | 19176856 | NULL | NULL | YES | BTREE | |
| play | 1 | play_score_added | 2 | score | A | 19176856 | NULL | NULL | | BTREE | |
| play | 1 | fk_playData_game | 1 | gameID | A | 76098 | NULL | NULL | | BTREE | |
| play | 1 | user_hiscore | 1 | userID | A | 650062 | NULL | NULL | YES | BTREE | |
| play | 1 | user_hiscore | 2 | score | A | 2397107 | NULL | NULL | | BTREE | |
+-------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+
I suspect both queries when you create the temp table basically needs to go through all the data in your table (and likewise in your do-everything-at-once query). If you have a lot of data that's just going to take a little while.
I'd maintain a separate table with the ID and total score for each player. Whenever you update the play table, also update the summary table. If they get out of sync, just stop the summary table and re-create the data from the play table. (Or if you already use redis in your infrastructure, you could maintain the summary there -- it has functions to make this particular thing really fast).
Instead of making temporary tables, try making a view instead. You can query against it just like you do with your normal table, but it also updates when any data in the view changes. That's far faster than dropping the table and re-creating it every time.