Changing default sorting behavior of mysql - mysql

I have a table in my database.
+--------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| rollno | int(11) | NO | | NULL | |
| name | varchar(20) | NO | | NULL | |
| marks | int(11) | NO | | NULL | |
+--------+-------------+------+-----+---------+----------------+
By default if I query
select * from students;
Shows result sorted by id INT (auto-increment).
+----+--------+------------+-------+
| id | rollno | name | marks |
+----+--------+------------+-------+
| 1 | 65 | John Doe | 89 |
| 2 | 62 | John Skeet | 76 |
| 3 | 33 | Mike Ross | 78 |
+----+--------+------------+-------+
3 rows in set (0.00 sec)
I want to change default sorting behaviour and make rollno the default sorting field, how do I do this?

There is no default sort order!
The DB returns the data in the fastest way possible. If this happen to be the order in which it is stored or a key is defined then this is up to the system. You can't rely on that.
Think about it: Why would the DB use performace to order something by default if you don't need it ordered. DBs are optimised for speed.
If you want it being ordered then you have to specify that in an order by clause.

Run this
ALTER TABLE students ORDER BY rollno ASC;

select * from students order by rollno asc; will return your results sorted by that column. It should be noted that there is no default sorting behavior as far as data is actually stored in the database (aside from identities and indexes); you should never depend on your results being sorted a certain way unless you explicitly sort them (using order by).

Related

Why does this query return an intermediate record?

I ran a somewhat nonsense query on MySQL, but because its output is the same each time, I'm wondering if someone can help me understand the underlying algorithm.
Here's the table Orders on which we'll execute the query (database taken from here, just in case someone's interested):
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| orderNumber | int(11) | NO | PRI | NULL | |
| orderDate | date | NO | | NULL | |
| requiredDate | date | NO | | NULL | |
| shippedDate | date | YES | | NULL | |
| status | varchar(15) | NO | | NULL | |
| comments | text | YES | | NULL | |
| customerNumber | int(11) | NO | MUL | NULL | |
+----------------+-------------+------+-----+---------+-------+
There are 326 records for now, with the largest orderNumber being 10425.
Now here's the query I ran (basically removed GROUP BY from a sensible query):
mysql> select count(1), orderNumber, status from orders;
+----------+-------------+---------+
| count(1) | orderNumber | status |
+----------+-------------+---------+
| 326 | 10100 | Shipped |
+----------+-------------+---------+
1 row in set (0.00 sec)
So I'm asking for the total number of rows, along with status and orderNumber, which can be just about anything under the given circumstances. But the query always returns orderNumber 10100, even if I log out and run it again.
Is there a predictable answer for this?
There's no predictable answer for which you should use in your design. In general, the DB will return the values of the first row that matches the query. If you want predictability, you should apply an aggregate to every column (e.g. using MIN or MAX to always get smallest/largest value)

mysql query running too slow for discontinuous data

I am new to MySQL, and trying to using MySQL on the project, basically was tracking players performance.
Below is the table fields.
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| unique_id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| record_time | datetime | NULL | NO | | NULL | | select,insert,update,references | |
| game_sourceid | char(20) | latin1_swedish_ci | NO | MUL | NULL | | select,insert,update,references | |
| game_number | smallint(6) | NULL | NO | | NULL | | select,insert,update,references | |
| game_difficulty | char(12) | latin1_swedish_ci | NO | MUL | NULL | | select,insert,update,references | |
| cost_time | smallint(5) unsigned | NULL | NO | MUL | NULL | | select,insert,update,references | |
| country | char(3) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | |
| source | char(7) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | |
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
and I have adding game_sourceid and game_difficulty as index and the engine is innodb.
I have insert about 11m rows of test data into this table, which is generated randomly but resembles the real data.
Basically the mostly query was like this, to get the average time and best time for a specific game_sourceid
SELECT avg(cost_time) AS avgtime
, min(cost_time) AS mintime
, count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_1';
+-----------+---------+--------+
| avgtime | mintime | count |
+-----------+---------+--------+
| 1681.2851 | 420 | 138034 |
+-----------+---------+--------+
1 row in set (4.97 sec)
and the query took about 5s
I have googled about this and someone said that may caused by the amout of query count, so I am trying to narrow down the scope like this
SELECT avg(cost_time) AS avgtime
, min(cost_time) AS mintime
, count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_1'
AND record_time > '2015-11-19 04:40:00';
+-----------+---------+-------+
| avgtime | mintime | count |
+-----------+---------+-------+
| 1275.2222 | 214 | 9 |
+-----------+---------+-------+
1 row in set (4.46 sec)
As you can see the 9 rows data also took about 5s, so i think it's not the problem about the query count.
The test data was generated randomly to simulate the real user's activity, so the data was discontinuous, so i added more continuous data(about 250k) with the same game_sourceid='standard_easy_9' but keep all others randomly, in other words the last 250k rows in this table has the same game_sourceid. And i'm trying to query like this:
SELECT avg(cost_time) AS avgtime
, min(cost_time) AS mintime
, count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_9';
+-----------+---------+--------+
| avgtime | mintime | count |
+-----------+---------+--------+
| 1271.4806 | 70 | 259379 |
+-----------+---------+--------+
1 row in set (0.40 sec)
This time the query magically took only 0.4s, that's totally beyond my expectations.
So here's the question, the data was retrived from the player at real time, so it must be randomly and discontinuous.
I am thinking of separating the data into multiple tables by the game_sourceid, but it will take another 80 tables for that, maybe more in the future.
Since I am new to MySQL, I am wondering if there are any other solutions for this, or just my query was too bad.
Update: Here's the index of my table
mysql> show index from statistics_work_table;
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| statistics_work_table | 0 | PRIMARY | 1 | unique_id | A | 11362113 | NULL | NULL | | BTREE | | |
| statistics_work_table | 1 | GameSourceId_CostTime | 1 | game_sourceid | A | 18 | NULL | NULL | | BTREE | | |
| statistics_work_table | 1 | GameSourceId_CostTime | 2 | cost_time | A | 344306 | NULL | NULL | | BTREE | | |
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
ALTER TABLE `statistics_work_table`
ADD INDEX `GameSourceId_CostTime` (`game_sourceid`,`cost_time`)
This index should make your queries super fast. Also, after you run the above statement, you should drop the single column index you have on game_sourceid, as the above will make the single column one redundant. (Which will hurt insert speed.)
The reason your queries are slow is because the database is using your single column index on game_sourceid, finding the rows, and then, for each row, using the primary key that is stored along with the index to find the main clustered index (aka primary key in this, and most cases), and then looking up the cost_time value. This is referred to as a double lookup, and it is something you want to avoid.
The index I provided above is called a "covering index". It allows your query to use ONLY the index, and so you only need a single lookup per row, greatly improving performance.

Database schema: Key/Value table or all keys in one record

I guess that this is somewhat of a philosophical question. I need to collect pathology results for a group of patients and store them in a database. In the past I have used a very simple table structure (simplified):
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| Updated | datetime | NO | PRI | NULL | |
| PatientId | varchar(255) | NO | | NULL | |
| Name | varchar(255) | NO | | NULL | |
| Value | varchar(255) | NO | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
More often in schema design I see:
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| PatientId | varchar(255) | NO | | NULL | |
| Ph_Value | varchar(255) | NO | | NULL | |
| K_Value | varchar(255) | NO | | NULL | |
| Ca_Value | varchar(255) | NO | | NULL | |
| Ph_Value_updated | datetime | NO | | NULL | |
| K_Value_updated | datetime | NO | | NULL | |
| Ca_Value_updated | datetime | NO | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
It seems to me that the first design is much more flexible, expandable etc. However, I do wonder about performance hits when the records run to the millions.
The issue with the second is that there may be a couple of hundred fields that need to be recorded on occasions.
I would be really interested to get comments / advice / guidance on this.
You are absolutely right, the first schema is a lot more flexible: you can add new keys on a live database without changing the schema. However, flexibility is usually bought with the time and/or the space. In this case, it's both: you need more space to store all keys for the same row because the ID is replicated N times, and the joins or orderings required to get the fields together would take time.
There is no reason to pay for flexibility unless you need it. If most of your queries need most of the columns, the second result is the most economical. However, if most of your queries ask for a single column, getting the flexibility may be worth spending the CPU time and the database space.
In my opinion, If that name/value pairs won't be changed much so the second option is much better in the terms of space and number of rows.
Also you can have another solution to optimize the first schema , to put the names in another table and just put name_id instead of repeating the same name several times.
The other schema is to have patient table and a table for each value that contains patient_id and value and the table name is the name for that value

Mysql sorting by

I'm working on how to implement a leaderboard. What I'd like to do is be able to sort the table by several different filters(score,number of submissions, average). The table might look like this.
+--------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-----------------------+------+-----+---------+-------+
| userID | mediumint(8) unsigned | NO | PRI | 0 | |
| score | int | YES | MUL | NULL | |
| numSub | int | YES | MUL | NULL | |
+--------+-----------------------+------+-----+---------+-------+
And a sample set of data like so:
+--------+----------+--------+
| userID | score | numSub |
+--------+----------+--------+
| 505610 | 1245 | 2 |
| 544222 | 1458 | 2 |
| 547278 | 245 | 1 |
| 659241 | 12487 | 8 |
| 681087 | 5487 | 3 |
+--------+----------+--------+
My queries will be coming from PHP.
// get the top 100 scores
$q = "select userID, score from table order by score desc limit 0, 100";
this will return a set of userID/score sorted highest score first
I also have a query to sort by numSub (number of submissions)
What I would like is to sort the table by the avg score that being score/numSub; The table could be large so efficiency is important to me.
Thanks in advance!
If efficiency is important, then add a column avgscore and assign it the value of score/numsub. Then, create an index on the column.
You can use an insert/update trigger to do the average calculation automatically when a row is added or modified.
Once your tables gets large, the sort is going to take a noticeable amount of time.
As far as I can see, there's no reason to make it more complicated than this;
SELECT userID, score/numsub AS average_score
FROM Table1
ORDER BY score/numsub DESC;

analyze query mysql

I have a question about, how to analyze a query to know performance of its (good or bad).
I searched a lot and got something like below:
SELECT count(*) FROM users; => Many experts said it's bad.
SELECT count(id) FROM users; => Many experts said it's good.
Please see the table:
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| userId | int(11) | NO | PRI | NULL | auto_increment |
| f_name | varchar(50) | YES | | NULL | |
| l_name | varchar(50) | YES | | NULL | |
| user_name | varchar(50) | NO | | NULL | |
| password | varchar(50) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
| active | char(1) | NO | | Y | |
| groupId | smallint(4) | YES | MUL | NULL | |
| created_date | datetime | YES | | NULL | |
| modified_date | datetime | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
But when I try to using EXPLAIN command for that, I got the results:
EXPLAIN SELECT count(*) FROM `user`;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT count(userId) FROM user;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
So, the first thing for me:
Can I understand it's the same performance?
P/S: MySQL version is 5.5.8.
No, you cannot. Explain doesn't reflect all the work done by mysql, it just gives you a plan of how it will be performed.
What about specifically count(*) vs count(id). The first one is always not slower than the second, and in some cases it is faster.
count(col) semantic is amount of not null values, while count(*) is - amount of rows.
Probably mysql can optimize count(col) by rewriting into count(*) as well as id is a PK thus cannot be NULL (if not - it looks up for NULLS, which is not fast), but I still propose you to use COUNT(*) in such cases.
Also - the internall processes depend on used storage engine. For myisam the precalculated number of rows returned in both cases (as long as you don't use WHERE).
In the example you give the performance is identical.
The execution plan shows you that the optimiser is clever enough to know that it should use the Primary key to find the total number of records when you use count(*).
There is not significant difference when it comes on counting. The reason is that most optimizers will figure out the best way to count rows by themselves.
The performance difference comes to searching for values and lack of indexing. So if you search for a field that has no index assigned {f_name,l_name} and a field that has{userID(mysql automatically use index on primary keys),groupID(seems like foraign key)} then you will see the difference in performance.