Trouble about MySQL Performance Optimization - mysql

I did a MySQL performance optimization test, but the test results surprised me.
First of all, I prepared several tables for my test, which are "t_worker_attendance_300w(3 million data), t_worker_attendance_1000w(10 million data), t_worker_attendance_1y(100 million data), t_worker_attendance_4y(400 million data)".
Each table has the same field, the same index, they are copied, including 400 million data volume is also increased from 3 million data.
In my understanding, MySQL's performance is bound to be severely affected by the size of the data volume, but the results have puzzled me for a whole week. I've almost tested the scenarios I can think of, but their execution times are the same!
This is a new MySQL 5.6.16 server,I tested any scenario I could think of, including INNER JOIN....
A) SHOW CREATE TABLE t_worker_attendance_4y
CREATE TABLE `t_worker_attendance_4y` (
`id` bigint(20) NOT NULL ,
`attendance_id` char(32) NOT NULL,
`worker_id` char(32) NOT NULL,
`subcontractor_id` char(32) NOT NULL ,
`project_id` char(32) NOT NULL ,
`sign_date` date NOT NULL ,
`sign_type` char(2) NOT NULL ,
`latitude` double DEFAULT NULL,
`longitude` double DEFAULT NULL ,
`sign_wages` decimal(16,2) DEFAULT NULL ,
`confirm_wages` decimal(16,2) DEFAULT NULL ,
`work_content` varchar(60) DEFAULT NULL ,
`team_leader_id` char(32) DEFAULT NULL,
`sign_state` char(2) NOT NULL ,
`confirm_date` date DEFAULT NULL ,
`sign_mode` char(2) DEFAULT NULL ,
`checkin_time` datetime DEFAULT NULL ,
`checkout_time` datetime DEFAULT NULL ,
`sign_hours` decimal(6,1) DEFAULT NULL ,
`overtime` decimal(6,1) DEFAULT NULL ,
`confirm_hours` decimal(6,1) DEFAULT NULL ,
`signimg` varchar(200) DEFAULT NULL ,
`signoutimg` varchar(200) DEFAULT NULL ,
`photocheck` char(2) DEFAULT NULL ,
`machine_type` varchar(2) DEFAULT '1' ,
`project_coordinate` text ,
`floor_num` varchar(200) DEFAULT NULL ,
`device_serial_no` varchar(32) DEFAULT NULL ,
KEY `checkin_time` (`checkin_time`),
KEY `worker_id` (`worker_id`),
KEY `project_id` (`project_id`),
KEY `subcontractor_id` (`subcontractor_id`),
KEY `sign_date` (`sign_date`),
KEY `project_id_2` (`project_id`,`sign_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
B) SHOW INDEX FROM t_worker_attendance_4y
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_worker_attendance_4y | 1 | checkin_time | 1 | checkin_time | A | 5017494 | NULL | NULL | YES | BTREE | | |
| t_worker_attendance_4y | 1 | worker_id | 1 | worker_id | A | 1686552 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id | 1 | project_id | A | 102450 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | subcontractor_id | 1 | subcontractor_id | A | 380473 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | sign_date | 1 | sign_date | A | 512643 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 1 | project_id | A | 102059 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 2 | sign_date | A | 1776104 | NULL | NULL | | BTREE | | |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
C) EXPLAIN SELECT SQL_NO_CACHE tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' AND sign_date >= '07/01/2018' AND sign_date < '08/01/2018' ;
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| 1 | SIMPLE | tw | ref | project_id,sign_date,project_id_2 | project_id_2 | 96 | const | 54134596 | Using where; Using index |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
They all went through the same joint index.
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_300w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1000w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.01 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
......
My guess is that MySQL's query performance will decline dramatically with the increase of data volume, but they are not much different. So I have no way to optimize my query. I don't know when to implement table partition plan or sub-database sub-table plan.
What I want to know is why the execution speed of index with small data volume is the same as that of index with large data volume. If you can help me, I would like to thank you very much.

Same search performance on large data volume because of BTREE index. It has O(log(n)). Relatively speaking that means that search algorithm have to complete:
6 operations on 3m of data
7 operations on 10m of data
8 operations on 100m of data
8 operations on 400m of data
Аs you can see the number of operations is almost the same.
My guess is that MySQL's query performance will decline dramatically with the increase of data volume
This is true for full table scan cases.

I have a new answer, someone told me "Because your query is covered by index, index is actually the time of query index. Mysql index uses B + tree structure. The query time is basically the same under the same tree height. You can calculate whether the height of the trees indexed by these tables is the same."
So I did the inquiry as required.
mysql> SELECT b.name, a.name, index_id, type, a.space, a.PAGE_NO
-> FROM information_schema.INNODB_SYS_INDEXES a,
-> information_schema.INNODB_SYS_TABLES b
-> WHERE a.table_id = b.table_id AND a.space <> 0;
+-------------------------------------------------+---------------------+----------+------+-------+---------+
| name | name | index_id | type | space | PAGE_NO |
+-------------------------------------------------+---------------------+----------+------+-------+---------+
| mysql/innodb_index_stats | PRIMARY | 18 | 3 | 2 | 3 |
| mysql/innodb_table_stats | PRIMARY | 17 | 3 | 1 | 3 |
| mysql/slave_master_info | PRIMARY | 20 | 3 | 4 | 3 |
| mysql/slave_relay_log_info | PRIMARY | 19 | 3 | 3 | 3 |
| mysql/slave_worker_info | PRIMARY | 21 | 3 | 5 | 3 |
| test_gomeet/t_worker_attendance_1y | GEN_CLUST_INDEX | 45 | 1 | 12 | 3 |
| test_gomeet/t_worker_attendance_1y | checkin_time | 46 | 0 | 12 | 16389 |
| test_gomeet/t_worker_attendance_1y | project_id | 50 | 0 | 12 | 32775 |
| test_gomeet/t_worker_attendance_1y | worker_id | 53 | 0 | 12 | 49161 |
| test_gomeet/t_worker_attendance_1y | subcontractor_id | 54 | 0 | 12 | 65547 |
| test_gomeet/t_worker_attendance_1y | sign_date | 66 | 0 | 12 | 81933 |
| test_gomeet/t_worker_attendance_1y | project_id_2 | 408 | 0 | 12 | 98319 |
| test_gomeet/t_worker_attendance_300w | GEN_CLUST_INDEX | 56 | 1 | 13 | 3 |
| test_gomeet/t_worker_attendance_300w | checkin_time | 58 | 0 | 13 | 16389 |
| test_gomeet/t_worker_attendance_300w | project_id | 59 | 0 | 13 | 16427 |
| test_gomeet/t_worker_attendance_300w | worker_id | 60 | 0 | 13 | 16428 |
| test_gomeet/t_worker_attendance_300w | subcontractor_id | 61 | 0 | 13 | 16429 |
| test_gomeet/t_worker_attendance_300w | sign_date | 67 | 0 | 13 | 65570 |
| test_gomeet/t_worker_attendance_300w | project_id_2 | 397 | 0 | 13 | 81929 |
| test_gomeet/t_worker_attendance_4y | GEN_CLUST_INDEX | 42 | 1 | 9 | 3 |
| test_gomeet/t_worker_attendance_4y | checkin_time | 47 | 0 | 9 | 16389 |
| test_gomeet/t_worker_attendance_4y | worker_id | 49 | 0 | 9 | 32775 |
| test_gomeet/t_worker_attendance_4y | project_id | 52 | 0 | 9 | 49161 |
| test_gomeet/t_worker_attendance_4y | subcontractor_id | 55 | 0 | 9 | 65547 |
| test_gomeet/t_worker_attendance_4y | sign_date | 69 | 0 | 9 | 81933 |
| test_gomeet/t_worker_attendance_4y | project_id_2 | 412 | 0 | 9 | 98319 |
+-------------------------------------------------+---------------------+----------+------+-------+---------+
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_page_size';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Innodb_page_size | 16384 |
+------------------+-------+
root#localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_300w.ibd
000c040 0200
000c042
root#localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_1y.ibd
000c040 0300
000c042
root#localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_4y.ibd
000c040 0300
000c042
The calculation shows that 3.34 is 100 million and 3.589 is 400 million. It's almost the same. Is it because of this?

Related

How to perform a sum for all previous records

I've been trying to implement the solution here with the added flavour of updating existing records. As an MRE I'm looking to populate the sum_date_diff column in a table with the sum of all the differences between the current row date and the date of every previous row where the current row p1_id matches the previous row p1_id or p2_id. I have already filled out the expected result below:
+-----+------------+-------+-------+---------------+
| id_ | date_time | p1_id | p2_id | sum_date_diff |
+-----+------------+-------+-------+---------------+
| 1 | 2000-01-01 | 1 | 2 | Null |
| 2 | 2000-01-02 | 2 | 4 | 1 |
| 3 | 2000-01-04 | 1 | 3 | 3 |
| 4 | 2000-01-07 | 2 | 5 | 11 |
| 5 | 2000-01-15 | 2 | 3 | 35 |
| 6 | 2000-01-20 | 1 | 3 | 35 |
| 7 | 2000-01-31 | 1 | 3 | 68 |
+-----+------------+-------+-------+---------------+
My query so far looks like:
UPDATE test.sum_date_diff AS sdd0
JOIN
(SELECT
id_,
SUM(DATEDIFF(sdd1.date_time, sq.date_time)) AS sum_date_diff
FROM
test.sum_date_diff AS sdd1
LEFT OUTER JOIN (SELECT
sdd2.date_time AS date_time, sdd2.p1_id AS player_id
FROM
test.sum_date_diff AS sdd2 UNION ALL SELECT
sdd3.date_time AS date_time, sdd3.p2_id AS player_id
FROM
test.sum_date_diff AS sdd3) AS sq ON sq.date_time < sdd1.date_time
AND sq.player_id = sdd1.p1_id
GROUP BY sdd1.id_) AS master_sq ON master_sq.id_ = sdd0.id_
SET
sdd0.sum_date_diff = master_sq.sum_date_diff
This works as shown here.
However, on a table of 1.5m records the query has been hanging for the last hour. Even when I add a WHERE clause onto the bottom to restrict the update to a single record then it hangs for 5 mins+.
Here is the EXPLAIN statement for the query on the full table:
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| 1 | UPDATE | sum_date_diff | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100 | NULL |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | const | 10 | 100 | NULL |
| 2 | DERIVED | sum_date_diff | NULL | index | PRIMARY,ix__match_oc_history__date_time,ix__match_oc_history__p1_id,ix__match_oc_history__p2_id,ix__match_oc_history__date_time_players | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index; Using temporary |
| 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2968576 | 100 | Using where; Using join buffer (hash join) |
| 3 | DERIVED | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
| 4 | UNION | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
Here is the CREATE TABLE statement:
CREATE TABLE `sum_date_diff` (
`id_` int NOT NULL AUTO_INCREMENT,
`date_time` datetime DEFAULT NULL,
`p1_id` int NOT NULL,
`p2_id` int NOT NULL,
`sum_date_diff` int DEFAULT NULL,
PRIMARY KEY (`id_`),
KEY `ix__sum_date_diff__date_time` (`date_time`),
KEY `ix__sum_date_diff__p1_id` (`p1_id`),
KEY `ix__sum_date_diff__p2_id` (`p2_id`),
KEY `ix__sum_date_diff__date_time_players` (`date_time`,`p1_id`,`p2_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1822120 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
MySQL version is 8.0.26 running on a 2016 MacBook Pro with Monterey with 16Gb RAM.
After reading around about boosting the RAM available to MySQL I've added the following to the standard my.cnf file:
innodb_buffer_pool_size = 8G
tmp_table_size=2G
max_heap_table_size=2G
I'm wondering if:
I've done something wrong
This is just a very slow task no matter what I do
There is a faster method
I'm hoping someone could enlighten me!
Whereas it is possible to do calculations like this in SQL, it is messy. If the number of rows is not in the millions, I would fetch the necessary columns into my application and do the arithmetic there. (Loops are easier and faster in PHP/Java/etc than in SQL.)
LEAD() and LAG() are possible, but they are not optimized well (or so is my experience). In an APP language, it is easy and efficient to look up things in arrays.
The SELECT can (easily and efficiently) do any filtering and sorting so that the app only receives the necessary data.

ProxySQL Query Cache doesn't always respect the query rules for some reason

I use ProxySQL (2.0.17) to cache all SELECT queries sent to MySQL. The mysql_query_rules table looks like this:
+---------+--------+----------+------------+--------+-------------+------------+------------+--------+------------------------------+---------------+----------------------+--------------+---------+-----------------+-----------------------+-----------+--------------------+---------------+-----------+---------+---------+-------+-------------------+----------------+------------------+-----------+--------+-------------+-----------+---------------------+-----+-------+---------+
| rule_id | active | username | schemaname | flagIN | client_addr | proxy_addr | proxy_port | digest | match_digest | match_pattern | negate_match_pattern | re_modifiers | flagOUT | replace_pattern | destination_hostgroup | cache_ttl | cache_empty_result | cache_timeout | reconnect | timeout | retries | delay | next_query_flagIN | mirror_flagOUT | mirror_hostgroup | error_msg | OK_msg | sticky_conn | multiplex | gtid_from_hostgroup | log | apply | comment |
+---------+--------+----------+------------+--------+-------------+------------+------------+--------+------------------------------+---------------+----------------------+--------------+---------+-----------------+-----------------------+-----------+--------------------+---------------+-----------+---------+---------+-------+-------------------+----------------+------------------+-----------+--------+-------------+-----------+---------------------+-----+-------+---------+
| 1 | 1 | NULL | NULL | 0 | NULL | NULL | NULL | NULL | ^[(]?SELECT (?!SQL_NO_CACHE) | NULL | 0 | CASELESS | NULL | NULL | NULL | 300000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 | NULL |
+---------+--------+----------+------------+--------+-------------+------------+------------+--------+------------------------------+---------------+----------------------+--------------+---------+-----------------+-----------------------+-----------+--------------------+---------------+-----------+---------+---------+-------+-------------------+----------------+------------------+-----------+--------+-------------+-----------+---------------------+-----+-------+---------+
One simple rule (I tried ^SELECT .* as well) and 300 seconds to wait until a cached query is purged.
For some reason, 5% of each query to be cached are still sent to the backend. For instance, this one is the most popular query:
+-----------+------------+----------+----------------+--------------------+--------------------------+------------+------------+------------+-------------+----------+----------+-------------------+---------------+
| hostgroup | schemaname | username | client_address | digest | digest_text | count_star | first_seen | last_seen | sum_time | min_time | max_time | sum_rows_affected | sum_rows_sent |
+-----------+------------+----------+----------------+--------------------+--------------------------+------------+------------+------------+-------------+----------+----------+-------------------+---------------+
| 2 | ------ | ---- | | 0xFB50749BCFE0DA3C | SELECT * FROM `language` | 12839 | 1621445210 | 1621455115 | 45069293213 | 31321 | 82235606 | 0 | 56960 |
| -1 | ------ | ---- | | 0xFB50749BCFE0DA3C | SELECT * FROM `language` | 326243 | 1621445210 | 1621455116 | 0 | 0 | 0 | 0 | 0 |
+-----------+------------+----------+----------------+--------------------+--------------------------+------------+------------+------------+-------------+----------+----------+-------------------+---------------+
I can't get my head around this peculiarity. Whenever I update stats_mysql_query_digest, count_star on hostgroup 2 (backend) gets incremented without waiting 300 seconds for the query to be purged.
The query cache size is set to 512 Mb. At its peak, it takes up around 100 Mb.
Help?..
Cranking mysql-query_cache_size_MB up to 5120 MB (which is ridiculous, of course) seems to have resolved the problem to some extent. The frequency of backend requests for that query has dropped by 10 times (thanks to ProxySQL's Query Logging you can log just one query and analyze it). The cache_ttl value is still somewhat far from being respected but I guess this workaround is better than nothing at this point.

tree branch from mysql

I have mysql table with schema whixh contain data to store tree structure.
CREATE TABLE `treedata` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` int(11) unsigned NOT NULL DEFAULT '0',
`depth` tinyint(3) unsigned NOT NULL DEFAULT '0',
`name` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uniquecheck` (`parent_id`,`name`) USING BTREE,
KEY `depth` (`depth`) USING BTREE,
KEY `parent_id` (`parent_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=latin1
It has below data.
mysql> select * from treedata;
+----+-----------+-------+------+
| id | parent_id | depth | name |
+----+-----------+-------+------+
| 1 | 1 | 0 | root |
| 2 | 1 | 1 | b1 |
| 3 | 1 | 1 | b2 |
| 4 | 1 | 1 | b3 |
| 5 | 2 | 2 | b1_1 |
| 6 | 2 | 2 | b1_2 |
| 7 | 2 | 2 | b1_3 |
| 8 | 3 | 2 | b2_1 |
| 9 | 3 | 2 | b2_2 |
| 10 | 3 | 2 | b2_3 |
| 11 | 4 | 2 | b3_1 |
| 12 | 4 | 2 | b3_2 |
| 13 | 4 | 2 | b3_3 |
+----+-----------+-------+------+
13 rows in set (0.00 sec)
I need to select branch and its children based on depth and name, like if depth is 1 and name is b1 then it should return
+----+-----------+-------+------+
| id | parent_id | depth | name |
+----+-----------+-------+------+
| 2 | 1 | 1 | b1 |
| 5 | 2 | 2 | b1_1 |
| 6 | 2 | 2 | b1_2 |
| 7 | 2 | 2 | b1_3 |
+----+-----------+-------+------+
I am new to database. I tried left join it gives all children but not branch itself.
mysql> select td2.* from treedata as td1 left join treedata as td2 on td1.id=td2.parent_id where td1.name='b1';
+------+-----------+-------+------+
| id | parent_id | depth | name |
+------+-----------+-------+------+
| 5 | 2 | 2 | b1_1 |
| 6 | 2 | 2 | b1_2 |
| 7 | 2 | 2 | b1_3 |
+------+-----------+-------+------+
3 rows in set (0.00 sec)
Note: I can't change database schema.
you can use like cluse for select all data which has b1 branch like this .
select td2.* from treedata as td1 left join treedata as td2 on td1.id=td2.parent_id where td1.name LIKE '%b1%';
i think it may help you
select * from (select * from table_name order by `depth`) products_sorted,(select #pv := 'your_node_id(string)') initialisation where (find_in_set(parent_id, #pv) or id=your_node_id) and length(#pv := concat(#pv, ',', id))
it will find all children of your starting node

In a very large MySQL analytics table - should I index the timestamp?

I'm looking to improve the speed of queries on a very large MySQL analytics table that I have. This table is tracking playercount on gameservers and the structure looks as so:
`server_tracker` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ip` int(10) unsigned NOT NULL,
`port` smallint(5) unsigned NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`players` tinyint(3) unsigned NOT NULL,
`map` varchar(28) NOT NULL,
`portjoin` smallint(5) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_tracking_ip_port` (`ip`,`port`)
) ENGINE=InnoDB AUTO_INCREMENT=310729056 DEFAULT CHARSET=utf8 ROW_FORMAT=FIXED |
This table is inserted into very frequently, with 10k+ servers being tracked 10+ times an hour. However, every hour the data is taken and averaged out, and put into an "averaged" table with basically the same structure.
Currently I have the IP/port setup as key. However - sometimes it can be a tad slow when doing that hourly averaging - so I am curious if it would be worth putting an index on the timestamp, which is frequently used to select data from a certain timeframe like so:
SELECT `players`
FROM `server_tracker`
WHERE `ip` = x
AND `port` = x
AND `date` > NOW()
AND `date` < NOW() + INTERVAL 60 MINUTE
ORDER BY `id` DESC
This is the only type of query ran on this table. The table is only used for fetching the playercount from gameservers within a specific timeframe. The data is never updated or changed.
However, I am a bit new to all of this - and I am not sure if putting an index on the timestamp would do much of anything. Just looking for some friendly advice.
Results of EXPLAIN SELECT players FROM server_tracker WHERE ip = x AND port = x AND date > NOW() AND date < NOW() + INTERVAL 60 MINUTE ORDER BY id DESC
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
| 1 | SIMPLE | server_tracker | ref | idx_tracking_ip_port | idx_tracking_ip_port | 6 | const,const | 15354 | Using where |
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
One of the most important information in MySQL and scripts is to know the MySQL to very few exceptions, always just ONE INDEX can be used in a query.
So it does not use much depending on an index ever to set a Column when all 4 are used verfelder in the where clause.
Only a combined index hilt over these fields.
The order of the fields is very important for this index can also be used for other queries.
An example:
An index on field1, field2 and field3 is used when you have the WHERE FIELD1 or FIELD1 and FIELD2 or field1, field2 and FIELD3. This index is not used if you in the WHERE FIELD2 or used FIELD3 or FIELD2 and field. 3 So always use the first field.
Too easy to find out if un like the QUERY works you can just run your query and EXPALIN and beommst directly the information whether and which index is used. If there are several lines you can as an indicator, the individual values ​​under rows muliplizieren together. The smaller this number is the better performs your query.
MariaDB [tmp]> EXPLAIN select * from content;
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | content | ALL | NULL | NULL | NULL | NULL | 13 | |
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
MariaDB [tmp]>
Anternativ you can check out the profiler how long the QUERY in what capacity depends and about optimizing server
An example:
MariaDB [(none)]> use tmp
Database changed
MariaDB [tmp]> SET PROFILING=ON;
Query OK, 0 rows affected (0.00 sec)
MariaDB [tmp]>
MariaDB [tmp]> SELECT * FROM content;
+----+------+---------------------+--------+------+--------------+------+------+------+------+
| id | Wert | Zeitstempel | WertID | aaa | d | e | wwww | n | ddd |
+----+------+---------------------+--------+------+--------------+------+------+------+------+
| 1 | 10 | 2001-01-01 00:00:00 | 1 | NULL | 1.5000 | NULL | NULL | 1 | NULL |
| 2 | 12.3 | 2001-01-01 00:01:00 | 2 | NULL | 2.5000 | NULL | NULL | 2 | NULL |
| 3 | 17.4 | 2001-01-01 00:02:00 | 3 | NULL | 123456.1250 | NULL | NULL | 3 | NULL |
| 4 | 10.9 | 2001-01-01 01:01:00 | 1 | NULL | 1000000.0000 | NULL | NULL | 4 | NULL |
| 5 | 15.4 | 2001-01-01 01:02:00 | 2 | NULL | NULL | NULL | NULL | 5 | NULL |
| 6 | 20.9 | 2001-01-01 01:03:00 | 3 | NULL | NULL | NULL | NULL | 6 | NULL |
| 7 | 22 | 2001-01-02 00:00:00 | 1 | NULL | NULL | NULL | NULL | 7 | NULL |
| 8 | 12.3 | 2001-01-02 00:01:00 | 2 | NULL | NULL | NULL | NULL | 8 | NULL |
| 9 | 17.4 | 2001-01-02 00:02:00 | 3 | NULL | NULL | NULL | NULL |
+----+------+---------------------+--------+------+--------------+------+------+------+------+
13 rows in set (0.00 sec)
MariaDB [tmp]>
MariaDB [tmp]> SHOW PROFILE;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000031 |
| checking permissions | 0.000005 |
| Opening tables | 0.000036 |
| After opening tables | 0.000004 |
| System lock | 0.000003 |
| Table lock | 0.000002 |
| After opening tables | 0.000005 |
| init | 0.000013 |
| optimizing | 0.000006 |
| statistics | 0.000013 |
| preparing | 0.000010 |
| executing | 0.000002 |
| Sending data | 0.000073 |
| end | 0.000003 |
| query end | 0.000003 |
| closing tables | 0.000006 |
| freeing items | 0.000003 |
| updating status | 0.000012 |
| cleaning up | 0.000003 |
+----------------------+----------+
19 rows in set (0.00 sec)
MariaDB [tmp]>

Mysql Group by time interval optimization

I have a very large table (several hundred millions of rows) that stores test results along with a datetime and a foreign key to a related entity called 'link', I need to to group rows by time intervals of 10,15,20,30 and 60 minutes as well as filter by time and 'link_id' I know this can be done with this query as explained [here][1]:
SELECT time,AVG(RTT),MIN(RTT),MAX(RTT),COUNT(*) FROM trace
WHERE link_id=1 AND time>='2015-01-01' AND time <= '2015-01-30'
GROUP BY UNIX_TIMESTAMP(time) DIV 600;
This solution worked but it was extremely slow (about 10 on average) so I tried adding a datetime column for each 'group by interval' for example the row:
id | time | rtt | link_id
1 | 2014-01-01 12:34:55.4034 | 154.3 | 2
became:
id | time | rtt | link_id | time_60 |time_30 ...
1 | 2014-01-01 12:34:55.4034 | 154.3 | 2 | 2014-01-01 12:00:00.00 | 2014-01-01 12:30:00.00 ...
and I get the intervals with the following query:
SELECT time_10,AVG(RTT),MIN(RTT),MAX(RTT),COUNT(*) FROM trace
WHERE link_id=1 AND time>='2015-01-01' AND time <= '2015-01-30'
GROUP BY time_10;
this query was at least 50% faster (about 5 seconds on average) but it is still pretty slow, how can I optimize this query to be faster?
explain query outputs this:
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
| 1 | SIMPLE | main_trace | ref | main_trace_link_id_c6febb11f84677f_fk_main_link_id,main_trace_e7549e3e | main_trace_link_id_c6febb11f84677f_fk_main_link_id | 4 | const | 1478359 | Using where; Using temporary; Using filesort |
+----+-------------+------------+------+------------------------------------------------------------------------+----------------------------------------------------+---------+-------+---------+----------------------------------------------+
and these are the table indexes:
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| main_trace | 0 | PRIMARY | 1 | id | A | 2956718 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_link_id_c6febb11f84677f_fk_main_link_id | 1 | link_id | A | 2 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_07cc694b | 1 | time | A | 2956718 | NULL | NULL | | BTREE | | |
| main_trace | 1 | main_trace_e7549e3e | 1 | time_10 | A | 22230 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_01af8333 | 1 | time_15 | A | 14783 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_1681ff94 | 1 | time_20 | A | 10870 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_f7c28c93 | 1 | time_30 | A | 6399 | NULL | NULL | YES | BTREE | | |
| main_trace | 1 | main_trace_0f29fcc5 | 1 | time_60 | A | 3390 | NULL | NULL | YES | BTREE | | |
+------------+------------+----------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
For this query:
SELECT time_10, AVG(RTT), MIN(RTT), MAX(RTT), COUNT(*)
FROM trace
WHERE link_id = 1 AND time >= '2015-01-01' AND time <= '2015-01-30'
GROUP BY time_10;
The best index is the covering index: trace(link_id, time, time_10, rtt).
a composite index on (id,time) followed by a potential analyze table trace would make it snappy.
It is just a suggestion, I am not saying to do it. Analyze table can take some people hours to run with millions of rows.
Suggesting index creation based on just one query is not a great idea. Assumption being, you have other queries. And they are a drag on inserts/updates.
time <= '2015-01-30' excludes most of the last day of January; did you want that? This pattern works well, and avoids many endcases (eg, leapyear):
WHERE time >= '2015-01-01'
AND time < '2015-01-01' + INTERVAL 1 MONTH
If this is static data (such as a write-once Data Warehouse), you could make the query much faster by building and maintaining Summary Tables.