Performance problem in MySQL - Sum() and select is taking too long - mysql

i have a performance problem in MySQL.
I have a table with 215000 rows (InnoDB Engine) inserted in it and to execute the function SUM() on one column for only 1254 rows is taking 500ms.
The version i am using is : MySQL 5.7.32
The computer specs are the following:
Core I5 3.0 Ghz
8 Gb Ram
Solid State Drive
Here i leave information about the structure of the database:
mysql> select count(*) from cuenta_corriente;
+----------+
| count(*) |
+----------+
| 214514 |
+----------+
mysql> describe cuenta_corriente;
+-----------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+----------------+
| ID_CUENTA_CORRIENTE | int(11) | NO | PRI | NULL | auto_increment |
| ID_CLIENTE | int(11) | YES | MUL | NULL | |
| ID_PROVEEDOR | int(11) | YES | MUL | NULL | |
| FECHA | varchar(50) | YES | MUL | NULL | |
| FECHA_FISCAL | varchar(50) | YES | | NULL | |
| ID_OPERACION | int(11) | YES | MUL | NULL | |
| ID_TIPO_OPERACION | int(11) | YES | MUL | NULL | |
| DEBE | double | YES | | 0 | |
| HABER | double | YES | | 0 | |
| TOTAL | double | YES | | 0 | |
| SALDO_ANTERIOR | double | YES | | 0 | |
| SALDO_ACTUAL | double | YES | | 0 | |
| ID_OPERACION_ASOCIADA | int(11) | YES | | NULL | |
| ELIMINADO | int(11) | YES | | 0 | |
| ID_EMPLEADO | int(11) | NO | | 0 | |
+-----------------------+-------------+------+-----+---------+----------------+
show indexes from cuenta_corriente;
+------------+------------------------------------------------------------+
| Non_unique | Key_name Column_name |
+------------+------------------------------------------------------------+
| 0 | PRIMARY ID_CUENTA_CORRIENTE |
| 1 | IDX_CUENTA_CORRIENTE ID_CLIENTE |
| 1 | IX_cuenta_corriente_FECHA FECHA |
| 1 | IX_cuenta_corriente_ID_CLIENTE ID_CLIENTE |
| 1 | IX_cuenta_corriente_ID_PROVEEDOR ID_PROVEEDOR |
| 1 | IX_cuenta_corriente_ID_TIPO_OPERACION ID_TIPO_OPERACION |
| 1 | IX_cuenta_corriente_ID_OPERACION ID_OPERACION |
| 1 | IDX_cuenta_corriente_ID_OPERACION ID_OPERACION |
+------------+------------------------------------------------------------+
8 rows in set (0.00 sec)
The problem is with the folowing queries, in my opinion they are taking too long, considering that i have an index for the column ID_CLIENTE and that there are only 1254 rows with the ID_CLIENTE column = 19. Here are the query results:
mysql> SELECT COUNT(*) FROM CUENTA_CORRIENTE WHERE ID_CLIENTE = 19;
1254 rows
mysql> SELECT DEBE FROM CUENTA_CORRIENTE WHERE ID_CLIENTE = 19;
1254 rows - 0.513 sec
mysql> SELECT SUM(DEBE) FROM CUENTA_CORRIENTE WHERE ID_CLIENTE = 19;
0.582 sec
The strange thing is if i select all the columns instead selecting only the "DEBE" column, it takes less time:
mysql> SELECT * FROM CUENTA_CORRIENTE WHERE ID_CLIENTE = 19;
0.095 sec
Can anyone help me to improve the performance?

You can make just that query fast by creating a composite index to support it.
ie:
CREATE INDEX IDX_QUERY_FAST ON cuenta_corriente (ID_CLIENTE, DEBE)
But don't forget, each index has to be maintained, so it slows down any inserts into the table, so you don't want 200 indexes supporting every possible query.
With the existing index, the engine should be smart enough to identify the 1200 rows you care about using the index, but then it has to go read all the table records (across however many pages) that have the individual rows to get the DEBE column.

Add this index to help most of the queries you have shown:
INDEX(ID_CLIENTE, DEBE)
and drop INDEX(ID_CLIENTE) if you have such.
Are all of your secondary indexes starting with the PRIMARY KEY?? (The names imply such; please provide SHOW CREATE TABLE to definitively say what columns are in each index.) Don't start an index with the PK; it is likely to be useless.
Run EXPLAIN SELECT ... to see which index a query uses.
When timing a query, run it twice. The first run may spend extra time loading index or data rows into cache (in RAM in the buffer_pool); the second run may be significantly faster because of the caching.

Related

Index not used in query. How to improve performance?

I have this query:
SELECT
*
FROM
`av_cita`
JOIN `av_cita_cstm` ON (
(
`av_cita`.`id` = `av_cita_cstm`.`id_c`
)
)
WHERE
av_cita.deleted = 0
This query takes over 120 seconds to finish, yet I have added all indexes.
When I ask for the execution plan:
explain SELECT * FROM `av_cita`
JOIN `av_cita_cstm` ON ( ( `av_cita`.`id` = `av_cita_cstm`.`id_c` ) )
WHERE av_cita.deleted = 0;
I get this:
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
| 1 | SIMPLE | av_cita | ALL | PRIMARY,delete_index | NULL | NULL | NULL | 192549 | Using where |
| 1 | SIMPLE | av_cita_cstm | eq_ref | PRIMARY | PRIMARY | 108 | rednacional_v2.av_cita.id | 1 | |
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
delete_index is listed in the possible_keys column, but the key is null, and it doesn't use the index.
Table and index definitions:
+------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| id | char(36) | NO | PRI | NULL | |
| name | varchar(255) | YES | MUL | NULL | |
| date_entered | datetime | YES | MUL | NULL | |
| date_modified | datetime | YES | | NULL | |
| modified_user_id | char(36) | YES | | NULL | |
| created_by | char(36) | YES | MUL | NULL | |
| description | text | YES | | NULL | |
| deleted | tinyint(1) | YES | MUL | 0 | |
| assigned_user_id | char(36) | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+-------+
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| av_cita | 0 | PRIMARY | 1 | id | A | 192786 | NULL | NULL | | BTREE | | |
| av_cita | 1 | delete_index | 1 | deleted | A | 2 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | name_index | 1 | name | A | 96393 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | date_entered_index | 1 | date_entered | A | 96393 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | created_by | 1 | created_by | A | 123 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | assigned_user_id | 1 | assigned_user_id | A | 1276 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | deleted_id | 1 | deleted | A | 2 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | deleted_id | 2 | id | A | 192786 | NULL | NULL | | BTREE | | |
| av_cita | 1 | id | 1 | id | A | 192786 | NULL | NULL | | BTREE | | |
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
How can I improve the performance of this query?
The query is losing time on making the join. I would strongly suggest to create and index on av_cita_cstm.id_c. Then the plan will probably be changed to use that index for the av_cita_cstm table, which is much better than PRIMARY. As a consequence PRIMARY will be used on ac_cita.
I think that will bring a big improvement. You might still get more improvement if you make sure delete_index is defined with two fields: (deleted, id), and then move the where condition of the SQL statement into the join condition. But I am not sure MySql will see this as a possibility.
The index on deleted is not used probably because the optimizer has decided that a full table-scan is cheaper than using the index. MySQL tends to make this decision if the value you search for is found on about 20% or more of the rows in the table.
By analogy, think of the index at the back of a book. You can understand why common words like "the" aren't indexed. It would be easier to just read the book cover-to-cover than to flip back and forth to the index, which only tells you that "the" appears on a majority of pages.
If you think MySQL has made the wrong decision, you can make it pretend that a table-scan is more expensive than using a specific index:
SELECT
*
FROM
`av_cita` FORCE INDEX (deleted_index)
JOIN `av_cita_cstm` ON (
(
`av_cita`.`id` = `av_cita_cstm`.`id_c`
)
)
WHERE
av_cita.deleted = 0
Read http://dev.mysql.com/doc/refman/5.7/en/index-hints.html for more information about index hints. Don't overuse index hints, they're useful only in rare cases. Most of the time the optimizer makes the right decision.
Your EXPLAIN plan shows that your join to av_cita_cstm is already using a unique index (the clue is "type: eq_ref" and also the "rows: 1"). I don't think any new index is needed in that table.
I notice the EXPLAIN shows that the table-scan on av_cita scans about an estimated 192549 rows. I'm really surprised that this takes 120 seconds. On any reasonably powerful computer, that should run much faster.
That makes me wonder if you have something else that needs tuning or configuration on this server:
What other processes are running on the server? A lot of applications, perhaps? Are the other processes also running slowly on this server? Do you need to increase the power of the server, or move applications onto their own server?
If you're on MySQL 5.7, try querying the sys schema: this:
select * from sys.innodb_buffer_stats_by_table
where object_name like 'av_cita%';
Are there other costly SQL queries running concurrently?
Did you under-allocate MySQL's innodb_buffer_pool_size? If it's too small, it could be furiously recycling pages in RAM as it scans your table.
select ##innodb_buffer_pool_size;
Did you over-allocate innodb_buffer_pool_size? Once I helped tune a server that was running very slowly. It turned out they had a 4GB buffer pool, but only 1GB of physical RAM. The operating system was swapping like crazy, causing everything to run slowly.
Another thought: You have shown us the columns in av_cita, but not the table structure for av_cita_cstm. Why are you fetching SELECT *? Do you really need all the columns? Are there huge BLOB/TEXT columns in the latter table? If so, it could be reading a large amount of data from disk that you don't need.
When you ask SQL questions, it would help if you run
SHOW CREATE TABLE av_cita\G
SHOW TABLE STATUS LIKE 'av_cita'\G
And also run the same commands for the other table av_cita_cstm, and include the output in your question above.

What index should increase performance of select query?

This is table structure:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| visitor_hash | varchar(40) | YES | MUL | NULL | |
| uri | varchar(255) | YES | | NULL | |
| ip_address | char(15) | YES | MUL | NULL | |
| last_visit | datetime | YES | | NULL | |
| visits | int(11) | NO | | NULL | |
| object_app | varchar(255) | YES | MUL | NULL | |
| object_model | varchar(255) | YES | | NULL | |
| object_id | varchar(255) | YES | | NULL | |
| blocked | tinyint(1) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
This is request:
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
Time for response is ~77,63 ms.
CREATE INDEX resource_model ON visits_visit (object_model(100));
After this request the time for response increased to ~150ms.
How to improve performance for this case? Thank you.
UPDATED:
Answering to Michal Komorowski.
This is explain before index:
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ALL | NULL | NULL | NULL | NULL | 142938 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)
And this is after index:
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ref | resource_model | resource_model | 303 | const | 64959 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
1 row in set (0.00 sec)
I don't know what gives me this information.
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
78,85 ms before indexing and 365,59 ms after indexing.
Also i have index
CREATE INDEX resource ON visits_visit (object_app(100), object_model(100), object_id(100));
But i need this one, because in other select queries WHERE contains this three keys.
UPDATE:
I'm using django debug toolbar to test performance of requests.
UPDATE:
Query:
ANALYZE TABLE visits_visit;
Output:
+-----------------------------+---------+----------+-----------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+---------+----------+-----------------------------+
| **************.visits_visit | analyze | status | Table is already up to date |
+-----------------------------+---------+----------+-----------------------------+
1 row in set (0.00 sec)
UPDATE:
SHOW INDEXES FROM visits_visit;
Output:
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visits_visit | 0 | PRIMARY | 1 | id | A | 142938 | NULL | NULL | | BTREE | | |
| visits_visit | 1 | visits_visit_0880babc | 1 | visitor_hash | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | visits_visit_5325a746 | 1 | ip_address | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 1 | object_app | A | 1 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 2 | object_model | A | 3 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 3 | object_id | A | 959 | 100 | NULL | YES | BTREE | | |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
It seems to me that although you have an index, MySQL doesn't know how to use it properly. It happens when information about data distribution (statistics) within a table are not up to date. In order to update them you should call ANALYZE TABLE visits_visit and then check results.
I was confused by misunderstanding of sql mechanisms, so i decided to create model Popular and save instances in it every 24 hours. Thanks to everyone, who tried to help.
As I said in your other question, Prefix indexes are virtually useless; don't use them except in rare circumstances.
Shrink the fields to reasonable lengths and you won't be tempted to use Prefix indexes.
The optimal index for that query is INDEX(object_model, object_id). Attempting to use INDEX(object_model(##), ...) will not get past object_model to anything after it.
If object_model is things like 'News', I suspect the other possible values are short, and perhaps there is a finite number of models. For "short" change to some smaller VARCHAR. For "finite", consider using ENUM('News', 'Weather', 'Sports', ...).
As for why it took longer after indexing...
Without the index, the Optimizer had no choice but to scan the entire table. This is a simple linear scan. It would read but not count any non-News rows.
With the index, the Optimizer has the additional choice of using the index. But, perhaps most rows are News? Well, it would scan the index (nice), but for each News item in the index, it would have to look up the row to get object_id (not so nice). It seems (from the timings) that the latter is less efficient.
By shrinking the declarations and using INDEX(object_model, object_id) (in this order), the query can be performed in the index. Think of the index as a mini-table with just those two columns in it. It is smaller. It is ordered by model, so it only needs to scan the 'News' part. The explain will show this "covering" by saying "Using index".
If all cases, the GROUP BY adds some overhead -- either keeping a hash of object_id in RAM or by saving intermediate results and sorting them. Then the ORDER BY requires a sort (or a priority hash) before the LIMIT can apply.

mysql query running too slow for discontinuous data

I am new to MySQL, and trying to using MySQL on the project, basically was tracking players performance.
Below is the table fields.
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| unique_id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| record_time | datetime | NULL | NO | | NULL | | select,insert,update,references | |
| game_sourceid | char(20) | latin1_swedish_ci | NO | MUL | NULL | | select,insert,update,references | |
| game_number | smallint(6) | NULL | NO | | NULL | | select,insert,update,references | |
| game_difficulty | char(12) | latin1_swedish_ci | NO | MUL | NULL | | select,insert,update,references | |
| cost_time | smallint(5) unsigned | NULL | NO | MUL | NULL | | select,insert,update,references | |
| country | char(3) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | |
| source | char(7) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | |
+-------------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
and I have adding game_sourceid and game_difficulty as index and the engine is innodb.
I have insert about 11m rows of test data into this table, which is generated randomly but resembles the real data.
Basically the mostly query was like this, to get the average time and best time for a specific game_sourceid
SELECT avg(cost_time) AS avgtime
, min(cost_time) AS mintime
, count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_1';
+-----------+---------+--------+
| avgtime | mintime | count |
+-----------+---------+--------+
| 1681.2851 | 420 | 138034 |
+-----------+---------+--------+
1 row in set (4.97 sec)
and the query took about 5s
I have googled about this and someone said that may caused by the amout of query count, so I am trying to narrow down the scope like this
SELECT avg(cost_time) AS avgtime
, min(cost_time) AS mintime
, count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_1'
AND record_time > '2015-11-19 04:40:00';
+-----------+---------+-------+
| avgtime | mintime | count |
+-----------+---------+-------+
| 1275.2222 | 214 | 9 |
+-----------+---------+-------+
1 row in set (4.46 sec)
As you can see the 9 rows data also took about 5s, so i think it's not the problem about the query count.
The test data was generated randomly to simulate the real user's activity, so the data was discontinuous, so i added more continuous data(about 250k) with the same game_sourceid='standard_easy_9' but keep all others randomly, in other words the last 250k rows in this table has the same game_sourceid. And i'm trying to query like this:
SELECT avg(cost_time) AS avgtime
, min(cost_time) AS mintime
, count(*) AS count
FROM statistics_work_table
WHERE game_sourceid = 'standard_easy_9';
+-----------+---------+--------+
| avgtime | mintime | count |
+-----------+---------+--------+
| 1271.4806 | 70 | 259379 |
+-----------+---------+--------+
1 row in set (0.40 sec)
This time the query magically took only 0.4s, that's totally beyond my expectations.
So here's the question, the data was retrived from the player at real time, so it must be randomly and discontinuous.
I am thinking of separating the data into multiple tables by the game_sourceid, but it will take another 80 tables for that, maybe more in the future.
Since I am new to MySQL, I am wondering if there are any other solutions for this, or just my query was too bad.
Update: Here's the index of my table
mysql> show index from statistics_work_table;
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| statistics_work_table | 0 | PRIMARY | 1 | unique_id | A | 11362113 | NULL | NULL | | BTREE | | |
| statistics_work_table | 1 | GameSourceId_CostTime | 1 | game_sourceid | A | 18 | NULL | NULL | | BTREE | | |
| statistics_work_table | 1 | GameSourceId_CostTime | 2 | cost_time | A | 344306 | NULL | NULL | | BTREE | | |
+-----------------------+------------+-------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
ALTER TABLE `statistics_work_table`
ADD INDEX `GameSourceId_CostTime` (`game_sourceid`,`cost_time`)
This index should make your queries super fast. Also, after you run the above statement, you should drop the single column index you have on game_sourceid, as the above will make the single column one redundant. (Which will hurt insert speed.)
The reason your queries are slow is because the database is using your single column index on game_sourceid, finding the rows, and then, for each row, using the primary key that is stored along with the index to find the main clustered index (aka primary key in this, and most cases), and then looking up the cost_time value. This is referred to as a double lookup, and it is something you want to avoid.
The index I provided above is called a "covering index". It allows your query to use ONLY the index, and so you only need a single lookup per row, greatly improving performance.

In a very large MySQL analytics table - should I index the timestamp?

I'm looking to improve the speed of queries on a very large MySQL analytics table that I have. This table is tracking playercount on gameservers and the structure looks as so:
`server_tracker` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ip` int(10) unsigned NOT NULL,
`port` smallint(5) unsigned NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`players` tinyint(3) unsigned NOT NULL,
`map` varchar(28) NOT NULL,
`portjoin` smallint(5) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_tracking_ip_port` (`ip`,`port`)
) ENGINE=InnoDB AUTO_INCREMENT=310729056 DEFAULT CHARSET=utf8 ROW_FORMAT=FIXED |
This table is inserted into very frequently, with 10k+ servers being tracked 10+ times an hour. However, every hour the data is taken and averaged out, and put into an "averaged" table with basically the same structure.
Currently I have the IP/port setup as key. However - sometimes it can be a tad slow when doing that hourly averaging - so I am curious if it would be worth putting an index on the timestamp, which is frequently used to select data from a certain timeframe like so:
SELECT `players`
FROM `server_tracker`
WHERE `ip` = x
AND `port` = x
AND `date` > NOW()
AND `date` < NOW() + INTERVAL 60 MINUTE
ORDER BY `id` DESC
This is the only type of query ran on this table. The table is only used for fetching the playercount from gameservers within a specific timeframe. The data is never updated or changed.
However, I am a bit new to all of this - and I am not sure if putting an index on the timestamp would do much of anything. Just looking for some friendly advice.
Results of EXPLAIN SELECT players FROM server_tracker WHERE ip = x AND port = x AND date > NOW() AND date < NOW() + INTERVAL 60 MINUTE ORDER BY id DESC
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
| 1 | SIMPLE | server_tracker | ref | idx_tracking_ip_port | idx_tracking_ip_port | 6 | const,const | 15354 | Using where |
+----+-------------+-----------------+------+----------------------+----------------------+---------+-------------+-------+-------------+
One of the most important information in MySQL and scripts is to know the MySQL to very few exceptions, always just ONE INDEX can be used in a query.
So it does not use much depending on an index ever to set a Column when all 4 are used verfelder in the where clause.
Only a combined index hilt over these fields.
The order of the fields is very important for this index can also be used for other queries.
An example:
An index on field1, field2 and field3 is used when you have the WHERE FIELD1 or FIELD1 and FIELD2 or field1, field2 and FIELD3. This index is not used if you in the WHERE FIELD2 or used FIELD3 or FIELD2 and field. 3 So always use the first field.
Too easy to find out if un like the QUERY works you can just run your query and EXPALIN and beommst directly the information whether and which index is used. If there are several lines you can as an indicator, the individual values ​​under rows muliplizieren together. The smaller this number is the better performs your query.
MariaDB [tmp]> EXPLAIN select * from content;
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | content | ALL | NULL | NULL | NULL | NULL | 13 | |
+------+-------------+---------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
MariaDB [tmp]>
Anternativ you can check out the profiler how long the QUERY in what capacity depends and about optimizing server
An example:
MariaDB [(none)]> use tmp
Database changed
MariaDB [tmp]> SET PROFILING=ON;
Query OK, 0 rows affected (0.00 sec)
MariaDB [tmp]>
MariaDB [tmp]> SELECT * FROM content;
+----+------+---------------------+--------+------+--------------+------+------+------+------+
| id | Wert | Zeitstempel | WertID | aaa | d | e | wwww | n | ddd |
+----+------+---------------------+--------+------+--------------+------+------+------+------+
| 1 | 10 | 2001-01-01 00:00:00 | 1 | NULL | 1.5000 | NULL | NULL | 1 | NULL |
| 2 | 12.3 | 2001-01-01 00:01:00 | 2 | NULL | 2.5000 | NULL | NULL | 2 | NULL |
| 3 | 17.4 | 2001-01-01 00:02:00 | 3 | NULL | 123456.1250 | NULL | NULL | 3 | NULL |
| 4 | 10.9 | 2001-01-01 01:01:00 | 1 | NULL | 1000000.0000 | NULL | NULL | 4 | NULL |
| 5 | 15.4 | 2001-01-01 01:02:00 | 2 | NULL | NULL | NULL | NULL | 5 | NULL |
| 6 | 20.9 | 2001-01-01 01:03:00 | 3 | NULL | NULL | NULL | NULL | 6 | NULL |
| 7 | 22 | 2001-01-02 00:00:00 | 1 | NULL | NULL | NULL | NULL | 7 | NULL |
| 8 | 12.3 | 2001-01-02 00:01:00 | 2 | NULL | NULL | NULL | NULL | 8 | NULL |
| 9 | 17.4 | 2001-01-02 00:02:00 | 3 | NULL | NULL | NULL | NULL |
+----+------+---------------------+--------+------+--------------+------+------+------+------+
13 rows in set (0.00 sec)
MariaDB [tmp]>
MariaDB [tmp]> SHOW PROFILE;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000031 |
| checking permissions | 0.000005 |
| Opening tables | 0.000036 |
| After opening tables | 0.000004 |
| System lock | 0.000003 |
| Table lock | 0.000002 |
| After opening tables | 0.000005 |
| init | 0.000013 |
| optimizing | 0.000006 |
| statistics | 0.000013 |
| preparing | 0.000010 |
| executing | 0.000002 |
| Sending data | 0.000073 |
| end | 0.000003 |
| query end | 0.000003 |
| closing tables | 0.000006 |
| freeing items | 0.000003 |
| updating status | 0.000012 |
| cleaning up | 0.000003 |
+----------------------+----------+
19 rows in set (0.00 sec)
MariaDB [tmp]>

How can a 'WHERE column LIKE "%expression%" ' perform better than a MATCH(column) AGAINST("expression") in MySQL?

I've run into a serious MySQL performance bottleneck which I'm unable to understand and resolve. Here are the table structures, indexes and record counts (bear with me, it's only two tables):
mysql> desc elggobjects_entity;
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| guid | bigint(20) unsigned | NO | PRI | NULL | |
| title | text | NO | MUL | NULL | |
| description | text | NO | | NULL | |
+-------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> show index from elggobjects_entity;
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| elggobjects_entity | 0 | PRIMARY | 1 | guid | A | 613637 | NULL | NULL | | BTREE | |
| elggobjects_entity | 1 | title | 1 | title | NULL | 131 | NULL | NULL | | FULLTEXT | |
| elggobjects_entity | 1 | title | 2 | description | NULL | 131 | NULL | NULL | | FULLTEXT | |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
mysql> select count(*) from elggobjects_entity;
+----------+
| count(*) |
+----------+
| 613637 |
+----------+
1 row in set (0.00 sec)
mysql> desc elggentity_relationships;
+--------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| guid_one | bigint(20) unsigned | NO | MUL | NULL | |
| relationship | varchar(50) | NO | MUL | NULL | |
| guid_two | bigint(20) unsigned | NO | MUL | NULL | |
| time_created | int(11) | NO | | NULL | |
+--------------+---------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> show index from elggentity_relationships;
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| elggentity_relationships | 0 | PRIMARY | 1 | id | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 1 | guid_one | A | NULL | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 2 | relationship | A | NULL | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 3 | guid_two | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 1 | relationship | 1 | relationship | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 1 | guid_two | 1 | guid_two | A | 11408236 | NULL | NULL | | BTREE | |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)
mysql> select count(*) from elggentity_relationships;
+----------+
| count(*) |
+----------+
| 11408236 |
+----------+
1 row in set (0.00 sec)
Now I'd like to use an INNER JOIN on those two tables and perform a full text search.
Query:
SELECT
count(DISTINCT o.guid) as total
FROM
elggobjects_entity o
INNER JOIN
elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
((MATCH (o.title, o.description) AGAINST ('scelerisque' )))
This gave me a 6 minute (!) response time.
On the other hand this one
SELECT
count(DISTINCT o.guid) as total
FROM
elggobjects_entity o
INNER JOIN
elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
((o.title like "%scelerisque%") OR (o.description like "%scelerisque%"))
returned the same count value in 0.02 seconds.
How is that possible? What am I missing here?
(MySQL info: mysql Ver 14.14 Distrib 5.1.49, for debian-linux-gnu (x86_64) using readline 6.1)
EDIT
EXPLAINing the first query (using match .. against) gives:
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | r | ref | guid_one,relationship | relationship | 152 | const | 6145 | Using where |
| 1 | SIMPLE | o | fulltext | PRIMARY,title | title | 0 | | 1 | Using where |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
while the second query (using LIKE "%..%"):
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| 1 | SIMPLE | r | ref | guid_one,relationship | relationship | 152 | const | 6145 | Using where |
| 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 8 | elgg1710.r.guid_one | 1 | Using where |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
By combining your experience and EXPLAIN's results, it seems that fulltext index is not as useful as you expect in this particular case. This depends on particular data in your database, on database structure or/and particular query.
Usually database engines use no more than one index per table. So when the table has more than one index, query optimizer tries to use the better one. But optimizer is not always clever enough.
EXPLAIN's output shows that database query optimizer decided to use indexes for relationship and title. The relationship filter reduces table elggentity_relationships to 6145 rows. And the title filter reduces the table elggobjects_entity to 72697 rows. Then MySQL needs to join those tables (6145 x 72697 = 446723065 filtering operations) without using any index because indexes have already been used for filtering. In this case this can be too much. MySQL can even make a decision to keep intermediate calculations in the hard drive by trying to keep enough free space in memory.
Now let's take a look into another query. It uses relationship and PRIMARY KEY (of table elggobjects_entity) as its indexes. The relationship filter reduces table elggentity_relationships to 6145 rows. By joining those tables on PRIMARY KEY index, the result gets only 3957 rows. This is not much for the last filter (i.e. LIKE "%scelerisque%"), even if index is NOT used for this purpose at all.
As you can see the speed much depends on indexes selected for a query. So, in this particular case the PRIMARY KEY index is much more useful than fulltext title index, because PRIMARY KEY has bigger impact for result reduction than title.
MySQL is not always clever to set the right indexes. We can do this manually, by using clauses like IGNORE INDEX (index_name), FORCE INDEX (index_name), etc.
But in your case the problem is that if we use MATCH() AGAINST() in a query then the fulltext index is required, because MATCH() AGAINST() doesn't work without fulltext index at all. So this is the main reason why MySQL has chosen incorrect indexes for the query.
UPDATE
OK, I did some investigation.
Firstly, you may try to force MySQL to use guid_one index instead of relationship on table elggentity_relationships: USE INDEX (guid_one).
But for even better performance I think you can try to create one index for the composition of two columns (guid_one, membership). Current index guid_one is very similar, but for 3 columns, not for 2. In this query there are only 2 columns used. In my opinion after index creation MySQL should automatically use the right index. If not, force MySQL to use it.
Note: After index creation don't forget to remove old USE INDEX instruction from your query, because this may prevent query from using the newly created index. :)