Joining tables on foreign key: MySQL performance

Joining tables on foreign key: MySQL performance - mysql

I have two tables that have a foreign key constraint between them
Table event
mysql> describe event;
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| sid | int(10) unsigned | NO | PRI | NULL | |
| cid | int(10) unsigned | NO | PRI | NULL | |
| signature | int(10) unsigned | NO | MUL | NULL | |
| timestamp | datetime | NO | MUL | NULL | |
| is_deleted | tinyint(1) | NO | MUL | 0 | |
+------------+------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
Table signature
mysql> describe signature;
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| sig_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| sig_name | varchar(255) | NO | MUL | NULL | |
| sig_class_id | int(10) unsigned | NO | MUL | NULL | |
| sig_priority | int(10) unsigned | YES | | NULL | |
| sig_rev | int(10) unsigned | YES | | NULL | |
| sig_sid | int(10) unsigned | YES | | NULL | |
| sig_gid | int(10) unsigned | YES | | NULL | |
+--------------+------------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
event.signature is a foreign key and links to signature.sig_id. Both have indexes as well
Table event is large(say 1M records) while table signature will be comparatively small (Few thousand at most)
Joined Queries that access any signature attribute take a very long time to execute. A look at explain
mysql> explain select event.sid,event.cid,signature.sig_name from event join signature on signature.sig_id=event.signature;
+----+-------------+-----------+------+--------------------------------+-----------------------+---------+-------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+--------------------------------+-----------------------+---------+-------------------------+------+--------------------------+
| 1 | SIMPLE | signature | ALL | PRIMARY,index_signature_sig_id | NULL | NULL | NULL | 127 | |
| 1 | SIMPLE | event | ref | index_event_signature | index_event_signature | 5 | snorby.signature.sig_id | 68 | Using where; Using index |
+----+-------------+-----------+------+--------------------------------+-----------------------+---------+-------------------------+------+--------------------------+
2 rows in set (0.00 sec)
While if no signature attribute is accessed
mysql> explain select event.sid,event.cid from event join signature on signature.sig_id=event.signature;
+----+-------------+-----------+-------+--------------------------------+------------------------+---------+-------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------------------+------------------------+---------+-------------------------+------+--------------------------+
| 1 | SIMPLE | signature | index | PRIMARY,index_signature_sig_id | index_signature_sig_id | 4 | NULL | 127 | Using index |
| 1 | SIMPLE | event | ref | index_event_signature | index_event_signature | 5 | snorby.signature.sig_id | 68 | Using where; Using index |
+----+-------------+-----------+-------+--------------------------------+------------------------+---------+-------------------------+------+--------------------------+
2 rows in set (0.00 sec)
As can be seen if Signature attribute is queried it does a full scan with ALL join type.
Is it possible to rewrite the query to be faster? I ask this because this is a part of multiple table join and joining event with signature is the bottleneck that is slowing down the query tremendously
I am using 5.1.52 MySQL and SQLAlchemy 0.7.8 as ORM

Your query does require a full scan, by definition.
That is, you give no filtering condition. No ... WHERE sig_rev = 17, for example.
Therefore, there is not much to improve here. MySQL picks a table to start with, does a full scan, and per row fetches matching rows from second table.
So the scan is essential. But you may turn it into an index scan instead of a table scan. I am assuming you have an index on the signature column only, and on the sig_id column only.
What you may do is create an additional index on sig_id, sig_name, like this:
ALTER TABLE signature ADD UNIQUE INDEX(sig_id, sig_name);
The index is unique by definition, since it is broader than the PRIMARY KEY, but this is outside the point.
What you may gain now is an execution plan similar to the second example you have posted: an index scan on signature, followed by an index lookup on event.
Make sure to compare and verify that you do get performance boost on this particular query. Check that the new index does not hurt INSERT performance etc.
Good luck.

Related

Performance problem in MySQL - Sum() and select is taking too long

i have a performance problem in MySQL.
I have a table with 215000 rows (InnoDB Engine) inserted in it and to execute the function SUM() on one column for only 1254 rows is taking 500ms.
The version i am using is : MySQL 5.7.32
The computer specs are the following:
Core I5 3.0 Ghz
8 Gb Ram
Solid State Drive
Here i leave information about the structure of the database:
mysql> select count(*) from cuenta_corriente;
+----------+
| count(*) |
+----------+
| 214514 |
+----------+
mysql> describe cuenta_corriente;
+-----------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+----------------+
| ID_CUENTA_CORRIENTE | int(11) | NO | PRI | NULL | auto_increment |
| ID_CLIENTE | int(11) | YES | MUL | NULL | |
| ID_PROVEEDOR | int(11) | YES | MUL | NULL | |
| FECHA | varchar(50) | YES | MUL | NULL | |
| FECHA_FISCAL | varchar(50) | YES | | NULL | |
| ID_OPERACION | int(11) | YES | MUL | NULL | |
| ID_TIPO_OPERACION | int(11) | YES | MUL | NULL | |
| DEBE | double | YES | | 0 | |
| HABER | double | YES | | 0 | |
| TOTAL | double | YES | | 0 | |
| SALDO_ANTERIOR | double | YES | | 0 | |
| SALDO_ACTUAL | double | YES | | 0 | |
| ID_OPERACION_ASOCIADA | int(11) | YES | | NULL | |
| ELIMINADO | int(11) | YES | | 0 | |
| ID_EMPLEADO | int(11) | NO | | 0 | |
+-----------------------+-------------+------+-----+---------+----------------+
show indexes from cuenta_corriente;
+------------+------------------------------------------------------------+
| Non_unique | Key_name Column_name |
+------------+------------------------------------------------------------+
| 0 | PRIMARY ID_CUENTA_CORRIENTE |
| 1 | IDX_CUENTA_CORRIENTE ID_CLIENTE |
| 1 | IX_cuenta_corriente_FECHA FECHA |
| 1 | IX_cuenta_corriente_ID_CLIENTE ID_CLIENTE |
| 1 | IX_cuenta_corriente_ID_PROVEEDOR ID_PROVEEDOR |
| 1 | IX_cuenta_corriente_ID_TIPO_OPERACION ID_TIPO_OPERACION |
| 1 | IX_cuenta_corriente_ID_OPERACION ID_OPERACION |
| 1 | IDX_cuenta_corriente_ID_OPERACION ID_OPERACION |
+------------+------------------------------------------------------------+
8 rows in set (0.00 sec)
The problem is with the folowing queries, in my opinion they are taking too long, considering that i have an index for the column ID_CLIENTE and that there are only 1254 rows with the ID_CLIENTE column = 19. Here are the query results:
mysql> SELECT COUNT(*) FROM CUENTA_CORRIENTE WHERE ID_CLIENTE = 19;
1254 rows
mysql> SELECT DEBE FROM CUENTA_CORRIENTE WHERE ID_CLIENTE = 19;
1254 rows - 0.513 sec
mysql> SELECT SUM(DEBE) FROM CUENTA_CORRIENTE WHERE ID_CLIENTE = 19;
0.582 sec
The strange thing is if i select all the columns instead selecting only the "DEBE" column, it takes less time:
mysql> SELECT * FROM CUENTA_CORRIENTE WHERE ID_CLIENTE = 19;
0.095 sec
Can anyone help me to improve the performance?

You can make just that query fast by creating a composite index to support it.
ie:
CREATE INDEX IDX_QUERY_FAST ON cuenta_corriente (ID_CLIENTE, DEBE)
But don't forget, each index has to be maintained, so it slows down any inserts into the table, so you don't want 200 indexes supporting every possible query.
With the existing index, the engine should be smart enough to identify the 1200 rows you care about using the index, but then it has to go read all the table records (across however many pages) that have the individual rows to get the DEBE column.

Add this index to help most of the queries you have shown:
INDEX(ID_CLIENTE, DEBE)
and drop INDEX(ID_CLIENTE) if you have such.
Are all of your secondary indexes starting with the PRIMARY KEY?? (The names imply such; please provide SHOW CREATE TABLE to definitively say what columns are in each index.) Don't start an index with the PK; it is likely to be useless.
Run EXPLAIN SELECT ... to see which index a query uses.
When timing a query, run it twice. The first run may spend extra time loading index or data rows into cache (in RAM in the buffer_pool); the second run may be significantly faster because of the caching.

What index should increase performance of select query?

This is table structure:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| visitor_hash | varchar(40) | YES | MUL | NULL | |
| uri | varchar(255) | YES | | NULL | |
| ip_address | char(15) | YES | MUL | NULL | |
| last_visit | datetime | YES | | NULL | |
| visits | int(11) | NO | | NULL | |
| object_app | varchar(255) | YES | MUL | NULL | |
| object_model | varchar(255) | YES | | NULL | |
| object_id | varchar(255) | YES | | NULL | |
| blocked | tinyint(1) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
This is request:
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
Time for response is ~77,63 ms.
CREATE INDEX resource_model ON visits_visit (object_model(100));
After this request the time for response increased to ~150ms.
How to improve performance for this case? Thank you.
UPDATED:
Answering to Michal Komorowski.
This is explain before index:
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ALL | NULL | NULL | NULL | NULL | 142938 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)
And this is after index:
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
| 1 | SIMPLE | visits_visit | ref | resource_model | resource_model | 303 | const | 64959 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+----------------+----------------+---------+-------+-------+----------------------------------------------+
1 row in set (0.00 sec)
I don't know what gives me this information.
SELECT `object_id`
FROM `visits_visit`
WHERE `object_model` = 'News'
GROUP BY `object_id`
ORDER BY COUNT( * ) DESC
LIMIT 0, 3
78,85 ms before indexing and 365,59 ms after indexing.
Also i have index
CREATE INDEX resource ON visits_visit (object_app(100), object_model(100), object_id(100));
But i need this one, because in other select queries WHERE contains this three keys.
UPDATE:
I'm using django debug toolbar to test performance of requests.
UPDATE:
Query:
ANALYZE TABLE visits_visit;
Output:
+-----------------------------+---------+----------+-----------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+---------+----------+-----------------------------+
| **************.visits_visit | analyze | status | Table is already up to date |
+-----------------------------+---------+----------+-----------------------------+
1 row in set (0.00 sec)
UPDATE:
SHOW INDEXES FROM visits_visit;
Output:
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| visits_visit | 0 | PRIMARY | 1 | id | A | 142938 | NULL | NULL | | BTREE | | |
| visits_visit | 1 | visits_visit_0880babc | 1 | visitor_hash | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | visits_visit_5325a746 | 1 | ip_address | A | 142938 | NULL | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 1 | object_app | A | 1 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 2 | object_model | A | 3 | 100 | NULL | YES | BTREE | | |
| visits_visit | 1 | resource | 3 | object_id | A | 959 | 100 | NULL | YES | BTREE | | |
+--------------+------------+-----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

It seems to me that although you have an index, MySQL doesn't know how to use it properly. It happens when information about data distribution (statistics) within a table are not up to date. In order to update them you should call ANALYZE TABLE visits_visit and then check results.

I was confused by misunderstanding of sql mechanisms, so i decided to create model Popular and save instances in it every 24 hours. Thanks to everyone, who tried to help.

As I said in your other question, Prefix indexes are virtually useless; don't use them except in rare circumstances.
Shrink the fields to reasonable lengths and you won't be tempted to use Prefix indexes.
The optimal index for that query is INDEX(object_model, object_id). Attempting to use INDEX(object_model(##), ...) will not get past object_model to anything after it.
If object_model is things like 'News', I suspect the other possible values are short, and perhaps there is a finite number of models. For "short" change to some smaller VARCHAR. For "finite", consider using ENUM('News', 'Weather', 'Sports', ...).
As for why it took longer after indexing...
Without the index, the Optimizer had no choice but to scan the entire table. This is a simple linear scan. It would read but not count any non-News rows.
With the index, the Optimizer has the additional choice of using the index. But, perhaps most rows are News? Well, it would scan the index (nice), but for each News item in the index, it would have to look up the row to get object_id (not so nice). It seems (from the timings) that the latter is less efficient.
By shrinking the declarations and using INDEX(object_model, object_id) (in this order), the query can be performed in the index. Think of the index as a mini-table with just those two columns in it. It is smaller. It is ordered by model, so it only needs to scan the 'News' part. The explain will show this "covering" by saying "Using index".
If all cases, the GROUP BY adds some overhead -- either keeping a hash of object_id in RAM or by saving intermediate results and sorting them. Then the ORDER BY requires a sort (or a priority hash) before the LIMIT can apply.

Spliting database according to user's ID

I have a database of 5m rows and it grows and it's getting harder and harder to do operations with it.
Is it a good idea to split the table in 10 tables (v0_table, v1_table... v9_table), where the number(v*) is the first number of the user's id?
The user's id in my case are not auto-increment so it would sort the data evenly across those 10 tables.
The problem is I have never done similar things....
Can anyone spot any disadvantages?
EDIT:
I would appreciate any help with tuning the structure or the query.
So the slowest query is the following one:
SELECT logos.user,
logos.date,
logos.level,
logos.title,
Count(guesses.id),
Sum(guesses.points)
FROM logos
LEFT JOIN guesses
ON guesses.user = '".$user['uid']."'
AND guesses.done = '1'
AND guesses.logo = logos.id
WHERE open = '1'
GROUP BY level
Where guesses table:
+--------+------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| logo | int(11) | NO | MUL | NULL | |
| user | int(11) | NO | MUL | NULL | |
| date | timestamp | NO | | CURRENT_TIMESTAMP | |
| points | int(4) | YES | MUL | 100 | |
| done | tinyint(1) | NO | MUL | 0 | |
+--------+------------+------+-----+-------------------+----------------+
LOGOS table:
+-------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| img | varchar(222) | NO | MUL | NULL | |
| level | int(3) | NO | MUL | NULL | |
| date | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| user | int(11) | NO | MUL | NULL | |
| open | tinyint(1) | NO | MUL | 0 | |
+-------+--------------+------+-----+-------------------+----------------+
EXPLAIN:
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+
| 1 | SIMPLE | logos | ref | open | open | 1 | const | 521 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | guesses | ref | done,user,logo | user | 4 | const | 87 | |
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+

Your problem isn't that you have too much data, it's that this data is not properly indexed. Try adding an index:
CREATE INDEX open_level ON logos(open, level)
This should eliminate Using temporary; Using filesort on logos.
Basically, you need an index on this table for this query to cover two things: open - for WHERE open = '1' and level - for GROUP BY level in this order, as MySQL will first filter by open, then will group the results by level (implicitly sorting by it in process).

Short and sweet: No. This is never a good idea. Is your table properly indexed? Is MySQL properly tuned? Are your queries efficient? Are you using any caching?

Instead of sharding your table, you may want to examine other tables in your database to see if they can be split off into other dbs. For example tables, that are never joined to are great candidates for this type of vertical partitioning.
This allows you to optimize hardware for smaller sets of data.

analyze query mysql

I have a question about, how to analyze a query to know performance of its (good or bad).
I searched a lot and got something like below:
SELECT count(*) FROM users; => Many experts said it's bad.
SELECT count(id) FROM users; => Many experts said it's good.
Please see the table:
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| userId | int(11) | NO | PRI | NULL | auto_increment |
| f_name | varchar(50) | YES | | NULL | |
| l_name | varchar(50) | YES | | NULL | |
| user_name | varchar(50) | NO | | NULL | |
| password | varchar(50) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
| active | char(1) | NO | | Y | |
| groupId | smallint(4) | YES | MUL | NULL | |
| created_date | datetime | YES | | NULL | |
| modified_date | datetime | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
But when I try to using EXPLAIN command for that, I got the results:
EXPLAIN SELECT count(*) FROM `user`;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT count(userId) FROM user;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
So, the first thing for me:
Can I understand it's the same performance?
P/S: MySQL version is 5.5.8.

No, you cannot. Explain doesn't reflect all the work done by mysql, it just gives you a plan of how it will be performed.
What about specifically count(*) vs count(id). The first one is always not slower than the second, and in some cases it is faster.
count(col) semantic is amount of not null values, while count(*) is - amount of rows.
Probably mysql can optimize count(col) by rewriting into count(*) as well as id is a PK thus cannot be NULL (if not - it looks up for NULLS, which is not fast), but I still propose you to use COUNT(*) in such cases.
Also - the internall processes depend on used storage engine. For myisam the precalculated number of rows returned in both cases (as long as you don't use WHERE).

In the example you give the performance is identical.
The execution plan shows you that the optimiser is clever enough to know that it should use the Primary key to find the total number of records when you use count(*).

There is not significant difference when it comes on counting. The reason is that most optimizers will figure out the best way to count rows by themselves.
The performance difference comes to searching for values and lack of indexing. So if you search for a field that has no index assigned {f_name,l_name} and a field that has{userID(mysql automatically use index on primary keys),groupID(seems like foraign key)} then you will see the difference in performance.

How can a 'WHERE column LIKE "%expression%" ' perform better than a MATCH(column) AGAINST("expression") in MySQL?

I've run into a serious MySQL performance bottleneck which I'm unable to understand and resolve. Here are the table structures, indexes and record counts (bear with me, it's only two tables):
mysql> desc elggobjects_entity;
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| guid | bigint(20) unsigned | NO | PRI | NULL | |
| title | text | NO | MUL | NULL | |
| description | text | NO | | NULL | |
+-------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> show index from elggobjects_entity;
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| elggobjects_entity | 0 | PRIMARY | 1 | guid | A | 613637 | NULL | NULL | | BTREE | |
| elggobjects_entity | 1 | title | 1 | title | NULL | 131 | NULL | NULL | | FULLTEXT | |
| elggobjects_entity | 1 | title | 2 | description | NULL | 131 | NULL | NULL | | FULLTEXT | |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
mysql> select count(*) from elggobjects_entity;
+----------+
| count(*) |
+----------+
| 613637 |
+----------+
1 row in set (0.00 sec)
mysql> desc elggentity_relationships;
+--------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| guid_one | bigint(20) unsigned | NO | MUL | NULL | |
| relationship | varchar(50) | NO | MUL | NULL | |
| guid_two | bigint(20) unsigned | NO | MUL | NULL | |
| time_created | int(11) | NO | | NULL | |
+--------------+---------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> show index from elggentity_relationships;
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| elggentity_relationships | 0 | PRIMARY | 1 | id | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 1 | guid_one | A | NULL | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 2 | relationship | A | NULL | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 3 | guid_two | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 1 | relationship | 1 | relationship | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 1 | guid_two | 1 | guid_two | A | 11408236 | NULL | NULL | | BTREE | |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)
mysql> select count(*) from elggentity_relationships;
+----------+
| count(*) |
+----------+
| 11408236 |
+----------+
1 row in set (0.00 sec)
Now I'd like to use an INNER JOIN on those two tables and perform a full text search.
Query:
SELECT
count(DISTINCT o.guid) as total
FROM
elggobjects_entity o
INNER JOIN
elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
((MATCH (o.title, o.description) AGAINST ('scelerisque' )))
This gave me a 6 minute (!) response time.
On the other hand this one
SELECT
count(DISTINCT o.guid) as total
FROM
elggobjects_entity o
INNER JOIN
elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
((o.title like "%scelerisque%") OR (o.description like "%scelerisque%"))
returned the same count value in 0.02 seconds.
How is that possible? What am I missing here?
(MySQL info: mysql Ver 14.14 Distrib 5.1.49, for debian-linux-gnu (x86_64) using readline 6.1)
EDIT
EXPLAINing the first query (using match .. against) gives:
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | r | ref | guid_one,relationship | relationship | 152 | const | 6145 | Using where |
| 1 | SIMPLE | o | fulltext | PRIMARY,title | title | 0 | | 1 | Using where |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
while the second query (using LIKE "%..%"):
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| 1 | SIMPLE | r | ref | guid_one,relationship | relationship | 152 | const | 6145 | Using where |
| 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 8 | elgg1710.r.guid_one | 1 | Using where |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+

By combining your experience and EXPLAIN's results, it seems that fulltext index is not as useful as you expect in this particular case. This depends on particular data in your database, on database structure or/and particular query.
Usually database engines use no more than one index per table. So when the table has more than one index, query optimizer tries to use the better one. But optimizer is not always clever enough.
EXPLAIN's output shows that database query optimizer decided to use indexes for relationship and title. The relationship filter reduces table elggentity_relationships to 6145 rows. And the title filter reduces the table elggobjects_entity to 72697 rows. Then MySQL needs to join those tables (6145 x 72697 = 446723065 filtering operations) without using any index because indexes have already been used for filtering. In this case this can be too much. MySQL can even make a decision to keep intermediate calculations in the hard drive by trying to keep enough free space in memory.
Now let's take a look into another query. It uses relationship and PRIMARY KEY (of table elggobjects_entity) as its indexes. The relationship filter reduces table elggentity_relationships to 6145 rows. By joining those tables on PRIMARY KEY index, the result gets only 3957 rows. This is not much for the last filter (i.e. LIKE "%scelerisque%"), even if index is NOT used for this purpose at all.
As you can see the speed much depends on indexes selected for a query. So, in this particular case the PRIMARY KEY index is much more useful than fulltext title index, because PRIMARY KEY has bigger impact for result reduction than title.
MySQL is not always clever to set the right indexes. We can do this manually, by using clauses like IGNORE INDEX (index_name), FORCE INDEX (index_name), etc.
But in your case the problem is that if we use MATCH() AGAINST() in a query then the fulltext index is required, because MATCH() AGAINST() doesn't work without fulltext index at all. So this is the main reason why MySQL has chosen incorrect indexes for the query.
UPDATE
OK, I did some investigation.
Firstly, you may try to force MySQL to use guid_one index instead of relationship on table elggentity_relationships: USE INDEX (guid_one).
But for even better performance I think you can try to create one index for the composition of two columns (guid_one, membership). Current index guid_one is very similar, but for 3 columns, not for 2. In this query there are only 2 columns used. In my opinion after index creation MySQL should automatically use the right index. If not, force MySQL to use it.
Note: After index creation don't forget to remove old USE INDEX instruction from your query, because this may prevent query from using the newly created index. :)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008