SELECT DISTINCT on two TEXT columns slow in MySQL 5.7.20 - mysql

A "select distinct col1,col2 from table1" where col1 and col2 are of type TEXT and table1 has about 65K rows works fine with MySQL 5.5.58. Now that I've upgraded to MySQL 5.7.20 it takes almost an hour! Does anyone know of any changes to MySQL that may be causing this? Does anyone have any suggestions how col1 and col2 should be optimally indexed for this query, or what other settings I should check to make this query run faster? I don't get the feeling that indexes are even being used since EXPLAIN says it's using a temporary table and no keys:
mysql> `
explain SELECT DISTINCT author,sort_author from itemsbyauthor;
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| 1 | SIMPLE | itemsbyauthor | NULL | ALL | NULL | NULL | NULL | NULL | 64727 | 100.00 | Using temporary |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)

In many cases, MySQL doesn't use prefix indexes properly and it seems this is one of these cases.
Do you really need the column type to be TEXT?
From the column names, it looks like the columns are holding author names, which seems like a relatively short string (let's say, up to 50 or 100 characters)?
I would re-consider the column type and try to alter it to VARCHAR with a fixed size, instead of TEXT.
Then, add a compound index that includes both columns.

Related

Why are no keys used in this EXPLAIN?

I was expecting this query to use a key.
mysql> DESCRIBE TABLE Foo;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | UNI | NULL | |
+-------+-------------+------+-----+---------+----------------+
mysql> EXPLAIN SELECT id FROM Foo WHERE name='foo';
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
Foo has a unique index on name, so why isn't the index being used in the SELECT?
From the MySQL Manual page entitled EXPLAIN Output Format:
Impossible WHERE noticed after reading const tables (JSON property:
message)
MySQL has read all const (and system) tables and notice that the WHERE
clause is always false.
and the definition of const tables, from the Page entitled Constants and Constant Tables:
A MySQL constant is something more than a mere literal in the query.
It can also be the contents of a constant table, which is defined as
follows:
A table with zero rows, or with only one row
A table expression that is restricted with a WHERE condition,
containing expressions of the form column = constant, for all the
columns of the table's primary key, or for all the columns of any of
the table's unique keys (provided that the unique columns are also
defined as NOT NULL).
The second reference is a page and half long. Please refer to it.
const
const
The table has at most one matching row, which is read at the start of
the query. Because there is only one row, values from the column in
this row can be regarded as constants by the rest of the optimizer.
const tables are very fast because they are read only once.
const is used when you compare all parts of a PRIMARY KEY or UNIQUE
index to constant values. In the following queries, tbl_name can be
used as a const table:
SELECT * FROM tbl_name WHERE primary_key=1;
SELECT * FROM tbl_name WHERE primary_key_part1=1 AND
primary_key_part2=2;
It could be because that the said table Foo very less volume of data. In such case optimizer will choose to do table scan rather than looking through index.
As MySQL Documentation clearly says
Indexes are less important for queries on small tables, or big tables
where report queries process most or all of the rows. When a query
needs to access most of the rows, reading sequentially is faster than
working through an index. Sequential reads minimize disk seeks, even
if not all the rows are needed for the query.

simple SQL statement takes longer time to execute [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Disadvantages of quoting integers in a Mysql query?
I have a very simple table Called Device on MYSql database.
+-----------------------------------+--------------+------+-----+----------------+
| Field | Type | Null | Key | Extra |
+-----------------------------------+--------------+------+-----+----------------+
| DTYPE | varchar(31) | NO | | |
| id | bigint(20) | NO | PRI | auto_increment |
| dateCreated | datetime | NO | | |
| dateModified | datetime | NO | | |
| phoneNumber | varchar(255) | YES | MUL | |
| version | bigint(20) | NO | | |
| oldPhoneNumber | varchar(255) | YES | | |
+-----------------------------------+--------------+------+-----+----------------+
This table has more than 100K records. I am running a very simple query
select * from AttDevice where phoneNumber = 5107357058;
This query takes almost 4-6 second, But when I change this query a little bit as shown below.
select * from AttDevice where phoneNumber = '5107357058';
It takes almost no time to get executed.
Notice that phoneNumber column is varchar. I don't understand why the former case takes longer time and later doesn't. The difference between these two queries is the single quote.
Does MYSQL treats these to query differently if so then why?
EDIT 1
I used EXPLAIN and got the following output but don't know how to interpret these two results.
mysql> EXPLAIN select * from AttDevice where phoneNumber = 5107357058;
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | Device | ALL | phoneNumber,idx_Device_phoneNumber | NULL | NULL | NULL | 6482116 | Using where |
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
mysql> EXPLAIN select * from AttDevice where phoneNumber = '5107357058';
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
| 1 | SIMPLE | Device | ref | phoneNumber,idx_Device_phoneNumber | phoneNumber | 258 | const | 2 | Using where |
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
Can someone explain me about the key, key_len and rows present in EXPLAIN query output?
1) Thank you for the "EXPLAIN". We all (including you, I'm sure) knew that the problem was that mysql had to convert the integer to a string, and had to do it for each row. But your "EXPLAIN" proved it.
2) Here's a nice, short article about EXPLAIN:
http://www.lornajane.net/posts/2011/explaining-mysqls-explain
The *possible_keys* shows which indexes apply to this query and the key
tells us which of those was actually used -... Finally the rows entry tell
us how many rows MySQL had to look at to find the result set.
Search value: key: type: ref: rows:
------------- --- ---- ---- ----
5107357058 NULL ALL NULL 6482116
'5107357058' phoneNumber ref const 2
3) The "ref" column is the "The columns compared to the index". In the second case, the string literal ("constant") '5107357058' was compared to the key "phoneNumber". In the first case, there was no usable key (because your search condition was a completely different type); hence "ref" was NULL.
4) The "type" column is "The join type". "Ref" means "All rows with matching index values are read from this table" (in this case, 2 rows). "ALL" mans "full table scan". Which in this case means 6 million rows.
5) Here's the mysql documentation for "EXPLAIN":
http://dev.mysql.com/doc/refman/5.5/en/explain-output.html
You fooled MySQL into making a bad choice by NOT quoting the phone number. Consider:
The column definition is varchar
In the first (unquoted) case you provided the value as an integer (long). I would have thought MySQL could figure this one out, but obviously it didn't, and did a full table scan.
In the second (quoted) case, you gave the search key in the correct datatype (character) and MySQL chose the index over the full-table-scan.
The varchar index cannot be used when you use a number as the operand, excerpt from the fine documentation on implicit type conversions:
For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:
SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may convert to the value 1, such as '1', ' 1', or '1a'.
I believe that MySQL has to convert the number into a varchar in the first example. In the second example it does not. I'm guessing that's where the difference is coming from.
The first example looks through the table one by one, the other one uses the index.
http://dev.mysql.com/doc/refman/5.0/en/show-columns.html
If Key is MUL, multiple occurrences of a given value are permitted within the column. The column is the first column of a nonunique index or a unique-valued index that can contain NULL values.
So instead of scanning all the null values, the second query look exclusively for for non-null values which speeds things up.
....I think.

MySQL MyISAM table index cardinality is zero

I have a table containing 60 million rows. The structure is like entryid, date, sourceid, detail, views. (entryid, date, sourceid, detail) is the PK, and I also have indexes for each field except views.
The problem is the cardinalities of the four indexes are zero, but I am sure they should not.
I wonder why is that? And does it mean the index doesn't work?
It's possible that the table statistics have not been updated.
See this page on optimizing MyISAM tables:
To help MySQL better optimize queries, use ANALYZE TABLE or run
myisamchk --analyze on a table after it has been loaded with data.
This updates a value for each index part that indicates the average
number of rows that have the same value. (For unique indexes, this is
always 1.) MySQL uses this to decide which index to choose when you
join two tables based on a nonconstant expression. You can check the
result from the table analysis by using SHOW INDEX FROM tbl_name and
examining the Cardinality value. myisamchk --description --verbose
shows index distribution information.
The best way to determine whether an index is helping is to explain a query:
mysql> explain select 1;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

Column optimization for LIKE queries

I have a database table with 300 000 records. The most common query done is:
SELECT [..] WHERE `word` LIKE 'userInput%';
At the moment the column type is varchar(50) UNIQUE. I am wondering whether there is a way to optimize the column for this specific query?
Update 2011 11 19 GMT 00:00:
mysql> EXPLAIN SELECT `word_id`, `word` FROM `words` WHERE `word` LIKE 'bar%'
-> ;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | words | range | word | word | 152 | NULL | 435 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
As long as you are using a "starts" with type wild card search, just a standard index should work fine on word.
It's only when you start getting into a "contains" wild card search that you start having hard issues with indexing.
Of course, attaching an explain plan would help...
Based on the explain plan, it looks like you're doing about as good as you can do. It's indicating that it's using the index properly, and not doing any full table scans or file sorts.

Getting a Column's Max Value

Is there any tangible difference (speed/efficiency) between these statements? Assume the column is indexed.
SELECT MAX(someIntColumn) AS someIntColumn
or
SELECT someIntColumn ORDER BY someIntColumn DESC LIMIT 1
This depends largely on the query optimizer in your SQL implementation. At best, they will have the same performance. Typically, however, the first query is potentially much faster.
The first query essentially asks for the DBMS to inspect every value in someIntColumn and pick the largest one.
The second query asks the DBMS to sort all the values in someIntColumn from largest to smallest and pick the first one. Depending on the number of rows in the table and the existence (or lack thereof) of an index on the column, this could be significantly slower.
If the query optimizer is sophisticated enough to realize that the second query is equivalent to the first one, you are in luck. But if you retarget your app to another DBMS, you might get unexpectedly poor performance.
EDIT based on explain plan:
Explain plan shows that max(column) is more efficient. The explain plan say, “Select tables optimized away”.
EXPLAIN SELECT version from schema_migrations order by version desc limit 1;
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
| 1 | SIMPLE | schema_migrations | index | NULL | unique_schema_migrations | 767 | NULL | 1 | Using index |
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT max(version) FROM schema_migrations ;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)