Add Index to Date String field in MySql - mysql

I have a table with approximately 2mn records. It has a column with date values in String format (similar format). Now I need to filter records based on this string date column. I tried with STR_TO_DATE but it takes ages to fetch records as this column doesn't have an INDEX.
Can anyone help me adding an Index to it?

Wrapping a column in function in a predicate forces MySQL to evaluate the function on every row, effectively disabling a potentially more efficient index seek or range scan.
As an example, MySQL can't use an index range scan with this
... FROM t WHERE STR_TO_DATE(t.mycol,'%Y-%m-%d') = '2018-04-23'
^^^^^^^^^^^^ ^^^^^^^^^^^^
but having the SQL reference a bare column would allow MySQL to consider using a range scan operation on an appropriate index ...
... FROM t WHERE t.mycol = DATE_FORMAT('2018-04-23','%Y-%m-%d')
^^^^^^^
A first cut at an suitable index for the latter query would be
CREATE INDEX t_IX1 ON t (mycol)
This isn't necessarily the best index for the query. It really depends on the query. For example, a covering index might be a more suitable choice.
The question mentions storing "date" values as strings, presumably meaning CHAR or VARCHAR datatype. Note that MySQL implements a native DATE datatype, which is custom designed for storing "date" values.

Related

Why mysql hit index when column wrap with date function

IMO, the below mysql will not hit index when the add_time field wrap with the date() func, but from the explain result, it hit index, why?
explain
SELECT count(0)
FROM xxx
WHERE date(add_time) >= date_sub(curdate(), INTERVAL 7 day);
The EXPLAIN report is known to be hard to interpret. They have overloaded too much information into a few fields. You might try
The type: index indicates it's doing an index-scan, which means it's visiting every entry in the index.
This visits the same number of entries as a table-scan, except it's against a secondary index instead of the clustered (primary) index.
When we see type: index, EXPLAIN shows possible_keys: NULL which means it can't use any index for searching efficiently. But it also shows key: add_time which means the index it's using for the index-scan is add_time.
The index-scan is due to the fact that MySQL cannot optimize expressions or function calls by itself. For example, if you were to try to search for dates with a specific month, you could search for month(add_time) = 4 but that wouldn't use the index on add_time because the dates with that month are scattered through the index, not all grouped together.
You may know that date(add_time) should be able to be searched by the index, but MySQL does not make that inference. MySQL just sees that you're using a function, and it doesn't even try to use the index.
That's why MySQL 5.7 introduced generated columns to allow us to index an expression, and MySQL 8.0 made it even better by allowing an index to defined for an expression without requiring us to define a generated column first.
I think you may be misinterpreting the explain plan here. If you look under possible_keys, you will see that it is NULL. From the MySQL documentation on explain output format:
If this column is NULL (or undefined in JSON-formatted output), there are no relevant indexes.
So, the index which is being used as mentioned in the Extra section probably has nothing to do with add_time.

MySQL query optimization access method.

What does it mean to have access type "All" in the query optimizer? I found this using the explain=format json statement in MySQL.
"ALL" means it's a full table scan, no index can be used at all, this is not good, you'd better add some indexes.
Check out the Mannul
Here "ALL" means your query's return output after a full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. Normally, you can avoid ALL by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.
One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. Table1.column1 is declared as CHAR(10) and Table2.column2 is CHAR(15), so there is a length mismatch. if both table's both columns have same datatype and length that give more optimize output.
Mostly developer make many indexes on one table, so you can specify which index use for your query by use keyword "USER INDEX"
e.g:
SELECT select_fields
FROM tablename USE INDEX(indexname)
WHERE condition;

Exclude records with empty binary column data

I have a column with type binary(16) not null in a mysql table. Not all records have data in this column, but because it setup to disallow NULL such records have an all-zero value.
I want to exclude these all-zero records from a query.
So far, the only solution I can find is to HEX the value and compare that:
SELECT uuid
FROM example
WHERE HEX(uuid) != '00000000000000000000000000000000'
which works, but is there a better way?
To match a binary type using a literal, use \0 or \1 for the bits.
In your case with a binary(16), this is how to exclude only zeroes:
where uuid != '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'
See SQLFiddle.
The advantage of using a plain comparison with a literal (like this) is that an index on the column can be used and it's a lot faster. If you invoke functions to make the comparison, indexes will not be used and the function(s) must be invoked on every row, which can cause big performance problems on large tables.
SELECT uuid FROM example WHERE TRIM('\0' FROM uuid)!='';
Note that Bohemians answer is a lot neater, I just use this when I am not sure about the length of the field (Which probably comes down to bad design on some level, feel free to educate me).
select uuid from example where uuid!=cast('' as binary(N));
where N is the length of the UUID column. Not pretty, but working.

Understanding Indexes in MySQL

I am trying to understand indexes in MySQL. I know that an index created in a table can speed up executing queries and it can slow down the inserting and updating of rows.
When creating an index, I used this query on a table called authors that contains (AuthorNum, AuthorFName, AuthorLName, ...)
Create index Index_1 on Authors ([What to put here]);
I know I have to put a column name, but which one?
Do I have to put the column name that will be compared in the Where statement when a user query the Table or what?
The Anatomy of an Index
An index is a distinct data structure within a database and is data redundancy. Its primary purpose is to provide an ordered representation of the indexed data through a logical ordering which is independent of the physical ordering. We do this using a doubly linked list and a tree structure known as the balanced search tree (B-tree). B-trees are nice because they keep data sorted and allow searches, access, insertions, and deletions in logarithmic time. Because of the doubly linked list, we are able to go backwards or forwards as needed on the index for various queries easily. Inserts become simple since we only have to rearrange pointers to the different pieces of data. Databases use these doubly linked list to connect leaf nodes (usually in a B+ tree or B-tree), each of which are stored in a page, and to establish logical ordering between the leaf nodes. Operations like UPDATE or INSERT become slower because they are actually two writing operations in the filesystem (one for the table data and one for the index data).
Defining an Optimal Index With WHERE
To define an optimal index you must not only understand how indexes work, but you must also understand how the application queries the data. E.g., you must know the column combinations that appear in the WHERE clause.
A common restriction with queries on LAST_NAME and FIRST_NAME columns deals with case sensitivity. For example, instead of doing an exact search like Hotinger we would prefer to match all results such as HoTingEr and so on. This is very easy to do in a WHERE clause: we just say WHERE UPPER(LAST_NAME) = UPPER('Hotinger')
However, if we define an index of LAST_NAME and query, it will actually run a full table scan because the query is not on LAST_NAME but on UPPER(LAST_NAME). From the database's perspective, this is completely different. So, in this case you should define the index on UPPER(LAST_NAME) instead.
Indexes do not necessarily have to be for one column. For example, if the primary key is a composite key (consisting of multiple columns) it will create a concatenated index also known as a combined index. Note that the ordering of the concatenated index has a significant impact on its usability and scalability so it must be chosen carefully. Basically, the ordering should match the way it is ordered in the WHERE clause.
Defining an Optimal Index With LIKE
The position of the wildcard characters makes a huge difference. LIKE clauses only use the characters before the wildcard during tree traversal; the rest do not narrow the scanned index range. The more selective the prefix of the LIKE clause the more narrow the scanned index becomes. This makes the index lookup faster. As a tip, avoid LIKE clauses which lead with wildcards like "%OTINGER%" For full-text searches, MySQL offers MATCH and AGAINST keywords. Starting with MySQL 5.6, you can have full-text indexes. Look at Full-Text Search Functions from MySQL for more in-depth discussion on indexing these results.
Yes, generally you need an index on the column or columns that you compare in the WHERE clause of your queries to speed up queries.
If you search by AuthorFName, then you create an index on that column. If they search by AuthorLName, then you create an index on that column.
In this case though, maybe what you should be looking at is a FULLTEXT index. That would allow users to enter fuzzy queries, which would return a number of results ordered by relevance.
From the MySQL Manual:
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially. If you need to access most of the rows, it is faster to
read sequentially, because this minimizes disk seeks.
An index usually means a B-Tree. Understand the structure of the B-Tree and you'll understand what index can and cannot do.
In your particular case:
WHERE AuthorLName = 'something' and WHERE AuthorLName LIKE 'something%' can be sped-up by an index on {AuthorLName}.
WHERE AuthorLName = 'something AND AuthorFName = 'something else' can be sped-up by a composite index on {AuthorLName, AuthorFName} or {AuthorFName, AuthorLName}.
WHERE AuthorLName = 'something OR AuthorFName = 'something else' (which doesn't make much sense, but is here as an example) can be sped-up by having two indexes: on {AuthorLName} and on {AuthorFName}.
WHERE AuthorLName LIKE '%something' cannot be sped-up by a B-Tree index (cunsider full-text indexing).
Etc...
See Use The Index, Luke! for a much more thorough treatment of the subject than possible in a simple SO post.
Limited length index:
When using text columns or very large varchar columns you won't be able to create an index over the entire length of the text/varchar, there are some limits (around 1024 ASCII characters in length).
In such a case you specify the length in the index declaration.
CREATE INDEX `my_limited_length_index` ON `my_table`(`long_text_content`(512));
-- please notice the use of the numeric length of the index after the column name
Processed value index (apparently available in PostgreSQL not MySQL):
Indexes are not exclusively built from one column, some may be built from multiple columns and other may be built from just some of the info a column has. For example if you have a full datetime column but you know you're only going to filter records by date you can build an index based on the datetime column but only containing date info.
-- `my_table` has a `created` column of type timestamp
CREATE INDEX `my_date_created` ON `my_table`(DATE(`created`));
-- please notice the use of the DATE function which extracts only
-- the date from the `created` timestamp
index shall span the columns you are going to use in WHERE statement.
To better understand, here is an example:
SELECT * FROM Authors WHERE AuthorNum > 10 AND AuthorLName LIKE 'A%';
SELECT * FROM Authors WHERE AuthorLName LIKE 'Be%';
If you are often using the shown above queries, you are highly adviced to have two indexes:
Create index AuthNum_AuthLName_Index on Authors (AuthorNum, AuthorLName);
Create index AuthLName_Index on Authors (AuthorLName);
The key thing to remember: index shall have the same combiation of columns used in WHERE statements

MySQL CAST() causes significant performance hit

So I ran the following in the MySQL console as a control test to see what was holding back the speed of my query.
SELECT bbva_deductions.ded_code, SUBSTRING_INDEX(bbva_deductions.employee_id, '-' , -1) AS tt_emplid,
bbva_job.paygroup, bbva_job.file_nbr, bbva_deductions.ded_amount
FROM bbva_deductions
LEFT JOIN bbva_job
ON CAST(SUBSTRING_INDEX(bbva_deductions.employee_id, '-' , -1) AS UNSIGNED) = bbva_job.emplid LIMIT 500
It took consistently around 4 seconds to run. (seems very high for only 500 rows). Simply removing the CAST within the JOIN decreased that to just 0.01 seconds.
In this context, why on earth is CAST so slow?
Here is the output of an EXPLAIN for this query:
And the same for the query without a CAST:
EXPLAIN EXTENDED:
As documented under How MySQL Uses Indexes:
MySQL uses indexes for these operations:
[ deletia ]
To retrieve rows from other tables when performing joins. MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. For example, VARCHAR(10) and CHAR(10) are the same size, but VARCHAR(10) and CHAR(15) are not.
Comparison of dissimilar columns may prevent use of indexes if values cannot be compared directly without conversion. Suppose that a numeric column is compared to a string column. For a given value such as 1 in the numeric column, it might compare equal to any number of values in the string column such as '1', ' 1', '00001', or '01.e1'. This rules out use of any indexes for the string column.
In your case, you are attempting to join on a comparison between a substring (of a string column in one table) and a string column in another table. An index can be used for this operation, however the comparison is performed lexicographically (i.e. treating the operands as strings, even if they represent numbers).
By explicitly casting one side to an integer, the comparison is performed numerically (as desired) - but this requires MySQL to implicitly convert the type of the string column and therefore it is unable to use that column's index.
You have hit this road bump because your schema is poorly designed. You should strive to ensure that all columns:
are encoded using the data types that are most relevant to their content; and
contain only a single piece of information — see Is storing a delimited list in a database column really that bad?
At the very least, your bbva_job.emplid should be an integer; and your bbva_deductions.employee_id should be split so that its parts are stored in separate (appropriately-typed) columns. With appropriate indexes, your query will then be considerably more performant.