Does an index improve performance when using modulo? - mysql

Imagine a MySQL table with one field id containing 1 billion rows from number 1 to a billion.
When I do a query like this
SELECT * FROM table WHERE id > 2000 AND id < 5000;
It is obvious that an index on id will improve the performance of that query.
However does such an index also help with modulo as in the following query
SELECT * FROM table WHERE (id % 4) = 0;
Does using an index help when using modulo?

No.
Functions on columns used in an index (almost) always preclude the use of the index. Even if this weren't true, the optimizer might decide not to use an index anyway. Fetching just one out of four records may not be selective enough for the index to be worthwhile.

In Oracle DB for example you can define so called function based indices for your purpose where you define that modulo function in the index. But I'm pretty sure function based indices do not exist with MySQL.
What you could do as a workaround is adding a additional column where you store the result of your modulo function. You have to modify your insert scripts fill it for future inserts and update the existing data sets. Then you can add an index to that column and use it in your where clause.

Related

SQLite & MySQL Compound Index vs Single index

I have a table with two fields: a,b
Both fields are indexed separately -- no compound index.
While trying to run a select query with both fields:
select * from table where a=<sth> and b=<sth>
It took over 400ms. while
select * from table where a=<sth>
took only 30ms;
Do I need set a compound index for (a,b)?
Reasonably, if I have indexes on both a and b, it should be fast for queries of a AND b like above right?
For this query:
select *
from table
where a = <sth> and b = <sth>;
The best index is on table(a, b). This can also be used for your second query as well.
Usually (but not always).
In your case the number of different values in a (and b) and the number of columns you use in your select can change the way db decide to use index / table.
For example,
if in table you have,say, 100.000 records and 80.000 of them have the same value for a, when you query for:
SELECT * FROM table WHERE a=<your value>
db engine could decide to "scan" directly the table without using the index, while if you query
SELECT a, b FROM table WHERE a=<your value>
and in index you added column b too (in index directly or with INCLUDE) it's quite probable that db engine will use the index.
Try to give a look on internet for index tips and give a look too to How can I index these queries?
The SQLite documentation explains how index lookups work.
Once the database has used an index to look up some rows, the other index is no longer efficient to use (there is no easy method to filter the results of the first lookup because the other index refers to rows in the original table, not to entries in the first index). See Multiple AND-Connected WHERE-Clause Terms.
To make index lookups on two columns as fast as possible, you need Multi-Column Indices.

Fastest result when checking date range

User will select a date e.g. 06-MAR-2017 and I need to retrieve hundred thousand of records for date earlier than 06-MAR-2017 (but it could vary depends on user selection).
From above case, I am using this querySELECT col from table_a where DATE_FORMAT(mydate,'%Y%m%d') < '20170306' I feel that the record is kind of slow. Are there any faster or fastest way to get date results like this?
With 100,000 records to read, the DBMS may decide to read the table record for record (full table scan) and there wouldn't be much you could do.
If on the other hand the table contains billions of records, so 100,000 would just be a small part, then the DBMS may decide to use an index instead.
In any way you should at least give the DBMS the opportunity to select via an index. This means: create an index first (if such doesn't exist yet).
You can create an index on the date column alone:
create index idx on table_a (mydate);
or even provide a covering index that contains the other columns used in the query, too:
create index idx on table_a (mydate, col);
Then write your query such that the date column is accessed directly. You have no index on DATE_FORMAT(mydate,'%Y%m%d'), so above indexes don't help with your original query. You'd need a query that looks up the date itself:
select col from table_a where mydate < date '2017-03-06';
Whether the DBMS then uses the index or not is still up to the DBMS. It will try to use the fastest approach, which very well can still be the full table scan.
If you make a function call in any column at the left side of comparison, MySql will make a full table scan.
The fastest method would be to have an index created on mydate, and make the right side ('20170306') the same datatype of the column (and the index)

What are the ways to improve the performance for below query

I have below table query which is executed to get the latest REG_LOG(table) update, perform full table scan to get the results.
SELECT REG_PATH,
REG_USER_ID,
REG_LOGGED_TIME,
REG_ACTION,
REG_ACTION_DATA
FROM REG_LOG
WHERE REG_LOGGED_TIME > <last-access-time>
AND REG_LOGGED_TIME < '<current-time>'
AND REG_TENANT_ID = <tenant-id>
This table can contain millions of data.
My question is what are the things we can do to increase the performance of this query? As per a workaround we have created an index for REG_LOGGED_TIME column to reduce full table scan.
Have 2 fields in WHERE clause. There is first candidates for indexing.
You should analyze selectivity of you fields. It is count distinct values divided by number rows. If result number is more then 200, you must create indexes.
Example:
CREATE INDEX ON REG_LOG (REG_TENANT_ID, REG_LOGGED_TIME);
Also you should review your other queries against this table. Probably you should create just one composite index. In this case, the first field must be column with biggest selectivity.
Adding an the following index will significantly improve the speed of your query:
CREATE INDEX ON REG_LOG (REG_TENANT_ID, REG_LOGGED_TIME);

SELECT vs UPDATE performance with index

If I SELECT IDs then UPDATE using those IDs, then the UPDATE query is faster than if I would UPDATE using the conditions in the SELECT.
To illustrate:
SELECT id FROM table WHERE a IS NULL LIMIT 10; -- 0.00 sec
UPDATE table SET field = value WHERE id IN (...); -- 0.01 sec
The above is about 100 times faster than an UPDATE with the same conditions:
UPDATE table SET field = value WHERE a IS NULL LIMIT 10; -- 0.91 sec
Why?
Note: the a column is indexed.
Most likely the second UPDATE statement locks much more rows, while the first one uses unique key and locks only the rows it's going to update.
The two queries are not identical. You only know that the IDs are unique in the table.
UPDATE ... LIMIT 10 will update at most 10 records.
UPDATE ... WHERE id IN (SELECT ... LIMIT 10) may update more than 10 records if there are duplicate ids.
I don't think there can be a one straight-forward answer to your "why?" without doing some sort of analysis and research.
The SELECT queries are normally cached, which means that if you run the same SELECT query multiple times, the execution time of the first query is normally greater than the following queries. Please note that this behavior can only be experienced where the SELECT is heavy and not in scenarios where even the first SELECT is much faster. So, in your example it might be that the SELECT took 0.00s because of the caching. The UPDATE queries are using different WHERE clauses and hence it is likely that their execution times are different.
Though the column a is indexed, but it is not necessary that MySQL must be using the index when doing the SELECT or the UPDATE. Please study the EXPLAIN outputs. Also, see the output of SHOW INDEX and check if the "Comment" column reads "disabled" for any indexes? You may read more here - http://dev.mysql.com/doc/refman/5.0/en/show-index.html and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
Also, if we ignore the SELECT for a while and focus only on the UPDATE queries, it is obvious that they aren't both using the same WHERE condition - the first one runs on id column and the latter on a. Though both columns are indexed but it does not necessarily mean that all the table indexes perform alike. It is possible that some index is more efficient than the other depending on the size of the index or the datatype of the indexed column or if it is a single- or multiple-column index. There sure might be other reasons but I ain't an expert on it.
Also, I think that the second UPDATE is doing more work in the sense that it might be putting more row-level locks compared to the first UPDATE. It is true that both UPDATES are finally updating the same number of rows. But where in the first update, it is 10 rows that are locked, I think in the second UPDATE, all rows with a as NULL (which is more than 10) are locked before doing the UPDATE. Perhaps MySQL first applies the locking and then runs the LIMIT clause to update only limited records.
Hope the above explanation makes sense!
Do you have a composite index or separate indexes?
If it is a composite index of id and a columns,
In 2nd update statement the a column's index would not be used. The reason is that only the left most prefix indexes are used (unless if a is the PRIMARY KEY)
So if you want the a column's index to be used, you need in include id in your WHERE clause as well, with id first then a.
Also it depends on what storage engine you are using since MySQL does indexes at the engine level, not server.
You can try this:
UPDATE table SET field = value WHERE id IN (...) AND a IS NULL LIMIT 10;
By doing this id is in the left most index followed by a
Also from your comments, the lookups are much faster because if you are using InnoDB, updating columns would mean that the InnoDB storage engine would have to move indexes to a different page node, or have to split a page if the page is already full, since InnoDB stores indexes in sequential order. This process is VERY slow and expensive, and gets even slower if your indexes are fragmented, or if your table is very big
The comment by Michael J.V is the best description. This answer assumes a is a column that is not indexed and 'id' is.
The WHERE clause in the first UPDATE command is working off the primary key of the table, id
The WHERE clause in the second UPDATE command is working off a non-indexed column. This makes the finding of the columns to be updated significantly slower.
Never underestimate the power of indexes. A table will perform better if the indexes are used correctly than a table a tenth the size with no indexing.
Regarding "MySQL doesn't support updating the same table you're selecting from"
UPDATE table SET field = value
WHERE id IN (SELECT id FROM table WHERE a IS NULL LIMIT 10);
Just do this:
UPDATE table SET field = value
WHERE id IN (select id from (SELECT id FROM table WHERE a IS NULL LIMIT 10));
The accepted answer seems right but is incomplete, there are major differences.
As much as I understand, and I'm not a SQL expert:
The first query you SELECT N rows and UPDATE them using the primary key.
That's very fast as you have a direct access to all rows based on the fastest possible index.
The second query you UPDATE N rows using LIMIT
That will lock all rows and release again after the update is finished.
The big difference is that you have a RACE CONDITION in case 1) and an atomic UPDATE in case 2)
If you have two or more simultanous calls of the case 1) query you'll have the situation that you select the SAME id's from the table.
Both calls will update the same IDs simultanously, overwriting each other.
This is called "race condition".
The second case is avoiding that issue, mysql will lock all rows during the update.
If a second session is doing the same command it will have a wait time until the rows are unlocked.
So no race condition is possible at the expense of lost time.

Index creation time in MySql

I am designing a database in Mysql which will be filled with quite large amounts of raw data. I wanna know that I should define indexes before I insert the data, or I should first insert my data and then create the index? is there any difference?
Also I wanna know If I have wanna have index on 2 columns, is it better to index them separately or together?
Thanks
If you are doing a bulk load, my opinion is to not have indexes up front, that will slow the load to constantly write index pages, especially if a larger data set. That being said, after the tables are populated, do a SINGLE statement to build ALL the indexes you expect instead of one-by-one. I learned the hard way a long time ago. I had a table of 14+ million rows and had to build 15+ indexes. Each index was increasingly longer than the last. It appeared each time a new index, it needed to rebuild the pages for the prior. Doing them all at once proved significantly better.
As for multiple column indexes... it depends on how your querying will be performed. If many queries WILL utilize a pair or more of columns in the WHERE condition, then yes, use multiple columns in a single index.
Also I wanna know If I have wanna have
index on 2 columns, is it better to
index them separately or together?
This depends on your queries. When you have an index (colA, colB) the database can never use this index when you don't use colA in the WHERE condition of your queries. If you have queries WHERE colB = ? then you need an index that starts with this column.
index (colA, colB);
WHERE colA = ?; -- can use the index
WHERE colA = ? AND colB = ?; -- can use the index
This one will fail:
WHERE colB = ?;
But... if you change the order of the columns in the index:
index (colB, colA); -- different order
WHERE colb = ?; -- can use the index
WHERE colA = ? AND colB = ?; -- can use the index
And now this one can't use the index:
WHERE colA = ?;
Check your queries, use EXPLAIN and create only the indexes you realy need.
Insert data first.
If index on two columns, either as combo search or individual would be (under normal circ):
idx_a (fldA + fldB)
idx_b (fldB)
regards,
//t
Typically when you are doing large inserts of data you will want to index it afterwards, that way it doesn't have to maintain and rebuild the indexes as data is inserted, therefore speeding up the insert process.
The indexing strategy depends entirely on how you intend to query the database. Are you going to be querying them as a set (i.e. have both in the where clause together) or as individuals (i.e. have one or the other in your where clause).