MySQL: Adding indexes on a table with existing records - mysql

I have a query that is running very slowly. The table is was querying has about 100k records and no indexes on most of the columns used in the where clause. I just added indexes on those columns but the query hasn't gotten any faster.
I think this is because when a column is indexed, it's value is written in the index at the time of insertion. I just added the indexes now after all those records were added. So is there a way to "re-run the indexes" on the table?
Edit
Here is the query and explain result:
Oddly enough when I copy the query and run in directly in my SQL manager tool it runs quite fast so may bye the problem is in my application code and not in the query itself.

Mysql keeps consistent indexes. It does not matter if the data is added first, the index is added first, or the data is changed at any time. The same final index will result (assuming the same final data and index type).
Your slow query is not caused by adding the index later. There will be some other reason.

This is an extremely common problem.
Use MySQL explain http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the statement, including information about how tables are joined and in which order.
Using these results... verify the index you created is functioning the way you expected.
If not, you will want to tweak your index until you have it working as expected.
You might want to create a new table, create indexes, then insert all elements from old table to new while testing this. It's easier than dropping and re-adding indices a million times.

Related

MySQL indexing has no speed effect through PHP but does on PhpMyAdmin

I am trying to speed up a simple SELECT query on a table that has around 2 million entries, in a MariaDB MySQL database. It took over 1.5s until I created an index for the columns that I need, and running it through PhpMyAdmin showed a significant boost in speed (now takes around 0.09s).
The problem is, when I run it through my PHP server (mysqli), the execution time does not change at all. I'm logging my execution time by running microtime() before and after the query, and it takes ~1.5s to run it, regardless of having the index or not (tried removing/readding it to see the difference).
Query example:
SELECT `pair`, `price`, `time` FROM `live_prices` FORCE INDEX
(pairPriceTime) WHERE `time` = '2022-08-07 03:01:59';
Index created:
ALTER TABLE `live_prices` ADD INDEX pairPriceTime (pair, price, time);
Any thoughts on this? Does PHP PDO ignore indexes? Do I need to restart the server in order for it to "acknowledge" that there is a new index? (Which is a problem since I'm using a shared hosting service...)
If that is really the query, then it needs an INDEX starting with the value tested in the WHERE:
INDEX(time)
Or, to make a "covering index":
INDEX(time, pair, price)
However, I suspect that most of your accesses involve pair? If so, then other queries may need
INDEX(pair, time)
especially if you as for a range of times.
To discuss various options further, please provide EXPLAIN SELECT ...
PDO, mysqli, phpmyadmin -- These all work the same way. (A possible exception deals with an implicit LIMIT on phpmyadmin.)
Try hard to avoid the use of FORCE INDEX -- what helps on today's query and dataset may hurt on tomorrow's.
When you see puzzling anomalies in timings, run the query twice. Caching may be the explanation.
The mysql documenation says
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
MariaDB documentation Force Index here says this
FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition, it tells the optimizer to regard a table scan as something very expensive. However, if none of the 'forced' indexes can be used, then a table scan will be used anyway.
Use of the index is not mandatory. Since you have only specified one condition - the time, it can choose to use some other index for the fetch. I would suggest that you use another condition for the select in the where clause or add an order by
order by pair, price, time
I ended up creating another index (just for the time column) and it did the trick, running at ~0.002s now. Setting the LIMIT clause had no effect since I was always getting 423 rows (for 423 coin pairs).
Bottom line, I probably needed a more specific index, although the weird part is that the first index worked great on PMA but not through PHP, but the second one now applies to both approaches.
Thank you all for the kind replies :)

MySQL EXPLAIN - it keeps giving me different explanations each time

I have a really large, complex query I'm trying to optimise using MySQL EXPLAIN SELECT or EXPLAIN EXTENDED SELECT.
If I run it against the query, I'll see every table in the query is using Using where in the Extra column, which is great.
No data will be changed at all, I'll go off and make a cup of tea or something, come back and re-run EXPLAIN.
This time, just a few minutes later, only 20% of the tables are Using where, the primary table is now Using index; Using temporary; Using filesort, and my day becomes a nightmare trying to debug this.
I am aware that sometimes things like temporary tables and filesorts are more efficient than using where clauses and indexes. But not in the case of this database, which is 10GB in size, and creating temporary tables and filesorts kills the server completely.
Any ideas why this would be happening? Is there logic or reason behind such a thing?!
You are using InnoDB, correct? You are using a version older than 5.6.6, correct?
You have encountered an interesting variant on InnoDB's lack of "persistent statistics". Several things used to trigger re-computing the statistics for InnoDB tables. And those statistics are used for deciding how to execute the query.
Probably your particular query was "on the fence" -- a slight change in some statistic would lead to a different query plan.
If you would like, we could dig deeper. But we need to see
SHOW CREATE TABLE
SHOW TABLE STATUS (for clues of table size)
EXPLAIN EXTENDED SELECT...
EXPLAIN FORMAT=JSON SELECT... (5.6.5 or later)
And we may be able to suggest ways to speed up the query.

Is there an equivalent of EXPLAIN that will work in front of an ALTER TABLE query?

It looks like the MySQL EXPLAIN prefix only works in front of certain queries. Is there an equivalent of EXPLAIN that will work in front of an ALTER TABLE query?
I would love to be able to find out how long my planned ALTER TABLE statement is likely to take.
Background: I have a table from someone else that contains 300 columns of data. I know that I'm only going to need to use a few of those columns, and in order to figure out which columns I need, I'm planning to do a full-text search for a few key words. But in order to do that, I need to add a full-text index. And since I'm new to this size of data set, I'm not entirely sure that this is a realistic plan. I'm hoping something like EXPLAIN (or, more likely, a substitute tool from this thread) might help determine that.
EDIT: In answer to a couple questions below, I should mention that this table has about 4 million rows and is on a local testing machine. So I can just run this thing blindly if needed. I just don't prefer to if possible. Thanks for all the good information so far.
Most "Alter table" will trigger the copy to tmp table operation, which it will create temp table with new schema, then lock table, copy data from old table to new table, then rename, drop old table.
So most time consumed is copy to temp table, it's depend on how big of that table if the server have enough memory. Use show table status to check how big of the table (data_length+ index_length), sample on other table to know the transfer speed on your mysql server, then you can estimate how long it will take.
Another way mentioned on mysql doc about explain on DML, but I didn't got result, maybe not finished yet :
http://dev.mysql.com/doc/refman/5.6/en/explain.html
As of MySQL 5.6.3, permitted explainable statements for EXPLAIN are SELECT, DELETE, INSERT, REPLACE, and UPDATE. Before MySQL 5.6.3, SELECT is the only explainable statement.

In Mysql, why do unused indexes affect the query plan?

I've seen this several times but I could be misinterpreting the EXPLAIN query plan.
Suppose I have a table(col1, col2).
I want to join it with another table on both col1 and col2.
So I create an index(col1, col2).
Sometimes, the EXPLAIN shows that the index is not being used. Perhaps some other inefficient index is used or none at all.
But if I create another index(col1), then the first index(col1, col2) is used.
Has anyone ever had this happen to them before? Do you have any idea why this might happen?
My theory is that the unused index provides some more accurate statistics about the table that hints to the query plan to use the first index. But I'm not familiar enough with the inner workings of mysql to know if this is true or how to prove it.
The documentation of MySQL for ALTER TABLE states that it may be required to run ANALYZE TABLE on it to refresh the index cardinality, which I believe to be a factor in the behaviour you're seeing. Also, the query optimiser usually handles empty (or near) empty tables quite different from populated tables, and it'll often do a full table scan instead of using an index when there are only a few rows. For my own development at $work I can't rely on the EXPLAIN output of my dev database because of that.

dilemma about mysql. using condition to limit load on a dbf

I have a table of about 800 000 records. Its basically a log which I query often.
I gave condition to query only queries that were entered last month in attempt to reduce the load on a database.
My thinking is
a) if the database goes only through the first month and then returns entries, its good.
b) if the database goes through the whole database + checking the condition against every single record, it's actually worse than no condition.
What is your opinion?
How would you go about reducing load on a dbf?
If the field containing the entry date is keyed/indexed, and is used by the DB software to optimize the query, that should reduce the set of rows examined to the rows matching that date range.
That said, it's a commonly understood that you are better off optimizing queries, indexes, database server settings and hardware, in that order. Changing how you query your data can reduce the impact of a query a millionfold for a query that is badly formulated in the first place, depending on the dataset.
If there are no obvious areas for speedup in how the query itself is formulated (joins done correctly or no joins needed, or effective use of indexes), adding indexes to help your common queries would by a good next step.
If you want more information about how the database is going to execute your query; you can use the MySQL EXPLAIN command to find out. For example, that will tell you if it's able to use an index for the query.