mysql: slow query on indexed field - mysql

The orders table has 2m records. There are ~900K unique ship-to-ids.
There is an index on ship_to_id ( the field isint(8)).
The query below takes nearly 10mn to complete. I've run PROCESSLIST which has Command = Query and State = Sending Data.
When I run explain, the existing index is used, and possible_keys is NULL.
Is there anything I should do to speed this query up? Thanks.
SELECT
ship_to_id as customer_id
FROM orders
GROUP BY ship_to_id
HAVING SUM( price_after_discount ) > 0

Does not look like you have a useful index. Try adding an index on price_after_discount, and add a where condition like this:
WHERE price_after_discount > 0
to minimize the number of rows you need to sum as you can obviously discard any that are 0.
Also try running "top" command and look at the io "wait" column while the query is running. If its high, it means your query causes a lot of disk I/O. You can increase various memory buffers if you have the RAM to speed this up (if you're using innodb) or myisam is done through filesystem cacheing. Restarting the server will flush these caches.
If you do not have enough RAM (which you shouldn't need too much for 2M records) then consider a partitioning scheme against maybe ship-to-ids column (if your version of mysql supports it).

If all the orders in that table aren't current (i.e. not going to change again) then you could archive them off into another table to reduce how much data has to be scanned.
Another option is to throw a last_modified timestamp on the table with an index. You could then keep track of when the query is run and store the results in another table (query_results). When it's time to run the query again, you would only need to select the orders that were modified since the last time the query was run, then use that to update the query_results. The logic is a little more complicated, but it should be much faster assuming a low percentage of the orders are updated between query executions.

MySQL will use an index for a group by, at least according to the documentation, as explained here.
To be most useful, all the columns used in the query should be in the index. This prevents the engine from having to reference the original data as well as the index. So, try an index on orders(ship_to_id, price_after_discount).

Related

MySQL indexing has no speed effect through PHP but does on PhpMyAdmin

I am trying to speed up a simple SELECT query on a table that has around 2 million entries, in a MariaDB MySQL database. It took over 1.5s until I created an index for the columns that I need, and running it through PhpMyAdmin showed a significant boost in speed (now takes around 0.09s).
The problem is, when I run it through my PHP server (mysqli), the execution time does not change at all. I'm logging my execution time by running microtime() before and after the query, and it takes ~1.5s to run it, regardless of having the index or not (tried removing/readding it to see the difference).
Query example:
SELECT `pair`, `price`, `time` FROM `live_prices` FORCE INDEX
(pairPriceTime) WHERE `time` = '2022-08-07 03:01:59';
Index created:
ALTER TABLE `live_prices` ADD INDEX pairPriceTime (pair, price, time);
Any thoughts on this? Does PHP PDO ignore indexes? Do I need to restart the server in order for it to "acknowledge" that there is a new index? (Which is a problem since I'm using a shared hosting service...)
If that is really the query, then it needs an INDEX starting with the value tested in the WHERE:
INDEX(time)
Or, to make a "covering index":
INDEX(time, pair, price)
However, I suspect that most of your accesses involve pair? If so, then other queries may need
INDEX(pair, time)
especially if you as for a range of times.
To discuss various options further, please provide EXPLAIN SELECT ...
PDO, mysqli, phpmyadmin -- These all work the same way. (A possible exception deals with an implicit LIMIT on phpmyadmin.)
Try hard to avoid the use of FORCE INDEX -- what helps on today's query and dataset may hurt on tomorrow's.
When you see puzzling anomalies in timings, run the query twice. Caching may be the explanation.
The mysql documenation says
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
MariaDB documentation Force Index here says this
FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition, it tells the optimizer to regard a table scan as something very expensive. However, if none of the 'forced' indexes can be used, then a table scan will be used anyway.
Use of the index is not mandatory. Since you have only specified one condition - the time, it can choose to use some other index for the fetch. I would suggest that you use another condition for the select in the where clause or add an order by
order by pair, price, time
I ended up creating another index (just for the time column) and it did the trick, running at ~0.002s now. Setting the LIMIT clause had no effect since I was always getting 423 rows (for 423 coin pairs).
Bottom line, I probably needed a more specific index, although the weird part is that the first index worked great on PMA but not through PHP, but the second one now applies to both approaches.
Thank you all for the kind replies :)

Is it possible to optimize a query that gets all the rows in a table

I have this query
SELECT id, alias, parent FROM `content`
Is there a way to optimize this query so 'type' is different than 'all'
id - primary, unique
id - index
parent - index
alias - index
....
Note that this query will almost never return more than 1500 rows.
Thank you
Your query is fetching all the rows, so by definition it's going to report "ALL" as the query type in the EXPLAIN report. The only other possibility is the "index" query type, an index-scan that visits every entry in the index. But that's virtually the same cost as a table-scan.
There's a saying that the fastest SQL query is one that you don't run at all, because you get the data some other way.
For example, if the data is in a cache of some type. If your data has no more than 1500 rows, and it doesn't change frequently, it may be a good candidate for putting in memory. Then you run the SQL query only if the cached data is missing.
There are a couple of common options:
The MySQL query cache is an in-memory cache maintained in the MySQL server, and purged automatically when the data in the table changes.
Memcached is a popular in-memory key-value store used frequently by applications that also use MySQL. It's very fast. Another option is Redis, which is similar but is also backed by disk storage.
Turn OFF log_queries_not_using_indexes; it clutters the slow log with red herrings like what you got.
0.00XX seconds -- good enough not to worry.
ALL is actually optimal for fetching multiple columns from 'all' rows of a table.

MySQL Query Taking a Long Time

I have a pretty simple query over a table with about 14 millions records that is taking about 30 minutes to complete. Here is the query:
select a.switch_name, a.recording_id, a.recording_date, a.start_time,
a.recording_id, a.duration, a.ani, a.dnis, a.agent_id, a.campaign,
a.call_type, a.agent_call_result, a.queue_name, a.rec_stopped,
a.balance, a.client_number, a.case_number, a.team_code
from recording_tbl as a
where client_number <> '1234567'
Filtering on client_number seems to be the culprit and that columns does have an index. I'm not sure what else to try.
You can start from creating INDEX on client_number and see how it helps, but the best results you'll get when you analyze your problem using EXPLAIN command.
http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
Is the table myisam or innodb? If innodb increase innodb buffer to a large amount so entire table can fit into memory. If myisam well it should automatically load into memory via OS cache buffers. Install more RAM. Install faster disk drives. These seem to be your only solutions considering you are doing an entire table scan (minus whatever client number which appears to be your testing client id?)
It takes awhile to load the tables into RAM as well so dont expect it as soon as the db starts up.
Your query is doing a full table scan on the one table in the query, recording_tbl. I am assuming this is a table and not a view, because of the "tbl" prefix. If this is a view, then you need to optimize the view.
There is no need to look at the explain. An index is unlikely to be helpful, unless 99% or so of the records have a client_number of 1234567. An index might makes things work, because of a phenomenon called thrashing.
Your problem is either undersized hardware or underallocated resources for the MySQL query engine. I would first look at buffering for the engine, and then the disk hardware and bandwidth to the processor.
Maybe...
where client_number = '1234567'
...would be a bit faster.
If Client_Number is stored as a number field then
where client_number = 1234567
May be faster if the string comparison was causing it to do a cast and possibly preventing the indexes being used.
Why do you need to return 14m rows? (I'm assuming that most records do not have the ID you are searching on).
If you don't need all 14m rows, add LIMIT to the end of your query. Less rows -> less memory -> faster query.
Example:
select a.switch_name, a.recording_id, a.recording_date, a.start_time,
a.recording_id, a.duration, a.ani, a.dnis, a.agent_id, a.campaign,
a.call_type, a.agent_call_result, a.queue_name, a.rec_stopped,
a.balance, a.client_number, a.case_number, a.team_code
from recording_tbl as a
where client_number <> '1234567'
LIMIT 1000
Would return the first 1000 rows.
And here's a comparison of how to return the top N rows across different SQL RDBMS:
http://www.petefreitag.com/item/59.cfm

the efficiency of MYSQL COUNT query

so I executed this query on a table:
EXPLAIN SELECT COUNT(*) FROM table;
and the 'rows' column from the output is displayed as NULL (whereas usually it will show how many rows the query went through)...
does this mean that the COUNT command is instantaneous and therefore does not require going through any row whatsoever?
If your table uses the MyISAM storage engine, then yes, that query resolves in constant time. The row count is part of the table metadata, the table itself does not have to be examined.
From: http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#COUNT.28.2A.29
Depending on what engine you are using, and I'm guessing it's MyISAM, a quick index count is executed rather than actually counting all the rows.
Many database engines use an index scan to get the count. If you're using MyISAM (fairly likely), it just reads a number in the engine index and returns it. Nearly instantaneous.
Edit:
InnoDB does a full table scan, so it will almost always be slower (than an engine that uses the table index), unless you're comparing queries with a WHERE clause.

Mysql performance on 6 million row table

One day I suspect I'll have to learn hadoop and transfer all this data to a non-structured database, but I'm surprised to find the performance degrade so significantly in such a short period of time.
I have a mysql table with just under 6 million rows.
I am doing a very simple query on this table, and believe I have all the correct indexes in place.
the query is
SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date
the explain returns
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE updateshows range date_idx date_idx 7 NULL 648997 Using where
so i am using the correct index as far as I can tell, but this query is taking 11 seconds to run.
The database is MyISAM, and phpMyAdmin says the table is 1.0GiB.
Any ideas here?
Edited:
The date_idx is indexes both the date and venid columns. Should those be two seperate indexes?
What you want to make sure is that the query will use ONLY the index, so make sure that the index covers all the fields you are selecting. Also, since it is a range query involved, You need to have the venid first in the index, since it is queried as a constant. I would therefore create and index like so:
ALTER TABLE events ADD INDEX indexNameHere (venid, date, time);
With this index, all the information that is needed to complete the query is in the index. This means that, hopefully, the storage engine is able to fetch the information without actually seeking inside the table itself. However, MyISAM might not be able to do this, since it doesn't store the data in the leaves of the indexes, so you might not get the speed increase you desire. If that's the case, try to create a copy of the table, and use the InnoDB engine on the copy. Repeat the same steps there and see if you get a significant speed increase. InnoDB does store the field values in the index leaves, and allow covering indexes.
Now, hopefully you'll see the following when you explain the query:
mysql> EXPLAIN SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date;
id select_type table type possible_keys key [..] Extra
1 SIMPLE events range date_idx, indexNameHere indexNameHere Using index, Using where
Try adding a key that spans venid and date (or the other way around, or both...)
I would imagine that a 6M row table should be able to be optimised with quite normal techniques.
I assume that you have a dedicated database server, and it has a sensible amount of ram (say 8G minimum).
You will want to ensure you've tuned mysql to use your ram efficiently. If you're running a 32-bit OS, don't. If you are using MyISAM, tune your key buffer to use a signficiant proportion, but not too much, of your ram.
In any case you want to run repeated performance testing on production-grade hardware.
Try putting an index on the venid column.