is it okay to use the following query? how is the performance?
select * from table where id not in ( 2000 or much more ids here)
my initial test comes up very fast, but I guess it is because I am the only who is using the server right now.
If you have an index it can be very fast.
However there is a bug in MySQL (possibly fixed in MySQL 5.5) where if there is no index, it won't just be slow, it will be incredibly slow. This because the subquery can be detected as a DEPENDENT SUBQUERY (correlated subquery) even when it is not. You can see whether MySQL is using the correct query plan by running EXPLAIN SELECT ... and checking that key is not NULL for your subquery. I have made another post about this bug with some more details:
Why would an IN condition be slower than “=” in sql?
You can also consider rewriting your query to use JOIN instead of IN to avoid this bug.
Related
I am trying to speed up a simple SELECT query on a table that has around 2 million entries, in a MariaDB MySQL database. It took over 1.5s until I created an index for the columns that I need, and running it through PhpMyAdmin showed a significant boost in speed (now takes around 0.09s).
The problem is, when I run it through my PHP server (mysqli), the execution time does not change at all. I'm logging my execution time by running microtime() before and after the query, and it takes ~1.5s to run it, regardless of having the index or not (tried removing/readding it to see the difference).
Query example:
SELECT `pair`, `price`, `time` FROM `live_prices` FORCE INDEX
(pairPriceTime) WHERE `time` = '2022-08-07 03:01:59';
Index created:
ALTER TABLE `live_prices` ADD INDEX pairPriceTime (pair, price, time);
Any thoughts on this? Does PHP PDO ignore indexes? Do I need to restart the server in order for it to "acknowledge" that there is a new index? (Which is a problem since I'm using a shared hosting service...)
If that is really the query, then it needs an INDEX starting with the value tested in the WHERE:
INDEX(time)
Or, to make a "covering index":
INDEX(time, pair, price)
However, I suspect that most of your accesses involve pair? If so, then other queries may need
INDEX(pair, time)
especially if you as for a range of times.
To discuss various options further, please provide EXPLAIN SELECT ...
PDO, mysqli, phpmyadmin -- These all work the same way. (A possible exception deals with an implicit LIMIT on phpmyadmin.)
Try hard to avoid the use of FORCE INDEX -- what helps on today's query and dataset may hurt on tomorrow's.
When you see puzzling anomalies in timings, run the query twice. Caching may be the explanation.
The mysql documenation says
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
MariaDB documentation Force Index here says this
FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition, it tells the optimizer to regard a table scan as something very expensive. However, if none of the 'forced' indexes can be used, then a table scan will be used anyway.
Use of the index is not mandatory. Since you have only specified one condition - the time, it can choose to use some other index for the fetch. I would suggest that you use another condition for the select in the where clause or add an order by
order by pair, price, time
I ended up creating another index (just for the time column) and it did the trick, running at ~0.002s now. Setting the LIMIT clause had no effect since I was always getting 423 rows (for 423 coin pairs).
Bottom line, I probably needed a more specific index, although the weird part is that the first index worked great on PMA but not through PHP, but the second one now applies to both approaches.
Thank you all for the kind replies :)
I am not sure if this is a mysql question or spring batch
I am using spring batch to read data from mysql database (Using JdbcPagingItemReader)
There are close to 1 million records that I am trying to read with fetch and pageSize of 10000
The issue is that read operation for each batch of 10000 records is very slow. I analysed the sql with explain and it is because even though there is index and all, due to sort by primary key, mysql uses internally filesort.
Has anyone faced similar issue before?
Sorry if the details are not sufficient (I havent provided the query, but its simple query with couple of joins and group by. All the join ids are indexed and sorting is based on primary key)
SELECT ...
GROUP BY ...
ORDER BY ...
LIMIT 10000
Must do the grouping and sorting before imposing the LIMIT.
In some cases the query can be "turned inside out" to locate the 10000 first (in a subquery), then do JOINs, etc This may run faster.
We need to study the actual query, not talk in generalizations. Also, please provide SHOW CREATE TABLE, there may be a missing composite index and/or some datatype issues. Please provide the generated SQL, not Spring's rendition of it.
If you'd write a query like so:
SELECT * FROM `posts` WHERE `views` > 200 OR `views` > 100
Would MySql analyze that query and realize that it's actually equivalent to this?
SELECT * FROM `posts` WHERE `views` > 100
In other words, would MySql optimize the query such that it skips any unnecessary WHERE checks?
I'm asking because I'm working on a piece of code that, for now, generates queries with redundant WHERE clauses. I'm wondering if I should optimize those queries before I send them to MySql, or if that's unnecessary, because MySql would do it anyway.
Yes. MySQL does optimize queries before running them. In fact, what runs has no obvious relationship to the SQL statement itself -- it is a directed acyclic graph.
In the process, MySQL determines what indexes to use for the query, what join algorithms, sorts lists of constants in in lists, and much more.
The optimizer also does some simplifications of the query. I'm not sure if those simplifications extend to inequalities. However, there is little overhead in making the comparison twice.
EXPLAIN SELECT ... Shows how the query was rewritten -- but it still has the OR.
The "Optimizer trace" says the same thing. However, when it gets into discussing the "cost", it gets smart and merges the two comparisons. (This is the case at least as far back as 5.6.)
In many cases, OR should be avoided like covid.
I have a problem with my MySQL database. I got an expensive query with some joins. But i run it always for one specific id, which makes the execution very fast.
Now, i put this query into a view. If i query this view and use the where clause with the id on the view, it seems as if MySQL at first loads all records and after that applies my where clause. This results in a very bad performance.
Is there a possibility to let MySQL use also my where clauses in the view before querying all records?
Thanks a lot and cheers,
Argonitas
I've linked a MySQL view into MS Access via ODBC, but it's running WAY slow.
It's a simple select, that compares two other selects to find records that are unique to the first select.
SELECT `contacts_onlinedonors`.`contactkey` AS `contactkey`
FROM (`hal9k3-testbed`.`contacts_onlinedonors`
LEFT JOIN `hal9k3-testbed`.`contacts_offlinedonors`
ON(( `contacts_onlinedonors`.`contactkey` =
`contacts_offlinedonors`.`contactkey` )))
WHERE Isnull(`contacts_offlinedonors`.`contactkey`)
The slow query log says it returns 34,000 rows after examining 1.5 Billion. There are only 200,000 in the base table. What the heck?
The field "contactkey" is obviously an index on the table.
First thing to do is to "explain" this query.
See http://dev.mysql.com/doc/refman/5.0/en/explain.html
The idea is to figure out what the mysql server is doing, which indexes it is using, and adding indexes where needed, or rewriting your query so it can use indexes.