I have 500000 records table in my MySQL server. When running a query it takes more time for query execution. sometimes it goes beyond a minute.
Below I have added my MySQL machine detail.
RAM-16GB
Processor : Intel(R) -Coreā¢ i5-4460M CPU #3.20GHz
OS: Windows server 64 bit
I know there is no problem with my machine since it is a standalone machine and no other applications there.
Maybe the problem with my query. I have gone through the MySql site and found that I have used proper syntax. But I don't know exactly the reason for the delay in the result.
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY 2 ASC
LIMIT 200 OFFSET 0
Can anyone please let me know whether the duration, depends on the table or the query that I have used?
Note: Even if try with indexing, there is no much difference in result time.
Thanks
Two approaches to this:
One approach is to create a covering index on the columns needed to satisfy your query. The correct index for your query contains these columns in this order: (SalesChannel, UnitPrice).
Why does this help? For one thing, the index itself contains all data needed to satisfy your query, and nothing else. This means your server does less work.
For another thing, MySQL's indexes are BTREE-organized. That means they're accessible in order. So your query can be satisfied one SalesChannel at a time, and MySQL doesn't need an internal temporary table. That's faster.
A second approach involves recognizing that ORDER BY ... LIMIT is a notorious performance antipattern. You require MySQL to sort a big mess of data, and then discard most of it.
You could try this:
SELECT SUM(UnitPrice) UnitPrice,
SalesChannel Channel
FROM samplesalesdata50000
WHERE SalesChannel IN (
SELECT SalesChannel
FROM samplesalesdata50000
ORDER BY Channel LIMIT 200 OFFSET 0
)
GROUP BY SalesChannel
ORDER BY SalesChannel
LIMIT 200 OFFSET 0
If you have an index on SalesChannel (the covering index mentioned above works) this should speed you up a lot, because your aggregate (GROUP BY) query need only consider a subset of your table.
Your problem with "ORDER BY 2 ASC". Try this "ORDER BY Channel".
If it was MS SQL Server you would use the WITH (NOLOCK)
and the MYSQL equivalent is
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY SalesChannel ASC
LIMIT 200 OFFSET 0
COMMIT ;
To improve on OJones's answer, note that
SELECT SalesChannel FROM samplesalesdata50000
ORDER BY SalesChannel LIMIT 200, 1
will quickly (assuming the index given) find the end of the desired list. Then adding this limits the main query to only the rows needed:
WHERE SalesChannel < (that-select)
There is, however, a problem. If there are fewer than 200 rows in the table, the subquery will return nothing.
You seem to be setting up for "paginating"? In that case, a similar technique can be used to find the starting value:
WHERE SalesChannel >= ...
AND SalesChannel < ...
This also avoids using the inefficient OFFSET, which has to read, then toss, all the rows being skipped over. More
But the real solution may be to build and maintain a Summary Table of the data. It would contain subtotals for each, say, month. Then run the query against the Summary table -- it might be 10x faster. More
Related
I am trying to calculate the rank of characters in my table. For whatever reason, the query runs forever whenever I run it with an order by clause.
I have a very similar query running in a different server with a different schema but it's essentially doing the same thing and that does finish almost instantly. I am completely lost as to why this query never finishes and takes forever.
I am indexing almost everything in the characters table and still no luck.
KEY `accountid` (`accountid`),
KEY `party` (`party`),
KEY `ranking1` (`level`,`exp`),
KEY `ranking2` (`gm`,`job`),
KEY `idx_characters_gm` (`gm`),
KEY `idx_characters_fame` (`fame`),
KEY `idx_characters_job` (`job`),
KEY `idx_characters_level` (`level`),
KEY `idx_characters_exp` (`exp`),
When I don't include the ORDER BY it runs just fine and finishes instantly. When I do, it runs forever.
There are only 28,000 characters in the DB so it can't be that intensive to compute a rank, especially when the limit's only 1.
SELECT c.name
, 1+(
SELECT COUNT(*)
FROM msd.characters as rankc
WHERE rankc.level > c.level
LIMIT 1
) as jobRank
FROM characters as c
JOIN accounts as a
ON c.accountid = a.id
WHERE c.gm = 0 AND a.banned = 0
ORDER BY c.`level` DESC, c.exp DESC
LIMIT 1
OFFSET 0;
Any help would be greatly appreciated!
EDIT:
Essentially, each character has a unique job and I want to get the job ranking of that character. The default order of the rankings is by level. That's why I'm doing a comparison in my jobRank SELECT.
Here is an example of my desired result: desired result
The first thing you should do when having query performance problems is run an explain on your query (just prefix your select statement with 'explain' and you can see https://dev.mysql.com/doc/refman/5.7/en/using-explain.html for more details.) While you might think you have all the appropriate indexes in place, sometimes the decisions the DBMS engine makes will not be obvious. Don't be surprised to find one or more full table scans where you thought indexes would be used (that's probably the most common source of 'query never finishes' issues)
Update: based on your explain output, the query is performing multiple full table scans in the subquery which is probably your performance problem. You want to try to get that subquery using an index so I'd suggest eliminating LIMIT 1 from the subquery (the where clause is filtering out levels you don't want and there's no group by clause so there should only be 1 row returned) and see if that does the job.
I was wondering what would be faster and what's the tradeoffs of using one or the other query?
SELECT * FROM table WHERE somecolumn = 'something' LIMIT 999;
vs.
SELECT * FROM table WHERE somecolumn = 'something';
Now, considering that the results of the query will never return more than a couple of hundreds of rows, does using LIMIT 999 makes some significate performance impact or not?
I'm looking into this option as in my project I will have some kind of option for a user to limit results as he'd like, and he can leave limit empty to show all, so it's easier for me to leave LIMIT part of the query and then just to change the number.
Now, the table is really big, ranging from couple of hundreds of thousands to couple of millions rows.
The exact quesy looks something like:
SELECT SUM(revenue) AS cost,
IF(ISNULL(headline) OR headline = '', 'undefined', headline
) AS headline
FROM `some_table`
WHERE ((date >= '2017-01-01')
AND (date <= '2017-12-31')
)
AND -- (sic)
GROUP BY `headline`
ORDER BY `cost` DESC
As I said before, this query will never return more than about a hundred rows.
Disk I/O, if any, is by far the most costly part of a query.
Fetching each row ranks next.
Almost everything else is insignificant.
However, if the existence of LIMIT can change what the Optimizer does, then there could be a significant difference.
In most cases, including the queries you gave, a too-big LIMIT has not impact.
In certain subqueries, a LIMIT will prevent the elimination of ORDER BY. A subquery is, by definition, is a set not an ordered set. So LIMIT is a kludge to prevent the optimization of removing ORDER BY.
If there is a composite index that includes all the columns needed for WHERE, GROUP BY, and ORDER BY, then the Optimizer can stop when the LIMIT is reached. Other situations go through tmp tables and sorts for GROUP BY and ORDER BY and can do the LIMIT only against a full set of rows.
Two caches were alluded to in the Comments so far.
"Query cache" -- This records exact queries and their result sets. If it is turned on and if it applicable, then the query comes back "instantly". By "exact", I include the existence and value of LIMIT.
To speed up all queries, data and indexes blocks are "cached" in RAM (see innodb_buffer_pool_size). This avoids disk I/O when a similar (not necessarily exact) query is run. See my first sentence, above.
I've got a complex query I have to run in an application that is giving me some performance trouble. I've simplified it here. The database is MySQL 5.6.35 on CentOS.
SELECT a.`po_num`,
Count(*) AS item_count,
Sum(b.`quantity`) AS total_quantity,
Group_concat(`web_sku` SEPARATOR ' ') AS web_skus
FROM `order` a
INNER JOIN `order_item` b
ON a.`order_id` = b.`order_key`
WHERE `store` LIKE '%foobar%'
LIMIT 200 offset 0;
The key part of this query is where I've placed "foobar" as a placeholder. If this value is something like big_store, the query takes much longer (roughly 0.4 seconds in the query provided here, much longer in the query I'm actually using) than if the value is small_store (roughly 0.1 seconds in the query provided). big_store would return significantly more results if there were not limit.
But there is a limit and that's what surprises me. Both datasets have more than the LIMIT, which is only 200. It appears to me that MySQL performing the select functions COUNT, SUM, GROUP_CONCAT for all big_store/small_store rows and then applies the LIMIT retroactively. I would imagine that it'd be best to stop when you get to 200.
Could it not do the select functions COUNT, SUM, GROUP_CONCAT actions after grabbing the 200 rows it will use, making my query much much quicker? This seems feasible to me except in cases where there's an ORDER BY on one of those rows.
Does MySQL not use LIMIT to optimize a query select functions? If not, is there a good reason for that? If so, did I make a mistake in my thinking above?
It can stop short due to the LIMIT, but that is not a reasonable query since there is no ORDER BY.
Without ORDER BY, it will pick whatever 200 rows it feels like and stop short.
With an ORDER BY, it will have to scan the entire table that contains store (please qualify columns with which table they come from!). This is because of the leading wildcard. Only then can it trim to 200 rows.
Another problem -- Without a GROUP BY, aggregates (SUM, etc) are performed across the entire table (or at least those that remain after filtering). The LIMIT does not apply until after that.
Perhaps what you are asking about is MariaDB 5.5.21's "LIMIT_ROWS_EXAMINED".
Think of it this way ... All of the components of a SELECT are done in the order specified by the syntax. Since LIMIT is last, it does not apply until after the other stuff is performed.
(There are a couple of exceptions: (1) SELECT col... must be done after FROM ..., since it would not know which table(s); (2) The optimizer readily reorders JOINed table and clauses in WHERE ... AND ....)
More details on that query.
The optimizer peeks ahead, and sees that the WHERE is filtering on order (that is where store is, yes?), so it decides to start with the table order.
It fetches all rows from order that match %foobar%.
For each such row, find the row(s) in order_item. Now it has some number of rows (possibly more than 200) with which to do the aggregates.
Perform the aggregates - COUNT, SUM, GROUP_CONCAT. (Actually this will probably be done as it gathers the rows -- another optimization.)
There is now 1 row (with an unpredictable value for a.po_num).
Skip 0 rows for the OFFSET part of the LIMIT. (OK, another out-of-order thingie.)
Deliver up to 200 rows. (There is only 1.)
Add ORDER BY (but no GROUP BY) -- big deal, sort the 1 row.
Add GROUP BY (but no ORDER BY) in, now you may have more than 200 rows coming out, and it can stop short.
Add GROUP BY and ORDER BY and they are identical, then it may have to do a sort for the grouping, but not for the ordering, and it may stop at 200.
Add GROUP BY and ORDER BY and they are not identical, then it may have to do a sort for the grouping, and will have to re-sort for the ordering, and cannot stop at 200 until after the ORDER BY. That is, virtually all the work is performed on all the data.
Oh, and all of this gets worse if you don't have the optimal index. Oh, did I fail to insist on providing SHOW CREATE TABLE?
I apologize for my tone. I have thrown quite a few tips in your direction; please learn from them.
this query takes an hour
select *,
unix_timestamp(finishtime)-unix_timestamp(submittime) timetaken
from joblog
where jobname like '%cas%'
and submittime>='2013-01-01 00:00:00'
and submittime<='2013-01-10 00:00:00'
order by id desc limit 300;
but the same query with one submittime finishes in like .03 seconds
the table has 2.1 Million rows
Any idea whats causing the issue or how to debug it
Your first step should be to use MySQL EXPLAIN to see what the query is doing. It'll probably give you some insight on how to fix your issue.
My guess is that jobname LIKE '%cas%' is the slowest part because you're doing a wildcard text search. Adding an index here won't even help, because you have a leading wildcard. Is there any way to do this query without a leading wildcard like that? Also adding an index on submittime might improve the speed of this query.
You might try adding a LIMIT to the query and see if that increases the speed that it returns ...
Excerpt from http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
"Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result. "
select *,unix_timestamp(finishtime)-unix_timestamp(submittime) timetaken
from joblog
where (submittime between '2013-01-10 00:00:00' and '2013-01-19 00:00:00')
and jobname is not null
and jobname like '%cas%';
this helped
(0.93 seconds)
I have a database of a million rows which isn't a lot. They're all sorted by cities with a city_id (indexed). I wanted to show the most recent post with:
SELECT * FROM table FORCE INDEX(PRIMARY, city_id) WHERE city_id=1 ORDER BY 'id' DESC LIMIT 0,4
id is also labeled primary. Before adding the force index it took 5.9 seconds. I found that solution on SO and it worked great. The query now takes 0.02 seconds.
The problem is that this seems to only worked in city_id 1 when I change that city to 2 or 3 or anything else it seems to be back to 6 seconds.
I'm not certain how mysql works. Does it index better on frequent queries or am I missing something here.
Do an explain on your query (with and without the force):
explain SELECT * FROM table WHERE city_id=1 ORDER BY 'id' DESC LIMIT 0,4
and see what mysql tells you about the cost of the use of a certain index. Concerning your indexing strategy and your force: MySQL loves combined indexes and is usually not very good at combining them itself and the primary indexing is always on, so no need to specify it. Concerning your statement I would use something like this and see if it improves the performance:
select * from table use index(city_id) where city_id=1 ORDER BY 'id'
DESC LIMIT 0,4
And always keep in mind: measuring the time of a statement on cmdline more than once will always let the cache kickin so it's pretty useless (and maybe that's the reason you get a bad performance if you change the city_id to a different value).
You will find a lot of good tips and performance hints in the MySQL Performance Blog.