Mysql Group By and Sum performance issues - mysql

We have just one table with millions of rows where this query, as it stands takes 138 seconds to run on a server with a buffer pool size of 25G, the server itself linux with SSD drives.
I am wondering if anyone could suggest any improvements in MySQL settings or in the query itself that would reduce run time. We only have about 8 large member_id's that have this performance problem, the rest run under 5 seconds. We run multiple summary tables like this for rollup reporting.
select *
from (
SELECT distinct account_name AS source,SUM(royalty_amount) AS total_amount
FROM royalty_stream
WHERE member_id = '1050705'
AND deleted = 0
AND period_year_quarter >= '2016_Q1'
AND period_year_quarter <= '2016_Q2'
GROUP BY account_name
ORDER BY total_amount desc
LIMIT 1
) a

I see a few obvious improvements.
Subselects
Don't use a subselect. This isn't a huge deal, but it makes little sense to add the overhead here.
Using Distinct
Is the distinct really needed here? Since you're grouping, it should be unnecessary overhead.
Data Storage Practices
Your period_year_quarter evaluation is going to be a hurdle. String comparisons are one of the slower things you can do, unfortunately. If you have the ability to update the data structure, I would highly recommend that you break period_year_quarter into two distinct, integer fields. One for the year, one for the quarter.
Is royalty_amount actually stored as a number, or are you making the database implicitly convert it every time? If so (surprisingly common mistake) converting that to a number will also help.
Indexing
You haven't explained what indexes are on this table. I'm hoping that you at least have one on member_id. If not, it should certainly be indexed.
I would further recommend an index on (member_id, period_year_quarter). If you took my advice from the previous section, that should be (member_id, year, quarter).
select
account_name as source
, sum(royalty_amount) as total_amount
from
royalty_stream
where
member_id = '1050705'
and deleted = 0
and period_year_quarter between '2016_Q1' and '2016_Q2'
group by
account_name
order by
total_amount desc
limit 1

Related

MySQL query takes more time to fetch data [MySQL]

I have 500000 records table in my MySQL server. When running a query it takes more time for query execution. sometimes it goes beyond a minute.
Below I have added my MySQL machine detail.
RAM-16GB
Processor : Intel(R) -Coreā„¢ i5-4460M CPU #3.20GHz
OS: Windows server 64 bit
I know there is no problem with my machine since it is a standalone machine and no other applications there.
Maybe the problem with my query. I have gone through the MySql site and found that I have used proper syntax. But I don't know exactly the reason for the delay in the result.
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY 2 ASC
LIMIT 200 OFFSET 0
Can anyone please let me know whether the duration, depends on the table or the query that I have used?
Note: Even if try with indexing, there is no much difference in result time.
Thanks
Two approaches to this:
One approach is to create a covering index on the columns needed to satisfy your query. The correct index for your query contains these columns in this order: (SalesChannel, UnitPrice).
Why does this help? For one thing, the index itself contains all data needed to satisfy your query, and nothing else. This means your server does less work.
For another thing, MySQL's indexes are BTREE-organized. That means they're accessible in order. So your query can be satisfied one SalesChannel at a time, and MySQL doesn't need an internal temporary table. That's faster.
A second approach involves recognizing that ORDER BY ... LIMIT is a notorious performance antipattern. You require MySQL to sort a big mess of data, and then discard most of it.
You could try this:
SELECT SUM(UnitPrice) UnitPrice,
SalesChannel Channel
FROM samplesalesdata50000
WHERE SalesChannel IN (
SELECT SalesChannel
FROM samplesalesdata50000
ORDER BY Channel LIMIT 200 OFFSET 0
)
GROUP BY SalesChannel
ORDER BY SalesChannel
LIMIT 200 OFFSET 0
If you have an index on SalesChannel (the covering index mentioned above works) this should speed you up a lot, because your aggregate (GROUP BY) query need only consider a subset of your table.
Your problem with "ORDER BY 2 ASC". Try this "ORDER BY Channel".
If it was MS SQL Server you would use the WITH (NOLOCK)
and the MYSQL equivalent is
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY SalesChannel ASC
LIMIT 200 OFFSET 0
COMMIT ;
To improve on OJones's answer, note that
SELECT SalesChannel FROM samplesalesdata50000
ORDER BY SalesChannel LIMIT 200, 1
will quickly (assuming the index given) find the end of the desired list. Then adding this limits the main query to only the rows needed:
WHERE SalesChannel < (that-select)
There is, however, a problem. If there are fewer than 200 rows in the table, the subquery will return nothing.
You seem to be setting up for "paginating"? In that case, a similar technique can be used to find the starting value:
WHERE SalesChannel >= ...
AND SalesChannel < ...
This also avoids using the inefficient OFFSET, which has to read, then toss, all the rows being skipped over. More
But the real solution may be to build and maintain a Summary Table of the data. It would contain subtotals for each, say, month. Then run the query against the Summary table -- it might be 10x faster. More

MySQL optimization problems with LIMIT keyword

I'm trying to optimize a MySQL query. The below query runs great as long as there are greater than 15 entries in the database for a particular user.
SELECT activityType, activityClass, startDate, endDate, activityNum, count(*) AS activityType
FROM (
SELECT activityType, activityClass, startDate, endDate, activityNum
FROM ActivityX
WHERE user=?
ORDER BY activityNum DESC
LIMIT 15) temp
WHERE startDate=? OR endDate=?
GROUP BY activityType
When there are less than 15 entries, the performance is terrible. My timing is roughly 25 ms vs. 4000 ms. (I need "15" to ensure I get all the relevant data.)
I found these interesting sentences:
"LIMIT N" is the keyword and N is any number starting from 0, putting 0 as the limit does not return any records in the query. Putting a number say 5 will return five records. If the records in the specified table are less than N, then all the records from the queried table are returned in the result set. [source: guru99.com]
To get around this problem, I'm using a heuristic to guess if the number of entries for a user is small - if so, I use a different query that takes about 1500 ms.
Is there anything I'm missing here? I can not use an index since the data is encrypted.
Thanks much,
Jon
I think an index on ActivityX(user, ActivityNum) will solve your problem.
I am guessing that you have an index on (ActivityNum) and the optimizer is trying to figure out if it should use the index. This causes thresholding. The composite index should better match the query.

Using index with IN clause and ordering by primary key

I am having a problem with the following task using MySQL. I have a table Records(id,enterprise, department, status). Where id is the primary key, and enterprise and department are foreign keys, and status is an integer value (0-CREATED, 1 - APPROVED, 2 - REJECTED).
Now, usually the application need to filter something for a concrete enterprise and department and status:
SELECT * FROM Records WHERE status = 0 AND enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
The order by is required, since I have to provide the user with the most recent records. For this query I have created an index (enterprise, department, status), and everything works fine. However, for some privileged users the status should be omitted:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
This obviously breaks the index - it's still good for filtering, but not for sorting. So, what should I do? I don't want create a separate index (enterprise, department), so what if I modify the query like this:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
AND status IN (0,1,2)
ORDER BY id desc LIMIT 0,10;
MySQL definitely does use the index now, since it's provided with values of status, but how quick will the sorting by primary key be? Will it take the recent 10 values for each status available, and then merge them, or will it first merge the ids for each status together, and only after that take the first ten (this way it's gonna be much slower I guess).
All of the queries will benefit from one composite query:
INDEX(enterprise, department, status, id)
enterprise and department can swapped, but keep the rest of the columns in that order.
The first query will use that index for both the WHERE and the ORDER BY, thereby be able to find the 10 rows without scanning the table or doing a sort.
The second query is missing status, so my index is less than perfect. This would be better:
INDEX(enterprise, department, id)
At that point, it works like above. (Note: If the table is InnoDB, then this 3-column index is identical to your 2-column INDEX(enterprise, department) -- the PK is silently included.)
The third query gets dicier because of the IN. Still, my 4 column index will be nearly the best. It will use the first 3 columns, but not be able to do the ORDER BY id, so it won't use id. And it won't be able to comsume the LIMIT. Hence the EXPLAIN will say Using temporary and/or Using filesort. Don't worry, performance should still be nice.
My second index is not as good for the third query.
See my Index Cookbook.
"How quick will sorting by id be"? That depends on two things.
Whether the sort can be avoided (see above);
How many rows in the query without the LIMIT;
Whether you are selecting TEXT columns.
I was careful to say whether the INDEX is used all the way through the ORDER BY, in which case there is no sort, and the LIMIT is folded in. Otherwise, all the rows (after filtering) are written to a temp table, sorted, then 10 rows are peeled off.
The "temp table" I just mentioned is necessary for various complex queries, such as those with subqueries, GROUP BY, ORDER BY. (As I have already hinted, sometimes the temp table can be avoided.) Anyway, the temp table comes in 2 flavors: MEMORY and MyISAM. MEMORY is favorable because it is faster. However, TEXT (and several other things) prevent its use.
If MEMORY is used then Using filesort is a misnomer -- the sort is really an in-memory sort, hence quite fast. For 10 rows (or even 100) the time taken is insignificant.

Very slow query, any other ways to format this with better performace?

I have this query (I didn't write) that was working fine for a client until the table got more then a few thousand rows in it, now it's taking 40 seconds+ on only 4200 rows.
Any suggetions on how to optimize and get the same result?
I've tried a few other methods but didn't get the correct result that this slower query returned...
SELECT COUNT(*) AS num
FROM `fl_events`
WHERE id IN(
SELECT DISTINCT (e2.id)
FROM `fl_events` AS e1, fl_events AS e2
WHERE e1.startdate >= now() AND e1.startdate = e2.startdate
)
ORDER BY `startdate`
Any help would be greatly appriciated!
Appart from the obvious indexes needed, I don't really get why you are joining your table with itself for choosing the IN condition. The ORDER BY is also not needed. Are you sure that your query can't be written just like this?:
SELECT COUNT(*) AS num
FROM `fl_events` AS e1
WHERE e1.startdate >= now()
I don't think rewriting the query will help. The key to your question is "until the table got more than a few thousand rows." This implies that important columns aren't indexed. Prior to a certain number of records, all the data fit on a single memory block - over that point, it takes a new block. And index is the only way to speed up the search.
first - check to see that the ID in fl_events is actually marked as a primary key. That physically orders the records and without it you can see data corruption and occasionally super-slow results. The use of distinct in the query makes it look like it might NOT be a unique value. That will pose a problem.
Then, make sure to add an index on the start_date.
The slowness is probably related to the join of the event table with itself, and possibly startdate not having an index.

Best approach to getting count of distinct values in MySQL

I have this query:
select count(distinct User_ID) from Web_Request_Log where Added_Timestamp like '20110312%' and User_ID Is Not Null;
User_ID and Added_Timestamp are indexed.
The query is painfully slow (we have millions of records and the table is growing fast).
I've read all the posts I could find about count and distinct, here, but they seem to be mostly syntax related. I'm interested in optimization and I'm wondering if I'm using the right tool for the job.
I can use an intermediate counter table to summarize overall hits, but I'd like a way to do this that would allow me to easily generate ad-hoc 'range' queries; i.e., what is the distinct visitor count for last week, or last month.
Did some tests to see if GROUP BY can help and it seems it can.
On table A with ~8M records and ~340K distinct records for a given non-indexed field:
GROUP BY 17 seconds
COUNT(DISTINCT ..) 21 seconds
On table A with ~2M records and ~50K distinct records for a given indexed field:
GROUP BY 200 ms
COUNT(DISTINCT ..) 2.5 seconds
This is MySql with InnoDB engine, BTW.
I can't find any relevant documentation though, and I wonder if that comparison is dependent on the data (how many duplicates there are).
For your table, the GROUP BY query will look like this:
SELECT COUNT(t.c)
FROM (SELECT 1 AS c
FROM Web_Request_Log
WHERE Added_Timestamp LIKE '20110312%'
AND User_ID IS NOT NULL
GROUP BY User_ID
) AS t
Try it and let us know if it's quicker :)