Mysql group by query optimization - mysql

I have query like this in mysql
select count(*)
from (
select count(idCustomer)
from customer_details
where ...
group by idCustomer having max(trx_staus) = -1
) as temp
So basically finding customer count that fulfill certain where condition (one or two) with max transaction state = -1 (other can be 2,3,4). but this query takes about 30 min on my local machine and 13 sec on high configuration server (about 20 gb ram and 8 core processor). i have 13 lac rows in table. i know group by and having max function are too costly. what can i do to optimize this query. any suggestion?

The inner query has to inspect all rows to determine the aggregate maximum; if you want to optimize this, add a calculated field that contains the maximum to your customer table and select on that.
The trick is then to keep that field up-to-date :)

Related

Filter large number of records on mysql when using INNER JOIN with two fields

I'm working on existing database with millions of inserts per day. Database design itself pretty bad and filtering records from it takes huge amount of time. we are in the process of moving this to ELK cluster but in the mean time I have to filter some records for immediate use.
I have two tables like this
table - log_1
datetime | id | name | ip
2017-01-01 01:01:00 | 12345 | sam | 192.168.100.100
table - log_2
datetime | mobile | id
2017-01-01 01:01:00 | 999999999 | 12345
I need to filter my data using ip and from the log_1 and datetime on both log_1 and log_2. to do that I use below query
SELECT log_1.datetime, log_1.id, log_1.name, log_1.ip, log_2,datetime, log_2.mobile, log_2.id
FROM log_1
INNER JOIN log_2
ON log_1.id = log_2.id AND log_1.datetime = log_2.datetime
where log_1.ip = '192.168.100.100'
limit 100
Needless to say this take forever to retrieve results with such large number of records. is there any better method I can do the same thing without waiting long time mysql to respond ?. In other words how can I optimized my query against such large database.
database is not production and it's for just analytics
First of all, your current LIMIT clause is fairly meaningless, because the query has no ORDER BY clause. It is not clear which 100 records you want to retain. So, you might want to use something like this:
SELECT
l1.datetime,
l1.id,
l1.name,
l1.ip,
l2.datetime,
l2.mobile,
l2.id
FROM log_1 l1
INNER JOIN log_2 l2
ON l1.id = l2.id AND l1.datetime = l2.datetime
WHERE
l1.ip = '192.168.100.100'
ORDER BY
l1.datetime DESC
LIMIT 100;
This would return the 100 most recent matching records. As for speeding up this query, one way to at least make the join faster would be to add the following index on the log_2 table:
CREATE INDEX idx ON log_2 (datetime, id, mobile);
Assuming MySQL chooses to use this index, it should make the join much faster, because each id and datetime value can be looked up in a B-tree instead of doing a manual scan of the entire table. Note that the index also covers the mobile column, which is needed in the select.
Can you try this :
1. Create index on both tables on id column if not already created (this will take time).
Try creating two temp tables log_1_tmp and log_2_tmp with data as below :
Query 1 - insert into log_1_tmp select * from log_1 where log_1.ip = '192.168.100.100'
Query 2 - insert into log_2_tmp select * from log_2 where log_2.ip = '192.168.100.100'
Run your query on above two tables and here you can remove where condition from your query.
See if this works.

Moving average query MS Access

I am trying to calculate the moving average of my data. I have googled and found many examples on this site and others but am still stumped. I need to calculate the average of the previous 5 flow for the record selected for the specific product.
My Table looks like the following:
TMDT Prod Flow
8/21/2017 12:01:00 AM A 100
8/20/2017 11:30:45 PM A 150
8/20/2017 10:00:15 PM A 200
8/19/2017 5:00:00 AM B 600
8/17/2017 12:00:00 AM A 300
8/16/2017 11:00:00 AM A 200
8/15/2017 10:00:31 AM A 50
I have been trying the following query:
SELECT b.TMDT, b.Flow, (SELECT AVG(Flow) as MovingAVG
FROM(SELECT TOP 5 *
FROM [mytable] a
WHERE Prod="A" AND [a.TMDT]< b.TMDT
ORDER BY a.TMDT DESC))
FROM mytable AS b;
When I try to run this query I get an input prompt for b.TMDT. Why is b.TMDT not being pulled from mytable?
Should I be using a different method altogether to calculate my moving averages?
I would like to add that I started with another method that works but is extremely slow. It runs fast enough for tables with 100 records or less. However, if the table has more than 100 records it feels like the query comes to a screeching halt.
Original method below.
I created two queries for each product code (There are 15 products): Q_ProdA_Rank and Q_ProdA_MovAvg
Q_ProdA_RanK (T_ProdA is a table with Product A's information):
SELECT a.TMDT, a.Flow, (Select count(*) from [T_ProdA]
where TMDT<=a.TMDT) AS Rank
FROM [T_ProdA] AS a
ORDER BY a.TMDT DESC;
Q_ProdA_MovAvg
SELECT b.TMDT, b.Flow, Round((Select sum(Flow) from [Q_PRodA_Rank] where
Rank between b.Rank-1 and (b.Rank-5))/IIf([Rank]<5,Rank-1,5),0) AS
MovingAvg
FROM [Q_ProdA_Rank] AS b;
The problem is that you're using a nested subquery, and as far as I know (can't find the right site for the documentation at the moment), variable scope in subqueries is limited to the direct parent of the subquery. This means that for your nested query, b.TMDT is outside of the variable scope.
Edit: As this is an interesting problem, and a properly-asked question, here is the full SQL answer. It's somewhat more complex than your try, but should run more efficiently
It contains a nested subquery that first lists the 5 previous flows for per TMDT and prod, then averages that, and then joins that in with the actual query.
SELECT A.TMDT, A.Prod, B.MovingAverage
FROM MyTable AS A LEFT JOIN (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Avg(Top5.Flow) As MovingAverage
FROM (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Top5.Flow
FROM MyTable As JoinKeys INNER JOIN MyTable AS Top5 ON JoinKeys.Prod = Top5.Prod
WHERE Top5.TMDT In (
SELECT TOP 5 A.TMDT FROM MyTable As A WHERE JoinKeys.Prod = A.Prod AND A.TMDT < JoinKeys.TMDT ORDER BY A.TMDT
)
)
GROUP BY JoinKeys.TMDT, JoinKeys.Prod
) AS B
ON A.Prod = B.JoinKeys.Prod AND A.TMDT = B.JoinKeys.TMDT
While in my previous version I advocated a VBA approach, this is probably more efficient, only more difficult to write and adjust.

How to write search(list) queries in Mysql

There is a search page in webapplication(Pagination is used : 10 records per page). Database used : Mysql. Table has around 1000 00records.Query is tuned as in query is using index (checked Explain plan).Result set that fetches around 17000 rows and takes around 5 sec .Can any please suggest how to optimize search Query.(Note : Tried to use limit but query time did not improve).
Query Eg:
Select * from abc
Join def on abc.id=def.id
where date >= '2013-09-03'
and date <='2014-10-01'
and def.state=1
-- id on both table is indexed
-- date and state column cannot be indexed as they have low SI.

Assistance with complex MySQL query (using LIMIT ?)

I wonder if anyone could help with a MySQL query I am trying to write to return relevant results.
I have a big table of change log data, and I want to retrieve a number of record 'groups'. For example, in this case a group would be where two or more records are entered with the same timestamp.
Here is a sample table.
==============================================
ID DATA TIMESTAMP
==============================================
1 Some text 1379000000
2 Something 1379011111
3 More data 1379011111
3 Interesting data 1379022222
3 Fascinating text 1379033333
If I wanted the first two grouped sets, I could use LIMIT 0,2 but this would miss the third record. The ideal query would return three rows (as two rows have the same timestamp).
==============================================
ID DATA TIMESTAMP
==============================================
1 Some text 1379000000
2 Something 1379011111
3 More data 1379011111
Currently I've been using PHP to process the entire table, which mostly works, but for a table of 1000+ records, this is not very efficient on memory usage!
Many thanks in advance for any help you can give...
Get the timestamps for the filtering using a join. For instance, the following would make sure that the second timestamp is in a completed group:
select t.*
from t join
(select timestamp
from t
order by timestamp
limit 2
) tt
on t.timestamp = tt.timestamp;
The following would get the first three groups, no matter what their size:
select t.*
from t join
(select distinct timestamp
from t
order by timestamp
limit 3
) tt
on t.timestamp = tt.timestamp;

Building a pre-desc row table on My-Sql to make faster searching query

I am trying to make more faster searching query on My-Sql database.
I have 800,000 rows in BOOKMARK table.
when I run with this query
SELECT * FROM `BOOKMARK` WHERE `topic` = 'Apple'
Showing rows 0 - 29 ( 501 total, Query took 0.0008 sec)
It's damn fast!
I have total point for each rows and want to find good one first.
SELECT * FROM `BOOKMARK` WHERE `topic` = 'Apple' ORDER BY total DESC
Showing rows 0 - 29 ( 501 total, Query took 0.4770 sec) [b_total: 9.211558703193814 - 1.19674062055568]
It's now 0.5 seconds!!
This is a huge problem for me.
Here are the table information.
* There are 20,000 different topics in this table.
* total number exist between 0-10
* The server calculate total points once a day.
I was thinking that if the table is ordered by total number for each topics, the search query doesn't have to include 'ORDER BY total DESC'.
It will save a lot of time, if the table check the orders once a day.
Is there a way to make this happen?
It was very simple.
I use PhpMyAdmin and changed the setting on Operations menu.
like below image.
After this,
SELECT * FROM `BOOKMARK` WHERE `topic` = 'Apple'
I ran this query and showed me the result with total value DESC.
Perfect!!