query optimization with multiple sub-queries - mysql

I want to retrieve data
number of meters in this month, minus the number of meters in the previous month,
and the value of the meter is deducted in accordance with their respective codes.
then summed the whole.
there are about 8000 records.
but I try to take 5 records, and it takes time 2:53 sec,
100 records takes time (1 min 1:57 sec).
really matter .
I have query like this.
SELECT code hvCode,
IFNULL( (SELECT meter
FROM bmrpt
WHERE waktu_foto LIKE '2014-05%'
GROUP BY code HAVING code = hvCode),0 )
-IFNULL( (SELECT meter
FROM bmrpt WHERE waktu_foto LIKE '2014-04%'
GROUP BY code HAVING code = hvCode),0 )hasil
FROM bmrpt group by code;
does anybody have an idea to change the query to be optimized?
this the sqlfiddle http://www.sqlfiddle.com/#!2/495c0/1
best regards

though your question is unclear but try below subquery , as what I understand
SELECT COALESCE(SUM(`meter`),0) FROM table WHERE code ='hvCode' AND MONTH(`date_column`) = 5
-
SELECT COALESCE(SUM(`meter`),0) FROM table WHERE code ='hvCode' AND MONTH(`date_column`) = 4

Related

Efficient SQL Query to calculate portion of a row in half hourly time series that has occurred

I have a table that looks like this:
id
slot
total
1
2022-12-01T12:00
100
2
2022-12-01T12:30
150
3
2022-12-01T13:00
200
There's an index on slot already. The table has ~100mil rows (and a bunch more columns not shown here)
I want to sum the total up to the current moment in time (EDIT: WASN'T CLEAR INITIALLY, I WILL PROVIDE A LOWER SLOT BOUND, SO THE SUM WILL BE OVER SOME NUMBER OF DAYS/WEEKS, NOT OVER FULL TABLE). Let's say the time is currently 2022-12-01T12:45. If I run select * from my_table where slot < CURRENT_TIMESTAMP(),
then I get back records 1 and 2.
However, in my data, the records represent forecasted sales within a time slot. I want to find the forecasts as of 2022-12-01T12:45, and so I want to find the proportion of the half hour slot of record 2 that has elapsed, and return that proportion of the total.
As of 2022-12-01T12:45 (assuming minute granularity), 50% of row 2 has elapsed, so I would expect the total to return as 150 / 2 = 75.
My current query works, but is slow. What are some ways I can optimise this, or other approaches I can take?
Also, how can we extend this solution to be generalised to any interval frequency? Maybe tomorrow we change our forecasting model and the data comes in sporadically. The hardcoded 30 would not work in that case.
select sum(fraction * total) as t from
select total,
LEAST(
timestampdiff(
minute,
datetime,
current_timestamp()
),
30
) / 30 as fraction
from my_table
where slot <= current_timestamp()
Consider computing your sum first, then remove the last element partial total. In order to keep the last element total, I'd prefer applying window functions instead of aggregations, and limit the output to the last row.
SET #current_time = CURRENT_TIMESTAMP();
WITH cte AS (
SELECT slot,
SUM(total) OVER(ORDER BY slot) AS total,
total AS rowtotal
FROM my_table
WHERE slot < #current_time
ORDER BY slot DESC
LIMIT 1
)
SELECT slot,
total - (30 - TIMESTAMPDIFF(MINUTE,
slot,
#current_time))
/30 * rowtotal AS total
FROM cte
Check the demo here.
Note1: Adding an index on the slot field is likely to boost this query performance.
Note2: If your query is running on millions of data, your timestamp may be likely to change during the query. You could store it into a variable before the query is run (or into another cte).
create an ondex in slot column btree as it is having high selectivity;

What is difference for below there two sql queries

select
substr(insert_date, 1, 14),
device, count(1)
from
abc.xyztable
where
insert_date >= DATE_SUB(NOW(), INTERVAL 10 DAY)
group by
device, substr(insert_date, 1, 14) ;
and then I am trying to get average of the same rows count which I got above.
SELECT
date, device, AVG(count)
FROM
(SELECT
substr(insert_date, 1, 14) AS date,
device,
COUNT(1) AS count
FROM
abc.xyztable
WHERE
insert_date >= DATE_SUB(NOW(), INTERVAL 10 DAY)
GROUP BY
device, substr(insert_date, 1, 14)) a
GROUP BY
device, date;
AS I found both queries return the same results, I tried for last 10 days data.
My purpose is to get the average rows count for last 10 days which I get from the above 1st query.
I'm not entirely sure what you're asking, the "difference" between the two queries is that the first one is valid but the second does not appear to be, as per HoneyBadger's comment. They also seem to be trying to achieve two different goals.
However, I think what you are trying to do is produce a query based on the data from the first query, which returns the date, device, and an average of the count column. If so, I believe the following query would calculate this:
WITH
dataset AS (
select substr(insert_date,1,14) AS theDate, device, count(*) AS
theCount
from abc.xyztable
where insert_date >=DATE_SUB(NOW(), INTERVAL 10 DAY)
group by device,substr(insert_date,1,14)
)
SELECT theDate, device, (SELECT ROUND(AVG(CAST(theCount
AS FLOAT)), 2) FROM
dataset) AS Average
FROM dataset
GROUP BY theDate, device
I have referenced the accepted answers of this question to calculate the average: How to calculate average of a column and then include it in a select query in oracle?
And this question to tidy up the query: Formatting Clear and readable SQL queries
Without having a sample of your data, or any proper context, I can't see how this would be especially useful, so if it was not what you were looking for, please edit your question and clarify exactly what you need.
EDIT: Based on what extra information you have provided, I've made a tweak to my solution to increase the precision of the average column. It now calculates the average to two decimal places. You have stated that this returns the same result as your original query, but the two queries are not formulating the same thing. If the count column is consistently the same number with little variation, the AVG function will round this, which in turn could produce results which look the same, especially if you only compare a small sample, so I have amended my answer to demonstrate this. Again, we'd all be able to help you much easier if you would provide more information, such as a sample of your data.
If you want an average you need to change the last GROUP BY
to get an average per device
GROUP BY device;
to get an average per date
GROUP BY date;
or remove it completely to get an average for all rows in the sub-query
Update
Below is a full example for getting the average per device
SELECT device, avg(count)
FROM (SELECT substr(insert_date,1,14) as date, device, count(1) as count
FROM abc.xyztable
WHERE insert_date >=DATE_SUB(NOW(), INTERVAL 10 DAY)
GROUP BY device,substr(insert_date,1,14)) a
GROUP BY device;

MySQL - group by interval query optimisation

Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).

Mysql sum reputation this month and select 2 other lower results before my result and 2 other higher results

I am trying to calculate user's reputation for this month and then to find 4 nearest other results (2 are lower and 2 are higher) so at all to find 5 results at a sequence.
For example the reputation for certain user is 4500 so I should get at the end results: 2750, 3000, 4500, 4650, 8900
This is the query I am having (it only selects for the certain user his reputation in the current month): SELECT SUM(reputation_change) FROM activity WHERE user_id = '1' AND YEAR(datetime) = YEAR(CURDATE()) AND MONTH(datetime) = MONTH(CURDATE())
My table is as following:
So the question is: how to make this to be performance-fair? Don't I have to restructuralize table and to add just for each user column reputation_this_month?
Thanks for all your suggestions.
You can run a MySQL routine every night that creates a different table based off of the query above. You'll see faster results when you SELECT from this table and you won't be taxing your production table with resource intensive queries

subtract the data for every 5 minutes between two particular times

I have some problem with MYSQL,I need to subtract the data between two particular times,for every 5 minutes and then average it the 5 minutes data.
What I am doing now is:
select (avg(columnname)),convert((min(datetime) div 500)*500, datetime) + INTERVAL 5 minute as endOfInterval
from Databasename.Tablename
where datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:50:00'
group by datetime div 500;
It is the cumulative average.
Suppose i get 500 at 11 o' clock and 700 at 11.05 ,the average i need is (700-500)/5 = 40.
But now i am getting (500+700)/5 = 240.
I dont need the cumulative average .
Kindly help me.
For the kind of average you're talking about, you don't want to aggregate multiple rows using a GROUP BY clause. INstead, you want to compute your result using exactly two diffrent rows from the same table. This calls for a self-join:
SELECT (b.columnname - a.columnname)/5, a.datetime, b.datetime
FROM Database.Tablename a, Database.Tablename b
WHERE b.datetime = a.datetime + INTERVAL 5 MINUTE
AND a.datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:45:00'
a and b refer to two different rows of the same table. The WHERE clause ensures that they are exactly 5 minutes apart.
If there is no second column matching that temporal distance, no resulting row will be included in the query result. If your table doesn't have data points exactly every five minutes, but you have to search for the suitable partner instead, then things become much more difficult. This answer might perhaps be adjusted for that use case. Or you might implement this at the application level, instead of on the database server.