single query to group data - mysql

I'm saving data in MySQL database every 5 seconds and I want to group this data in an average of 5 minutes.
The select is this:
SELECT MIN(revision) as revrev,
AVG(temperature),
AVG(humidity)
FROM dht22_sens t
Group by revision div 500
ORDER BY `revrev` DESC
Is possible to save data with a single query possibly in the same table?

If it is about reducing the number of rows, then I think you have to insert a new row with aggregated values and then delete the original, detailed rows. I don't know any single sql statement for inserting and deleting in one rush (cf also a similar answer from symcbean at StackOverflow, who additionally suggests to pack these two statements into a procedure).
I'd suggest to add an extra column aggregationLevel and do two statements (with or without procedure):
insert into dht22_sens SELECT MIN(t.revision) as revision,
AVG(t.temperature) as temperature,
AVG(t.humidity) as humidity,
500 as aggregationLevel
FROM dht22_sens t
WHERE t.aggregationLevel IS NULL;
delete from dht22_sens where aggregationLevel is null;

Related

Profiling JOIN and Subquery Queries in phpMyAdmin

Problem Description
I have an audit table that contains history changes of some objects. The audit contains a unique audit event id, id of the object being changed, date of the change, the property that has been changed and also before and after change values and other columns.
What I need to do is query the audit data and get the date the same field was previously changed on the same object. So I need to look at the audit a second time and for each audit entry add the previous similar entry with it's date as the previous change date.
Schema & Data
The table schema has id (id) as the primary key and the object id (parent_id) as an index. Nothing else is indexed. In my test case I have roughly 150 objects with around 80k audit entries for them.
Solution
There are two obvious solutions sub-queries and left join.
In left join I basically join the same audit table on itself again with the join statement making sure the object, field and value changes are correspond, the changes are older than the current change and select the max change date and finally to only pickup one latest previous change I group by id. In case no previous change has been found use the creation date of the object itself.
LEFT JOIN SQL
SELECT `audit`.`id` AS `id`,
`audit`.`parent_id` AS `parent_id`,
`audit`.`date_created` AS `date_created`,
COALESCE(MAX(`audit_prev`.`date_created`), `audit_parent`.`date_entered`) AS `date_created_before`,
`audit`.`field_name` AS `field_name`,
`audit`.`before_value_string` AS `before_value_string`,
`audit`.`after_value_string` AS `after_value_string`
FROM `opportunities_audit` `audit`
LEFT JOIN `opportunities_audit` `audit_prev`
ON(`audit`.`parent_id` = `audit_prev`.`parent_id`
AND `audit_prev`.`date_created` < `audit`.`date_created`
AND `audit_prev`.`after_value_string` = `audit`.`before_value_string`
AND `audit`.`field_name` = `audit_prev`.`field_name`)
LEFT JOIN `opportunities` `audit_parent` ON(`audit`.`parent_id` = `audit_parent`.`id`)
GROUP BY `audit`.`id`;
Subquery logic is rather similar but instead grouping and using the MAX function I simply have order by date DESC and LIMIT 1
SELECT `audit`.`id` AS `id`,
`audit`.`parent_id` AS `parent_id`,
`audit`.`date_created` AS `date_created`,
COALESCE((SELECT `audit_prev`.`date_created`
FROM `opportunities_audit` AS `audit_prev`
WHERE
(`audit_prev`.`parent_id` = `audit`.`parent_id`)
AND (`audit_prev`.`date_created` < `audit`.`date_created`)
AND (`audit_prev`.`after_value_string` = `audit`.`before_value_string`)
AND (`audit_prev`.`field_name` = `audit`.`field_name` )
ORDER BY `date_created` DESC
LIMIT 1
), `audit_parent`.`date_entered`) AS `date_created_before`,
`audit`.`field_name` AS `field_name`,
`audit`.`before_value_string` AS `before_value_string`,
`audit`.`after_value_string` AS `after_value_string`
FROM `opportunities_audit` `audit`
LEFT JOIN `opportunities` `audit_parent` ON(`audit`.`parent_id` = `audit_parent`.`id`);
Both queries produce identical result sets.
Issue
When I run the query in phpMyAdmin the solution with join takes roughly 2m30s to return the result. However, phpMyAdmin says the query took 0.04 seconds. When I run the subquery solution the result comes back immediately and the reported execution time by phpMyAdmin is something like 0.06 seconds.
So I have a hard time understanding where this difference in actual execution time comes from. My initial guess was that the problem would be related to phpMyAdmin's automatic LIMITS on the returned data set- while the result has 80k rows it only displays 25. But adding the LIMIT manually to the queries makes them both execute fast.
Also running the queries from the command line mysql tool returns the full result sets for both queries and the reported execution times correspond to the actual execution time and the method using joins is still roughly 1.5x faster then subquery.
From the profiler data it seems that the bulk of that wait time is spent on sending data. As it takes sending data takes in the order of minutes and everything else is in the order of microseconds.
Still why would the behaviour of phpMyAdmin differ so greatly in the case of the two queries?

SQL delete query is taking too long , 20k records to be deleted from half million records

In my database i have around 500,000 records. I have applied a query to delete around 20,000 records.
Its been 45 minutes that Heidi SQL is showing that the command is being executed.
Here is my command -
DELETE FROM DIRECTINOUT_MSQL WHERE MACHINENO LIKE '%TEXAOUT%' AND DATASENT = 1 AND UPDATETIME = NULL AND DATE <= '2017/4/30';
Please advise, how to avoid these kind of situations in future and what should i do now ? Shall i disintegrate this query with some condition and execute smaller query ?
I have exported my database backup file, its around 47mb
Kindly advise.
Try index method. It will improve your query performance
Database index, or just index, helps speed up the retrieval of data from tables. When you query data from a table, first MySQL checks if the indexes exist, then MySQL uses the indexes to select exact physical corresponding rows of the table instead of scanning the whole table.
https://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html
What is the engine you are using myIsm or InnoDB?
I guess you are using mysql database,
What is the isolation level set on the database?
how much time does a select of that data takes? please see that there is in some cases limits that make you thin that query finished within few seconds but you got only part of the results.
I had this problem once. I solved it using a loop.
You have to first write a query with fast select to check if you have records to delete:
SELECT COUNT(SOME_ID) FROM DIRECTINOUT_MSQL WHERE MACHINENO LIKE '%TEXAOUT%' AND DATASENT = 1 AND UPDATETIME = NULL AND DATE <= '2017/4/30' LIMIT 1
While you have a count of 1 for this query then do this:
DELETE FROM DIRECTINOUT_MSQL WHERE MACHINENO LIKE '%TEXAOUT%' AND DATASENT = 1 AND UPDATETIME = NULL AND DATE <= '2017/4/30' LIMIT 1000
I just chose 1000 randomly, but you have to see what is the fastest limit for the DELETE statement for you server configuration.

SQL discrepancy using COUNT command

I am trying to do my first steps with SQL. Currently I am trying to analyse a database and stepped over a problem which I can't explain. Eventually someone could give me a hint.
I have a mySQL table ('cap851312') witch has 330.178 table rows. I already imported the table to Excel, and verified this number!
Every single row includes a field (column 'ID_MES_ANO') for the entries date. For the time being, all the date is uniquely set "201312".
Starting the following command, I would expect to see as a result the given number of rows, however the number which appears is 476.598.
SELECT movedb.cap851312.ID_MES_ANO, count(*)
FROM movedb.cap851312;
I already imported the file to Excel, and verified the number of lines. Indeed, it is 330.178!
How could I find out, what exactly is going wrong?
Update:
I've tried:
SELECT count(*) FROM movedb.cap851312
This returns as well 476.598.
As I am using workbench, I easily could confirm the numer of 330.178 table rows.
Update 2:
The Workbench Table Inspector confirms: "Table rows: 330178"
Solved - However unsure why:
I changed the statement to
SELECT count(ID_MES_ANO) FROM movedb.cap851512;
This time the result is 330178 !
COUNT(*) counts all rows.
COUNT(ID_MES_ANO) counts only rows for which ID_MES_ANO is not null.
So the difference between the two are the rows where ID_MES_ANO is null.
You can verify this with
SELECT count(*) FROM movedb.cap851512 WHERE ID_MES_ANO IS NULL;
By the way:
SELECT movedb.cap851312.ID_MES_ANO, count(*) FROM movedb.cap851312;
means: aggregate all rows to one single result row (by using the aggregate function COUNT without GROUP BY). This result row shows ID_MES_ANO and the number of all records. Standard SQL does not allow this, because you don't tell the DBMS which ID_MES_ANO of those hundreds of thousands of records to show. MySQL violates the standard here and simply picks one ID_MES_ANO arbitrarily from the rows.

I always have a "WHERE date" in all my SQL queries. Speed up?

I have a large table with hundreds of thousands of rows. However only about 50,000 rows are actually "active" and part of my queries, because I only select the rows that have been updated last 14 days with WHERE crdate > "2014-08-10". So to speed up the queries to the table I'm thinking what of the following options (or maybe you have another suggestion?) that is the best one:
I can delete all old entries and insert them into a "history" table with a cronjob running every day/week. However this will still make the history table slow if I want to do queries to that one.
I can make an index on my "crdate" column. However my dates are in the format of "2014-08-10 06:32:59" so I guess because it is storing so many different values, that index will be quite large(?) and potentially slow(?).
Do you guys have any other suggestion of how I can speed up queries to this table? Is it an bad idea to set an index on a date-column that have so many different values?
1st rule of databases. Always have indexes on columns you are filtering on.
So yes, put an index on crdate.
You can also go with a history table in parallel but make sure you put the index on the crdate column in the history table too. Having the history table, will allow you to have a smaller index in the main table.
I wanted to add to this for future googler's. if you are querying a datatime a more distinct query will result in a more efficient query for example
SELECT * FROM MyTable WHERE MyDateTime = '01/01/2015 00:00:00'
Will be faster than:
SELECT * FROM MyTable WHERE MyDateTime = '01/01/2015'
I tested this repeatedly on an indexed view(by datetime) of 5 million rows the more distinct query gave me a 1 second quicker response

Questions on how to randomly Query multiple rows from Mysql without using "ORDER BY RAND()"

I need to query the MYSQL with some condition, and get five random different rows from the result.
Say, I have a table named 'user', and a field named 'cash'. I can compose a SQL like:
SELECT * FROM user where cash < 1000 order by RAND() LIMIT 5.
The result is good, totally random, unsorted and different from each other, exact what I want.
But I got from google that the efficiency is bad when the table get large because MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.
Then I go on searching and got a solution like:
SELECT * FROM `user` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `user`)- (SELECT MIN(id) FROM `user`))+(SELECT MIN(id) FROM `user`)) AS id) AS t2 WHERE t1.id >= t2.id AND cash < 1000 ORDER BY t1.id LIMIT 5;
This method uses JOIN and MAX(id). And the efficiency is better than the first one according to my testing. However, there is a problem. Since I also needs a condition "cash<1000" and if the the RAND() is so big that no row behind it has the cash<1000, then no result will return.
Anyone has good idea of how to compose the SQL that has have the same effect as the first one but has better efficiency?
Or, shall I just do simple query in MYSQL and let PHP randomly pick 5 different rows from the query result?
Your help is appreciated.
To make first query faster, just SELECT id - that will make the temporary table rather small (it will contain only IDs and not all fields of each row) and maybe it will fit in memory (temp table with text/blob are always created on-disk for example). Then when you get a result, run another query SELECT * FROM xy WHERE id IN (a,b,c,d,...). As you mentioned this approach is not very efficient, but as a quick fix this modification will make it several times faster.
One of the best approaches seems to be getting the total number of rows, choosing random numbers and for each result run a new query SELECT * FROM xy WHERE abc LIMIT $random,1. It should be quite efficient for random 3-5, but not good if you want 100 random rows each time :)
Also consider caching your results. Often you don't need different random rows to be displayed on each page load. Generate your random rows only once per minute. If you will generate the data for example via cron, you can live also with query which takes several seconds, as users will see the old data while new data are being generated.
Here are some of my bookmarks for this problem for reference:
http://jan.kneschke.de/projects/mysql/order-by-rand/
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/