Mysql Group by clusters of results in time - mysql

I have a DATETIME field in a mysql table with random entries. I would like to select object groups, and break them apart when there is a four hour gap. Ideally I could return all IDs within each group in an array (or sub-group?).
This seems beyond my SQL skills.
Any interesting solutions?

SQL is meant to perform set operations on data, while what you're asking for involves sequential processing.
So, I'd suggest processing it sequentially, rather than trying to twist SQL to try to do this. I believe that you have two basic choices:
Select data from the table in a stored procedure, using a cursor to process the results.
Execute a select statement from the external, stand-alone language of your choice (Java, C#, Python, PHP...whatever floats your boat), and group the data appropriately there.

You can use MySQL DATE_FORMAT clause.
Example
SELECT CAST(DATE_FORMAT(date,'%Y-%m-%d %k:00:00') AS DATETIME) hour
FROM table
WHERE date = CURRENT_DATE - INTERVAL 4 HOUR
GROUP BY CAST(DATE_FORMAT(date,'%Y-%m-%d %k:00:00') AS DATETIME)

Related

SQL SELECT - order dates with wrong format

I was tasked with ordering some entries in our web application. Current solution made by some other guy 10 years ago, is that there is a select on db and then it iterates and make table.
Problem is, that date is in dd-mm-yyyy format and in varchar data.
And not really sure, if I am brave enought to make changes to the database.
So is there some way to order it anyway within a select, some way to order it by the end meaby? Or only way without making some gruesome function in code is to change the db?
You can use the STR_TO_DATE() function for this. Try
ORDER BY STR_TO_DATE(varcharDateColumn, '%d-%m-%Y')
It converts your character-string dates to the DATE datatype where ordering works without trouble.
As of MySQL 5.7 or later, you can add a so-called generated column to your table without touching the other data.
ALTER TABLE tbl
ADD COLUMN goodDate
AS (STR_TO_DATE(varcharDateColumn, '%m-%d-%Y'))
STORED;
You can even put an index on that column if you need to use it for searrching.
ALTER TABLE t1 ADD INDEX goodDate(goodDate);
You can use STR_TO_DATE function, but this will work only for small tables(maybe thousands of records), on large data sets you will face performance problems:
SELECT *
FROM (
SELECT '01-5-2013' AS Date
UNION ALL
SELECT '02-6-2013' AS Date
UNION ALL
SELECT '01-6-2013' AS Date
) AS t1
ORDER BY STR_TO_DATE(Date,'%d-%m-%Y')
Long term solution should be conversion of that column to proper date type.

MySQL - Group By date/time functions on a large table

I have a bunch of financial stock data in a MySQL table. The data is stored in a 1min tick per row format (OHLC). From that data I'd like to create 30min/hourly/daily aggregates. The problem that the table is enormous and grouping by date functions on the timestamp column yeilds horrible performance results.
Ex: The following query produces the right result but ends up taking too long.
SELECT market, max(timestamp) AS TS
FROM tbl_data
GROUP BY market, DATE(timestamp), HOUR(timestamp)
ORDER BY market, TS ASC
The table has a primary index on the (market, timestamp) columns. And I have also added an additional index on the timestamp column. However, that is not of much help as the usage of date/hour functions means a table scan regardless.
How can I improve the performance? Perhaps I should consider a different database than MySQL that provides specialized date/time indexes? if so what would be a good option?
One thing to note is that it would suffice if I could get the LAST row of each hour/day/timeframe. The database has tens of millions of rows.
MySQL version: 5.7
Thanks in advance for the help.
Edit: Here is what Explain shows on a smaller DB of the exact same format:

Stored Procedure: Use User-Defined Variable or Call Database functions multiple times

I am writing a stored procedure that will go through and calculate different sums using 3 dynamically calculated dates in multiple queries.
If this was in code I would calculate the dates at the beginning of the function and use the local variables in the subsequent statements so that the calculation only had to happen once. I'm not very familiar with the inner workings of a MySQL database, however, so I'm not sure if the same principle applies to stored procedures.
My first attempt at the stored procedure in question looks like:
CREATE PROCEDURE `CalcValues` (
IN p_id int
)
BEGIN
declare thirty datetime DEFAULT DATE_ADD(CURDATE(), INTERVAL -30 DAY);
UPDATE rpt
SET decTarget = (SELECT SUM(decOne)
FROM tblOne
WHERE id = p_id AND
dteOne <= CURDATE() AND dteOne > thirty)
WHERE id = p_id;
END
I've removed a bunch of stuff that would be processed after this section but basically I'm calculating 3 dates instead of one and using at least one of them in each query. There are a total of 10 or so queries in the procedure.
Would it be more efficient to calculate the datetimes at the begining of the procedure and use the local variables or to call DATE_ADD(CURDATE(), INTERVAL -30 DAY) whenever I want to use the calculated date?
This is really more a question of programming philosophy than one of pragmatics, and the answer very much a matter of opinion. My opinion is that factoring out the function result into a local variable leads to more readable code, and I believe is what a decent optimizing compiler would do anyway. The fact that it's SQL does not, in my opinion, make a difference in this.

Mysql calculation in select statement

I have been doing my office work in Excel.and my records have become too much and want to use mysql.i have a view from db it has the columns "date,stockdelivered,sales" i want to add another calculated field know as "stock balance".
i know this is supposed to be done at the client side during data entry.
i have a script that generates php list/report only based on views and tables,it has no option for adding calculation fields, so i would like to make a view in mysql if possible.
in excel i used to do it as follows.
i would like to know if this is possible in mysql.
i don't have much experience with my sql but i imagine first
one must be able to select the previous row.colomn4
then add it to the current row.colomn2 minus current row.colomn3
If there is another way to achieve the same out put please suggest.
Generally speaking, SQL wasn't really intended to yield "running totals" like you desire. Other RDBMS have introduced proprietary extensions to deliver analytic functions which enable calculations of this sort, but MySQL lacks such features.
Instead, one broadly has four options. In no particular order:
Accumulate a running total in your application, as you loop over the resultset;
Alter your schema to keep track of a running total within your database (especially good in situations like this, where new data is only ever appended "to the end");
Group a self-join:
SELECT a.Sale_Date,
SUM(a.Stock_Delivered) AS Stock_Delivered,
SUM(a.Units_Sold) AS Units_Sold,
SUM(b.Stock_Delivered - b.Units_Sold) AS `Stock Balance`
FROM sales_report a
JOIN sales_report b ON b.Sale_Date <= a.Sale_Date
GROUP BY a.Sale_Date
Accumulate the running total in a user variable:
SELECT Sale_Date,
Stock_Delivered,
Units_Sold,
#t := #t + Stock_Delivered - Units_Sold AS `Stock Balance`
FROM sales_report, (SELECT #t:=0) init
ORDER BY Sale_Date
Eggyal has four good solutions. I think the cleanest way to do a running total in MySQL is using a correlated subquery -- it eliminates the group by at the end. So I would add to the list of options:
SELECT sr.Sale_Date, sr.Stock_Delivered, sr.Units_Sold,
(select SUM(sr2.Stock_Delivered) - sum(sr2.Units_Sold)
from sales_report sr2
where sr2.sale_date <= sr.sale_date
) as StockBalance
FROM sales_report sr
ORDER BY Sale_Date
SELECT
sales_report.Stock_Delivered,
sales_report.Units_Sold,
sales_report.Stock_Delivered - sales_report.Units_Sold
FROM
sales_report;

How to dynamically add SELECT statements if not enough results?

I am querying a table for a set of data that may or may not have enough results for the correct operation of my page.
Typically, I would just broaden the range of my SELECT statement to insure enough results, but in this particular case, we need to start with as small of a range of results as possible before expanding (if we need to).
My goal, therefore, is to create a query that will search the db, determine if it got a sufficient amount of rows, and continue searching if it didn't. In PHP something like this would be very simple, but I can't figure out how to dynamically add select statements in a query based off of a current count of rows.
Here is a crude illustration of what I'd like to do - in a SINGLE query:
SELECT *, COUNT(`id`) FROM `blogs` WHERE `date` IS BETWEEN '2011-01-01' AND '2011-01-02' LIMIT 25
IF COUNT(`id`) < 25 {
SELECT * FROM `blogs` WHERE `date` IS BETWEEN '2011-01-02' AND '2011-01-03' LIMIT 25
}
Is this possible to do with a single query?
You have 2 possible solutions:
Compare the count on the programming language side. And if there is not enough - perform one more query. (it is not as bad as you think: query cache, memcached, proper indexes, enough memory on server, etc - there are a lot of possibilities to improve performance)
Create stored procedure