converting randomly generate string instances to datetime mysql - mysql

I want to update a table (cust_id date_id) with randomly generated content
for the cust_id I am using
rand()*1000
for the datetime I am generating random dates over the past three years as follows
CONCAT(ROUND(RAND()*-3) + YEAR(NOW()),"-",ROUND(RAND()*11) + 1,"-",ROUND(RAND()*27) + 1)
Then I am generating many instances by creating inner joins from a table with 10 numbers
FROM numbers JOIN number n2 JOIN number n3
Putting it all together I run
INSERT INTO orders (cust_id, date_id)
SELECT ROUND(RAND()*1000) AS cust_id,
CONVERT(DATETIME, (CONCAT(ROUND(RAND()*-3) + YEAR(NOW()),"-",ROUND(RAND()*11) + 1,"-
",ROUND(RAND()*27) + 1))) AS date_id
FROM numbers JOIN number n2 JOIN number n3 JOIN;
I have played around with different conversion formats, and tried to set the result to a variable and cast that to datetime but it is all throwing up errors. I suspect that the problem is that mysql is reading it as a function rather than a string. I have found another work around where I keep the original datetime and use intervals, but would like to know what the issue with my initial approach is. Any insights that people have would be appreciated.

I have your basic formula working using the DATE() function.
SELECT DATE(CONCAT(ROUND(RAND()*-3) + YEAR(NOW()),"-",
ROUND(RAND()*11) + 1,"-",
ROUND(RAND()*27) + 1))
Still, you're much better off using
SELECT CURDATE() - INTERVAL ROUND(RAND()*3*365.25) DAY
Why? If you leave leap-year February 29 out of your test data, you leave out something critical to test. And if you leave out days 29, 30, and 31 from all your test months, you may not get test coverage for end-of-month date arithmetic.

Related

Calculate Sum of Mean and Std dev in sql query on single column

I am having table name as "Table1" in mysql.I have to find Sum of Mean and Std dev on column "Open".I did it easily using python but I am unable to do it using sql.
Select * from BANKNIFTY_cal_spread;
Date Current Next difference
2021-09-03 00:00:00 36914.8 37043.95 129.14999999999418
2021-09-06 00:00:00 36734 36869.15 135.15000000000146
2021-09-07 00:00:00 36572.9 36710.65 137.75
2021-09-08 00:00:00 36945 37065 120
2021-09-09 00:00:00 36770 36895.1 125.09999999999854
Python Code-
nf_fut_mean = round(df['difference'].mean())
print(f"NF Future Mean: {nf_fut_mean}")
nf_fut_std = round(df['difference'].std())
print(f"NF Future Standard Deviation: {nf_fut_std}")
upper_range = round((nf_fut_mean + nf_fut_std))
lower_range = round((nf_fut_mean - nf_fut_std))
I search for Sql solution but I didn't get it. I tried building query but it's not showing correct results in query builder in grafana alerting.
Now I added Mean column ,std dev column , upper_range and lower_range column using python dataframe and pushed to mysql table.
#Booboo,
After removing Date from SQL Query, it's showing correct results in two columns- average + std_deviation and average - std_deviation.
select average + std_deviation, average - std_deviation from (
select avg(difference) as average, stddev_pop(difference) as std_deviation from BANKNIFTY_cal_spread
) sq
It looks as though the sample you're using for the aggregations for MEAN, STDDEV, etc is the entire table - in which case you have to drop the DATE field from the query's result set.
You could also establish the baseline query using a CTE (Common Table Expression) using a WITH statement instead of a subquery, and then apply the subsequent processing:
WITH BN_CTE AS
(
select avg(difference) as average, stddev_pop(difference) as std_deviation from BANKNIFTY_cal_spread
)
select average + std_deviation, average - std_deviation from BN_CTE;
With the data you posted having only a single Open column value for any given Date column value, you standard deviation should be 0 (and the average just that single value).
I am having difficulty in understanding your SQL since I cannot see how it relates to finding the sum (and presumably the difference, which you also seem to want) of the average and standard deviation of column Open in table Table1. If I just go by your English-language description of what you are trying to do and your definition of table Table1, then the following should work. Note that since we want both the sum and difference of two values, which are not trivial to calculate, we should calculate those two values only once:
select Date, average + std_deviation, average - std_deviation from (
select Date, avg(Open) as average, stddev_pop(Open) as std_deviation from Table1
group by Date
) sq
order by Date
Note that I am using column aliases in the subquery that do not conflict with built-in MySQL function names.
SQL does not allow both calculating something in the SELECT clause and using it. (Yes, #variables allow in limited cases; but that won't work for aggregates in the way hinted in the Question.)
Either repeat the expressions:
SELECT average(difference) AS mean,
average(difference) + stddev_pop(difference) AS "mean-sigma",
average(difference) - stddev_pop(difference) AS "mean+sigma"
FROM BANKNIFTY_cal_spread;
Or use a subquery to call the functions only once:
SELECT mean, mean-sigma, mean+sigma
FROM ( SELECT
average(difference) AS mean,
stddev_pop(difference) AS sigma
FROM BANKNIFTY_cal_spread
) AS x;
I expect the timings to be similar.
And, as already mentioned, avoid using aliases that are identical to function names, etc.

MySQL Daily Time Coverage Without Gaps

I have a table like the following example:
What I need to do is return the coverage (number of hours an operator/s were onsite) for each day. The challenge is that I need to ignore gaps in coverage and not double count hours where two operators were signed in at the same time. For instance, the image below is a visual representation of the table.
The logic of the image is as follows:
Operator A: Signed in at 10 and signed out at noon for a total of 2 hours
Operator B: Signed in at 1 and signed out at 3 for a total of 2 hours
Operator A: Came back and signed in at 2 and signed out at 5 for a total of 3 hours but 1 hour overlaps with operator A so I cannot count that 1 hour otherwise I will be double counting coverage
Therefore the total coverage time without overlaps is 6 hours and the value I need the query to produce. So far I can ignore double counting by taking the max in min dates of each day and subtracting the two:
SELECT YEAR, WEEK, SUM(HOURS)
FROM
(SELECT
YEAR(SignedIn) AS YEAR,
WEEK(SignedIn) AS WEEK,
DAY(SignedIn) AS DAY,
time_to_sec(timediff(MAX(SignedOut), MIN(SignedIn)))/ 3600 AS HOURS
FROM OperatorLogs
GROUP BY YEAR, WEEK, DAY) As VirtualTable
GROUP BY YEAR, WEEK
Which produces 7 because it takes the first sign-in (10 AM) and calculates the hours up until the last sign-out (4:00 PM). However, it includes the gap in coverage (12 - 1) which should not be included. I am unsure of how to remove that time from the total hours while also not double counting when there is overlap, i.e. from 2-3 there should only be 1 hour of coverage even though two separate operators are on site each putting in an hour. Any help is appreciated.
Sorry, work interrupted me.
Here's my working solution, I'm not convinced it's optimal due to the (relatively) expensive nature of the joins, but I've optimised it slightly based on the soft-rule that "shifts" never span multiple days.
SELECT
calendar_date,
SUM(coverage_seconds) / 3600 AS coverage_hours
FROM
(
-- Signins that didn't happen within another operators shift
SELECT DISTINCT
DATE(e.signedin) AS calendar_date,
-(UNIX_TIMESTAMP(e.signedin) MOD 86400) AS coverage_seconds
FROM
OperatorLogs e
LEFT JOIN
OperatorLogs o
ON o.signedin >= DATE(e.signedin)
AND o.signedin < e.signedin
AND o.signedout >= e.signedin
WHERE
o.signedin IS NULL
UNION ALL
-- Signouts that didn't happen within another operators shift
SELECT DISTINCT
DATE(e.signedout) AS calendar_date,
+(UNIX_TIMESTAMP(e.signedout) MOD 86400) AS coverage_seconds
FROM
OperatorLogs e
LEFT JOIN
OperatorLogs o
ON o.signedin >= DATE(e.signedout)
AND o.signedin <= e.signedout
AND o.signedout > e.signedout
WHERE
o.signedin IS NULL
)
AS coverage_markers
GROUP BY
calendar_date
;
Feel free to test it with more rigourous data...
https://www.db-fiddle.com/f/4RgWVhcdNEro21rUksVdXD/0
(As a note, to make your sample data match your excel image, your first shift should have started at 9am)

How to count days between 2 dates except holiday and weekend?

I started a HR management project and I want to count days between 2 dates without counting the holidays and weekends. So the HR can count employee's day off
Here's the case, I want to count between 2018-02-14 and 2018-02-20 where there is an office holiday on 2018-02-16. The result should be 3 days.
I have already created a table called tbl_holiday where I put all weekends and holidays in one year there
I found this post, and I tried it on my MariaDB
Here's my query:
SELECT 5 * (DATEDIFF('2018-02-20', '2018-02-14') DIV 7) +
MID('0123444401233334012222340111123400012345001234550', 7 *
WEEKDAY('2018-02-14') + WEEKDAY('2018-02-20') + 1, 1) -
(SELECT COUNT(dates) FROM tbl_holiday WHERE dates NOT IN (SELECT dates FROM tbl_holiday)) as Days
The query works but the result is 4 days, not 3 days. It means the query only exclude the weekends but not the holiday
What is wrong with my query? Am I missing something? Thank you for helping me
#RichardDoe, from the question comments.
In a reasonable implementation of a date table, you create a list of all days (covering a sufficient range to cope with any query you may run against it - 15 years each way from today is probably a useful minimum), and alongside each day you store a variety of derived attributes.
I wrote a Q&A recently with basic tools that would get you started in SQL Server: https://stackoverflow.com/a/48611348/9129668
Unfortunately I don't have a MySQL environment or intimate familiarity with it to allow me to write or adapt queries off the top of my head (as I'm doing here), but I hope this will illustrate the structure of a solution for you in SQL Server syntax.
In terms of the answer I link to (which generates a date table on the fly) and extending it by adding in your holiday table (and making some inferences about how you've defined your holiday table), and noting that a working day is any day Mon-Fri that isn't a holiday, you'd write a query like so to get the number of working days between any two dates:
WITH
dynamic_date_table AS
(
SELECT *
FROM generate_series_datetime2('2000-01-01','2030-12-31',1)
CROSS APPLY datetime2_params_fxn(datetime2_value)
)
,date_table_ext1 AS
(
SELECT
ddt.*
,IIF(hol.dates IS NOT NULL, 1, 0) AS is_company_holiday
FROM
dynamic_date_table AS ddt
LEFT JOIN
tbl_holiday AS hol
ON (hol.dates = ddt.datetime2_value)
)
,date_table_ext2 AS
(
SELECT
*
,IIF(is_weekend = 1 OR is_company_holiday = 1, 0, 1) AS is_company_work_day
FROM date_table_ext1
)
SELECT
COUNT(datetime2_value)
FROM
date_table_ext2
WHERE
(datetime2_value BETWEEN '2018-02-14' AND '2018-02-20')
AND
(is_company_work_day = 1)
Obviously, the idea for a well-factored solution is that these intermediate calculations (being general in nature to the entire company) get rolled into the date_params_fxn, so that any query run against the database gains access to the pre-defined list of company workdays. Queries that are run against it then start to resemble plain English (rather than the approach you linked to and adapted in your question, which is ingenious but far from clear).
If you want top performance (which will be relevant if you are hitting these calculations heavily) then you define appropriate parameters, save the lot into a stored date table, and index that table appropriately. This way, your query would become as simple as the final part of the query here, but referencing the stored date table instead of the with-block.
The sequentially-numbered workdays I referred to in my comment on your question, are another step again for the efficiency and indexability of certain types of queries against a date table, but I won't complicate this answer any further for now. If any further clarification is required, please feel free to ask.
I found the answer for this problem
It turns out, I just need to use a simple arithmetic operator for this problem
SELECT (SELECT DATEDIFF('2018-02-20', '2018-02-14')) - (SELECT COUNT(id) FROM tbl_holiday WHERE dates BETWEEN '2018-02-14' AND '2018-02-20');

MySQL Comparing Times of Different Formats

I am working with a database full of songs, with titles and durations.
I need to return all songs with a duration greater than 29:59 (MM:SS).
The data is formatted in two different ways.
Format 1
Most of the data in the table is formatted as MM:SS, with some songs being greater than 60 minutes formatted for example as 72:15.
Format 2
Other songs in the table are formatted as HH:MM:SS, where the example given for Format 1 would instead be 01:12:15.
I have tried two different types of queries to solve this problem.
Query 1
The following query returns all of the values that I seek to return for Format 1, but I could not find a way to get values included for Format 2.
select title, duration from songs where
time(cast(duration as time)) >
time(cast('29:59' as time))
Query 2
With the next query, I hoped to use the format specifiers in str_to_date to locate those results with the format HH:MM:SS, but instead I received results such as 3:50. The interpreter is assuming that all of the data is of the form HH:MM, and I do not know how to tell it otherwise without ruining the results.
select title, duration from songs where
time(cast(str_to_date(duration, '%H:%i:%s') as time)) >
time(cast(str_to_date('00:29:59', '%H:%i:%s') as time))
I've tried changing the specifiers in the first call to str_to_date to %i:%s, which gives me all values greater than 29:59, but none greater than 59:59. This is worse than the original query. I've also tried 00:%i:%s and '00:' || duration, '%H:%i:%s'. These two in particular would ruin the results anyway, but I'm just fiddling at this point.
I'm thoroughly stumped, but I'm sure the solution is an easy one. Any help is appreciated.
EDIT: Here is some data requested from the comments below.
Results from show create table:
CREATE TABLE `songs` (
`song_id` int(11) NOT NULL,
`title` varchar(100) NOT NULL,
`duration` varchar(20) DEFAULT NULL,
PRIMARY KEY (`song_id`),
UNIQUE KEY `songs_uq` (`title`,`duration`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Keep in mind, there are more columns than I described above, but I left some out for the sake of simplicity. I will also leave them out in the sample data.
Sample Data
title duration
(Allegro Moderato) 3:50
Agatha 1:56
Antecessor Machine 06:16
Very Long Song 01:24:16
Also Very Long 2:35:22
You are storing unstructured data in a relational database. And that is making you unhappy. So structure it.
Either add a TIME column, or copy song_id into a parallel time table on the side that you can JOIN against. Select all the two-colon durations and trivially update TIME. Repeat, prepending '00:' to all the one-colon durations. Now you have parsed all rows, and can safely ignore the duration column.
Ok, fine, I suppose you could construct a VIEW that offers UNION ALL of those two queries, but that is slow and ugly, much better to fix the on-disk data.
Forget times. Convert to seconds. Here is one way:
select s.*
from (select s.*,
( substring_index(duration, ':', -1) + 0 +
substring_index(substring_index(duration, ':', -2), ':', 1) * 60 +
(case when duration like '%:%:%' then substring_index(duration, ':', 1) * 60*60
else 0
end)
) as duration_seconds
from songs s
) s
where duration_seconds > 29*60 + 59;
After some research I have come up with an answer of my own that I am happy with.
select title, duration from songs where
case
when length(duration) - length(replace(duration, ':', '')) = 1
then time_to_sec(duration) > time_to_sec('29:59')
else time_to_sec(duration) > time_to_sec('00:29:59')
end
Thank you to Gordon Linoff for suggesting that I convert the times to seconds. This made things much easier. I just found his solution a bit overcomplicated, and it reinvents the wheel by not using time_to_sec.
Output Data
title duration
21 Album Mix Tape 45:40
Act 1 1:20:25
Act 2 1:12:05
Agog Opus I 30:00
Among The Vultures 2:11:00
Anabasis 1:12:00
Avalanches Mixtape 60:00
Beautiful And Timeless 73:46
Beggars Banquet Tracks 76:07
Bonus Tracks 68:55
Chindogu 66:23
Spun 101:08
Note: Gordon mentioned his reason for not using time_to_sec was to account for songs greater than 23 hours long. After testing, I found that time_to_sec does support hours larger than 23, just as it supports minutes greater than 59.
It is also perfectly fine with other non-conforming formats such as 1:4:32 (e.g. 01:04:32).

MySQL: Average interval between records

Assume this table:
id date
----------------
1 2010-12-12
2 2010-12-13
3 2010-12-18
4 2010-12-22
5 2010-12-23
How do I find the average intervals between these dates, using MySQL queries only?
For instance, the calculation on this table will be
(
( 2010-12-13 - 2010-12-12 )
+ ( 2010-12-18 - 2010-12-13 )
+ ( 2010-12-22 - 2010-12-18 )
+ ( 2010-12-23 - 2010-12-22 )
) / 4
----------------------------------
= ( 1 DAY + 5 DAY + 4 DAY + 1 DAY ) / 4
= 2.75 DAY
Intuitively, what you are asking should be equivalent to the interval between the first and last dates, divided by the number of dates minus 1.
Let me explain more thoroughly. Imagine the dates are points on a line (+ are dates present, - are dates missing, the first date is the 12th, and I changed the last date to Dec 24th for illustration purposes):
++----+---+-+
Now, what you really want to do, is evenly space your dates out between these lines, and find how long it is between each of them:
+--+--+--+--+
To do that, you simply take the number of days between the last and first days, in this case 24 - 12 = 12, and divide it by the number of intervals you have to space out, in this case 4: 12 / 4 = 3.
With a MySQL query
SELECT DATEDIFF(MAX(dt), MIN(dt)) / (COUNT(dt) - 1) FROM a;
This works on this table (with your values it returns 2.75):
CREATE TABLE IF NOT EXISTS `a` (
`dt` date NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `a` (`dt`) VALUES
('2010-12-12'),
('2010-12-13'),
('2010-12-18'),
('2010-12-22'),
('2010-12-24');
If the ids are uniformly incremented without gaps, join the table to itself on id+1:
SELECT d.id, d.date, n.date, datediff(d.date, n.date)
FROM dates d
JOIN dates n ON(n.id = d.id + 1)
Then GROUP BY and average as needed.
If the ids are not uniform, do an inner query to assign ordered ids first.
I guess you'll also need to add a subquery to get the total number of rows.
Alternatively
Create an aggregate function that keeps track of the previous date, and a running sum and count. You'll still need to select from a subquery to force the ordering by date (actually, I'm not sure if that's guaranteed in MySQL).
Come to think of it, this is a much better way of doing it.
And Even Simpler
Just noting that Vegard's solution is much better.
The following query returns correct result
SELECT AVG(
DATEDIFF(i.date, (SELECT MAX(date)
FROM intervals WHERE date < i.date)
)
)
FROM intervals i
but it runs a dependent subquery which might be really inefficient with no index and on a larger number of rows.
You need to do self join and get differences using DATEDIFF function and get average.