MySQL: Populating A Table with Random Dates or Null Vallues - mysql

I would like to populate a column in my table with either random dates from the past or Null Values. I would like to set the random dates between two dates, January 1 1920 and December 1 2018, or NULL VALUES.
I've come accross some confusing code that could be a solution for generating a random date during a specific period, but it doesn't cater for the null values.
INSERT INTO `FootballPlayers` VALUES (SELECT timestamp('2010-04-30') - INTERVAL FLOOR( RAND( ) * 366) DAY);
I would like for the column of the table to have something like.
+----------------+
| Date of Death |
+----------------+
| 20/10/1990 |
| 01/11/1988 |
| 04/02/2006 |
| NULL |
| 17/05/2011 |
| 22/04/1972 |
| NULL |
| NULL |
| 13/04/1989 |
| 10/03/1999 |
+----------------+

Actually you don't get any null value by this query but if you use some case statement here then you can get this null here.
I used when-case statement and get exact result.
Here you go :
SELECT
CASE
WHEN (SELECT FLOOR( RAND( ) * 366)) BETWEEN 50 AND 255
THEN TIMESTAMP('2010-04-30') -INTERVAL FLOOR( RAND( ) * 366) DAY
ELSE NULL
END time_of_death
You can use this to your insert statement as well.

You can use ELT() function.
SELECT ELT(N,MYRANDOMDATE) AS RANDOMDATE FROM
(SELECT TIMESTAMP('2010-04-30') - INTERVAL FLOOR( RAND( ) * 366) DAY AS MYRANDOMDATE,
FLOOR(RAND()*(2-1+1))+1 AS N) AS FINAL
Here i'm generating random number between 1 and 2 along with the random date. And I'm providing only one argument in ELT() function.When the random number is 2 , it will give you NULL. You can adjust this arguments based on your frequency of needed NULL values
Check demo here

As a convenient single IF expression:
SELECT IF(RAND() > 0.2, TIMESTAMP(NOW()) - INTERVAL FLOOR(RAND() * 43200) MINUTE, NULL)

Related

How to get data between start and expiration date if date is not empty or null?

I am trying to select offers between two dates, one of start and one of expiration and in case the expiration date is empty or null it will always show the offers.
Table
+----------------+---------------------+---------------------+
| deal_title | deal_start | deal_expire |
+----------------+---------------------+---------------------+
| Example Deal | 10-24-2021 16:10:00 | 10-25-2021 16:10:00 |
| Example Deal 2 | 10-24-2021 16:10:00 | NULL |
+----------------+---------------------+---------------------+
Php Function to get the current date by timezone.
function getDateByTimeZone(){
$date = new DateTime("now", new DateTimeZone("Europe/London") );
return $date->format('m-d-Y H:i:s');
}
Mysql query:
SELECT deals.*, categories.category_title AS category_title
FROM deals
LEFT JOIN categories ON deal_category = categories.category_id
WHERE deals.deal_status = 1
AND deals.deal_featured = 1
AND deals.deal_start >= '".getDateByTimeZone()."'
AND '".getDateByTimeZone()."' < deals.deal_expire
OR deals.deal_expire IS NULL
OR deals.deal_expire = ''
GROUP BY deals.deal_id ORDER BY deals.deal_created DESC
You didn't really explain what problem you're having. Having written queries like this many times in the past, you likely need parentheses around the expiration side of your date qualifications.
WHERE deals.deal_status = 1
AND deals.deal_featured = 1
AND deals.deal_start >= '".getDateByTimeZone()."'
AND (
'".getDateByTimeZone()."' < deals.deal_expire
OR deals.deal_expire IS NULL
)
If you don't put parentheses around your OR clause, then operator precedence will cause the whole WHERE clause to be true whenever the expire date is NULL and that's not what you want. You want a compounded OR clause here.
I don't think you need to compare against empty string either, just assuming you put that in there trying to figure things out so I left it out in my sample code.
Also I'm not familiar with PHP string interpolation enough to know if there's an issue with the way you're interpolating the result of the 'getDateByTimeZone' function into that query. It looks funky to me based on past experience with PHP, but I'm ignoring that part of it under the assumption that there's something wrapping this code which resolves it correctly.
The best would be to have MySQL datetimes from the start in your database
But you can do all in MySQL.
STR_TO_DATE will cost time every time it runs
When you put around all expire dates a () it will give back a true if youe of them is true
CREATE TABLE deals (
deal_id int,
deal_status int,
deal_featured int,
deal_category int,
`deal_title` VARCHAR(14),
`deal_start` VARCHAR(19),
`deal_expire` VARCHAR(19)
,deal_created DATEtime
);
INSERT INTO deals
(deal_id,deal_status,deal_featured,deal_category,`deal_title`, `deal_start`, `deal_expire`,deal_created)
VALUES
(1,1,1,1,'Example Deal', '10-24-2021 16:10:00', '10-25-2021 16:10:00',NOW()),
(2,1,1,1,'Example Deal 2', '10-24-2021 16:10:00', NULL,NOW());
CREATE TABLE categories (category_id int,category_title varchar(20) )
INSERT INTO categories VALUES(1,'test')
SELECT
deals.deal_id, MIN(`deal_title`), MIN(`deal_start`), MIN(`deal_expire`),MIN(deals.deal_created) as deal_created , MIN(categories.category_title)
FROM
deals
LEFT JOIN
categories ON deal_category = categories.category_id
WHERE
deals.deal_status = 1
AND deals.deal_featured = 1
AND STR_TO_DATE(deals.deal_start, "%m-%d-%Y %H:%i:%s") >= NOW() - INTERVAL 1 DAY
AND (NOW() < STR_TO_DATE(deals.deal_expire, "%m-%d-%Y %H:%i:%s")
OR deals.deal_expire IS NULL
OR deals.deal_expire = '')
GROUP BY deals.deal_id
ORDER BY deal_created DESC
deal_id | MIN(`deal_title`) | MIN(`deal_start`) | MIN(`deal_expire`) | deal_created | MIN(categories.category_title)
------: | :---------------- | :------------------ | :------------------ | :------------------ | :-----------------------------
1 | Example Deal | 10-24-2021 16:10:00 | 10-25-2021 16:10:00 | 2021-10-24 22:42:34 | test
2 | Example Deal 2 | 10-24-2021 16:10:00 | null | 2021-10-24 22:42:34 | test
db<>fiddle here

Calculate the number of unique strings with the date, with a possible error

In the select, I get rows from the table with time in the format TIMESTAMP. I want to count unique rows, BUT with a possible error of 1 second. In the example below, for example, 3 unique records (1 and 2 have an error of 1 second, and therefore counted as one).
I was thinking to make a function like ABS(time_1 - time_2) > 1 to search for unique values.
Is it possible to implement this somehow on the SQL side, or would it be better to implement it on the server-side, which would be pulling this data?
Is it possible to do it without functions?
How much of a burden will this put on the database?
Any tips for solving the problem are welcome!
ps: I have an old version of SQL 5.7
Example output:
+------------+
| time |
+------------+
| 1583060400 |
+------------+
| 1583060401 |
+------------+
| 1583060460 |
+------------+
| 1583074860 |
+------------+
Assuming that "if a row TIMESTAMP differs from previous row TIMESTAMP by not more than 1 second then ignore this row presence" you may use
SELECT MAX(counter) groups_amount
FROM ( SELECT CASE WHEN TIMESTAMPDIFF(SECOND, #previous, `time`) > 1
THEN #counter := #counter + 1
END counter,
#previous := `time`
FROM test
CROSS JOIN ( SELECT #previous := '1970-01-01 00:00:01',
#counter := 0 ) init_vars
ORDER BY `time` ASC ) subquery;
https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=2aba3b8f473e65f4f40e449c8d97a79d

How can I get the difference between the individual maximum values of different days?

I am new in MySQL, I am trying to find:
The difference between a given day's maximum value occurred and the previous day's maximum value.
I was able to get the maximum values for dates via:
select max(`bundle_count`), `Production_date`
from `table`
group by `Production_date`
But I don't know how to use SQL to calculate the differences between maximums for two given dates.
am expecting output like this
Please help me.
Update 1: Here is a fiddle, http://sqlfiddle.com/#!2/818ad/2, that I used for testing.
Update 2: Here is a fiddle, http://sqlfiddle.com/#!2/3f78d/10 that I used for further refining/fixing, based on Sandy's comments.
Update 3: For some reason the case where there is no previous day was not being dealt with correctly. I thought it was. However, I've updated to make sure that works (a bit cumbersome--but it appears to be right. Last fiddle: http://sqlfiddle.com/#!2/3f78d/45
I think #Grijesh conceptually got you the main thing you needed via the self-join of the input data (so make sure you vote up his answer!). I've cleaned up his query a bit on syntax (building off of his query!):
SELECT
DATE(t1.`Production_date`) as theDate,
MAX( t1.`bundle_count` ) AS 'max(bundle_count)',
MAX( t1.`bundle_count` ) -
IF(
EXISTS
(
SELECT date(t2.production_date)
FROM input_example t2
WHERE t2.machine_no = 1 AND
date_sub(date(t1.production_date), interval 1 day) = date(t2.production_date)
),
(
SELECT MAX(t3.bundle_count)
FROM input_example t3
WHERE t3.machine_no = 1 AND
date_sub(date(t1.production_date), interval 1 day) = date(t3.production_date)
GROUP BY DATE(t3.production_date)
), 0
)
AS Total_Bundles_Used
FROM `input_example` t1
WHERE t1.machine_no = 1
GROUP BY DATE( t1.`production_date` )
Note 1: I think #Grijesh and I were cleaning up the query syntax issues at the same time. It's encouraging that we ended up with very similar versions after we were both doing cleanup. My version differs in using IFNULL() for when there is no preceding data. I also ended up with a DATE_SUB, and I made sure to reduce various dates to mere dates without time component, via DATE()
Note 2: I originally had not fully understood your source tables, so I thought I needed to implement a running count in the query. But upon better inspection, it's clear that your source data already has a running count, so I took that stuff back out.
I am not sure but you need something like this, Hope it will be helpful to you upto some extend:
Try this:
SELECT t1.`Production_date` ,
MAX(t1.`bundle_count`) - MAX(t2.`bundle_count`) ,
COUNT(t1.`bundle_count`)
FROM `table_name` AS t1
INNER JOIN `table_name` AS t2
ON ABS(DATEDIFF(t1.`Production_date` , t2.`Production_date`)) = 1
GROUP BY t1.`Production_date`
EDIT
I create a table name = 'table_name', as below,
mysql> SELECT * FROM `table_name`;
+---------------------+--------------+
| Production_date | bundle_count |
+---------------------+--------------+
| 2004-12-01 20:37:22 | 1 |
| 2004-12-01 20:37:22 | 2 |
| 2004-12-01 20:37:22 | 3 |
| 2004-12-02 20:37:22 | 2 |
| 2004-12-02 20:37:22 | 5 |
| 2004-12-02 20:37:22 | 7 |
| 2004-12-03 20:37:22 | 6 |
| 2004-12-03 20:37:22 | 7 |
| 2004-12-03 20:37:22 | 2 |
| 2004-12-04 20:37:22 | 1 |
| 2004-12-04 20:37:22 | 9 |
+---------------------+--------------+
11 rows in set (0.00 sec)
My query: to find difference in bundle_count between two consecutive dates:
SELECT t1.`Production_date` ,
MAX(t2.`bundle_count`) - MAX(t1.`bundle_count`) ,
COUNT(t1.`bundle_count`)
FROM `table_name` AS t1
INNER JOIN `table_name` AS t2
ON ABS(DATEDIFF(t1.`Production_date` , t2.`Production_date`)) = 1
GROUP BY t1.Production_date;
its output:
+---------------------+-------------------------------------------------+--------------------------+
| Production_date | MAX(t2.`bundle_count`) - MAX(t1.`bundle_count`) | COUNT(t1.`bundle_count`) |
+---------------------+-------------------------------------------------+--------------------------+
| 2004-12-01 20:37:22 | 4 | 9 |
| 2004-12-02 20:37:22 | 0 | 18 |
| 2004-12-03 20:37:22 | 2 | 15 |
| 2004-12-04 20:37:22 | -2 | 6 |
+---------------------+-------------------------------------------------+--------------------------+
4 rows in set (0.00 sec)
This is PostgreSQL syntax (sorry; it's what I'm familiar with) but should fundamentally work in either database. Note this doesn't exactly run in PostgreSQL either because group is not a valid table name (it's a reserved keyword). The approach is a self-join as others have mentioned but I've used a view to handle the max-by-day and the difference as separate steps.
create view max_by_day as
select
date_trunc('day', production_date) as production_date,
max(bundle_count) as bundle_count
from
group
group by
date_trunc('day', production_date);
select
today.production_date as production_date,
today.bundle_count,
today.bundle_count - coalesce(yesterday.bundle_count, 0)
from
max_by_day as today
left join max_by_day yesterday on (yesterday.production_date = today.production_date - '1 day'::interval)
order by
production_date;
PostgreSQL also has a construct called window functions which is useful for this and a bit easier to understand. Just had to stick in a bit of advocacy for a superior database. :-P
select
date_trunc('day', production_date),
max(bundle_count),
max(bundle_count) - lag(max(bundle_count), 1, 0)
over
(order by date_trunc('day', production_date))
from
group
group by
date_trunc('day', production_date);
These two approaches differ in how they handle missing days in the data - the first will treat it as a 0, the second will use the previous day which is present. There wasn't a case like this in your sample so I don't know if this is something you care about.

Query database in weekly interval

I have a database with a created_at column containing the datetime in Y-m-d H:i:s format.
The latest datetime entry is 2011-09-28 00:10:02.
I need the query to be relative to the latest datetime entry.
The first value in the query should be the latest datetime entry.
The second value in the query should be the entry closest to 7 days from the first value.
The third value should be the entry closest to 7 days from the second value.
REPEAT #3.
What I mean by "closest to 7 days from":
The following are dates, the interval I desire is a week, in seconds a week is 604800 seconds.
7 days from the first value is equal to 1316578202 (1317183002-604800)
the value closest to 1316578202 (7 days) is... 1316571974
unix timestamp | Y-m-d H:i:s
1317183002 | 2011-09-28 00:10:02 -> appear in query (first value)
1317101233 | 2011-09-27 01:27:13
1317009182 | 2011-09-25 23:53:02
1316916554 | 2011-09-24 22:09:14
1316836656 | 2011-09-23 23:57:36
1316745220 | 2011-09-22 22:33:40
1316659915 | 2011-09-21 22:51:55
1316571974 | 2011-09-20 22:26:14 -> closest to 7 days from 1317183002 (first value)
1316499187 | 2011-09-20 02:13:07
1316064243 | 2011-09-15 01:24:03
1315967707 | 2011-09-13 22:35:07 -> closest to 7 days from 1316571974 (second value)
1315881414 | 2011-09-12 22:36:54
1315794048 | 2011-09-11 22:20:48
1315715786 | 2011-09-11 00:36:26
1315622142 | 2011-09-09 22:35:42
I would really appreciate any help, I have not been able to do this via mysql and no online resources seem to deal with relative date manipulation such as this. I would like the query to be modular enough to be able to change the interval weekly, monthly, or yearly. Thanks in advance!
Answer #1 Reply:
SELECT
UNIX_TIMESTAMP(created_at)
AS unix_timestamp,
(
SELECT MIN(UNIX_TIMESTAMP(created_at))
FROM my_table
WHERE created_at >=
(
SELECT max(created_at) - 7
FROM my_table
)
)
AS `random_1`,
(
SELECT MIN(UNIX_TIMESTAMP(created_at))
FROM my_table
WHERE created_at >=
(
SELECT MAX(created_at) - 14
FROM my_table
)
)
AS `random_2`
FROM my_table
WHERE created_at =
(
SELECT MAX(created_at)
FROM my_table
)
Returns:
unix_timestamp | random_1 | random_2
1317183002 | 1317183002 | 1317183002
Answer #2 Reply:
RESULT SET:
This is the result set for a yearly interval:
id | created_at | period_index | period_timestamp
267 | 2010-09-27 22:57:05 | 0 | 1317183002
1 | 2009-12-10 15:08:00 | 1 | 1285554786
I desire this result:
id | created_at | period_index | period_timestamp
626 | 2011-09-28 00:10:02 | 0 | 0
267 | 2010-09-27 22:57:05 | 1 | 1317183002
I hope this makes more sense.
It's not exactly what you asked for, but the following example is pretty close....
Example 1:
select
floor(timestampdiff(SECOND, tbl.time, most_recent.time)/604800) as period_index,
unix_timestamp(max(tbl.time)) as period_timestamp
from
tbl
, (select max(time) as time from tbl) most_recent
group by period_index
gives results:
+--------------+------------------+
| period_index | period_timestamp |
+--------------+------------------+
| 0 | 1317183002 |
| 1 | 1316571974 |
| 2 | 1315967707 |
+--------------+------------------+
This breaks the dataset into groups based on "periods", where (in this example) each period is 7-days (604800 seconds) long. The period_timestamp that is returned for each period is the 'latest' (most recent) timestamp that falls within that period.
The period boundaries are all computed based on the most recent timestamp in the database, rather than computing each period's start and end time individually based on the timestamp of the period before it. The difference is subtle - your question requests the latter (iterative approach), but I'm hoping that the former (approach I've described here) will suffice for your needs, since SQL doesn't lend itself well to implementing iterative algorithms.
If you really do need to determine each period based on the timestamp in the previous period, then your best bet is going to be an iterative approach -- either using a programming language of your choice (like php), or by building a stored procedure that uses a cursor.
Edit #1
Here's the table structure for the above example.
CREATE TABLE `tbl` (
`id` int(10) unsigned NOT NULL auto_increment PRIMARY KEY,
`time` datetime NOT NULL
)
Edit #2
Ok, first: I've improved the original example query (see revised "Example 1" above). It still works the same way, and gives the same results, but it's cleaner, more efficient, and easier to understand.
Now... the query above is a group-by query, meaning it shows aggregate results for the "period" groups as I described above - not row-by-row results like a "normal" query. With a group-by query, you're limited to using aggregate columns only. Aggregate columns are those columns that are named in the group by clause, or that are computed by an aggregate function like MAX(time)). It is not possible to extract meaningful values for non-aggregate columns (like id) from within the projection of a group-by query.
Unfortunately, mysql doesn't generate an error when you try to do this. Instead, it just picks a value at random from within the grouped rows, and shows that value for the non-aggregate column in the grouped result. This is what's causing the odd behavior the OP reported when trying to use the code from Example #1.
Fortunately, this problem is fairly easy to solve. Just wrap another query around the group query, to select the row-by-row information you're interested in...
Example 2:
SELECT
entries.id,
entries.time,
periods.idx as period_index,
unix_timestamp(periods.time) as period_timestamp
FROM
tbl entries
JOIN
(select
floor(timestampdiff( SECOND, tbl.time, most_recent.time)/31536000) as idx,
max(tbl.time) as time
from
tbl
, (select max(time) as time from tbl) most_recent
group by idx
) periods
ON entries.time = periods.time
Result:
+-----+---------------------+--------------+------------------+
| id | time | period_index | period_timestamp |
+-----+---------------------+--------------+------------------+
| 598 | 2011-09-28 04:10:02 | 0 | 1317183002 |
| 996 | 2010-09-27 22:57:05 | 1 | 1285628225 |
+-----+---------------------+--------------+------------------+
Notes:
Example 2 uses a period length of 31536000 seconds (365-days). While Example 1 (above) uses a period of 604800 seconds (7-days). Other than that, the inner query in Example 2 is the same as the primary query shown in Example 1.
If a matching period_time belongs to more than one entry (i.e. two or more entries have the exact same time, and that time matches one of the selected period_time values), then the above query (Example 2) will include multiple rows for the given period timestamp (one for each match). Whatever code consumes this result set should be prepared to handle such an edge case.
It's also worth noting that these queries will perform much, much better if you define an index on your datetime column. For my example schema, that would look like this:
ALTER TABLE tbl ADD INDEX idx_time ( time )
If you're willing to go for the closest that is after the week is out then this'll work. You can extend it to work out the closest but it'll look so disgusting it's probably not worth it.
select unix_timestamp
, ( select min(unix_tstamp)
from my_table
where sql_tstamp >= ( select max(sql_tstamp) - 7
from my_table )
)
, ( select min(unix_tstamp)
from my_table
where sql_tstamp >= ( select max(sql_tstamp) - 14
from my_table )
)
from my_table
where sql_tstamp = ( select max(sql_tstamp)
from my_table )

Find adjacent rows without stored procedure

Considering the following table:
someId INTEGER #PK
ageStart TINYINT(3)
ageEnd TINYINT(3)
dateBegin INTEGER
dateEnd INTEGER
Where dateBegin and dateEnd are dates represented as days since 1800-12-28...
And considering some sample data:
someId | ageStart | ageEnd | dateStart | dateEnd
------------------------------------------------
203 | 16 | 25 | 76533 | 76539 \
506 | 16 | 25 | 76540 | 76546 adjacent rows
384 | 16 | 25 | 76547 | 76553 /
342 | 16 | 25 | 76563 | 76569 \
545 | 16 | 25 | 76570 | 76576 adjacent rows
764 | 16 | 25 | 76577 | 76583 /
(There would be arbitrary rows mixed in off course, I just want to illustrate 2 relevant rowsets)
Is it possible to find adjacent rows for a given age category (ageStart to ageEnd) without a stored procedure? The criteria for adjacency is: dateStart is 1 day after dateEnd of the previous found row.
For instance, given the above sample data, if I were to query it with the following parameters:
ageStart = 16
ageEnd = 25
dateStart = 76533
I would like it to return me the rows 1, 2 and 3 of the sample data, since their dates are adjacent (dayStart is next day of previous row's dateEnd).
ageStart = 16
ageEnd = 25
dateStart = 76563
...would give me rows 4, 5 and 6 of the sample data
Probably not efficient if lots of data into your table but try this:
SELECT b.*
FROM
(SELECT #continue:=2) init,
(
SELECT *
FROM ageTable
WHERE ageStart=16 AND
ageEnd=25 AND
dateStart=76533
) a
INNER JOIN (
SELECT *
FROM ageTable
ORDER BY dateStart
) b ON (
b.ageStart=a.ageStart AND
b.ageEnd=a.ageEnd AND
b.dateStart>=a.dateStart
)
LEFT JOIN ageTable c ON (
c.dateStart=b.dateEnd+1 AND
c.ageStart=b.ageStart AND
c.ageEnd=b.ageEnd
)
WHERE
CASE
WHEN #continue=2 THEN
CASE
WHEN c.someId IS NULL THEN
#continue:=1
ELSE
#continue
END
WHEN #continue=1 THEN
#continue:=0
ELSE
#continue
END
You can consider your data to be in a parent-child relationship: a record is a child of a (parent) record if the child's startDate equals the parent's endDate + 1. For hierarchical data (with parent-child relationships), the nested sets model allows you to query the data without stored procedures. You can find a brief description of the nested sets model here:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
The idea is to number your records in a clever way so that you can use simple queries instead of recursive stored procedures.
While it is very easy to query hierarchical data stored in this way, some care is required when adding new records. Adding new records in a nested sets model requires updates of existing records. This may or may not be acceptable in your use case.
Well, you can generate a result-set ordered in a specific way and use LIMIT, to get only first record from it.
For example, get the next record by dateEnd in the list:
SELECT *
FROM `table`
WHERE `dateEnd` > '76546'
ORDER BY `dateEnd`
LIMIT 1
You will get:
384 | 16 | 25 | 76547 | 76553
For a previous row:
SELECT *
FROM `table`
WHERE `dateEnd` < '76546'
ORDER BY `dateEnd` DESC
LIMIT 1
You will get:
203 | 16 | 25 | 76533 | 76539
I doubt that it can be done with just one query...