These are my table columns:
ID || Date || Description || Priority
My goal is to insert random test data of 2000 rows with date ranging between (7/1/2019 - 7/1/2020) and randomize the priority from list (High, Medium, Low).
I know how to insert random numbers but I am stuck with the date and the priority fields.
If I need to write code, any pointers on how do I do it?
Just want to be clear - I have issue with randomizing and inserting from a given list
CREATE TABLE mytable (
id SERIAL PRIMARY KEY,
date DATE NOT NULL,
description TEXT,
priority ENUM('High','Medium','Low') NOT NULL
);
INSERT INTO mytable (date, priority)
SELECT '2019-07-01' + INTERVAL FLOOR(RAND()*365) DAY,
ELT(1+FLOOR(RAND()*3), 'High', 'Medium', 'Low')
FROM DUAL;
The fake table DUAL is a special keyword. You can select from it, and it always returns exactly one row. But it has no real columns with data, so you can only select expressions.
Do this INSERT a few times and you get:
mysql> select * from mytable;
+----+------------+-------------+----------+
| id | date | description | priority |
+----+------------+-------------+----------+
| 1 | 2019-10-20 | NULL | Medium |
| 2 | 2020-05-17 | NULL | High |
| 3 | 2020-06-25 | NULL | Low |
| 4 | 2020-05-06 | NULL | Medium |
| 5 | 2019-09-30 | NULL | High |
| 6 | 2019-08-06 | NULL | Low |
| 7 | 2020-02-21 | NULL | High |
| 8 | 2019-11-10 | NULL | High |
| 9 | 2019-07-30 | NULL | High |
+----+------------+-------------+----------+
Here's a trick to use the number of rows in the table itself to insert the same number of rows, basically doubling the number of rows:
INSERT INTO mytable (date, priority)
SELECT '2019-07-01' + INTERVAL FLOOR(RAND()*365) DAY,
ELT(1+FLOOR(RAND()*3), 'High', 'Medium', 'Low')
FROM mytable;
Just changing FROM DUAL to FROM mytable I change from selecting one row, to selecting the current number of rows from the table. But the values I insert are still random expressions, not the values already in those rows. So I get new rows with new random values.
Then repeat this INSERT as many times as you want to double the number of rows.
Read also about the ELT() function.
You seem to be looking for something like this. A basic random sample is:
select t.*
from t
where date >= '2019-07-01' and date < '2020-07-01'
order by random()
fetch first 2000 rows only;
Of course, the function for random() varies by database, as does the logic for limiting rows. This should get about the same distribution of priorities as in the original data.
If you want the rows to come by priority first, then use:
select t.*
from t
where date >= '2019-07-01' and date < '2020-07-01'
order by (case when priority = 'High' then 1 when priority = 'Medium' then 2 else 3 end),
random()
fetch first 2000 rows only;
Related
In my SQL database (MySql), I want to record the price history of an asset.
I have a table with a timestamp as a primary key and price as the value. It has only two column timestamp / price
There should be one price point per second recorded.
Sometimes, there are missing price points. (When the server goes down)
Here is an example of the timestamp column.
**timestamp**
1581431400
1581431401
1581431402
1581431403
1581431405
1581431406 //missing 4 rows price points after this
1581431410
1581431411
1581431412
1581431413
1581431414
1581431415 //missing 3 rows price points after this
1581431418
1581431419
1581431420
Given two timestamps, how to run a SQL query that will fetch the timestamp ranges where the data exists without querying the entire database?
For example, I let's say the two timestamp in UNIX are 1 and 2000000000
What is the SQL query I should run to return the following ranges:
[
[1581431400,1581431406],
[1581431410,1581431415],
[1581431418,1581431420]
]
Here is my answer (Hack). You can use a query like this.
SELECT CONCAT( '[',GROUP_CONCAT('\n',
'[', res.missing_from, '],'
,'[', res.missing_to -1,']') , '\n]') AS missing
FROM (
SELECT m.ts+1 AS missing_from,
(SELECT ts FROM mytable WHERE ts > m.ts ORDER BY ts LIMIT 1 ) as missing_to
FROM mytable m
LEFT JOIN mytable mf ON m.ts+1 = mf.ts
WHERE
mf.ts IS NULL
) AS res
WHERE res.missing_to - res.missing_from > 0;
SAMPLE
mysql> SELECT * FROM mytable;
+------------+
| ts |
+------------+
| 1581431400 |
| 1581431401 |
| 1581431402 |
| 1581431403 |
| 1581431405 |
| 1581431406 |
| 1581431410 |
| 1581431411 |
| 1581431412 |
| 1581431413 |
| 1581431414 |
| 1581431415 |
| 1581431418 |
| 1581431419 |
| 1581431420 |
+------------+
15 rows in set (0.00 sec)
TEST
mysql> SELECT CONCAT( '[',GROUP_CONCAT('\n',
'[', res.missing_from, '],'
,'[', res.missing_to -1,']') , '\n]') AS missing
FROM (
SELECT m.ts+1 AS missing_from,
(SELECT ts FROM mytable WHERE ts > m.ts ORDER BY ts LIMIT 1 ) as missing_to
FROM mytable m
LEFT JOIN mytable mf ON m.ts+1 = mf.ts
WHERE
mf.ts IS NULL
) AS res
WHERE res.missing_to - res.missing_from > 0;
+-------------------------------------------------------------------------------------+
| missing |
+-------------------------------------------------------------------------------------+
| [
[1581431404],[1581431404],
[1581431407],[1581431409],
[1581431416],[1581431417]
] |
+-------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
I would simply use window functions:
select min(timestamp), max(timestamp)
from (select timestamp, row_number() over (order by timestamp) as seqnum
from t
) t
group by (timestamp - seqnum);
I'm not sure what "without querying the entire database?" is supposed to mean. This reads the table -- as any such query would need to -- but does not need to query anything else in the database.
This illustrates what happens:
timestamp seqnum diff
1581431400 1 1581431399
1581431401 2 1581431399
1581431402 3 1581431399
1581431403 4 1581431399
1581431405 5 1581431400
1581431406 6 1581431400
1581431410 7 1581431403
1581431411 8 1581431403
The last column is identifying adjacent timestamps that differ by "1". That is what is aggregated in the outer query.
UPDATE
Sometime, when a family is being inactivated from a system, it may contain more than 1 individual. In my case show at the sql fiddle, the family with household_id=12 has 3 individuals.
I need to insert the data of these 3 individuals as the same from indiviudal table to individual_history table and just changing the ind_action field into the following message HH has been inactivated.
Here is a sample data:
| individual_id | household_id | family_relation_id | marital_status_id | ind_lmms_id | ind_un_id | head_of_hh | ind_first_name_ar | ind_last_name_ar | ind_first_name_en | ind_last_name_en | ind_gender | dob | ind_status | ind_date_added | user_id | system_date |
|---------------|--------------|--------------------|-------------------|-------------|-----------|------------|-------------------|------------------|-------------------|------------------|------------|------------|------------|----------------------|---------|----------------------|
| 1 | 12 | 3 | 1 | 321 | (null) | no | u | x | (null) | (null) | Male | 2012-01-01 | Active | 2018-07-19T00:00:00Z | 1 | 2018-07-19T00:00:00Z |
| 2 | 12 | 1 | 2 | 123 | (null) | no | x | y | (null) | (null) | Female | 1998-03-05 | Active | 2015-03-05T00:00:00Z | 1 | 2015-03-05T00:00:00Z |
| 3 | 12 | 3 | 1 | 1234 | (null) | no | x | z | (null) | (null) | Female | 2004-04-05 | Active | 2018-04-11T00:00:00Z | 1 | 2018-04-11T00:00:00Z |
All 3 fields should be inserted to the table individual_history and ind_action is set to the note I added above.
I need to insert into a table called individual_history values of a SELECT query from table individual.
Here is the query:
INSERT INTO individual_history
(individual_id,
household_id,
family_relation_id_history,
marital_status_id_history,
ind_lmms_id_history,
ind_un_id_history,
head_of_hh_history,
ind_first_name_ar_history,
ind_last_name_ar_history,
ind_first_name_en_history,
ind_last_name_en_history,
ind_gender_history,
dob_history,
ind_status_history,
ind_action,
ind_date_changed,
user_id,
system_date)
VALUES ((SELECT i.individual_id,
i.household_id,
i.family_relation_id,
i.marital_status_id,
i.ind_lmms_id,
i.ind_un_id,
i.head_of_hh,
i.ind_first_name_ar,
i.ind_last_name_ar,
i.ind_first_name_en,
i.ind_last_name_en,
i.ind_gender,
i.dob,
i.ind_status
FROM individual i
WHERE i.household_id = :hid),
'HH Status Changed to inactive',
(SELECT i.ind_date_added,
i.user_id
FROM individual i
WHERE i.household_id = :hid),
:systemDate)
As you can see from the query, I am splitting the SELECT statement into 2 parts, as I want to insert a specific ind_action message, then I will continue by getting the other 2 fields date added and user_id.
The systemDate is the just the now() function result.
I tried to run this query using 12 as hid and I received the following error:
1136 - Column count doesn't match value count at row 1
After doing few searches, I found that I should add parenthesis for each of the values. So I changed the query to:
INSERT INTO individual_history
(individual_id,
household_id,
family_relation_id_history,
marital_status_id_history,
ind_lmms_id_history,
ind_un_id_history,
head_of_hh_history,
ind_first_name_ar_history,
ind_last_name_ar_history,
ind_first_name_en_history,
ind_last_name_en_history,
ind_gender_history,
dob_history,
ind_status_history,
ind_action,
ind_date_changed,
user_id,
system_date)
VALUES ((SELECT i.individual_id,
i.household_id,
i.family_relation_id,
i.marital_status_id,
i.ind_lmms_id,
i.ind_un_id,
i.head_of_hh,
i.ind_first_name_ar,
i.ind_last_name_ar,
i.ind_first_name_en,
i.ind_last_name_en,
i.ind_gender,
i.dob,
i.ind_status
FROM individual i
WHERE i.household_id = 12),
( 'HH Status Changed to inactive' ),
(SELECT i.ind_date_added,
i.user_id
FROM individual i
WHERE i.household_id = 12),
( NOW() ))
But still got the same error.
I tried to count the number of fields I am inserting compared to the ones I am selecting, and they are the same (18 fields).
UPDATE
I changed the query by removing the VALUES clause:
INSERT INTO individual_history
(
individual_id,
household_id,
family_relation_id_history,
marital_status_id_history,
ind_lmms_id_history,
ind_un_id_history,
head_of_hh_history,
ind_first_name_ar_history,
ind_last_name_ar_history,
ind_first_name_en_history,
ind_last_name_en_history,
ind_gender_history,
dob_history,
ind_status_history,
ind_action,
ind_date_changed,
user_id,
system_date
)
SELECT i.individual_id,
i.household_id,
i.family_relation_id,
i.marital_status_id,
i.ind_lmms_id,
i.ind_un_id,
i.head_of_hh,
i.ind_first_name_ar,
i.ind_last_name_ar,
i.ind_first_name_en,
i.ind_last_name_en,
i.ind_gender,
i.dob,
i.ind_status
FROM individual i
WHERE i.household_id=12,
'HH Status Changed to inactive',
(
SELECT i.ind_date_added,
i.user_id
FROM individual i
WHERE i.household_id=12),
now()
And I got the following error:
1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use
near '
'HH Status Changed to inactive',
' at line 10
Please note that the datatype of fields are exactly the same in both tables, and individual_history table contain an auto-increment primary key.
HERE IS AN SQL FIDDLE to check with sample data.
You don't need two SELECTs for what you're trying to do. If you want to use some specific value for ind_action, simply replace it in your select, same as you did with the now() function:
INSERT INTO targetTable (col1, col2, col3, col4, colTime)
SELECT colA, colB, 'my specific string', colD, now()
FROM sourceTable WHERE colA = 12;
Here, col3 gets the string, colTime the now().
#Marting Hennings, I am a bit too late ... but this query should work:
INSERT INTO individual_history
(individual_id,
household_id,
family_relation_id_history,
marital_status_id_history,
ind_lmms_id_history,
ind_un_id_history,
head_of_hh_history,
ind_first_name_ar_history,
ind_last_name_ar_history,
ind_first_name_en_history,
ind_last_name_en_history,
ind_gender_history,
dob_history,
ind_status_history,
ind_action,
ind_date_changed,
user_id,
system_date)
SELECT individual_id,
household_id,
family_relation_id,
marital_status_id,
ind_lmms_id,
ind_un_id,
head_of_hh,
ind_first_name_ar,
ind_last_name_ar,
ind_first_name_en,
ind_last_name_en,
ind_gender,
dob,
ind_status,
'HH Status Changed to inactive',
ind_date_added,
user_id,
now()
FROM individual
WHERE individual.household_id = 12
If I have a MySQL table such as:
I want to use SQL to calculate the sum of the PositiveResult column and also the NegativeResult column. Normally I could simply do SUM(PositiveResult) in a query.
But what if I wanted to go a step further and place the totals in a row at the bottom of the result set:
Can this be achieved at the data level or is it a presentation layer issue? If it can be done by SQL, how might I do this? I am a bit of an SQL newbie.
Thanks to the respondents. I will now check things with the customer.
Also, can a text column be added so that the value of the last row of data is not shown in the summary row? Like this:
I would also do this in the presentation layer, but you can do it MySQL...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,pos DECIMAL(5,2)
,neg DECIMAL(5,2)
);
INSERT INTO my_table VALUES
(1,0,0),
(2,1,-2.5),
(3,1.6,-1),
(4,1,-2);
SELECT COALESCE(id,'total') my_id,SUM(pos),SUM(neg) FROM my_table GROUP BY id WITH ROLLUP;
+-------+----------+----------+
| my_id | SUM(pos) | SUM(neg) |
+-------+----------+----------+
| 1 | 0.00 | 0.00 |
| 2 | 1.00 | -2.50 |
| 3 | 1.60 | -1.00 |
| 4 | 1.00 | -2.00 |
| total| 3.60 | -5.50 |
+-------+----------+----------+
5 rows in set (0.02 sec)
Here's a hack for the amended problem - it ain't pretty but I think it works...
SELECT COALESCE(id,'') my_id
, SUM(pos)
, SUM(neg)
, COALESCE(string,'') n
FROM my_table
GROUP
BY id
, string
WITH ROLLUP
HAVING n <> '' OR my_id = ''
;
select keyword,sum(positiveResults)+sum(NegativeResults)
from mytable
group by
Keyword
if you need the absolute value put sum(abs(NegativeResults)
This should be handled at least one layer above the SQL query layer.
The initial query can fetch the detail info and then the application layer can calculate the aggregation (summary row). Or, a second db call to fetch the summary directly can be used (although this would be efficient only for cases where the calculation of the summary is very resource-intensive and a second db call is really necessary - most of the time the app layer can do it more efficiently).
The ordering/layout of the results (i.e. the detail rows followed by the "footer" summary row) should be handled at the presentation layer.
I'd recommend doing this at the presentation layer. To do something like this in SQL is also possible.
create table test (
keywordid int,
positiveresult decimal(10,2),
negativeresult decimal(10,2)
);
insert into test values
(1, 0, 0), (2, 1, -2.5), (3, 1.6, -1), (4, 1, -2);
select * from (
select keywordid, positiveresult, negativeresult
from test
union all
select null, sum(positiveresult), sum(negativeresult) from test
) main
order by
case when keywordid is null then 1000000 else keywordid end;
I added ordering using a arbitrarily high number if keywordid is null to make sure the ordered recordset can be pulled easily by the view for displaying.
Result:
+-----------+----------------+----------------+
| keywordid | positiveresult | negativeresult |
+-----------+----------------+----------------+
| 1 | 0.00 | 0.00 |
| 2 | 1.00 | -2.50 |
| 3 | 1.60 | -1.00 |
| 4 | 1.00 | -2.00 |
| NULL | 3.60 | -5.50 |
+-----------+----------------+----------------+
Table Name: DemoTable.
Total Fields: 2
Fields:
id (int, auto increment, primary key)
month_and_year (varchar(10))
month_and_year contains date as '2015-03', '2015-01', '2014-12' and so on...
I am trying to get values from the table between '2014-10' and '2015-03'.
SELECT * FROM DemoTable where month_and_year>='2014-10' AND month_and_year<='2015-03' ORDER BY month_and_year DESC
Query does not give desired output as month_and_year field has varchar data type. Changing varchar to date data type isn't possible as date data type does not accept date in 'yyyy-mm' format.
How can the result be obtained?
PS:Is UNIX_TIMESTAMP() a safe bet in this case?
You should never store date value as varchar and choose mysql native date related data types like date,datetime or timestamp
However in your case you need to do some date related calculations before doing the select query. Consider the following table
mysql> select * from test ;
+------+----------------+
| id | month_and_year |
+------+----------------+
| 1 | 2014-10 |
| 2 | 2014-10 |
| 3 | 2014-09 |
| 4 | 2014-11 |
| 5 | 2015-01 |
| 6 | 2014-08 |
+------+----------------+
Now the approach would as
First convert the varchar to real date
Then for the lower limit always start the comparison from first day of the year month value
The upper limit will be till the end of the month.
So the query becomes
select * from test
where
date_format(
str_to_date(
month_and_year,'%Y-%m'
),'%Y-%m-01'
)
>=
date_format(
str_to_date('2014-10','%Y-%m'
),'%Y-%m-01'
)
and
last_day(
date_format(
str_to_date(month_and_year,'%Y-%m'
),'%Y-%m-01'
)
)
<=
last_day(
date_format(
str_to_date('2015-03','%Y-%m'
),'%Y-%m-01'
)
);
The output will be as
+------+----------------+
| id | month_and_year |
+------+----------------+
| 1 | 2014-10 |
| 2 | 2014-10 |
| 4 | 2014-11 |
| 5 | 2015-01 |
+------+----------------+
Use the function STR_TO_DATE(string,format);
http://www.mysqltutorial.org/mysql-str_to_date/
You should use either mysql date time functions or use int field in mysql and store UNIXTIMESTAMP and compare like you are already doing. I think it is overkill to store unixtimestamp because you only need month and year and you won't benefit a lot from unixtimestamp advantages.
I have the following table structure:
Table:
Column 1: pid (Integer, Primary Key, Auto Increment, NOT NULL )
Column 2: name (Varchar, NOT NULL )
Column 3: balance (Int(11) )
Column 3 (balance) can have NULL values.
I want to write a query to get the all entries sorted by balance (both in ASC & DESC order), but the entries with NULL values should come after the sorted list and they should be sorted by name.
For example, for the following data:
|--------pid--------|--------name--------|--------balance--------
1 | Tom | 1000000000
2 | Jerry | NULL
3 | Spike | 4000000000
4 | Butch | NULL
5 | Nibbles | NULL
6 | Tyke | 3000000000
the expected result is (For ascending order):
|--------pid--------|--------name--------|--------balance--------
1 | Tom | 1000000000
6 | Tyke | 3000000000
3 | Spike | 4000000000
4 | Butch | NULL
2 | Jerry | NULL
5 | Nibbles | NULL
What should be the query for this case?
You can do something like:
SELECT * FROM `table` ORDER BY (IF(`balance` IS NULL, `name`, `balance`))
Try this:
SELECT * FROM table_name ORDER BY ISNULL(balance), IFNULL(balance, name);
It will first sort by ISNULL (so non-null balances will be first), and then sort by "name", if the "balance" is null, or sort by "balance" otherwise.
You can of course add DESC or ASC after the ISNULL() or IFNULL() expressions. You can control whether you want null values first:
SELECT * FROM table_name ORDER BY ISNULL(balance) DESC, IFNULL(balance, name) ASC;
or null values last:
SELECT * FROM table_name ORDER BY ISNULL(balance) ASC, IFNULL(balance, name) ASC;
Explanation of IFNULL can be found here and explanation of ISNULL is here.