Find adjacent rows without stored procedure - mysql

Considering the following table:
someId INTEGER #PK
ageStart TINYINT(3)
ageEnd TINYINT(3)
dateBegin INTEGER
dateEnd INTEGER
Where dateBegin and dateEnd are dates represented as days since 1800-12-28...
And considering some sample data:
someId | ageStart | ageEnd | dateStart | dateEnd
------------------------------------------------
203 | 16 | 25 | 76533 | 76539 \
506 | 16 | 25 | 76540 | 76546 adjacent rows
384 | 16 | 25 | 76547 | 76553 /
342 | 16 | 25 | 76563 | 76569 \
545 | 16 | 25 | 76570 | 76576 adjacent rows
764 | 16 | 25 | 76577 | 76583 /
(There would be arbitrary rows mixed in off course, I just want to illustrate 2 relevant rowsets)
Is it possible to find adjacent rows for a given age category (ageStart to ageEnd) without a stored procedure? The criteria for adjacency is: dateStart is 1 day after dateEnd of the previous found row.
For instance, given the above sample data, if I were to query it with the following parameters:
ageStart = 16
ageEnd = 25
dateStart = 76533
I would like it to return me the rows 1, 2 and 3 of the sample data, since their dates are adjacent (dayStart is next day of previous row's dateEnd).
ageStart = 16
ageEnd = 25
dateStart = 76563
...would give me rows 4, 5 and 6 of the sample data

Probably not efficient if lots of data into your table but try this:
SELECT b.*
FROM
(SELECT #continue:=2) init,
(
SELECT *
FROM ageTable
WHERE ageStart=16 AND
ageEnd=25 AND
dateStart=76533
) a
INNER JOIN (
SELECT *
FROM ageTable
ORDER BY dateStart
) b ON (
b.ageStart=a.ageStart AND
b.ageEnd=a.ageEnd AND
b.dateStart>=a.dateStart
)
LEFT JOIN ageTable c ON (
c.dateStart=b.dateEnd+1 AND
c.ageStart=b.ageStart AND
c.ageEnd=b.ageEnd
)
WHERE
CASE
WHEN #continue=2 THEN
CASE
WHEN c.someId IS NULL THEN
#continue:=1
ELSE
#continue
END
WHEN #continue=1 THEN
#continue:=0
ELSE
#continue
END

You can consider your data to be in a parent-child relationship: a record is a child of a (parent) record if the child's startDate equals the parent's endDate + 1. For hierarchical data (with parent-child relationships), the nested sets model allows you to query the data without stored procedures. You can find a brief description of the nested sets model here:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
The idea is to number your records in a clever way so that you can use simple queries instead of recursive stored procedures.
While it is very easy to query hierarchical data stored in this way, some care is required when adding new records. Adding new records in a nested sets model requires updates of existing records. This may or may not be acceptable in your use case.

Well, you can generate a result-set ordered in a specific way and use LIMIT, to get only first record from it.
For example, get the next record by dateEnd in the list:
SELECT *
FROM `table`
WHERE `dateEnd` > '76546'
ORDER BY `dateEnd`
LIMIT 1
You will get:
384 | 16 | 25 | 76547 | 76553
For a previous row:
SELECT *
FROM `table`
WHERE `dateEnd` < '76546'
ORDER BY `dateEnd` DESC
LIMIT 1
You will get:
203 | 16 | 25 | 76533 | 76539
I doubt that it can be done with just one query...

Related

Select statement returning the IDs backwards

I have a table with some primary IDs inserted.
In anoter post i have already done i was provided half of the answer i requested and i am thankful for this. (MySQL select statement returning results in circle mode)
I tried to accomplish the other half with no luck. What i want to achieve is a select statement that will get me the opposite of the below example.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table (id SERIAL PRIMARY KEY);
INSERT INTO my_table VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9);
The select statement i was provided with:
SELECT * FROM my_table ORDER BY id > 5 DESC, id;
Returns 6 - 7 - 8 - 9 - 1 - 2 - 3 - 4 - 5
I also need a select statement to return:
5 - 4 - 3 - 2 - 1 - 9 - 8 - 7 - 6
Thank you in advance!
You need conditional sorting:
SELECT * FROM my_table
ORDER BY id < 6 DESC, id DESC;
See the demo.
Results:
| id |
| --- |
| 5 |
| 4 |
| 3 |
| 2 |
| 1 |
| 9 |
| 8 |
| 7 |
| 6 |
From your earlier question, it seems you need ways to find the next and previous item in a database. This assumes
you know the id of the current item
the id is a primary key or other unique value
next gets the next higher id value
previous gets the next lower id value
when you get to either end of the range of id values, you get nothing back... there isn't any next or previous.
Here's how to get the next id.
SELECT id FROM tbl WHERE id > [[[current_id]]] ORDER BY id LIMIT 1
And, similarly, here's how to get the previous value.
SELECT id FROM tbl WHERE id < [[[current_id]]] ORDER BY ID desc LIMIT 1
These queries return just one row, or no rows if there's no next or previous
There's not much to be gained from a conditional ordering scheme when you only need a single value

Find latest value in a comparison of data between 2 tables

I Have 2 tables in my DB and I want to compare values of 2 select queries Ive made on each one
Table 1: click_log
Query table 1:
SELECT *
FROM click_log
Table 2: km_articles
Query table 2:
SELECT km_article_no
FROM km_articles
WHERE km_article_date <= "2017-10-31" AND km_article_status = "Published" AND km_article_view_count <= "5"
The columns I want to compare are table link_clicked for table 1 with km_article_no and I know I will find repeated matched, nevertheless from those repeated matches I want to find the latest one that I want to get from another column in table 1 called "when_clicked" that contains data information, not sure How can i put together those to queries and then narrow them down.
this is how the tables look like:
Table 1:
|link_clicked|when_clicked
KB00001 | 2017-08-02
KB00001 | 2017-12-02
KB00002 | 2017-08-02
KB00002 | 2017-09-02
KB00003 | 2017-09-02
KB00003 | 2017-09-02
Table 2:
km_article_no|km_article_ti|km_article_status|km_article_view_count|km_article_date
KB00001 |outlook IOS | Published | 5 | 2017-01-02
KB00002 |outlook CSS | Published | 4 | 2017-01-05
KB00003 |outlook ZTE | Retired | 3 | 2017-01-09
If I understand correctly, you want to show all km_articlesrows, each with the latest related click_log.when_clicked date. So aggregate your click_log per link_clicked and find the maximum when_clicked. Then join this to km_articles.
select kma.*, cl.last_clicked
from km_articles kma
join
(
select link_clicked, max(when_clicked) as last_clicked
from click_log
group by link_clicked
) cl on cl.link_clicked = kma.km_article_no
where kma.km_article_date <= date '2017-10-31'
and kma.km_article_status = 'Published'
and kma.km_article_view_count <= 5;
(If you also want to show km_articles rows that have no match in click_log, then change join to left join.)

query over 2 columns where value never appears in either column more than once

Looking for a query that takes the following table ProductList
id| column_1 | column_2 | Sum
================================
1 | Product-A | Product-B | 67
2 | Product-A | Product-C | 55
3 | Product-A | Product-D | 23
4 | Product-B | Product-C | 95
5 | Product-C | Product-D | 110
and returns the first record Product-A_Product-B and then skips all records that contain Product-A or Product-B in either column and returns Product-C_Product-D.
I only want to return the row if everything in the row is appearing for the first time.
Assuming that the products don't contain ,, you could use a comma-delimited session variable to store already selected products and check for every row if one of the columns is already contained in that variable:
select column_1, column_2
from (
select l.*,
case when find_in_set(l.column_1, #products) or find_in_set(l.column_2, #products)
then 1
else (#products := concat(#products, ',', l.column_1, ',', l.column_2)) = ''
end as skip
from ProductList l
cross join (select #products := '') init
order by l.id
) t
where skip = 0;
Demo: http://rextester.com/NDVBW87988
But you should know the risks:
ORDER BY in a subquery is not really valid and usually doesn't make sence. The engine may skip it or move it to the outer query.
If you read and write the same session variable in one statement, the execution order is not defined. So the query might not work for all (future) versions.

How can I get the difference between the individual maximum values of different days?

I am new in MySQL, I am trying to find:
The difference between a given day's maximum value occurred and the previous day's maximum value.
I was able to get the maximum values for dates via:
select max(`bundle_count`), `Production_date`
from `table`
group by `Production_date`
But I don't know how to use SQL to calculate the differences between maximums for two given dates.
am expecting output like this
Please help me.
Update 1: Here is a fiddle, http://sqlfiddle.com/#!2/818ad/2, that I used for testing.
Update 2: Here is a fiddle, http://sqlfiddle.com/#!2/3f78d/10 that I used for further refining/fixing, based on Sandy's comments.
Update 3: For some reason the case where there is no previous day was not being dealt with correctly. I thought it was. However, I've updated to make sure that works (a bit cumbersome--but it appears to be right. Last fiddle: http://sqlfiddle.com/#!2/3f78d/45
I think #Grijesh conceptually got you the main thing you needed via the self-join of the input data (so make sure you vote up his answer!). I've cleaned up his query a bit on syntax (building off of his query!):
SELECT
DATE(t1.`Production_date`) as theDate,
MAX( t1.`bundle_count` ) AS 'max(bundle_count)',
MAX( t1.`bundle_count` ) -
IF(
EXISTS
(
SELECT date(t2.production_date)
FROM input_example t2
WHERE t2.machine_no = 1 AND
date_sub(date(t1.production_date), interval 1 day) = date(t2.production_date)
),
(
SELECT MAX(t3.bundle_count)
FROM input_example t3
WHERE t3.machine_no = 1 AND
date_sub(date(t1.production_date), interval 1 day) = date(t3.production_date)
GROUP BY DATE(t3.production_date)
), 0
)
AS Total_Bundles_Used
FROM `input_example` t1
WHERE t1.machine_no = 1
GROUP BY DATE( t1.`production_date` )
Note 1: I think #Grijesh and I were cleaning up the query syntax issues at the same time. It's encouraging that we ended up with very similar versions after we were both doing cleanup. My version differs in using IFNULL() for when there is no preceding data. I also ended up with a DATE_SUB, and I made sure to reduce various dates to mere dates without time component, via DATE()
Note 2: I originally had not fully understood your source tables, so I thought I needed to implement a running count in the query. But upon better inspection, it's clear that your source data already has a running count, so I took that stuff back out.
I am not sure but you need something like this, Hope it will be helpful to you upto some extend:
Try this:
SELECT t1.`Production_date` ,
MAX(t1.`bundle_count`) - MAX(t2.`bundle_count`) ,
COUNT(t1.`bundle_count`)
FROM `table_name` AS t1
INNER JOIN `table_name` AS t2
ON ABS(DATEDIFF(t1.`Production_date` , t2.`Production_date`)) = 1
GROUP BY t1.`Production_date`
EDIT
I create a table name = 'table_name', as below,
mysql> SELECT * FROM `table_name`;
+---------------------+--------------+
| Production_date | bundle_count |
+---------------------+--------------+
| 2004-12-01 20:37:22 | 1 |
| 2004-12-01 20:37:22 | 2 |
| 2004-12-01 20:37:22 | 3 |
| 2004-12-02 20:37:22 | 2 |
| 2004-12-02 20:37:22 | 5 |
| 2004-12-02 20:37:22 | 7 |
| 2004-12-03 20:37:22 | 6 |
| 2004-12-03 20:37:22 | 7 |
| 2004-12-03 20:37:22 | 2 |
| 2004-12-04 20:37:22 | 1 |
| 2004-12-04 20:37:22 | 9 |
+---------------------+--------------+
11 rows in set (0.00 sec)
My query: to find difference in bundle_count between two consecutive dates:
SELECT t1.`Production_date` ,
MAX(t2.`bundle_count`) - MAX(t1.`bundle_count`) ,
COUNT(t1.`bundle_count`)
FROM `table_name` AS t1
INNER JOIN `table_name` AS t2
ON ABS(DATEDIFF(t1.`Production_date` , t2.`Production_date`)) = 1
GROUP BY t1.Production_date;
its output:
+---------------------+-------------------------------------------------+--------------------------+
| Production_date | MAX(t2.`bundle_count`) - MAX(t1.`bundle_count`) | COUNT(t1.`bundle_count`) |
+---------------------+-------------------------------------------------+--------------------------+
| 2004-12-01 20:37:22 | 4 | 9 |
| 2004-12-02 20:37:22 | 0 | 18 |
| 2004-12-03 20:37:22 | 2 | 15 |
| 2004-12-04 20:37:22 | -2 | 6 |
+---------------------+-------------------------------------------------+--------------------------+
4 rows in set (0.00 sec)
This is PostgreSQL syntax (sorry; it's what I'm familiar with) but should fundamentally work in either database. Note this doesn't exactly run in PostgreSQL either because group is not a valid table name (it's a reserved keyword). The approach is a self-join as others have mentioned but I've used a view to handle the max-by-day and the difference as separate steps.
create view max_by_day as
select
date_trunc('day', production_date) as production_date,
max(bundle_count) as bundle_count
from
group
group by
date_trunc('day', production_date);
select
today.production_date as production_date,
today.bundle_count,
today.bundle_count - coalesce(yesterday.bundle_count, 0)
from
max_by_day as today
left join max_by_day yesterday on (yesterday.production_date = today.production_date - '1 day'::interval)
order by
production_date;
PostgreSQL also has a construct called window functions which is useful for this and a bit easier to understand. Just had to stick in a bit of advocacy for a superior database. :-P
select
date_trunc('day', production_date),
max(bundle_count),
max(bundle_count) - lag(max(bundle_count), 1, 0)
over
(order by date_trunc('day', production_date))
from
group
group by
date_trunc('day', production_date);
These two approaches differ in how they handle missing days in the data - the first will treat it as a 0, the second will use the previous day which is present. There wasn't a case like this in your sample so I don't know if this is something you care about.

Query database in weekly interval

I have a database with a created_at column containing the datetime in Y-m-d H:i:s format.
The latest datetime entry is 2011-09-28 00:10:02.
I need the query to be relative to the latest datetime entry.
The first value in the query should be the latest datetime entry.
The second value in the query should be the entry closest to 7 days from the first value.
The third value should be the entry closest to 7 days from the second value.
REPEAT #3.
What I mean by "closest to 7 days from":
The following are dates, the interval I desire is a week, in seconds a week is 604800 seconds.
7 days from the first value is equal to 1316578202 (1317183002-604800)
the value closest to 1316578202 (7 days) is... 1316571974
unix timestamp | Y-m-d H:i:s
1317183002 | 2011-09-28 00:10:02 -> appear in query (first value)
1317101233 | 2011-09-27 01:27:13
1317009182 | 2011-09-25 23:53:02
1316916554 | 2011-09-24 22:09:14
1316836656 | 2011-09-23 23:57:36
1316745220 | 2011-09-22 22:33:40
1316659915 | 2011-09-21 22:51:55
1316571974 | 2011-09-20 22:26:14 -> closest to 7 days from 1317183002 (first value)
1316499187 | 2011-09-20 02:13:07
1316064243 | 2011-09-15 01:24:03
1315967707 | 2011-09-13 22:35:07 -> closest to 7 days from 1316571974 (second value)
1315881414 | 2011-09-12 22:36:54
1315794048 | 2011-09-11 22:20:48
1315715786 | 2011-09-11 00:36:26
1315622142 | 2011-09-09 22:35:42
I would really appreciate any help, I have not been able to do this via mysql and no online resources seem to deal with relative date manipulation such as this. I would like the query to be modular enough to be able to change the interval weekly, monthly, or yearly. Thanks in advance!
Answer #1 Reply:
SELECT
UNIX_TIMESTAMP(created_at)
AS unix_timestamp,
(
SELECT MIN(UNIX_TIMESTAMP(created_at))
FROM my_table
WHERE created_at >=
(
SELECT max(created_at) - 7
FROM my_table
)
)
AS `random_1`,
(
SELECT MIN(UNIX_TIMESTAMP(created_at))
FROM my_table
WHERE created_at >=
(
SELECT MAX(created_at) - 14
FROM my_table
)
)
AS `random_2`
FROM my_table
WHERE created_at =
(
SELECT MAX(created_at)
FROM my_table
)
Returns:
unix_timestamp | random_1 | random_2
1317183002 | 1317183002 | 1317183002
Answer #2 Reply:
RESULT SET:
This is the result set for a yearly interval:
id | created_at | period_index | period_timestamp
267 | 2010-09-27 22:57:05 | 0 | 1317183002
1 | 2009-12-10 15:08:00 | 1 | 1285554786
I desire this result:
id | created_at | period_index | period_timestamp
626 | 2011-09-28 00:10:02 | 0 | 0
267 | 2010-09-27 22:57:05 | 1 | 1317183002
I hope this makes more sense.
It's not exactly what you asked for, but the following example is pretty close....
Example 1:
select
floor(timestampdiff(SECOND, tbl.time, most_recent.time)/604800) as period_index,
unix_timestamp(max(tbl.time)) as period_timestamp
from
tbl
, (select max(time) as time from tbl) most_recent
group by period_index
gives results:
+--------------+------------------+
| period_index | period_timestamp |
+--------------+------------------+
| 0 | 1317183002 |
| 1 | 1316571974 |
| 2 | 1315967707 |
+--------------+------------------+
This breaks the dataset into groups based on "periods", where (in this example) each period is 7-days (604800 seconds) long. The period_timestamp that is returned for each period is the 'latest' (most recent) timestamp that falls within that period.
The period boundaries are all computed based on the most recent timestamp in the database, rather than computing each period's start and end time individually based on the timestamp of the period before it. The difference is subtle - your question requests the latter (iterative approach), but I'm hoping that the former (approach I've described here) will suffice for your needs, since SQL doesn't lend itself well to implementing iterative algorithms.
If you really do need to determine each period based on the timestamp in the previous period, then your best bet is going to be an iterative approach -- either using a programming language of your choice (like php), or by building a stored procedure that uses a cursor.
Edit #1
Here's the table structure for the above example.
CREATE TABLE `tbl` (
`id` int(10) unsigned NOT NULL auto_increment PRIMARY KEY,
`time` datetime NOT NULL
)
Edit #2
Ok, first: I've improved the original example query (see revised "Example 1" above). It still works the same way, and gives the same results, but it's cleaner, more efficient, and easier to understand.
Now... the query above is a group-by query, meaning it shows aggregate results for the "period" groups as I described above - not row-by-row results like a "normal" query. With a group-by query, you're limited to using aggregate columns only. Aggregate columns are those columns that are named in the group by clause, or that are computed by an aggregate function like MAX(time)). It is not possible to extract meaningful values for non-aggregate columns (like id) from within the projection of a group-by query.
Unfortunately, mysql doesn't generate an error when you try to do this. Instead, it just picks a value at random from within the grouped rows, and shows that value for the non-aggregate column in the grouped result. This is what's causing the odd behavior the OP reported when trying to use the code from Example #1.
Fortunately, this problem is fairly easy to solve. Just wrap another query around the group query, to select the row-by-row information you're interested in...
Example 2:
SELECT
entries.id,
entries.time,
periods.idx as period_index,
unix_timestamp(periods.time) as period_timestamp
FROM
tbl entries
JOIN
(select
floor(timestampdiff( SECOND, tbl.time, most_recent.time)/31536000) as idx,
max(tbl.time) as time
from
tbl
, (select max(time) as time from tbl) most_recent
group by idx
) periods
ON entries.time = periods.time
Result:
+-----+---------------------+--------------+------------------+
| id | time | period_index | period_timestamp |
+-----+---------------------+--------------+------------------+
| 598 | 2011-09-28 04:10:02 | 0 | 1317183002 |
| 996 | 2010-09-27 22:57:05 | 1 | 1285628225 |
+-----+---------------------+--------------+------------------+
Notes:
Example 2 uses a period length of 31536000 seconds (365-days). While Example 1 (above) uses a period of 604800 seconds (7-days). Other than that, the inner query in Example 2 is the same as the primary query shown in Example 1.
If a matching period_time belongs to more than one entry (i.e. two or more entries have the exact same time, and that time matches one of the selected period_time values), then the above query (Example 2) will include multiple rows for the given period timestamp (one for each match). Whatever code consumes this result set should be prepared to handle such an edge case.
It's also worth noting that these queries will perform much, much better if you define an index on your datetime column. For my example schema, that would look like this:
ALTER TABLE tbl ADD INDEX idx_time ( time )
If you're willing to go for the closest that is after the week is out then this'll work. You can extend it to work out the closest but it'll look so disgusting it's probably not worth it.
select unix_timestamp
, ( select min(unix_tstamp)
from my_table
where sql_tstamp >= ( select max(sql_tstamp) - 7
from my_table )
)
, ( select min(unix_tstamp)
from my_table
where sql_tstamp >= ( select max(sql_tstamp) - 14
from my_table )
)
from my_table
where sql_tstamp = ( select max(sql_tstamp)
from my_table )