Using Self-Join to find differences between rows

Using Self-Join to find differences between rows - mysql

I have tried finding a solution to this question, but everything I've found has either asked a slightly different question or hasn't had an adequate answer. I have a table with the following setup:
fullvna
+--------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+----------------+
| startdate | date | YES | | NULL | |
| starttime | time | YES | | NULL | |
| id | int(11) | NO | PRI | NULL | auto_increment |
+--------------+-------------+------+-----+---------+----------------+
I want to find the time difference between each pair of consecutive lines, so the starttime of id=1 minus the starttime of id=2 (the table is ordered in reverse chronological order). I based my query off of what I found here: http://www.mysqltutorial.org/mysql-tips/mysql-compare-calculate-difference-successive-rows/
create table difference as SELECT
one.id,
     one.starttime,
     two.starttime,
    (one.starttime - two.starttime) AS diff
FROM
     fullvna one
        INNER JOIN
    fullvna two ON two.id = one.id + 1;
I'm receiving the following printout, and am not sure what it means or what I'm doing wrong:
ERROR 1064 (42000): You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for the right
syntax to use near '  one.starttime,
    two.starttime,
    (one.starttime - two.starttime' at line 3

You have hidden characters that are displayed as spaces, but they're not and they're causing the error. Copy the query from my answer. And as Juan suggested, it is recommended to use the TIMEDIFF() function instead of subtracting them:
CREATE TABLE difference AS
SELECT one.id,
one.starttime AS starttime,
two.starttime AS endtime,
TIMEDIFF(one.starttime, two.starttime) AS diff
FROM fullvna one
INNER JOIN fullvna two ON two.id = one.id + 1;
EDIT As xQbert mentioned, you need to use different names for the starttime column, so I corrected the query above accordingly.

Don't use alias one as it's a keyword pick a different one
alias startime as two columns with same name in a create table will not work.
timediff (as others mentioned in comments)
.
CREATE TABLE difference as
SELECT a1.id
, a1.starttime as OneStartTime
, a2.starttime as TwoStartTime
, TIMEDIFF(a1.starttime, a2.starttime) AS diff
FROM fullvna a1
INNER JOIN fullvna a2
ON a2.id = a1.id + 1;

Related

MySQL - get a column showing COUNT() from an associated table on each row

I have a database in MySQL (5.5.60-MariaDB).
I'm doing a SELECT query to get rows from a table called revision_filters followed by various INNER JOIN's to get associated data. The query looks as follows and executes correctly:
SELECT RevisionFilters.id AS `RevisionFilters__id`,
RevisionFilters.date AS `RevisionFilters__date`,
RevisionFilters.comment AS `RevisionFilters__comment`,
filters.label AS `Filters__label`,
filters.anchor AS `Filters__anchor`,
groups.label AS `Groups__label`
FROM revision_filters RevisionFilters
INNER JOIN dev_hub_subdb.filters Filters
ON filters.id = ( RevisionFilters.filter_id )
INNER JOIN dev_hub_subdb.groups Groups
ON groups.id = ( filters.group_id )
INNER JOIN dev_hub_subdb.regulations Regulations
ON regulations.id = ( groups.regulation_id )
There is a table called revision_filters_substances. The structure of the table is as follows. In this instance revision_filter_id is a foreign key that relates to revision_filters.id.
mysql> describe revision_filters_substances;
+--------------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| revision_filter_id | mediumint(8) unsigned | NO | MUL | NULL | |
| substance_id | mediumint(8) unsigned | NO | MUL | NULL | |
+--------------------+-----------------------+------+-----+---------+----------------+
What I want to do is adapt my SELECT query so that on each row returned I can get a COUNT of the number of rows in revision_filters_substances that correspond to the rows in the SELECT query for revision_filters.
In some instances, it's possible that there are no rows in revision_filters_substances corresponding to a particular revision_filters.id and in this case I need the count to return 0.
I've read https://dba.stackexchange.com/questions/133384/counting-rows-from-a-subquery
But I can't see how to adapt this to my query.
It says on the linked article
The subquery should immediately follow the FROM keyword.
So I've tried doing this immediately following FROM revision_filters RevisionFilters in the query I have already:
, (SELECT COUNT(id) FROM revision_filters_substances WHERE id = revision_filters.id) AS count_substances
But this errors:
Unknown column 'revision_filters.id' in 'where clause'
Please can someone advise if this is possible? I don't see how to specify 0 if there are no corresponding rows either, so also need advice on how to achieve that.

You have aliased the revision_filters table to RevisionFilters. Use RevisionFilters.id instead in the where clause of the Correlated Subquery.
Also, to handle "no rows", current subquery will return NULL; you would have to use Coalesce(..) around it to return 0.
SELECT .... ,
COALESCE(SELECT COUNT(id)
FROM revision_filters_substances
WHERE id = RevisionFilters.id, 0) AS count_substances
.... /* your rest of the query here (FROM, WHERE clauses etc) */

Identifying faulty data in MySQL table

Apologies for asking a question that may have answers in some form or another on here, but I was unable to make any of those solutions work for me.
I have the following query:
SELECT `user_id`, `application_id`, `unallocated_date`, `check_in_date`, `check_out_date`
FROM `student_room`
WHERE `user_id` = 17225
ORDER BY `application_id` DESC
It produces the following result:
user_id | application_id | unallocated_date | check_in_date | check_out_date
--------+----------------+---------------------+---------------------+---------------
17225 | 30782 | 2018-02-04 14:32:29 | NULL | NULL
17225 | 30782 | 2018-02-04 14:32:49 | NULL | NULL
17225 | 30782 | 2018-02-04 14:32:51 | NULL | NULL
17225 | 30782 | NULL | NULL | NULL
17225 | 30782 | NULL | 2018-02-04 14:41:54 | NULL
The fourth row in the result is a fault in my data; it should look similar to the first three rows - these rows occur happens when a student was allocated a new room, and the previous one needs to be unallocated. In this case, the unallocation of row 4 did not actually happen due to either a historical bug in the system I am working on, or user error, but most likely the former.
How can I identify ALL such rows? My attempts with GROUP BY and HAVING did not work, as I checked where all three date fields were NULL, but it did not pick up this particular user - so I was doing something wrong. My original query was:
SELECT COUNT(user_id) AS `count`, user_id FROM `student_room`
WHERE `unallocated_date` IS NULL
AND `check_in_date` IS NULL
AND `check_out_date` IS NULL
GROUP BY `user_id`
HAVING COUNT(user_id) > 1
ORDER BY `user_id` ASC
I tried various INNER JOIN attempts too, but I did not of them right...
The rows that I am interested in will have at least one entry with all three dates NULL, but also one where there is a check_in_date that is NOT NULL, as per this example. If I only had the first four rows, then the data could be correct, but the fifth row's presence makes the fourth row a faulty record - it should've been given an "unallocated_date" value at the time of the allocation of the room in the fifth row, which for some reason did not happen.

Together with a friend of mine, we made the following query that works. I have now learned that you can use "EXISTS" in MySQL. I saw it used when dropping or creating tables, but never like this. It ended up that this query solves the problem:
SELECT cte.user_id, COUNT(*)
FROM (
SELECT sro.user_id
FROM student_room AS sro
WHERE sro.unallocated_date IS NULL
AND sro.check_in_date IS NULL
AND sro.check_out_date IS NULL
AND EXISTS (
SELECT *
FROM student_room AS sri
WHERE sri.user_id = sro.user_id
AND sri.student_room_id > sro.student_room_id
)
ORDER BY user_id DESC
)
AS cte
GROUP BY cte.user_ID
ORDER BY COUNT(*) DESC
This query is the result of more than an hour of tinkering with records that was erroneous, so apologies if this appears to not match the question's requirements 100%, but it does solve the problem for me.

SQLYOG - SQL - Merging two columns into 1 column

I have two columns displaying the same type of information but not necessarily the same data. Although some of the data overlaps each column may/may not contain information that will also include NULL values. Like so:
Company ID | Company Name | Company ID | Company Name
-----------+--------------+------------+-------------
1 | A | 1 | A
2 | B | NULL | NULL
NULL | NULL | 3 | C
I am trying to merge columns 1 and 2 to columns 3 and 4, respectively, so that I have two columns that look like this:
Company ID | Company Name
-----------+-------------
1 | A
2 | B
3 | C
Looking at similar stackoverflow questions, I have doubt this may be done easily. Is this possible? Please, let me know!
Anything helps.

As you don't seem to be around to answer questions for clarification right now, let's go ahead.
It seems, you do actually have the four columns in question in a single table - but than, there should be no duplicate column names. Once they are unique, the following should work:
UPDATE SomeTable
SET company_ID_1 = IFNULL(company_ID_1, company_ID_2)
, company_Name_1 = IFNULL(company_Name_1, company_Name_2)
WHERE
company_ID_1 IS NULL
OR
company_Name_1 IS NULL
;
If the presented is actually the output of a join, you could replace the same by:
SELECT
IFNULL(SomeTable1.company_ID, SomeTable2.company_ID) company_ID
, IFNULL(SomeTable1.company_Name, SomeTable2.company_Name) company_Name
FROM SomeTable1
LEFT JOIN SomeTable2
ON SomeTable1.company_ID = SomeTable2.company_ID
UNION ALL
SELECT
IFNULL(SomeTable1.company_ID, SomeTable2.company_ID) company_ID
, IFNULL(SomeTable1.company_Name, SomeTable2.company_Name) company_Name
FROM SomeTable1
RIGHT JOIN SomeTable2
ON SomeTable1.company_ID = SomeTable2.company_ID
WHERE SomeTable1.company_ID IS NULL
ORDER BY company_ID
;
See it in action: SQL Fiddle
Please comment, if and as this requires adjustment / further detail.

Using mySQL variables in subqueries

I am trying to use user defined variables to limit the results of a subquery, in order to get the difference between two timestamps in some analytics data. The code I am working with is as follows:
SELECT #visitID := `s`.`visit_id` AS `visit_id`, # Get the visit ID and assign to a variable
#dt := `s`.`dt` AS `visit`, # Get the timestamp of the visit and assign to a variable
`tmp`.`dt` AS `next-visit` # Get the 'next visit' timestamp which should be returned by the subquery
FROM `wp_slim_stats` AS `s` # From the main table...
LEFT JOIN (SELECT `s`.`visit_id`, # Start the subquery
MIN(`s`.`dt`) as `dt` # Get the lowest timestamp returned
FROM `wp_slim_stats` AS `s` # ...from the same table
WHERE `s`.`visit_id` = #visitID # Visit ID should be the same as the row the main query is working on
AND `s`.`dt` > #dt # Timestamp should be HIGHER than the row we are working on
LIMIT 0, 1) as `tmp` ON `tmp`.`visit_id` = `s`.`visit_id` # Join on visit_id
WHERE `s`.`resource` LIKE 'foo%' # Limit all results to the page we are looking for
The intention is to get an individual pageview and record its visit ID and the timestamp. The subquery should then return the next record from the database with the same visit ID. I can then subtract one from the other to get the seconds spent on a page.
The problem I am running into is that the subquery seems to be re-evaluating for each row returned, and not populating the next-visit column until the end. This means that all the rows returned are matched against the subquery's results for the final row, thus all next-visit columns are null apart from the final row.
The results I am looking for would be something like:
_________________________________________________
| visit_id | visit | next-visit|
|--------------|---------------|----------------|
| 1 | 123456789 | 123457890 |
|--------------|---------------|----------------|
| 4 | 234567890 | 234567891 |
|--------------|---------------|----------------|
| 6 | 345678901 | 345678902 |
|--------------|---------------|----------------|
| 8 | 456789012 | 456789013 |
|______________|_______________|________________|
But I am getting
_________________________________________________
| visit_id | visit | next-visit|
|--------------|---------------|----------------|
| 1 | 123456789 | NULL |
|--------------|---------------|----------------|
| 4 | 234567890 | NULL |
|--------------|---------------|----------------|
| 6 | 345678901 | NULL |
|--------------|---------------|----------------|
| 8 | 456789012 | 456789013 |
|______________|_______________|________________|
I am still fairly new to using variables in mySQL, particularly when assigning them dynamically. As I mentioned, I think I am messing up the order of operations somewhere, which is causing the subquery to re-populate each row at the end.
Ideally I need to be able to do this in pure mySQL due to restrictions that from the client, so no PHP unfortunately. Is it possible to do what I am trying to do?
Thank you!

You don't need variables here at all.
SELECT `s`.`visit_id` AS `visit_id`,
`s`.`dt` AS `visit`,
(SELECT MIN(dt) FROM `wp_slim_stats` ws WHERE ws.visit_id = s.visit_id AND ws.dt > s.dt)
FROM `wp_slim_stats` AS `s`
WHERE `s`.`resource` LIKE 'foo%'
And to answer why your solution doesn't work, have a look at the order of operations in a sql query:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause

Here's the query you would need to run.
select visits.visitid as vId, temp.time as tTime, visits.time as vTime
from visits inner join (select min(id) as firstId, visitid, time from
visits v1 group by visitid)temp on visits.visitid = temp.visitid where
id > temp.firstid group by visits.visitid;
See this SQL fiddle

Get a result by comparing two tables with an identical column

mysql> select * from on_connected;
+----+-----------+-------------+---------------------------+---------------------+
| id | extension | destination | call_id | created_at |
+----+-----------+-------------+---------------------------+---------------------+
| 11 | 1111111 | 01155555551 | 521243ad953e-965inwuz1gku | 2013-08-19 17:11:53 |
+----+-----------+-------------+---------------------------+---------------------+
mysql> select * from on_disconnected;
+----+-----------+-------------+---------------------------+---------------------+
| id | extension | destination | call_id | created_at |
+----+-----------+-------------+---------------------------+---------------------+
| 1 | 1111111 | 01155555551 | 521243ad953e-965inwuz1gku | 2013-08-19 17:11:57 |
+----+-----------+-------------+---------------------------+---------------------+
1 row in set (0.00 sec)
There is a time difference of 4sec between the two. I would like to calculate the difference
using a query of some type. I'm aware of TIMEFIFF() and joins but lack the skills to form the query
at this point.
Here's my attempt thus far:
SELECT TIMEDIFF(to_seconds(od.created_at), to_seconds(oc.created_at))
FROM on_connected oc
JOIN on_disconnected od
ON oc.call_id=od.call_id
WHERE call_id='521243ad953e-965inwuz1gku';
Mysql reports:
ERROR 1052 (23000): Column 'call_id' in where clause is ambiguous

In your where clause change
WHERE call_id='521243ad953e-965inwuz1gku';
to
WHERE oc.call_id='521243ad953e-965inwuz1gku';
or
WHERE od.call_id='521243ad953e-965inwuz1gku';
doesn't matter.

If you want the differences for all times:
SELECT TIME_TO_SEC(TIMEDIFF(od.created_at, oc.created_at))
FROM on_connected oc
JOIN on_disconnected od ON od.call_id = oc.call_id
Demo
For a single call_id, you need to alias the column name in the filter:
WHERE oc.call_id = '521243ad953e-965inwuz1gku'
Demo

try oc.call_id in the where clause.
although the values will have matched at this point, the sql parser still needs to know which one you're referring to.

When you JOIN two tables using a column whose name is identical in both tables, you could use the USING clause instead of ON:
SELECT TIMEDIFF(to_seconds(od.created_at), to_seconds(oc.created_at))
FROM on_connected oc
JOIN on_disconnected od
USING(call_id) -- eq. to `od.call_id = oc.call_id`
WHERE call_id='521243ad953e-965inwuz1gku'; -- no need to specify the table name here
Non only this will save a few key stokes, but by doing so, you will be able to reference that column without specifying the table name.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Using Self-Join to find differences between rows - mysql

Related

MySQL - get a column showing COUNT() from an associated table on each row

Identifying faulty data in MySQL table

SQLYOG - SQL - Merging two columns into 1 column

Using mySQL variables in subqueries

Get a result by comparing two tables with an identical column

Categories

Resources