Get a result by comparing two tables with an identical column - mysql

mysql> select * from on_connected;
+----+-----------+-------------+---------------------------+---------------------+
| id | extension | destination | call_id | created_at |
+----+-----------+-------------+---------------------------+---------------------+
| 11 | 1111111 | 01155555551 | 521243ad953e-965inwuz1gku | 2013-08-19 17:11:53 |
+----+-----------+-------------+---------------------------+---------------------+
mysql> select * from on_disconnected;
+----+-----------+-------------+---------------------------+---------------------+
| id | extension | destination | call_id | created_at |
+----+-----------+-------------+---------------------------+---------------------+
| 1 | 1111111 | 01155555551 | 521243ad953e-965inwuz1gku | 2013-08-19 17:11:57 |
+----+-----------+-------------+---------------------------+---------------------+
1 row in set (0.00 sec)
There is a time difference of 4sec between the two. I would like to calculate the difference
using a query of some type. I'm aware of TIMEFIFF() and joins but lack the skills to form the query
at this point.
Here's my attempt thus far:
SELECT TIMEDIFF(to_seconds(od.created_at), to_seconds(oc.created_at))
FROM on_connected oc
JOIN on_disconnected od
ON oc.call_id=od.call_id
WHERE call_id='521243ad953e-965inwuz1gku';
Mysql reports:
ERROR 1052 (23000): Column 'call_id' in where clause is ambiguous

In your where clause change
WHERE call_id='521243ad953e-965inwuz1gku';
to
WHERE oc.call_id='521243ad953e-965inwuz1gku';
or
WHERE od.call_id='521243ad953e-965inwuz1gku';
doesn't matter.

If you want the differences for all times:
SELECT TIME_TO_SEC(TIMEDIFF(od.created_at, oc.created_at))
FROM on_connected oc
JOIN on_disconnected od ON od.call_id = oc.call_id
Demo
For a single call_id, you need to alias the column name in the filter:
WHERE oc.call_id = '521243ad953e-965inwuz1gku'
Demo

try oc.call_id in the where clause.
although the values will have matched at this point, the sql parser still needs to know which one you're referring to.

When you JOIN two tables using a column whose name is identical in both tables, you could use the USING clause instead of ON:
SELECT TIMEDIFF(to_seconds(od.created_at), to_seconds(oc.created_at))
FROM on_connected oc
JOIN on_disconnected od
USING(call_id) -- eq. to `od.call_id = oc.call_id`
WHERE call_id='521243ad953e-965inwuz1gku'; -- no need to specify the table name here
Non only this will save a few key stokes, but by doing so, you will be able to reference that column without specifying the table name.

Related

Inconsistency with MySQL - USING vs ON [duplicate]

In a MySQL JOIN, what is the difference between ON and USING()? As far as I can tell, USING() is just more convenient syntax, whereas ON allows a little more flexibility when the column names are not identical. However, that difference is so minor, you'd think they'd just do away with USING().
Is there more to this than meets the eye? If yes, which should I use in a given situation?
It is mostly syntactic sugar, but a couple differences are noteworthy:
ON is the more general of the two. One can join tables ON a column, a set of columns and even a condition. For example:
SELECT * FROM world.City JOIN world.Country ON (City.CountryCode = Country.Code) WHERE ...
USING is useful when both tables share a column of the exact same name on which they join. In this case, one may say:
SELECT ... FROM film JOIN film_actor USING (film_id) WHERE ...
An additional nice treat is that one does not need to fully qualify the joining columns:
SELECT film.title, film_id -- film_id is not prefixed
FROM film
JOIN film_actor USING (film_id)
WHERE ...
To illustrate, to do the above with ON, we would have to write:
SELECT film.title, film.film_id -- film.film_id is required here
FROM film
JOIN film_actor ON (film.film_id = film_actor.film_id)
WHERE ...
Notice the film.film_id qualification in the SELECT clause. It would be invalid to just say film_id since that would make for an ambiguity:
ERROR 1052 (23000): Column 'film_id' in field list is ambiguous
As for select *, the joining column appears in the result set twice with ON while it appears only once with USING:
mysql> create table t(i int);insert t select 1;create table t2 select*from t;
Query OK, 0 rows affected (0.11 sec)
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
Query OK, 1 row affected (0.19 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> select*from t join t2 on t.i=t2.i;
+------+------+
| i | i |
+------+------+
| 1 | 1 |
+------+------+
1 row in set (0.00 sec)
mysql> select*from t join t2 using(i);
+------+
| i |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
mysql>
Thought I would chip in here with when I have found ON to be more useful than USING. It is when OUTER joins are introduced into queries.
ON benefits from allowing the results set of the table that a query is OUTER joining onto to be restricted while maintaining the OUTER join. Attempting to restrict the results set through specifying a WHERE clause will, effectively, change the OUTER join into an INNER join.
Granted this may be a relative corner case. Worth putting out there though.....
For example:
CREATE TABLE country (
countryId int(10) unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
country varchar(50) not null,
UNIQUE KEY countryUIdx1 (country)
) ENGINE=InnoDB;
insert into country(country) values ("France");
insert into country(country) values ("China");
insert into country(country) values ("USA");
insert into country(country) values ("Italy");
insert into country(country) values ("UK");
insert into country(country) values ("Monaco");
CREATE TABLE city (
cityId int(10) unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
countryId int(10) unsigned not null,
city varchar(50) not null,
hasAirport boolean not null default true,
UNIQUE KEY cityUIdx1 (countryId,city),
CONSTRAINT city_country_fk1 FOREIGN KEY (countryId) REFERENCES country (countryId)
) ENGINE=InnoDB;
insert into city (countryId,city,hasAirport) values (1,"Paris",true);
insert into city (countryId,city,hasAirport) values (2,"Bejing",true);
insert into city (countryId,city,hasAirport) values (3,"New York",true);
insert into city (countryId,city,hasAirport) values (4,"Napoli",true);
insert into city (countryId,city,hasAirport) values (5,"Manchester",true);
insert into city (countryId,city,hasAirport) values (5,"Birmingham",false);
insert into city (countryId,city,hasAirport) values (3,"Cincinatti",false);
insert into city (countryId,city,hasAirport) values (6,"Monaco",false);
-- Gah. Left outer join is now effectively an inner join
-- because of the where predicate
select *
from country left join city using (countryId)
where hasAirport
;
-- Hooray! I can see Monaco again thanks to
-- moving my predicate into the ON
select *
from country co left join city ci on (co.countryId=ci.countryId and ci.hasAirport)
;
Wikipedia has the following information about USING:
The USING construct is more than mere syntactic sugar, however, since
the result set differs from the result set of the version with the
explicit predicate. Specifically, any columns mentioned in the USING
list will appear only once, with an unqualified name, rather than once
for each table in the join. In the case above, there will be a single
DepartmentID column and no employee.DepartmentID or
department.DepartmentID.
Tables that it was talking about:
The Postgres documentation also defines them pretty well:
The ON clause is the most general kind of join condition: it takes a
Boolean value expression of the same kind as is used in a WHERE
clause. A pair of rows from T1 and T2 match if the ON expression
evaluates to true.
The USING clause is a shorthand that allows you to take advantage of
the specific situation where both sides of the join use the same name
for the joining column(s). It takes a comma-separated list of the
shared column names and forms a join condition that includes an
equality comparison for each one. For example, joining T1 and T2 with
USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b =
T2.b.
Furthermore, the output of JOIN USING suppresses redundant columns:
there is no need to print both of the matched columns, since they must
have equal values. While JOIN ON produces all columns from T1 followed
by all columns from T2, JOIN USING produces one output column for each
of the listed column pairs (in the listed order), followed by any
remaining columns from T1, followed by any remaining columns from T2.
Database tables
To demonstrate how the USING and ON clauses work, let's assume we have the following post and post_comment database tables, which form a one-to-many table relationship via the post_id Foreign Key column in the post_comment table referencing the post_id Primary Key column in the post table:
The parent post table has 3 rows:
| post_id | title |
|---------|-----------|
| 1 | Java |
| 2 | Hibernate |
| 3 | JPA |
and the post_comment child table has the 3 records:
| post_comment_id | review | post_id |
|-----------------|-----------|---------|
| 1 | Good | 1 |
| 2 | Excellent | 1 |
| 3 | Awesome | 2 |
The JOIN ON clause using a custom projection
Traditionally, when writing an INNER JOIN or LEFT JOIN query, we happen to use the ON clause to define the join condition.
For example, to get the comments along with their associated post title and identifier, we can use the following SQL projection query:
SELECT
post.post_id,
title,
review
FROM post
INNER JOIN post_comment ON post.post_id = post_comment.post_id
ORDER BY post.post_id, post_comment_id
And, we get back the following result set:
| post_id | title | review |
|---------|-----------|-----------|
| 1 | Java | Good |
| 1 | Java | Excellent |
| 2 | Hibernate | Awesome |
The JOIN USING clause using a custom projection
When the Foreign Key column and the column it references have the same name, we can use the USING clause, like in the following example:
SELECT
post_id,
title,
review
FROM post
INNER JOIN post_comment USING(post_id)
ORDER BY post_id, post_comment_id
And, the result set for this particular query is identical to the previous SQL query that used the ON clause:
| post_id | title | review |
|---------|-----------|-----------|
| 1 | Java | Good |
| 1 | Java | Excellent |
| 2 | Hibernate | Awesome |
The USING clause works for Oracle, PostgreSQL, MySQL, and MariaDB. SQL Server doesn't support the USING clause, so you need to use the ON clause instead.
The USING clause can be used with INNER, LEFT, RIGHT, and FULL JOIN statements.
SQL JOIN ON clause with SELECT *
Now, if we change the previous ON clause query to select all columns using SELECT *:
SELECT *
FROM post
INNER JOIN post_comment ON post.post_id = post_comment.post_id
ORDER BY post.post_id, post_comment_id
We are going to get the following result set:
| post_id | title | post_comment_id | review | post_id |
|---------|-----------|-----------------|-----------|---------|
| 1 | Java | 1 | Good | 1 |
| 1 | Java | 2 | Excellent | 1 |
| 2 | Hibernate | 3 | Awesome | 2 |
As you can see, the post_id is duplicated because both the post and post_comment tables contain a post_id column.
SQL JOIN USING clause with SELECT *
On the other hand, if we run a SELECT * query that features the USING clause for the JOIN condition:
SELECT *
FROM post
INNER JOIN post_comment USING(post_id)
ORDER BY post_id, post_comment_id
We will get the following result set:
| post_id | title | post_comment_id | review |
|---------|-----------|-----------------|-----------|
| 1 | Java | 1 | Good |
| 1 | Java | 2 | Excellent |
| 2 | Hibernate | 3 | Awesome |
You can see that this time, the post_id column is deduplicated, so there is a single post_id column being included in the result set.
Conclusion
If the database schema is designed so that Foreign Key column names match the columns they reference, and the JOIN conditions only check if the Foreign Key column value is equal to the value of its mirroring column in the other table, then you can employ the USING clause.
Otherwise, if the Foreign Key column name differs from the referencing column or you want to include a more complex join condition, then you should use the ON clause instead.
For those experimenting with this in phpMyAdmin, just a word:
phpMyAdmin appears to have a few problems with USING. For the record this is phpMyAdmin run on Linux Mint, version: "4.5.4.1deb2ubuntu2", Database server: "10.2.14-MariaDB-10.2.14+maria~xenial - mariadb.org binary distribution".
I have run SELECT commands using JOIN and USING in both phpMyAdmin and in Terminal (command line), and the ones in phpMyAdmin produce some baffling responses:
1) a LIMIT clause at the end appears to be ignored.
2) the supposed number of rows as reported at the top of the page with the results is sometimes wrong: for example 4 are returned, but at the top it says "Showing rows 0 - 24 (2503 total, Query took 0.0018 seconds.)"
Logging on to mysql normally and running the same queries does not produce these errors. Nor do these errors occur when running the same query in phpMyAdmin using JOIN ... ON .... Presumably a phpMyAdmin bug.
Short answer:
USING: when clause is ambiguous
ON: when clause has different comparison params

Using Self-Join to find differences between rows

I have tried finding a solution to this question, but everything I've found has either asked a slightly different question or hasn't had an adequate answer. I have a table with the following setup:
fullvna
+--------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+----------------+
| startdate | date | YES | | NULL | |
| starttime | time | YES | | NULL | |
| id | int(11) | NO | PRI | NULL | auto_increment |
+--------------+-------------+------+-----+---------+----------------+
I want to find the time difference between each pair of consecutive lines, so the starttime of id=1 minus the starttime of id=2 (the table is ordered in reverse chronological order). I based my query off of what I found here: http://www.mysqltutorial.org/mysql-tips/mysql-compare-calculate-difference-successive-rows/
create table difference as SELECT
one.id,
     one.starttime,
     two.starttime,
    (one.starttime - two.starttime) AS diff
FROM
     fullvna one
        INNER JOIN
    fullvna two ON two.id = one.id + 1;
I'm receiving the following printout, and am not sure what it means or what I'm doing wrong:
ERROR 1064 (42000): You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for the right
syntax to use near '  one.starttime,
    two.starttime,
    (one.starttime - two.starttime' at line 3
You have hidden characters that are displayed as spaces, but they're not and they're causing the error. Copy the query from my answer. And as Juan suggested, it is recommended to use the TIMEDIFF() function instead of subtracting them:
CREATE TABLE difference AS
SELECT one.id,
one.starttime AS starttime,
two.starttime AS endtime,
TIMEDIFF(one.starttime, two.starttime) AS diff
FROM fullvna one
INNER JOIN fullvna two ON two.id = one.id + 1;
EDIT As xQbert mentioned, you need to use different names for the starttime column, so I corrected the query above accordingly.
Don't use alias one as it's a keyword pick a different one
alias startime as two columns with same name in a create table will not work.
timediff (as others mentioned in comments)
.
CREATE TABLE difference as
SELECT a1.id
, a1.starttime as OneStartTime
, a2.starttime as TwoStartTime
, TIMEDIFF(a1.starttime, a2.starttime) AS diff
FROM fullvna a1
INNER JOIN fullvna a2
ON a2.id = a1.id + 1;

Using mySQL variables in subqueries

I am trying to use user defined variables to limit the results of a subquery, in order to get the difference between two timestamps in some analytics data. The code I am working with is as follows:
SELECT #visitID := `s`.`visit_id` AS `visit_id`, # Get the visit ID and assign to a variable
#dt := `s`.`dt` AS `visit`, # Get the timestamp of the visit and assign to a variable
`tmp`.`dt` AS `next-visit` # Get the 'next visit' timestamp which should be returned by the subquery
FROM `wp_slim_stats` AS `s` # From the main table...
LEFT JOIN (SELECT `s`.`visit_id`, # Start the subquery
MIN(`s`.`dt`) as `dt` # Get the lowest timestamp returned
FROM `wp_slim_stats` AS `s` # ...from the same table
WHERE `s`.`visit_id` = #visitID # Visit ID should be the same as the row the main query is working on
AND `s`.`dt` > #dt # Timestamp should be HIGHER than the row we are working on
LIMIT 0, 1) as `tmp` ON `tmp`.`visit_id` = `s`.`visit_id` # Join on visit_id
WHERE `s`.`resource` LIKE 'foo%' # Limit all results to the page we are looking for
The intention is to get an individual pageview and record its visit ID and the timestamp. The subquery should then return the next record from the database with the same visit ID. I can then subtract one from the other to get the seconds spent on a page.
The problem I am running into is that the subquery seems to be re-evaluating for each row returned, and not populating the next-visit column until the end. This means that all the rows returned are matched against the subquery's results for the final row, thus all next-visit columns are null apart from the final row.
The results I am looking for would be something like:
_________________________________________________
| visit_id | visit | next-visit|
|--------------|---------------|----------------|
| 1 | 123456789 | 123457890 |
|--------------|---------------|----------------|
| 4 | 234567890 | 234567891 |
|--------------|---------------|----------------|
| 6 | 345678901 | 345678902 |
|--------------|---------------|----------------|
| 8 | 456789012 | 456789013 |
|______________|_______________|________________|
But I am getting
_________________________________________________
| visit_id | visit | next-visit|
|--------------|---------------|----------------|
| 1 | 123456789 | NULL |
|--------------|---------------|----------------|
| 4 | 234567890 | NULL |
|--------------|---------------|----------------|
| 6 | 345678901 | NULL |
|--------------|---------------|----------------|
| 8 | 456789012 | 456789013 |
|______________|_______________|________________|
I am still fairly new to using variables in mySQL, particularly when assigning them dynamically. As I mentioned, I think I am messing up the order of operations somewhere, which is causing the subquery to re-populate each row at the end.
Ideally I need to be able to do this in pure mySQL due to restrictions that from the client, so no PHP unfortunately. Is it possible to do what I am trying to do?
Thank you!
You don't need variables here at all.
SELECT `s`.`visit_id` AS `visit_id`,
`s`.`dt` AS `visit`,
(SELECT MIN(dt) FROM `wp_slim_stats` ws WHERE ws.visit_id = s.visit_id AND ws.dt > s.dt)
FROM `wp_slim_stats` AS `s`
WHERE `s`.`resource` LIKE 'foo%'
And to answer why your solution doesn't work, have a look at the order of operations in a sql query:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
Here's the query you would need to run.
select visits.visitid as vId, temp.time as tTime, visits.time as vTime
from visits inner join (select min(id) as firstId, visitid, time from
visits v1 group by visitid)temp on visits.visitid = temp.visitid where
id > temp.firstid group by visits.visitid;
See this SQL fiddle

Delete all rows except first N from a table having single column

I need a single query. Delete all rows from the table except the top N rows. The table has only one column. Like,
|friends_name|
==============
| Arunji |
| Roshit |
| Misbahu |
| etc... |
This column may contain repeated names as well.
Contains repeated names
Only one column.
If you can order your records by friends_name, and if there are no duplicates, you could use this:
DELETE FROM names
WHERE
friends_name NOT IN (
SELECT * FROM (
SELECT friends_name
FROM names
ORDER BY friends_name
LIMIT 10) s
)
Please see fiddle here.
Or you can use this:
DELETE FROM names ORDER BY friends_name DESC
LIMIT total_records-10
where total_records is (SELECT COUNT(*) FROM names), but you have to do this by code, you can't put a count in the LIMIT clause of your query.
If you don't have an id field, i suppose you use an alphabetic order.
MYSQL
DELETE FROM friends
WHERE friends_name
NOT IN (
SELECT * FROM (
SELECT friends_name
FROM friends
ORDER BY friends_name ASC
LIMIT 10) r
)
You delete all rows exept the 10 firsts (alphabetic order)
I just wanted to follow up on this relatively old question because the existing answers don't capture the requirement and/or are incorrect. The question states the names can be repeated, but only the top N must be preserved. Other answers will delete incorrect rows and/or incorrect number of them.
For example, if we have this table:
|friends_name|
==============
| Arunji |
| Roshit |
| Misbahu |
| Misbahu |
| Roshit |
| Misbahu |
| Rohan |
And we want to delete all but top 3 rows (N = 3), the expected result would be:
|friends_name|
==============
| Arunji |
| Roshit |
| Misbahu |
The DELETE statement from the currently selected answer will result in:
|friends_name|
==============
| Arunji |
| Misbahu |
| Misbahu |
| Misbahu |
See this sqlfiddle. The reason for this is that it first sorts names alphabetically, then takes top 3, then deletes all that don't equal that. But since they are sorted by name they may not be the top 3 we want, and there's no guarantee that we'll end up with only 3.
In the absence of unique indexes and other fields to determine what "top N" means, we go by the order returned by the database. We could be tempted to do something like this (substitute 99999 with however high number):
DELETE FROM names LIMIT 99999 OFFSET 3
But according to MySQL docs, while the DELETE supports the LIMIT clause, it does not support OFFSET. So, doing this in a single query, as requested, does not seem to be possible; we must perform the steps manually.
Solution 1 - temporary table to hold top 3
CREATE TEMPORARY TABLE temp_names LIKE names;
INSERT INTO temp_names SELECT * FROM names LIMIT 3;
DELETE FROM names;
INSERT INTO names SELECT * FROM temp_names;
Here's the sqlfiddle for reference.
Solution 2 - new table with rename
CREATE TABLE new_names LIKE names;
INSERT INTO new_names SELECT * FROM names LIMIT 3;
RENAME TABLE names TO old_names, new_names TO names;
DROP TABLE old_names;
Here's the sqlfiddle for this one.
In either case, we end up with top 3 rows in our original table:
|friends_name|
==============
| Arunji |
| Roshit |
| Misbahu |

mysql query logic

I have an sql query which shows the delivery details of a vehicle. ( it uses greatest to fetch max value from a range of colums for each vehicle stop)
SELECT deliveryid AS deliverynumber, loadid1 AS loadnumberdate,
haulieraccepted AS haulier,
greatest(drop1arrivedatetime, drop2arrivedatetime, drop3arrivedatetime,
drop4arrivedatetime, drop5arrivedatetime) AS planneddate,
date(greatest(ActualDrop1Arrive, ActualDrop2Arrive, ActualDrop3Arrive,
ActualDrop4Arrive, ActualDrop5Arrive )) AS actualenddate,
mitigation
FROM deliverydetails
WHERE deliveryid=44
the output is
deliverynumber | loadnumberdate | haulier | planneddate | actualenddate | mitigation
44 | 484487 | stols transport | 2011-11-26 15:50:00 | 2011-11-26 | customerdelay
How can I add to the mysql query to compare columns 'planneddate' and 'actualenddate'? if the dates are the same then set the query field to 'ontime' else if actualenddate>planneddate then 'deliverylate'. So ideally I want the following output:
deliverynumber | loadnumberdate | haulier | planneddate | actualenddate | mitigation | Status
44 | 484487 | stols transport | 2011-11-26 15:50:00 | 2011-11-26 | customerdelay | ontime.
Thanks for the assistance.
You can use a CASE statement or IF function. Perhaps something like:
SELECT ...., IF(actualenddate>planneddate,'deliverylate','ontime') AS status FROM ....
use mysql if condition and date conversion function to check and display according to....
You can wrap your original query as a subquery. This will rename the columns. Then, use a case ... then clause to add the column.
Assuming your original query works just fine, it would look like this:
select
*,
case when (... some comparison on 'planneddate' and 'actualenddate' ...)
then <true output>
else <false output> end
from
(<your original query>) as myalias;
The trick is that the columns from the subquery are renamed, allowing you to use their new names (planneddate and actualenddate).