SQL query to find out of order records

SQL query to find out of order records - mysql

I have a PHP program w/MySQL database which contains many records. Two columns of particular relevance are incidentnumber and date. These both move forward only. However, sometimes a user enters data which is out of sequence; eg:
Incident Date
1 Jan 1 2000
2 Jan 1 2010
3 Jan 1 2002
It appears that incident 2 was entered with the wrong date, it should be Jan 1 2001.
Is there any way to query for records where the date is out of sequence? Or do I have to iterate through all records tracking last date to find the error?
ADDED NOTE: The incidents are not sequential (they might go 1,3,6,123, etc). Nor are the dates sequential. And these are columns in the same table.

This command selects any records for which there exists in the same table a record with a lower Incident number but a higher Date.
SELECT * FROM TableName T1 WHERE EXISTS
(SELECT * FROM TableName T2
WHERE T2.Incident < T1.Incident AND T2.Date > T1.Date)
This slightly more complex command will find only records for which are out of order in "both directions", meaning they have an later dated record earlier in the file and an earlier dated record later in the file. This avoids the situation in which making a mistake in a very early record in the file will make all the subsequent records appear out of order. However, it will not catch a problem in the two records with the lowest or highest incident numbers.
SELECT * FROM TableName T1 WHERE EXISTS
(SELECT * FROM TableName T2
WHERE T2.Incident < T1.Incident AND T2.Date > T1.Date)
AND EXISTS
(SELECT * FROM TableName T2
WHERE T2.Incident > T1.Incident AND T2.Date < T1.Date)
Finally, as ruakh points out in the comments, the above query gives you ALL the out-of-order records. Although that is, technically, what you wanted it makes it difficult to find the "point of breakage" in the chain of dates. The following query will give you only the records where the chain gets messed up, does not require IncidentID to increase monotonically, and allows deletions of incidents.
SELECT * FROM TableName T1 WHERE
Date < (SELECT Date FROM TableName T2 WHERE T2.IncidentID =
(SELECT MAX(IncidentID) FROM TableName T3 WHERE T3.IncidentID < T1.IncidentID))
OR Date > (SELECT Date FROM TableName T2 WHERE T2.IncidentID =
(SELECT MAX(IncidentID) FROM TableName T3 WHERE T3.IncidentID > T1.IncidentID))
(Not tested, since I don't have a copy of MySQL handy).

select * from yourtable t1
inner join yourtable t2
on t1.incident=t2.incident-1
and t1.date>t2.date

This selects all of the ids where the date is greater than the next records date. That should tell you which ones are out of order.
SELECT Incident FROM table a
WHERE a.Date > (SELECT b.Date FROM table b WHERE b.Incident = (a.Incident + 1))

In case that the IncidentID column is always in a regular incremental sequence:
SELECT c.IncidentID AS cincID, p.IncidentID AS pincID,
c.Date AS cDate, p.Date AS pDate,
DATEDIFF(c.Date, p.Date)
FROM Incident c, Incident p
WHERE c.IncidentID = (p.IncidentID + 1)
AND datediff(c.Date, p.Date) < 1

Related

Select rows from tableA based on age calculation from tableB

table1 we have ID, DOB(date of birth, eg. 01/01/1980)
Table2 we have id and other columns
How to get all rows from table 2 if id is under the age of 20?
I currently have:
SELECT *
FROM table2
WHERE id IN (
SELECT id
FROM table1
WHERE TIMESTAMPDIFF(Year,DOB,curdate()) <= 20
)
Is my solution efficient?

You would be better off calculating a date 20 years ago and asking if the table data is after that date. This means one calculation is needed, not a calculation for every row in the table. Any time that you perform a calculation on row data it means an index cannot be used. This is catastrophe for performance if DOB is indexed
TIMESTAMPDIFF doesn't count the number of years between two dates, it give you the number of times the year rolls over 31 dec for two dates. This means asking for the difference between 31 dec and 1 jan will report as 1 year when in fact it is only one (or upto two) days (depending on the times)
SELECT id
FROM table1
where DOB > DATE_SUB(CURDATE(), INTERVAL 20 YEAR)
Personally I use join rather than IN because once you learn the pattern it is easy to extend it using LEFT joins to look for rows that don't exist or match the patterns, but in practical terms the query optimizer rewrites IN and JOIN to execute them the same anyway. Some dB perform poorly for IN, because they execute them differently to joins
SELECT *
FROM
table1 t1
INNER JOIN table2 t2
ON t1.id = t2.id
where t1.DOB > DATE_SUB(CURDATE(), INTERVAL 20 YEAR)
Mech is making the point about select * that it should be avoided in production code. That's a relevant point for the most part - always select only the columns you need (sometimes if a dB has indexed a table and you only need columns that are in the index, then using select * will be a performance hit because the dB has to use the index to look up which rows then lookup the rows. If you specify the columns you need it can decide whether it can answer the query purely from the index for a speed boost. The only time I might consider using select * is in a sub query where the optimizer will rewrite it anyway
Always alias your tables and use the aliases. This prevents your query breaking if later you add a column to either table that is the same name as a column in the other table. While adding things isn't usually a problem or cause bugs and crashes, if a query just "select name from a join b.." and only table a has a name column, it will start crashing if a name column is added to b. Specifying a.name would prevent this

For MySQL
SELECT table2.*
FROM table1
JOIN table2 ON table1.id = table2.id
WHERE table1.dob >= CURRENT_DATE - INTERVAL 20 YEAR

Historically, MySQL has implemented EXISTS more efficiently than IN. So, I would recommend:
SELECT t2.*
FROM table2 t2
WHERE EXISTS (SELECT 1
FROM table1 t1
WHERE t1.id = t2.id AND
TIMESTAMPDIFF(Year, t1.DOB, curdate()) <= 20
);
For performance, you want an index on table1(id, DOB).
You can also change the year comparison to:
t1.DOB <= curdate() - interval 20 year
That is presumably the logic you want and the index could take advantage of it.
I recommend this over an join because there is no risk of having duplicate rows in the result set. Your question does not specify that id is unique in table1, so duplicates are a risk. Even if there are no duplicates, this would also have the best performance under many circumstances.

MYSQL - limit between results for fast search in ranges

I have two tables in MYSQL where table2 contains ranges of serial numbers (unique) with 17 digits (varchar 17) and table1 contains serial values (same format as ranges)
ex:
table 1:
serial_id seial
1 12345678123456799
table 2:
range id date start end
1 2012-01-01 12345678123456789 12345678123456999
2 2012-01-01 12345678123457000 12345678123457099
3 2012-01-01 12345678123457100 12345678123457199
I want to find range ids that each serial belong to it.the simplest query that can be used is:
select *
from table1,table2
where table1.serial between table2.start and table2.end
but I want to optimize it to run faster with below facts :
the serials and ranges are unique and so each serial may belong to one and only one range. so it is not necessary to search other ranges when one range contains the serial.
first 11 digits of each range are the same. for example one ranges can be from 12345678120000000 to 12345678129999999.
serials and ranges are ordered by date and it is more likely to find ranges in early dates. serials are about 6000000 records and ranges are about 100000 records.
any idea for better query?

This is a bit challenging to speed up. Here is one method that I've used with IP address ranges:
select t1.*,
(select t2.range_id
from table2 t2
where t2.start <= t.serial
order by t2.start desc
limit 1
) as range_id
from table1 t1;
This can take advantage of an index on table2(start, range_id).
Note: this does not check the end of the range. For that, I would add another join . . . although this (unhappily) requires materializing a subquery:
select *
from (select t1.*,
(select t2.range_id
from table2 t2
where t2.start <= t.serial
order by t2.start desc
limit 1
) as range_id
from table1 t1
) t1 left join
table2 t2
on t1.range_id = t2.range_id and t2.end >= t.serial;
The additional join want an index on table2(range_id, end).

I think by a little change in data model, a big performance improvement will happen.
By adding rangeid column to table1 as foreign key.
table 1:
serial_id seial rangeid
1 12345678123456799 1
Then write following query:
select *
from table1 join table2 using(rangeid);
And if that change is impossible you can use like operator as below:
select *
from table1 join table2
on(table2.start like concat(left(table1.serial,12),'%'))
where table1.serial between table2.start and table2.end;
table2.start column must be indexed.
Edit:
And increase the number "12" to max possible number according the relation between serial field and start field.

How to calculate time difference between current and previous row in MySQL

I have mysql table t1 like this :
What i want to do is do calculations between all rows and save the value in new coloumn called diff
TICKETID| DATENEW | DIFF
16743 12:36:46 0
16744 12:51:25 15. minute
16745 12:57:25 6.5 minute
..........
.......
etc
i know there are similar questions ,but ive tried all of the solutions
posted here with no success,so how to solve this query ???

To get the time difference in minutes between the current and previous row, you can use timestampdiff on datenow and the previous time, which you can get via subquery:
select ticketid, datenew,
timestampdiff(minute,datenew,(select datenew from mytable t2
where t2.ticketid < t1.ticketid order by t2.ticketid desc limit 1)) as diff
from mytable t1
Update
Here's another way using a variable to store the previous datenew value that might be faster:
select ticketid, datenew, timestampdiff(minute,datenew,prevdatenew)
from (
select ticketid, datenew, #prevDateNew as prevdatenew,
#prevDateNew := datenew
from mytable order by ticketid
) t1

select
t1.*
,coalesce(timestampdiff(MINUTE,t2.dt,t1.dt),0) as tdiff
from t t1 left join t t2
on t1.id = t2.id+1
order by t1.id
As you are only looking for a difference between the current row and the next, you can join on the next row and calculate the time difference in minutes.
Note: This assumes there are no missing id's in the table. You might have to change the join condition if there were missing id's.
SQL Fiddle: http://www.sqlfiddle.com/#!9/4dcae/15

SELECT * FROM table while condition=true?

i want to select something from table while one condition is true,
SELECT * FROM (SELECT * FROM`table1` `t1` ORDER BY t1.date) `t2` WHILE t2.id!=5
when while condition comes to false it stop selecting next rows.
Please help me, I have already search a lot and many similars in stackoverflow but I can't get it.
please don't tell me about where , i want solution in sql not in php or anything other
OK the real problem is here
SELECT *,(SELECT SUM(t2.amount) FROM (select * from transaction as t1 order by t1.date) `t2`) as total_per_transition FROM transaction
here i want to calculate total balance on each transaction

First find the first date where the condition fails, so where id=5:
SELECT date
FROM table1
WHERE id = 5
ORDER BY date
LIMIT 1
Then make the above a derived table (we call it lim) and join it to the original table to get all rows with previous dates: t.date < lim.date
SELECT t.*
FROM table1 AS t
JOIN
( SELECT date
FROM table1
WHERE id = 5
ORDER BY date
LIMIT 1
) AS lim
ON t.date < COALESCE(lim.date, '9999-12-31') ;
The COALESCE() is for the case when there are no rows at all with id=5 - and in that case we want all rows from the table.

SQL Work out the average time difference between total rows

I've searched around SO and can't seem to find a question with an answer that works fine for me. I have a table with almost 2 million rows in, and each row has a MySQL Date formatted field.
I'd like to work out (in seconds) how often a row was inserted, so work out the average difference between the dates of all the rows with a SQL query.
Any ideas?
-- EDIT --
Here's what my table looks like
id, name, date (datetime), age, gender

If you want to know how often (on average) a row was inserted, I don't think you need to calculate all the differences. You only need to sum up the differences between adjacent rows (adjacent based on the timestamp) and divide the result by the number of the summands.
The formula
((T1-T0) + (T2-T1) + … + (TN-TN-1)) / N
can obviously be simplified to merely
(TN-T0) / N
So, the query would be something like this:
SELECT TIMESTAMPDIFF(SECOND, MIN(date), MAX(date)) / (COUNT(*) - 1)
FROM atable
Make sure the number of rows is more than 1, or you'll get the Division By Zero error. Still, if you like, you can prevent the error with a simple trick:
SELECT
IFNULL(TIMESTAMPDIFF(SECOND, MIN(date), MAX(date)) / NULLIF(COUNT(*) - 1, 0), 0)
FROM atable
Now you can safely run the query against a table with a single row.

Give this a shot:
select AVG(theDelay) from (
select TIMESTAMPDIFF(SECOND,a.date, b.date) as theDelay
from myTable a
join myTable b on b.date = (select MIN(x.date)
from myTable x
where x.date > a.date)
) p
The inner query joins each row with the next row (by date) and returns the number of seconds between them. That query is then encapsulated and is queried for the average number of seconds.
EDIT: If your ID column is auto-incrementing and they are in date order, you can speed it up a bit by joining to the next ID row rather than the MIN next date.
select AVG(theDelay) from (
select TIMESTAMPDIFF(SECOND,a.date, b.date) as theDelay
from myTable a
join myTable b on b.date = (select MIN(x.id)
from myTable x
where x.id > a.id)
) p
EDIT2: As brilliantly commented by Mikael Eriksson, you may be able to just do:
select (TIMESTAMPDIFF(SECOND,(MAX(date),MIN(date)) / COUNT(*)) from myTable
There's a lot you can do with this to eliminate off-peak hours or big spans without a new record, using the join syntax in my first example.

Try this:
select avg(diff) as AverageSecondsBetweenDates
from (
select TIMESTAMPDIFF(SECOND, t1.MyDate, min(t2.MyDate)) as diff
from MyTable t1
inner join MyTable t2 on t2.MyDate > t1.MyDate
group by t1.MyDate
) a

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008