SQL unusual query, find max deltas between consecutive elements - mysql

I've met an interesting problem.
I have a table of workers' ids' and days of their visits. Here is dump:
CREATE TABLE `pp` (
`id` int(11) DEFAULT '1',
`day` int(11) DEFAULT '1',
`key` varchar(45) NOT NULL,
PRIMARY KEY (`key`)
)
INSERT INTO `pp` VALUES
(1,1,'1'),
(1,20,'2'),
(1,50,'3'),
(1,70,'4'),
(2,1,'5'),
(2,120,'6'),
(2,90,'7'),
(1,90,'8'),
(2,100,'9');
So I need to find workers which have missed more than 50 days at least once. For example, if worker visited at 5th, 95th, 96th, 97th day, if we look at deltas, we can see that the largest delta (90) is more than 50, so we should include this worker into result.
The problem is how do I efficiently find deltas between visits of different workers?
I can't even imagine how to work with mysql tables as consequent arrays of data.
So we need to separate day values for different workers, sort them and then find max deltas for each. But how? Is there any way to, for example, enumerate sorted arrays in sql?

Try this query -
edited:
SELECT t.id, t.day1, t.day2 FROM (
SELECT p1.id, p1.day day1, p2.day day2 FROM pp p1
JOIN (SELECT * FROM pp ORDER BY day) p2
ON p1.id = p2.id AND p1.day < p2.day
GROUP BY p1.id, p1.day
) t
GROUP BY t.id
HAVING MAX(day2 - day1) >= 50

This is a way I used to cope with such problems:
SELECT distinct t3.id FROM
(SELECT t1.id, t1.day, MIN(t2.day) nextday
FROM pp t1
JOIN pp t2 ON t1.id=t2.id AND t1.day<t2.day
GROUP BY t1.id, t1.day
HAVING nextday-t1.day >50) t3
(EDIT this version is slightly better)
This finds all the IDs for which there is a delta > 50. (I assumed that this is what you're after)
To see it working: SQL fiddle
To find the max deltas:
SELECT t3.id, MAX(t3.nextday-t3.day) FROM
(SELECT t1.id, t1.day, MIN(t2.day) nextday
FROM pp t1
JOIN pp t2 ON t1.id=t2.id AND t1.day<t2.day
GROUP BY t1.id, t1.day) t3
GROUP BY t3.id
The logic behind is to find the "next" item, whatever that means. As this is an ordered attribute, the next item can be defined as having the lowest value among those rows that have the value larger than the one examined... Then you join the "next" values to the original values, conpute the delta, and return only those that are applicable. If you need the other columns too, just do a JOIN on the outer select to the original table.
I'm not sure if this is the best solution regarding perfirmance, but I only wrote queries for one-off reports, with which I could afford the query to run for a while.
There is one semantic error though, that can arise: if somebody was present on the 1st, 2nd and 3rd days, but never after, this does not find the absence. To overcome this, you could add a special row with UNIONing a select to the tables specifying tomorrow's day count for all IDs, but that would make this query disgusting enough not to try writing it down...

This could also be a solution:
select distinct pp.id
from pp
where pp.day-(select max(day)
from pp pp2
where
pp2.id=pp.id and
pp2.day<pp.day)>=50
(since days are not ordered by key, i'm not searching for the previous key but for the max day before current day)

Related

How to get the maximum and not duplicated data in my table in mysql?

I table data is like this:
id car_id create_time remark
6c3befd0201a4691 4539196f55b54523986535539ed7beef 2017-07-1 16:42:49 firstcar
769d85b323bb4a1c 4539196f55b54523986535539ed7beef 2017-07-18 16:42:49 secondcar
984660c4189e499 575d90e340d14cf1bef4349b7bb5de9a 2017-07-3 16:42:49 firstjeep
I want to get the newest data. It means if there have two same car_id, I want to get only one according the newest time. How to write?
I try to write this, but I find it may wrong. If the other record may have the same create_time? How to fix that?
SELECT * FROM t_decorate_car
WHERE create_time IN
(SELECT tmptime FROM
(SELECT MAX(create_time),tmptime,car_id
FROM decorate
GROUP BY car_id
) tmp
)
One canonical way to handle this is to join your table to a subquery which finds the latest record for each car_id. This subquery serves as a filter to remove the older records you don't want to see.
SELECT t1.*
FROM t_decorate_car t1
INNER JOIN
(
SELECT car_id, MAX(create_time) AS max_create_time
FROM t_decorate_car
GROUP BY car_id
) t2
ON t1.car_id = t2.car_id AND
t1.create_time = t2.max_create_time
By the way, if you want to continue down your current road, you can also solve this using a correlated subquery:
SELECT t1.*
FROM t_decorate_car t1
WHERE t1.create_time = (SELECT MAX(t2.create_time) FROM t_decorate_car t2
WHERE t2.car_id = t1.car_id)
You were on the right track but you never connected the subquery to the main query using the right WHERE clause.

SQL query, check if cells with the same value in column A have the same value in column B

I have a table with a lot of columns and rows such as the below example :
I'm trying to find Orders that are partially complete, such that there is at least one Item Line record for the order that does have a Goods Issue Date value and one Item Line record for the order that does not have a Goods Issue Date value. I can easily get orders with no goods issue date at all, but I need to know the orders that have some item lines with a date and some without.
Looking at the sample data above, I should only see results for Order #1, because Orders 2,3, and 5 are all fully complete and Order 4 has not started yet.
SELECT
*
FROM
theTable t1
WHERE
t1.`Goods Issue Date` IS NULL
AND EXISTS ( SELECT
*
FROM
theTable t2
WHERE
t2.`Order` = t1.`Order`
AND t2.`Goods Issue Date` IS NOT NULL );
DEMO
You can also use a simple IN if Order is non-nullable
SELECT *
FROM theTable
WHERE `Goods Issue Date` IS NULL
AND `Order` IN (
SELECT `Order`
FROM theTable
WHERE `Goods Issue Date` IS NOT NULL
)
You first need to identify the bad orders. Here I've done this with a CTE (common table expression) for readability but it can be accomplished with a simple subquery in the join statement.
Once you have that, simply join the results back on the parent table.
WITH BadOrders AS (
SELECT *
FROM table
WHERE [Goods Issue Date] IS NULL)
SELECT table.*
FROM table
JOIN BadOrders ON table.Order = BadOrders.Order

Create a VIEW where a record in t1 is not present in t2 ? Confirmation on Union/Left Join/Inner Join?

I am trying to make a view of records in t1 where the source id from t1 is not in t2.
Like... "what records are not present in the other table?"
Do I need to include t2 in the FROM clause? Thanks
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1
WHERE t1.fee_source_id NOT IN (
SELECT t1.fee_source_id
FROM t1 INNER JOIN t2 ON t1.fee_source_id = t2.fee_source
)
ORDER BY t1.aif_id DESC
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using IN:
SELECT fee_source_id, company_name, document
FROM t1
WHERE fee_source_id NOT IN (SELECT fee_source FROM t2)
ORDER BY aif_id DESC
Using EXISTS:
SELECT fee_source_id, company_name, document
FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2 WHERE t2.fee_source = t1.fee_source_id LIMIT 1
)
ORDER BY aif_id DESC
Using JOIN:
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1 LEFT JOIN t2 ON t2.fee_source = t1.fee_source_id
WHERE t2.fee_source IS NULL
ORDER BY t1.aif_id DESC
According to #Quassnoi's analysis:
Summary
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
It will take each value from t_left and look it up in the index on t_right.value. In case of an index hit or an index miss, the corresponding predicate will immediately return FALSE or TRUE, respectively, and the decision to return the row from t_left or not will be made immediately without examining other rows in t_right.
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
However, I'm not entirely sure how this analysis reconciles with the MySQL manual section on Optimizing Subqueries with EXISTS Strategy which (to my reading) suggests that the second approach above should be more efficient than the first.
Another option below (similar to anti-join)... Great answer above though. Thanks!
SELECT D1.deptno, D1.dname
FROM dept D1
MINUS
SELECT D2.deptno, D2.dname
FROM dept D2, emp E2
WHERE D2.deptno = E2.deptno
ORDER BY 1;

How do I write this kind of query (returning the latest avaiable data for each row)

I have a table defined like this:
CREATE TABLE mytable (id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(id),
user_id INT REFERENCES user(id) ON UPDATE CASCASE ON DELETE RESTRICT,
amount REAL NOT NULL CHECK (amount > 0),
record_date DATE NOT NULL
);
CREATE UNIQUE INDEX idxu_mybl_key ON mytable (user_id, amount, record_date);
I want to write a query that will have two columns:
user_id
amount
There should be only ONE entry in the returned result set for a given user. Furthermore, the amount figure returned should be the last recoreded amount for the user (i.e. MAX(record_date).
The complication arises because weights are recorded on different dates for different users, so there is no single LAST record_date for all users.
How may I write (preferably an ANSI SQL) query to return the columns mentioned previously, but ensuring that its only the amount for the last recorded amount for the user that is returned?
As an aside, it is probably a good idea to return the 'record_date' column as well in the query, so that it is eas(ier) to verify that the query is working as required.
I am using MySQL as my backend db, but ideally the query should be db agnostic (i.e. ANSI SQL) if possible.
First you need the last record_date for each user:
select user_id, max(record_date) as last_record_date
from mytable
group by user_id
Now, you can join previous query with mytable itself to get amount for this record_date:
select
t1.user_id, last_record_date, amount
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
A problem appears becuase a user can have several rows for same last_record_date (with different amounts). Then you should get one of them, sample (getting the max of the different amounts):
select
t1.user_id, t1.record_date as last_record_date, max(t1.amount)
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
group by t1.user_id, t1.record_date
I do not now about MySQL but in general SQL you need a sub-query for that. You must join the query that calculates the greatest record_date with the original one that calculates the corresponding amount. Roughly like this:
SELECT B.*
FROM
(select user_id, max(record_date) max_date from mytable group by user_id) A
join
mytable B
on A.user_id = B.user_id and A.max_date = B.record_date
SELECT datatable.* FROM
mytable AS datatable
INNER JOIN (
SELECT user_id,max(record_date) AS max_record_date FROM mytable GROUP BS user_id
) AS selectortable ON
selectortable.user_id=datatable.user_id
AND
selectortable.max_record_date=datatable.record_date
in some SQLs you might need
SELECT MAX(user_id), ...
in the selectortable view instead of simply SELECT user_id,...
The definition of maximum: there is no larger(or: "more recent") value than this one. This naturally leads to a NOT EXISTS query, which should be available in any DBMS.
SELECT user_id, amount
FROM mytable mt
WHERE mt.user_id = $user
AND NOT EXISTS ( SELECT *
FROM mytable nx
WHERE nx.user_id = mt.user_id
AND nx.record_date > mt.record_date
)
;
BTW: your table definition allows more than one record to exist for a given {id,date}, but with different amounts. This query will return them all.

MySQL: get all values of one column for which there is no row matching a condition on a second column

Let's say I have a MySQL join that gives me a bunch of rows with the following columns: PERSON_ID, ENTRY_ID, DATE
Each person can have multiple entries, but an entry can't belong to multiple people. What I want to retrieve is a list of all people who have not posted an entry in the last month - (all PERSON_IDs for which there is no row with a DATE within 30 days of NOW()). Can I get that result in a single query?
My current join is basically:
SELECT P.ID,P.NAME,P.EMAIL,E.ID,E.DATE FROM PERSON P, ENTRY E WHERE P.ID = E.PERSON_ID.
That gives the list of all entries, joined to additional information about the person who posted each one. The entries table is a big one, so scalability is somewhat important.
You probably want some kind of left null join... however, I'm slightly puzzled. Is there another table with just a single row for each possible person, or do you mean you want a list of everyone with at least 1 row in your example table, but no entry if the last month?
Assuming the latter (although tbh the former seems more likely):
select distinct(t1.person_id) from mytable as t1
left join mytable as t2 on t1.person_id = t2.person_id and
t2.`date` > date_sub(now(), interval 30 day)
where t2.person_id is null
;
This is basically saying "give me a list of distinct person ids where you can't join to that person id in the last 30 days"
(untested btw)
... as another comment, you might be tempted to do something like:
select distinct(t1.person_id) from mytable as t1 where t1.person_id not in
(select person_id from mytable where `date` > date_sub(now(), interval 30 day)
);
That's fine as long as your dataset is small and will stay small, but will scale very badly on a large number of rows...
I would recommend outer-joining the entries table to the persons table, and group by the PERSON_ID.
This will cause all the values in the date column to be NULL whenever the are no matching rows, but will still display the PERSON_ID.
After that all is left is to check if the date column is NULL.
if you include your table structure i will be able to add a MySQL statement which can explain a lot better
Assuming your query goes something like
Select tablea.person_id, entry_id, date from tablea
left join
(select * from tableb tableb.date between subdate(now(),30) and adddate(now(),30) as temptable on tablea.person_id=temptable.person_id where temptable.date is null