How to avoid self-joins that result in symmetric results in MySQL? - mysql

I was looking for records that are within 2 weeks of each other, in the same table, as such:
SELECT stuff
FROM mytable AS a
JOIN mytable AS b
ON a.ID = b.ID
WHERE
(
a.Date = b.Date
OR
a.Date BETWEEN DATE_SUB(b.Date, INTERVAL 14 DAY) AND DATE_ADD(b.Date, INTERVAL 14 DAY)
OR
b.Date BETWEEN DATE_SUB(a.Date, INTERVAL 14 DAY) AND DATE_ADD(a.Date, INTERVAL 14 DAY)
)
;
It worked fine, but now I have a result with this type of structure:
| ID | a.Date | b.Date | a.Value | b.Value |
|----|------------|------------|---------|---------|
| 1 | 2016-01-01 | 2016-01-02 | foo | bar |
| 1 | 2016-01-02 | 2016-01-01 | bar | foo |
Either I did my join in a bad way which is leading to this duplicated structure, or the join is okay but I need some way to remove the chiral record. Can anyone advise me on how to proceed?

Add:
a.Value < b.Value
to the WHERE clause.
Or, better yet, if you have a primary key (and all tables should have a primary key):
a.pk < b.pk

Related

MYSQL Select row when first part of condition is valid (with a group by)

I really don't know how to find an answer for my question, so I'm asking you.
Here is the table I have :
+----+------------+------------+-------------+
| id | start_date | end_date | id_person |
+----+------------+------------+-------------+
| 1 | 2017-10-01 | 2017-12-01 | 1 |
| 2 | 2017-07-01 | 2017-09-01 | 1 |
| 3 | 2016-01-01 | 2016-02-01 | 1 |
| 4 | 2016-05-01 | 2016-06-01 | 2 |
| 5 | 2016-01-01 | 2016-02-01 | 2 |
+----+------------+------------+-------------+
And here is the query I tried to use :
SELECT * FROM table
WHERE ((start_date < NOW() AND end_date > NOW())
OR start_date > NOW()
OR end_date < NOW())
GROUP BY `id_person`
The result I was expecting was this one :
+----+------------+------------+-------------+
| id | start_date | end_date | id_person |
+----+------------+------------+-------------+
| 2 | 2017-07-01 | 2017-09-01 | 1 | // matches first condition
| 4 | 2016-05-01 | 2016-06-01 | 2 | // matches 3rd condition and has the most recent start_date
+----+------------+------------+-------------+
If you didn't get what I did wrong yet, I'm going to tell you.
Here, I was trying to show a single row per person but I wanted this row to match the first condition it finds and not the others, I don't want the row to just be ordered by start_date. It is like a custom order where I want the first row for each person.
The problem is that this query doesn't work since the GROUP BY statement doesn't apply conditions first. (even if it did, I'm not sure the condition would only select one row)
I really don't know how I can achieve that and I don't even know if it is possible, I hope someone can lead me towards any solution.
Thanks for reading this, I'll answer as fast as I can to give you more informations.
Here's one idea...
SELECT m.*
FROM my_table m
JOIN
( SELECT x.id_person
, MAX(x.start_date) start_date
FROM my_table x
JOIN
( SELECT id_person
, MIN(CASE WHEN NOW() BETWEEN start_date AND end_date THEN 'A' WHEN start_date > NOW() THEN 'B' WHEN end_date < NOW() THEN 'C' END) rule
FROM my_table
GROUP
BY id_person
) y
ON y.id_person = x.id_person
AND y.rule = CASE WHEN NOW() BETWEEN start_date AND end_date THEN 'A' WHEN start_date > NOW() THEN 'B' WHEN end_date < NOW() THEN 'C' END
GROUP
BY id_person
) n
ON n.id_person = m.id_person
AND n.start_date = m.start_date;
+----+------------+------------+-----------+
| id | start_date | end_date | id_person |
+----+------------+------------+-----------+
| 2 | 2017-07-01 | 2017-09-01 | 1 |
| 4 | 2016-05-01 | 2016-06-01 | 2 |
+----+------------+------------+-----------+
If are happy to write the rules directly in sql rather than as where conditions you can ask the database for what yuo want more directly.
This means taking a step back to see what the rules you want are. It looks like you want to prioritise the entries by closest date, showing first current, then future, then historical. It also looks like end_date >= start_date, which means you only need to look at end_date to find what you are looking for.
Mysql can answer the question abusing it's group by functionality (until recent versions).
SELECT t.* FROM
(
SELECT t.*
FROM table t
ORDER BY SIGN(t.end_date - NOW()),ABS(t.end_date-NOW())
)
GROUP BY t.id_person
A standard sql method that will also play better with indexes would be to look for end dates before and after today separately.
SELECT t.*
FROM table t
JOIN (
SELECT t.person_id
,COALESCE(first_not_ended.end_date,t.last_ended.end_date) AS end_date
FROM table t
LEFT JOIN (
SELECT t.*,MIN(end_date) AS end_date
FROM table t
WHERE t.end_date > NOW()
GROUP by t.person_id
) first_not_ended
ON t.person_id=first_not_ended.person_id
AND t.end_date=first_not_ended.end_ate
LEFT JOIN (
SELECT t.person_id,MAX(end_date) AS end_date
FROM table t
WHERE t.end_date < NOW()
GROUP by t.person_id
) last_ended
ON t.person_id=last_ended.person_id
AND t.end_date=last_ended.end_date
) closest
ON t.person_id=closest.person_id
AND t.end_date=closest.end_date

MySQL set OFFSET depending on subquery

I'm trying to delete all records older than one week while keeping at least one for each user.
Example:
| ID | user | date | other columns...
| 1 | 1234 | -2 days | ...
| 2 | 1234 | -3 days | ...
| 3 | 1234 | -8 days | ...
| 4 | 5678 | -9 days | ...
| 5 | 5678 | -10 days | ...
Should become
| ID | user | date | other columns...
| 1 | 1234 | -2 days | ...
| 2 | 1234 | -3 days | ...
| 4 | 5678 | -9 days | ... // Keeping the most recent record for this user
So far I've made this, but it uses CASE to set OFFSET, so it doesn't work:
DELETE FROM transactions WHERE ID < (
SELECT ID FROM (
SELECT ID FROM transactions t WHERE
DATE(date) <= DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
user = transactions.user
ORDER BY ID DESC
LIMIT 1 OFFSET CASE WHEN EXISTS (
SELECT ID FROM transactions x WHERE
DATE(date) > DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
user = transactions.user
) THEN 0 ELSE 1 END
)
)
So the question is: how to fix the code above?
P.S.: I'm relatively new to anything except most basic operations in SQL
By grouping the transactions by user, you can determine those that you wish to preserve:
SELECT user, MAX(date) date
FROM transactions
GROUP BY user
You can then make an outer join between these results and your original table using the multiple-table DELETE syntax in order to delete only the desired records:
DELETE transactions
FROM transactions NATURAL LEFT JOIN (
SELECT user, MAX(date) date
FROM transactions
GROUP BY user
) t
WHERE date < CURRENT_DATE - INTERVAL 7 DAY
AND t.date IS NULL
try
DELETE FROM transactions tt WHERE tt.id NOT IN (
SELECT ID FROM transactions t WHERE
DATE(t.date) <= DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
t.user = tt.transactions.user
ORDER BY t.ID DESC limit 1
)

MySQL Transform a date-range into single rows/count days per year?

I am looking for a solution to count days in a daterange per year. My table looks like this:
+----+-----------+------------+------------+
| id | source_id | start_date | end_date |
+----+-----------+------------+------------+
| 1 | 1 | 2015-11-01 | 2017-01-31 |
+----+-----------+------------+------------+
Now I want to count the days in between. Its easy with DATEDIFF() in complete, but how to do it per year?
I tried a kind of temp. transformation into single rows to perform count and group actions:
+----+-----------+------------+------------+
| id | source_id | start_date | end_date |
+----+-----------+------------+------------+
| 1 | 1 | 2015-11-01 | 2015-12-31 |
+----+-----------+------------+------------+
| 1 | 1 | 2016-01-01 | 2016-12-31 |
+----+-----------+------------+------------+
| 1 | 1 | 2017-01-01 | 2017-01-31 |
+----+-----------+------------+------------+
EDIT:
The desired output should like that:
+-----------+------+------+
| source_id | year | days |
+-----------+------+------+
| 1 | 2015 | 60 |
+-----------+------+------+
| 1 | 2016 | 365 |
+-----------+------+------+
| 1 | 2017 | 30 |
+-----------+------+------+
So it become possible to summarize all days grouped by source_id and year.
Is there an easy way to do it in MySQL?
Create another table that lists all the years:
CREATE TABLE years (
year_start DATE,
year_end DATE
);
INSERT INTO years VALUES
('2015-01-01', '2015-12-31'),
('2016-01-01', '2016-12-31'),
('2017-01-01', '2017-12-31');
Then you can join with this table
SELECT t.source_id, YEAR(y.year_start) AS year, DATEDIFF(LEAST(year_end, end_date), GREATEST(year_start, start_date)) AS day_count
FROM yourTable AS t
JOIN years AS y
ON y.year_start BETWEEN t.start_date AND t.end_date
OR y.year_end BETWEEN t.start_date AND t.end_date
DEMO
If you don't want to create a real table, you can use a subquery that creates it on the fly:
SELECT t.source_id, YEAR(y.year_start) AS year, DATEDIFF(LEAST(year_end, end_date), GREATEST(year_start, start_date)) AS day_count
FROM yourTable AS t
JOIN (SELECT CAST('2015-01-01' AS DATE) AS year_start, CAST('2015-12-31' AS DATE) AS year_end
UNION
SELECT CAST('2016-01-01' AS DATE) AS year_start, CAST('2016-12-31' AS DATE) AS year_end
UNION
SELECT CAST('2017-01-01' AS DATE) AS year_start, CAST('2017-12-31' AS DATE) AS year_end
) AS y
ON y.year_start BETWEEN t.start_date AND t.end_date
OR y.year_end BETWEEN t.start_date AND t.end_date
DEMO
I found some other snippet and I combined both. Its more a working hack than a solution, but it works good enough for my purpose.
SELECT r.source_id,
YEAR(y.year_start) AS year,
DATEDIFF(LEAST(year_end, end_date), GREATEST(year_start, start_date)) AS day_count,
r.start_date,
r.end_date
FROM ranges AS r
JOIN (
SELECT #i:= #i + 1 AS YEAR,
CAST(CONCAT(#i, '-01-01') AS DATE) AS year_start,
CAST(CONCAT(#i, '-12-31') AS DATE) AS year_end
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY,
(SELECT #i:= 1899) AS i
) AS y
ON r.start_date >= y.year_start AND r.start_date <= y.year_end
OR r.end_date >= y.year_start AND r.end_date <= y.year_end;
I think, the table INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY is just a workaround to do the iteration. Not nice, but maybe someone needs something like that.

find all rows that have two minutes difference in date

I have a table ACQUISITION, with 1 720 208 rows.
------------------------------------------------------
| id | date | value |
|--------------|-------------------------|-----------|
| 1820188 | 2011-01-22 17:48:56 | 1.287 |
| 1820187 | 2011-01-21 21:55:11 | 2.312 |
| 1820186 | 2011-01-21 21:54:00 | 2.313 |
| 1820185 | 2011-01-20 17:46:10 | 1.755 |
| 1820184 | 2011-01-20 17:45:05 | 1.785 |
| 1820183 | 2011-01-19 18:21:02 | 2.001 |
------------------------------------------------------
Following a problem I need to find every rows that have less than two minutes difference.
Ideally I should be able to find here:
| 1820187 | 2011-01-21 21:55:11 | 2.312 |
| 1820186 | 2011-01-21 21:54:00 | 2.313 |
| 1820185 | 2011-01-20 17:46:10 | 1.755 |
| 1820184 | 2011-01-20 17:45:05 | 1.785 |
I'm quite lost here, if you got any ideas.
Let us restate your question in a subtle fashion so we can make this query complete before the heat-death of the universe.
"I need to know the consecutive records in the table with timestamps closer together than two minutes."
We can tie the notion of "consecutive" to your id values.
Try this query and see if you get decent performance (http://sqlfiddle.com/#!9/28738/2/0)
SELECT a.date first_date, a.id first_id, a.value first_value,
b.id second_id, b.value second_value,
TIMESTAMPDIFF(SECOND, a.date, b.date) delta_t
FROM thetable AS a
JOIN thetable AS b ON b.id = a.id + 1
AND b.date <= a.date + INTERVAL 2 MINUTE
The self-join workload is brought to heel with ON b.id = a.id + 1. And, avoiding a function on one of the two date column values allows the query to exploit any index that's available on that column.
Creating a covering index on (id,date,value) will help performance of this query.
If the consecutive-row assumption doesn't work in this dataset, you can try this, to compare each row to the next ten rows. It will be slower. (http://sqlfiddle.com/#!9/28738/6/0)
SELECT a.date first_date, a.id first_id, a.value first_value,
b.id second_id, b.value second_value,
TIMESTAMPDIFF(SECOND, a.date, b.date) delta_t
FROM thetable AS a
JOIN thetable AS b ON b.id <= a.id + 10
AND b.id > a.id
AND b.date <= a.date + INTERVAL 2 MINUTE
If the id values are entirely worthless as a way of ordering your rows, you'll need this. And, it will be very slow. (http://sqlfiddle.com/#!9/28738/5/0)
SELECT a.date first_date, a.id first_id, a.value first_value,
b.id second_id, b.value second_value,
TIMESTAMPDIFF(SECOND, a.date, b.date) delta_t
FROM thetable AS a
JOIN thetable AS b ON b.date <= a.date + INTERVAL 2 MINUTE
AND b.date > a.date
AND b.id <> a.id
Do a SELF JOIN with the table and use TIMEDIFF() function like
SELECT t1.*
from ACQUISITION t1 JOIN ACQUISITION t2
ON TIMEDIFF(t1.`date`, t2.`date`) <= 2;

Number of increments through period in MySQL

I think this question is gonna be hard to solve.
I have a TABLE in my DDBB as this one:
+----+--------+-------+
| ID | MONTH | VALUE |
+----+--------+-------+
| 1 | 1-2000 | 20.00 |
| 1 | 2-2000 | 21.00 |
| 1 | 3-2000 | 7.00 |
| 1 | 4-2000 | 8.00 |
+----+--------+-------+
With the following definition:
ID INTEGER(7) ZEROFILL NOT NULL
MONTH VARCHAR(7) NOT NULL
VALUE DOUBLE(20,2)
What I'm trying to achieve is the way to retrieve the number of times, through a period, the field {VALUE} has increased from its previous values.
In the example above, if the period is from "1-2000" to "4-2000", {VALUE} has increased 2 times: [20.00->21.00, 7.00->8.00]
At the end, I will like to have the following output:
+----+------------+
| ID | NUM_OF_INC |
+----+------------+
| 1 | 2 |
+----+------------+
What I'm pointing as the main issue, is that {MONTH} is not a DATE type field (of course, it cannot be).
Is there any way to achieve this?
I'm afraid that the solution is to get all the values and then compare one by one from the engine that is executing the queries.
Due to your date format and MySQLs lack of CTEs to convert them a single time, the query gets pretty verbose; this searches the whole range but it's fairly easy to add a range check using the same pattern;
SELECT a.id, COUNT(*) NUM_OF_INC
FROM Table1 a
JOIN Table1 b
ON a.id = b.id
AND a.value < b.value
AND STR_TO_DATE(CONCAT(a.`MONTH`, '-1'), '%c-%Y-%d')
< STR_TO_DATE(CONCAT(b.`MONTH`, '-1'), '%c-%Y-%d')
LEFT JOIN Table1 c
ON a.id = c.id
AND STR_TO_DATE(CONCAT(a.`MONTH`, '-1'), '%c-%Y-%d')
< STR_TO_DATE(CONCAT(c.`MONTH`, '-1'), '%c-%Y-%d')
AND STR_TO_DATE(CONCAT(c.`MONTH`, '-1'), '%c-%Y-%d')
< STR_TO_DATE(CONCAT(b.`MONTH`, '-1'), '%c-%Y-%d')
WHERE c.id IS NULL
GROUP BY a.id;
An SQLfiddle to test with.
Sadly, this query will definitely not use any index you have on MONTH.
If it is an option consider changing the datatype of MONTH into something calculable. Then you can join the last month (Month - 1) and select on a difference > 0:
SELECT
t1.ID, count(*)
FROM
Entity t1
INNER JOIN Entity t2
ON t1.ID = t2.ID
AND t2.MONTH = t1.MONTH - 1
WHERE
t1.VALUE - t2.VALUE > 0
AND t1.MONTH BETWEEN :beginDate AND :endDate
GROUP BY t1.ID
If you can't change the data type. You have to change the t1.MONTH - 1 with some MySQL functions:
DATE_FORMAT(
SUBDATE(
STR_TO_DATE(CONCAT(t1.MONTH, "-1"), "%c-%Y-%d"),
INTERVAL 1 MONTH),
"%c-%Y")
as well as t1.MONTH BETWEEN :beginDate AND :endDate:
STR_TO_DATE(CONCAT(t1.MONTH, "-1"), "%c-%Y-%d")
BETWEEN :beginDate AND :endDate