Get earliest date as Start date and latest date as end date - mysql

I Have a requirement where I need to get earliest date as start date and If latest date is present then I need to have it as end date, if latest date is blanks which means the person is still active then I need to have it as blanks.
I used Min and Max on date fields but My latest date field is not capturing as Blanks if date is absent.

If you want to get the earliest start_date, by ID. And also bring with whatever is in the End_date field - No matter if it is NULL, or has an date. Then you can first get group by ID(which is not unique in your example given), then use MIN() on start_date. Then you fetch which row these values belong to, and thereby get the End_date. This works, but if you've got several start dates with the same ID, that complicates things - and in that case we need some more example data with a bit mor explanation of how it is supposed to work. But, here goes:
Fiddle: https://www.db-fiddle.com/f/o2NyDpAc76TLYdmGFGHqag/3
CREATE TABLE my_table (
ID int,
Start_Date date,
End_date date null
);
INSERT INTO my_table (ID,Start_Date, End_date)
VALUES
(1,'2021-01-01', '2022-04-05'),
(1,'2022-01-01', '2022-04-02'),
(2,'2022-07-01', '2022-05-07'),
(2,'2022-01-01', null);
SELECT a.*
FROM my_table a
join (SELECT
ID,
MIN(my_table.Start_date) as 'Start_date'
FROM my_table
GROUP BY ID) jn
on a.ID=jn.ID and a.Start_date=jn.Start_date
Source table:
ID
Start_Date
End_date
1
'2021-01-01'
'2022-04-05'
1
'2022-01-01'
2022-04-02
2
'2022-07-01'
'2022-05-07'
2
'2022-01-01'
NULL
Results table:
ID
Start_Date
End_date
1
'2021-01-01'
'2022-04-05'
2
'2022-01-01'
NULL

This might work:
SELECT ID, MIN(start_date) Start_Date,
NULLIF(MAX(COALESCE(end_date,'29991231')), '29991231') End_Date
FROM MyTable
GROUP BY ID
See it work here:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=5febc25e9c79840fe6aa2e55d77cf5d0
At least it will seem to give the right results based on the sample data available. However, this would still show a null if a record with an earlier start date has a null end date, and a record with a later start date does have an end date. It's likely this should never happen in real data, but then real data tends to be messy even when it shouldn't be.
To really do this properly, you need to find the whole row with the latest start date and then look at the end date value from that row. Fortunately, we have a great way to count rows: the row_number() windowing function:
SELECT ID, Start_Date, End_Date
FROM (
SELECT ID, Start_Date, End_Date,
row_number() over (PARTITION BY ID ORDER BY Start_Date DESC) rn
FROM MyTable
) t0
WHERE rn=1
But this is only part of the solution. This should now always have the right End_Date, but will usually have the wrong Start_Date. We can update it to fix that error like this:
SELECT ID, (SELECT MIN(Start_Date) FROM MyTable t WHERE t.ID=t0.ID) Start_Date, End_Date
FROM (
SELECT ID, Start_Date, End_Date,
row_number() over (PARTITION BY ID ORDER BY Start_Date DESC) rn
FROM MyTable
) t0
WHERE rn=1
And now we will always get the right result.
See it work here:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=4b7d4cba4849eee9ba3bf978cebfc3bf
Finally, all this assumes you have a reasonable schema using null and DateTime values, and not an unreasonable schema using varchar and empty strings. If the latter really is your situation the schema design really is BROKEN and you should fix it.
This also assumes at least MySql 8.0. If you're using something older than that, condolences. 5.7 and earlier are rooted in basic design from 2006, and don't really qualify as a modern database platform.

Related

How to select either one row or the other

I have a table with columns VAT, start and end date.
I have two rows. The standard entry has 0000-00-00 as the start and end date and the other row has the start_date 2020-06-01 and the end_date 2020-12-31
I want VAT of the second row to be selected if today's date is between the start and end date, otherwise the standard VAT with 0000-00-00 should be selected
This is my table:
I tried
SELECT *
FROM taxes
WHERE (CASE WHEN start_date < "2020-06-06"
AND end_date > "2020-06-06" THEN 1
ELSE 0
END) = 1
But i don't know how to formulate the else case or whether it can work at all like this
You can use order by and limit for this:
select t.*
from taxes t
where start_date = '0000-00-00' or
'2020-06-06' between start_date and end_date
order by start_date desc
limit 1;
The idea is that the first condition gets the "default" value. The second condition gets the matching condition. These two rows are then sorted, so the matching condition will be first -- if there is one.
There might be ways o doing it with your suggested "0000-00-00' dates for start and end points, but in my view you run a much cleaner ship if you address the time spans individually, i. e. spell out the date ranges for before and after the "exception period", like:
INSERT INTO vat (startdt,enddt,fullrate,reducedrate)
VALUES ('2000-01-01','2020-06-30',.19,.07), -- before
('2020-07-01','2020-12-31',.16,.05), -- exception period
('2021-01-01','2500-12-31',.19,.07); -- after
select * from vat where now() between startdt and enddt;
This way you document in a very clear way which rates were applicable when. And the query itself becomes trivial, see above and check out my demo here: https://rextester.com/YLYUU53617
SELECT *
FROM taxes
WHERE tax_id=IF(start_date < "2020-06-06" AND end_date > "2020-06-06", 1, 0)
you can find the records for current date, then combine this set with the source table filtered by '0000-00-00' excluding country codes from this set
with
current_taxes as (
select *
from taxes
where current_date between start_date and end_date
)
select *
from current_taxes
union all
select *
from taxes
left join current_taxes
using (country_code)
where taxes.start_date='0000-00-00'
and current_taxes.country_code is null
;

SQL Performance on selecting first/last row for each user on bigger data table

I have read through quite a few posts with greatest-n-per-group but still don't seem to find a good solution in terms of performance. I'm running 10.1.43-MariaDB.
I'm trying to get the change in data values in given time frame and so I need to get the earliest and latest row from this period. The largest number of rows in a time frame that needs to be calculated right now is around 700k and it's only going to be growing. For now I have just resulted into doing two queries, one for the latest and one for the earliest date, but even this has slow performance on currently. The table looks like this:
user_id data date
4567 109 28/06/2019 11:04:45
4252 309 18/06/2019 11:04:45
4567 77 18/02/2019 11:04:45
7893 1123 22/06/2019 11:04:45
4252 303 11/06/2019 11:04:45
4252 317 19/06/2019 11:04:45
The date and user_id columns are indexed. Without ordering the rows aren't in any particular order in the database if that makes a difference.
The furthest I have gotten with this issue is query like this for year period currently (700k datapoints):
SELECT user_id,
MIN(date) as date, data
FROM datapoint_table
WHERE date >= '2019-01-14'
GROUP BY user_id
This gives me the right date and user_id in around very fast in around ~0.05s. But like the common issue with the greatest-n-per-group is, the rest of the row (data in this case) is not from the same row with date. I have read about other similar questions and tried with subquery like this:
SELECT a.user_id, a.date, a.data
FROM datapoint_table a
INNER JOIN (
SELECT datapoint_table.user_id,
MIN(date) as date, data
FROM datapoint_table
WHERE date >= '2019-01-01'
GROUP BY user_id
) b ON a.user_id = b.user_id AND a.date = b.date
This query takes around 15s to complete and gets the correct data value. The 15s tho is just way too long and I must be doing something wrong when the first query is so fast. I also tried doing (MAX)-(MIN) for the data with group by for user_id but it also had slow performance.
What would be more efficient way of getting the same data value as the date or even the difference in latest and earliest data for each user?
Assuming you are using a fairly recent version of either MariaDB or MySQL, then ROW_NUMBER would probably be the most efficient way to find the earliest record for each user:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date) rn
FROM datapoint_table
WHERE date > '2019-01-14'
)
SELECT user_id, data, date
FROM cte
WHERE rn = 1;
To the above you could also consider adding the following index:
CREATE INDEX ON datapoint_table (user_id, date);
You could also try the following variant index with the columns reversed:
CREATE INDEX ON datapoint_table (date, user_id);
It is not clear which version of the index would perform the best, which would depend on your data and the execution plan. Ideally one of the above two indices would help the database execute ROW_NUMBER, along with the WHERE clause.
If your database version does not support ROW_NUMBER, then you may continue with your current approach:
SELECT d1.user_id, d1.data, d1.date
FROM datapoint_table d1
INNER JOIN
(
SELECT user_id, MIN(date) AS min_date
FROM datapoint_table
WHERE date > '2019-01-14'
GROUP BY user_id
) d2
ON d1.user_id = d2.user AND d1.date = d2.min_date
WHERE
d1.date > '2019-01-14';
Again, the indices suggested should at least speed up the execution of the GROUP BY subquery.

Select NULL otherwise latest date per group

I am trying to pickup Account with End Date NULL first then latest date if there are more accounts with the same item
Table Sample
Result expected
Select distinct *
from Sample
where End Date is null
Need help to display the output.
Select *
from Sample
order by End_Date is not null, End_date desc
According to sample it seems to me you need union and not exists corelate subquery
select * from table_name t where t.enddate is null
union
select * from table_name t
where t.endate=( select max(enddate) from table_name t1 where t1.Item=t.Item and t1.Account=t.Account)
and not exists ( select 1 from table_name t2 where enddate is null and
t1 where t2.item=t.item
)
SELECT * FROM YourTable ORDER BY End_Date IS NOT NULL, End_Date DESC
In a Derived Table, you can determine the end_date_to_consider for every Item (using GROUP BY Item). IF() the MIN() date is NULL, then we consider NULL, else we consider the MAX() date.
Now, we can join this back to the main table on Item and the end_date to get the required rows.
Try:
SELECT t.*
FROM
Sample AS t
JOIN
(
SELECT
Item,
IF(MIN(end_date) IS NULL,
NULL,
MAX(end_date)) AS end_date_to_consider
FROM Sample
GROUP BY Item
) AS dt
ON dt.Item = t.Item AND
(dt.end_date_to_consider = t.end_date OR
(dt.end_date_to_consider IS NULL AND
t.end_date IS NULL)
)
First of all you should state clearly which result rows you want: You want one result row per Item and TOU. For each Item/TOU pair you want the row with highest date, with null having precedence (i.e. being considered the highest possible date).
Is this correct? Does that work with your real accounts? In your example it is always that all rows for one account have a higher date than all other account rows. If that is not the case with your real accounts, you need something more sophisticated than the following solution.
The highest date you can store in MySQL is 9999-12-31. Use this to treat the null dates as desired. Then it's just two steps:
Get the highest date per item and tou.
Get the row for these item, tou and date.
The query:
select * from
sample
where (item, tou, coalesce(enddate, date '9999-12-31') in
(
select item, tou, max(coalesce(enddate, date '9999-12-31'))
from sample
group by item, tou
)
order by item, tou;
(If it is possible for your enddate to have the value 9999-12-31 and you want null have precedence over this, then you must consider this in the query, i.e. you can no longer simply use this date in case of null, and the query will get more complicated.)

related to query using SQL

In oracle sql, how to get the count of newly added customers only for the month of april and may and make sure they werent there in the previous months
SELECT CUSTOMER ID , COUNT(*)
FROM TABLE
WHERE DATE BETWEEN '1-APR-2018' AND '31-MAY-2018' AND ...
If we give max (date) and min(date), we can compare the greater date to check if this customer is new , correct?
expected output is month count
april ---
may ---
should show the exact count how many new customers joined in these two months
One approach is to use aggregation:
select customer_id, min(date) as min_date
from t
group by customer_id
having min(date) >= date '2018-04-01 and
min(date) < date '2018-06-01';
This gets the list of customers (which your query seems to be doing). To get the count, just use count(*) and make this a subquery.

MySQL Group Grouped By Result

I have a very simple table which consists of the following columns:
id | customer_id | total | created_at
I was running this query to get the results per day for the last ten days:
SELECT SUM(total) AS total, DATE_FORMAT(created_at, "%d/%m/%Y") AS date
FROM table
WHERE created_at BETWEEN "2017-02-20" AND "2017-03-01"
GROUP BY created_at
ORDER BY created_at DESC
This works fine, but I've just noticed that there's an issue with imported rows being duplicated for some reason so I'd like to update the query to be able to handle the situation if it ever happens again, in other words select one row instead of all when the date and customer id are the same (the total is also identical).
If I add customer_id to the group by that seems to work but the trouble with that is then the query returns a result per day for each customer when I only want the overall total.
I've tried a couple of things but I haven't cracked it yet, I think it will be achievable using a sub query and/or an inner join, I have tried this so far but the figures are very wrong:
SELECT
created_at,
(
SELECT SUM(total)
FROM table test
WHERE test.created_at = table.created_at
AND test.customer_id = table.customer_id
GROUP BY customer_id, created_at
LIMIT 1
) AS total
FROM table
WHERE created_at BETWEEN "2017-02-20" AND "2017-03-01"
GROUP BY created_at
ORDER BY created_at DESC
It's also a large table so finding a performant way to do this is also important.
First, are you sure that created_at is a date and not a datetime? This makes a big difference.
You can do what you want using two levels of aggregation:
SELECT SUM(max_total) AS total, DATE_FORMAT(created_at, '%d/%m/%Y') AS date
FROM (SELECT t.customer_id, t.created_at, MAX(total) as max_total
FROM table t
WHERE t.created_at BETWEEN '2017-02-20' AND '2017-03-01'
GROUP BY t.customer_id, t.created_at
) t
GROUP BY created_at
ORDER BY created_at DESC;