Split logic in SQL - mysql

I am looking for an sql query which would take the following table data as input
ID Start date end date
ID1 01.01.2016 31.12.2016
ID2 01.02.2016 30.06.2016
ID3 01.10.2016 31.10.2016
ID4 01.02.2016 31.07.2016
and gives the following output
ID1 01.01.2016 31.01.2016
ID4 01.02.2016 31.07.2016
ID1 01.08.2016 30.09.2016
ID3 01.10.2016 31.10.2016
ID1 01.11.2016 31.12.2016
please see that ID1 is split for only one month as ID2,ID3 and ID4 has an overlap with ID1.
The idea is the latest date range gets the preference. if you see the output ID2 is completly rejected as it is overwritten by ID4.
please can you post the hints of the query.

I have thought up some logic for doing this. I can give the syntax for sql server (but you would have to find the syntax for mysql). Replace [Table Name] with the name of your table in the queries that follow.
Get the maximum end date from your table and store in a variable:
declare #max_end_date datetime
set #max_end_date = (select max([end date]) from [Table Name])
Collect possible start dates into a temporary table, adding in a row number and two blank columns (' ' as id, 0 as dupe):
select
row_number()over(order by start_date) as row,
start_date,
'' as id,
0 as dupe
into #temp
from
(select [start date] as start_date
from [Table Name]
union
select dateadd(day,1,[end date])
from [Table Name]
where [end date] <> #max_end_date )
group by start_date
Update each row of the temporary table with the plan effective on that start date:
update #temp
set #temp.id = c.id
from #temp
left outer join
(select a.start_date, max(b.id) as id
from #temp a
inner join [Table Name] b
on a.start_date between b.[start date] and b.[end date]
group by a.start_date) c
on #temp.start_date = c.start_date
Update the temporary table with a flag to mark out 'duplicate' rows:
update #temp
set #temp.dupe = 1
from #temp
inner join #temp b
on a.row = b.row + 1
and a.id = b.id
Delete the duplicate rows:
delete from #temp where dupe = 1
Update the row column:
update #temp
set row = row_number()over(order by start_date)
Join this table to itself to create the table you are after:
select
a.id as id,
a.start_date as start_date,
isnull(dateadd(day,-1,b.start_date),#max_end_date) as end_date
from #temp a
left outer join #temp b
on a.row = b.row - 1

Related

How can we return only unique records from table?

I am having a table structure like this
CREATE TABLE yourTable (
`Source` VARCHAR(20),
`Destination` VARCHAR(20),
`Distance` Integer
);
INSERT INTO yourTable
(`Source`, `Destination`, `Distance`)
VALUES
('Buffalo', 'Rochester', 2200),
('Yonkers', 'Syracuse', 1400),
('Cheektowaga', 'Schenectady', 600),
('Rochester', 'Buffalo', 2200)
How can we return only unique records for example as 'Buffalo' and 'Rochester' are present in 1 & 4 rows so one should be taken while retrieving.
I tried writing this query but here source and destination values are not correct for 3 rows Schenectady Cheektowaga
SELECT DISTINCT GREATEST(Source, Destination) as Source, LEAST(Source, Destination) AS Destination, Distance
FROM yourTable
Use two queries that you combine with UNION. One query returns the rows that are already unique, the other removes the duplicate from the rows that are duplicated in the other direction.
SELECT t1.Source, t1.Destination, t1.Distance
FROM yourTable AS t1
LEFT JOIN yourTable AS t2 ON t1.Source = t2.Destination AND t1.Destination = t2.Source
WHERE t2.Source IS NULL
UNION ALL
SELECT GREATEST(Source, Destination) AS s, LEAST(Source, Destination) AS d, MAX(Distance) AS Distance
FROM yourTable
GROUP BY s, d
HAVING COUNT(*) > 1
DEMO
Try this:
select * from yourTable group by greatest(source,destination);

Getting error while implementing CASE WHEN statement in SQL

I have to write a query and check whether sales has increased (use yes) and decreased (use no in column SaleIncreased) for every salesman compared to his previous year's performance.
sample table output should be as follows
EmpID salesman year SaleIncreased
7843921 John 2016 Null
7843921 John 2017 Yes
7843934 Neil 2016 Null
7843934 Neil 2017 No
I have used self join with CASE WHEN statement as follows
select t1.empid, t1.salesman, t1.year
from Sales_temp as t1
inner join Sales_temp as t2
on t1.empid = t2.empid and t2.year = t1.year - 1
case when t1.sale > t2.sale
then 'Yes'
else 'No'
end as 'SaleIncreased'
I'm unable to get the desired output.
Your CASE expression appears to be out of place, and you probably intended for it to be in the SELECT clause:
SELECT
t1.empid,
t1.salesman,
t1.year,
CASE WHEN t1.sale > t2.sale
THEN 'Yes'
ELSE 'No'
END AS SaleIncreased
FROM Sales_temp AS t1
LEFT JOIN Sales_temp AS t2
ON t1.empid = t2.empid AND t2.year = t1.year - 1
ORDER BY
t1.empid,
t1.year;
Another change I made is to use a left join instead of an inner join. This is important, because it would ensure that the earliest year records for each employee would appear in the result set (these would be the records having a NULL value for the increase in sales).
Is this useful.?
DECLARE #tab1 TABLE(EMPID BIGINT,Saleman VARCHAR(100),[Year] BIGINT,Sales BIGINT)
INSERT INTO #tab1
SELECT 7843921,'John',2016,100 Union ALL
SELECT 7843921,'John',2017,150 Union ALL
SELECT 7843934,'Neil',2016,120 Union ALL
SELECT 7843934,'Neil',2017,90
Select *,CASE
WHEN LAG(Sales) OVER(Partition by EmpID order by [year]) IS NULL then NULL
WHEN Sales - LAG(Sales) OVER(Partition by EmpID order by [year])>0 THEN 'Yes'
ELSE 'No' END
from #tab1

How to delete only a single row from 2 duplicate rows?

I have 2 duplicate rows in the table,I want to delete only 1 from that and keep the other row.how can I do that?
The PostGres code might be a little different, but here's an example from TSQL that does it with a CTE:
; WITH duplicates
AS (
SELECT ServerName ,
ProcessName ,
DateCreated ,
RowRank = ROW_NUMBER() OVER(PARTITION BY ServerName, ProcessName, DateCreated ORDER BY 1)
FROM dbo.ErrorLog
)
DELETE e
FROM dbo.ErrorLog e
JOIN duplicates d
ON d.ServerName = e.ServerName
AND d.ProcessName = e.ProcessName
AND d.DateCreated = e.DateCreated
AND d.RowRank <> 1

CASE Statement instead of inner join

This is my table structure:
CUST_ID ORDER_MONTH
---------------------
1 1
1 5
2 3
2 4
My objective is to tag these customers as either New or Returning customers.
When I filter the query lets say for month 1 then customer 1 should have the tag 'New' but when I filter it for month 5 then customer 1 should show up as 'Return' as he already made a purchase in month 1.
Same way customer ID 2 should show up as New for month 3 and return for month 4.
I want to do this using a CASE statement and not inner join.
Thanks
If you insist on using a case statement, the logic would be something like "If this is the first month for that user, write new, otherwise write returning." The query would be as follows:
SELECT CASE
WHEN m.month = (SELECT MIN(month) FROM myTable WHERE customer = m.customer) THEN 'New'
ELSE 'Returning' END AS customerType
FROM myTable m;
However, I think this would be nicer and more readable in a JOIN. You can write an aggregation query to get the earliest month for each user, and then use COALESCE() to replace null values with 'Returning'. The aggregation:
SELECT customer, MIN(month) AS minMonth, 'New' AS customerType
FROM myTable
GROUP BY customer ;
To get the rest:
SELECT m.customer, m.month, COALESCE(t.customerType, 'Returning') AS customerType
FROM myTable m
LEFT JOIN(
SELECT customer, MIN(month) AS minMonth, 'New' AS customerType
FROM myTable
GROUP BY customer) t ON t.customer = m.customer AND t.minMonth = m.month;
Here is an SQL Fiddle example that shows both examples.
You don't need a JOIN and a case statement would probably be overkill...
SELECT CUST_ID, IF(COUNT(1)>1, 'Returning', 'New') AS blah
FROM the_table
WHERE ORDER_MONTH <= the_month
GROUP BY CUST_ID
;
Of course, using just month is going to cause problems after a year (or really, after passing December.)
This would be better
SELECT CUST_ID, IF(COUNT(1)>1, 'Returning', 'New') AS blah
FROM the_table
WHERE order_date <= some_date
GROUP BY CUST_ID
;
Well I do not reccomend this way but this is what you want.
select *
,case when order_month = (select MIN(order_month) from #temp t2 where t1.cust_ID =t2.cust_id) THEN 'NEW' ELSE 'Return' end 'Type'
from #temp t1
I think I get what you're trying to do. Your case statement basically just needs to check if the customer's month equals the month you're filtering by. Something like this:
SELECT
<your other fields>,
CASE WHEN Order_Month = <your filter> THEN 'New'
ELSE 'Return'
END AS 'SomeName'
FROM <your table>
Try this query
select a.CUST_ID, a.ORDER_MONTH ,case when b is not null then 'Return' else 'New' end as type
from tablename a
join tablename b on a.CUST_ID=b.CUST_ID and a.ORDER_MONTH>b.ORDER_MONTH
SELECT *,
CASE
WHEN EXISTS (SELECT *
FROM [YourTable] t2
WHERE t1.cust_id = t2.cust_id
AND t2.order_month < t1.order_month) THEN 'Return'
ELSE 'New'
END
FROM [YourTable] t1
This query uses CASE on an EXISTS clause.
The EXISTS is on a subquery which queries the same table for any rows in previous months.
If there are rows for previous months then the EXISTS is true and the CASE returns 'Return'. If there are no rows for previous months then the EXISTS is false and the CASE returns 'New'.

How to insert conditionally

I create a temporary table #tbl(account, last_update). I have following two inserts from different source (could be tables from different databases) to insert account with last update date. For example
create table #tbl ([account] numeric(18, 0), [last_update] datetime)
insert into #tbl(account , last_update)
select table1.account, max(table1.last_update)
from table1 join…
group by table1.account
insert into #tbl(account , last_update)
select table2.account, max(table2.last_update)
from table2 join…
group by table2.account
The problem is this could cause duplicate account in the table #tbl. I either have to avoid it during each insert or remove the duplicate after both insert. Also, if there is account with two different last_update, I want the #tbl have the latest last_update. How do I achieve this conditional insert? Which one will have better performance?
Do you think you could rewrite your query to something like:
create table #tbl ([account] numeric(18, 0), [last_update] datetime)
insert into #tbl(account , last_update)
select theaccount, MAX(theupdate) from
(
select table1.account AS theaccount, table1.last_update AS theupdate
from table1 join…
UNION ALL
select table2.account AS theaccount, table2.last_update AS theupdate
from table2 join…
) AS tmp GROUP BY theaccount
The UNION ALL will build you 1 unique table combining table1 + table2 records. From there, you can act as if was a regular table, which means that you are able to find the max last_update for each record using a "group by"
insert into #tbl(account , last_update)
select account, last_update
from
(
select a.* from #table1 a where
last_update in( select top 1 last_update from #table1 b
where
a.account = b.account
order by last_update desc)
UNION
select a.* from #table2 a where
last_update in( select top 1 last_update from #table2 b
where
a.account = b.account
order by last_update desc)
) AS tmp