substitute duplicate value with null SQL - mysql

I have a table with the following data :
orderid
item_amount
total_bill_amount
123
2
8
123
6
8
455
4
11
455
6
11
455
1
11
I want to substitute duplicate value for total_bill_amount with null ad keep the first record always with a value and anything after with null. Example of how i want to see the data :
orderid
item_amount
total_bill_amount
123
2
8
123
6
null
455
4
11
455
6
null
455
1
null
Note that my MySQL version is 5.7, so I can't use any window functions in MySQL 8.

You can use row_number() with a subquery:
with to_r(rnum, id, a, t) as (
select row_number() over (order by o.orderid), o.* from orders o
)
select r.id, r.a, case when
(select sum(r1.t = r.t and r1.id = r.id and r.rnum >= r1.rnum) from to_r r1) > 1
then null else r.t end
from to_r r

Related

Replace null value with previous value without using ID in SQL Server 2008

It's done with the use of ID but I want without using ID
SELECT
ID
,COALESCE(p.number,
(SELECT TOP (1) number
FROM tablea AS p2
WHERE
p2.number IS NOT NULL
AND p2.ID <= p.ID ORDER BY p2.ID DESC))as Number--,Result = p.number
FROM TableA AS p;
ID number
1 100
2 150
3 NULL
4 300
5 NULL
6 NULL
7 450
8 NULL
9 NULL
10 560
11 NULL
12 880
13 NULL
14 579
15 987
16 NULL
17 NULL
18 NULL
Try this query. This will help you get your desired result set. This query is written in SQL Server 2008 R2.
WITH CTE AS
( SELECT id, number FROM tablea)
SELECT A.id, A.number, ISNULL(A.number,B.number) number
FROM CTE A
OUTER APPLY (SELECT TOP 1 *
FROM CTE
WHERE id < a.id AND number IS NOT NULL
ORDER BY id DESC) B
You can try using LAG and LEAD functions in SQL Server 2012/

Display distinct rows of a table with the sum of a column of all duplicate rows in SQL Server 2008

There are two tables :
Tasks table :
TaskName (PK)
TaskAllocation table :
AllocationID(PK),
TaskName(F.K to TaskName in 'Tasks' Table),
UserID( F.K to ID in 'Users' Table),
EngineerType( F.K to ID in 'EngineerType' Table),
Start Date,
End Date,
Hours,
Location
'Users' Table :
ID,
FirstName,
LastName
'EngineerTypes' Table :
ID,
Type
Each task can have multiple allocations.Hence, the taskname can occur multiple times in the Task Allocation table. The same task can be mapped to multiple users (UserID)
I need to display the selected task (given as input from U.I), the UserIDs allocated to that task, first occurrence of start date, first occurrence of end date and Sum(Hours) for each user of the selected tasks.
Example: TaskAllocation data :
TaskName UserID TypeId StartDate EndDate Hours Location
Task1 1 11 Feb 5 Feb 7 1 NULL
Task1 1 11 Feb 6 Feb 7 2 NULL
Task1 1 11 Feb 7 Feb 7 3 Onsite
Task1 2 12 Feb 8 Feb 10 4 Offshore
Task1 2 12 Feb 9 Feb 10 5 NULL
Task1 2 12 Feb 10 Feb 10 6 NULL
'EngineerTypes' data :
ID Type
11 Type1
12 Type2
'Users' Data :
ID FirstName
1 Name1
2 Name2
The query which I implemented was :
Select TaskAllocation.UserId as UserId,Users.FirstName as Name,
TaskAllocation.EngineerType as TypeId,EngineerTypes.Type as Type,
min(TaskAllocation.StartDate) as AllocationStartDate,
max(TaskAllocation.EndDate) as AllocationEndDate,
sum(TaskAllocation.Hours) as Hours, TaskAllocation.Location
from TaskAllocation join Users on TaskAllocation.UserId=Users.ID
join EngineerTypes on EngineerTypes.ID = TaskAllocation.EngineerType
where TaskAllocation.TaskName = Task1' group by
FirstName,UserId,EngineerType,Type,AllocationStartDate,AllocationEndDate,Hours,
Location,TaskName order by TaskName, UserId
Output: UserId Name TypeId Type AllocationStartDate AllocationEndDate Hours Location
1. 126 Name1 11 Type1 2015-11-23 2015-11-25 0.1 NULL
2. 126 Name1 11 Type1 2015-11-24 2015-11-25 0.2 NULL
3. 126 Name1 11 Type1 2015-11-25 2015-11-25 0.3 NULL
4. 127 Name2 12 Type2 2015-11-23 2015-11-25 0.2 NULL
5. 127 Name2 12 Type2 2015-11-24 2015-11-25 0.3 NULL
6. 127 Name2 12 Type2 2015-11-25 2015-11-25 0.4 NULL
You could try below ways
Method1:
select taskname,
userid,
min(startdate) as'first occurence',
max(enddate) as'last occurence'
,sum(hours)
from t1
group by taskname,userid
Method2:Cross apply
select
distinct taskname,userid,b.*
from t1
cross apply
(select min(startdate) as Firstoccur,max(startdate) as secondocc,sum(hours) as hrs
from t1 t2 where t1.taskname =t2.taskname and t1.userid=t2.userid
group by t2.taskname,t2.userid) b
Method 3:
Window functions
with cte
as
(
select taskname,userid,
min(startdate) over (partition by taskname,userid) as 'first',
max(enddate) over (partition by taskname,userid) as 'second',
sum(hours) over (partition by taskname,userid) as 'hrs',
ROW_NUMBER() over (partition by taskname,userid order by taskname,userid) as rn
from t1
)
select *from cte where rn=1

Mysql JOIN with extra priority column

I have two days trying to do this query with no luck.
I have two tables 'DEMAND' and 'DEMAND_STATE' (one to many relation). The table DEMAND_STATE have millions entries.
CREATE TABLE DEMAND
(
ID INT NOT NULL,
DESTINY_ID INT NOT NULL
)
CREATE TABLE DEMAND_STATE
(
ID INT NOT NULL,
PRIORITY INT NOT NULL,
QUANTITY DOUBLE NOT NULL,
CASE_ID INT NOT NULL,
DEMAND_ID INT NOT NULL,
PHASE_ID INT NOT NULL
)
The QUANTITY of the DEMAND_STATE is given according to a CASE_ID and PHASE_ID. We have 'N' PHASES in 'M' CASES. Always the same number of Phases in all Cases. We always have a initial Base Quantity called 'BASE CASE' in the Case with CASE_ID = 1.
For example to obtain quantity for Case (id=2) and Case Base (id=1)
select D.*, S.PRIORITY, S.QUANTITY, S.CASE_ID, S.DEMAND_ID, S.PHASE_ID
FROM DEMAND D
join DEMAND_STATE S on (D.ID = S.DEMAND_ID)
WHERE (S.CASE_ID = 2 OR S.CASE_ID = 1)
(paste only for id=8)
ID PRIORITY QUANTITY CASE_ID DEMAND_ID PHASE_ID
8 0 85 1 8 1
8 0 83 1 8 2
8 0 88 1 8 3
8 0 89 1 8 4
8 10 85 2 8 1
8 10 84 2 8 2
8 10 86 2 8 3
8 10 89 2 8 4
We need to obtain for all Demand in 'DEMAND' only the Quantity for Each Phase with MAX priority. The idea is no duplicate DEMAND_STATE data for each new Case creation. Only create new state rows when Demand-Case-Phase is different to Case Base. This is a new project and we accept changes in model for better performance.
I also tried with the MAX calculation. This query over DEMAND_STATE works fine but only obtain data for a concrete DEMAND_ID. Further i think this solution can be so expensive.
SELECT P.ID, P.QUANTITY, P.CASE_ID, P.DEMAND_ID, P.PHASE_ID
FROM DEMAND_STATE P
JOIN (
SELECT PHASE_ID, MAX(PRIORITY) max_priority, S.DEMAND_ID
from DEMAND_STATE S
WHERE S.DEMAND_ID = 1
AND (S.CASE_ID=1 OR S.CASE_ID=2)
GROUP BY S.PHASE_ID
) SUB
ON (SUB.PHASE_ID = P.PHASE_ID AND SUB.max_priority = P.PRIORITY)
WHERE P.DEMAND_ID = 1
GROUP BY P.PHASE_ID
The result:
ID QUANTITY CASE_ID DEMAND_ID PHASE_ID
1 86 1 1 1
2 85 1 1 2
3 81 1 1 3
8 500 2 1 4
This is the result expected:
ID ID PRIORITY QUANTITY CASE_ID PHASE_ID
8 1 0 86 1 1 (data from Case Base id=1 priority 0)
8 2 10 85 1 2 (data from Case Baseid=1 priority 0)
8 3 10 81 1 3 (data from Case Base id=1 priority 0)
8 64 10 500 2 4 (data from Case id=2 priority 10)
thank for help :)
Edit:
Result of Simon proposal:
ID QUANTITY CASE_ID DEMAND_ID PHASE_ID
1 86 1 1 1
2 85 1 1 2
3 81 1 1 3
4 84 1 1 4 (this row shouldnt exist)
8 500 2 1 4 (this is the correct row)
Also would have to join it with DEMAND
#didierc response:
ID ID MAX(S.PRIORITY) QUANTITY CASE_ID PHASE_ID
1 8 10 500 2 4
2 13 10 81 2 1
2 14 10 83 2 2
2 15 10 84 2 3
3 21 10 81 2 1
4 31 10 86 2 3
4 32 10 80 2 4
4 29 10 85 2 1
4 30 10 81 2 2
we need for each DEMAND four rows with the quantity Value. In Case Base we have four quantity and in Case 2 we only change the quantity for phase 4. We need always four rows for each demand.
Database DEMAND_STATE data:
ID PRIORITY QUANTITY CASE_ID DEMAND_ID PHASE_ID
1 0 86 1 1 1
2 0 85 1 1 2
3 0 81 1 1 3
4 0 84 1 1 4
8 10 500 2 1 4
We need to obtain for all Demand in 'DEMAND' only the Quantity for Each Phase with MAX priority
I translate the above, according to your sample result set, as:
SELECT
D.ID, S.ID, MAX(S.PRIORITY), S.QUANTITY, S.CASE_ID, S.PHASE_ID
FROM DEMAND D
LEFT JOIN DEMAND_STATE S
ON D.ID = S.DEMAND_ID
GROUP BY S.PHASE_ID, S.DEMAND_ID
Update:
To get the maximum priority for each pair(demand_id,phase_id)n we use the following query:
SELECT
DEMAND_ID, PHASE_ID, MAX(PRIORITY) AS PRIORITY
FROM DEMAND_STATE
GROUP BY DEMAND_ID, PHASE_ID
Next, to retrieve the set of phases for a given demand, just make an inner join on demand state:
SELECT S.* FROM DEMAND_STATE S
INNER JOIN (
SELECT
DEMAND_ID, PHASE_ID, MAX(PRIORITY) AS PRIORITY
FROM DEMAND_STATE
GROUP BY DEMAND_ID, PHASE_ID
) S2
USING (DEMAND_ID,PHASE_ID, PRIORITY)
WHERE DEMAND_ID = 1
If you want to limit the possible cases, include a where clause in the query S2:
SELECT S.* FROM DEMAND_STATE S
INNER JOIN (
SELECT
DEMAND_ID, PHASE_ID, MAX(PRIORITY) AS PRIORITY
FROM DEMAND_STATE
WHERE CASE_ID IN (1,2)
GROUP BY DEMAND_ID, PHASE_ID
) S2
USING (DEMAND_ID,PHASE_ID, PRIORITY)
WHERE DEMAND_ID = 1
However, your comments and update indicates that MAX(PRIORITY) does not seem very relevant after all. My understanding is that you have a base case, which may be overriden by another case in a given scenario (that scenario is the pair base case + some other case). Clarify that point in your question body if this is incorrect. If that is the case, you may change the above query by replacing PRIORITY by CASE_ID:
SELECT S.* FROM DEMAND_STATE S
INNER JOIN (
SELECT
DEMAND_ID, PHASE_ID, MAX(CASE_ID) AS CASE_ID
FROM DEMAND_STATE
WHERE CASE_ID IN (1,2)
GROUP BY DEMAND_ID, PHASE_ID
) S2
USING (DEMAND_ID,PHASE_ID, CASE_ID)
WHERE DEMAND_ID = 1
The only reason I see from having a priority is if you wish to combine more than 2 cases, and use priority to select which case will prevail depending on the phase.
You may of course prepend an inner join on DEMAND to include the related demand data.
Use of subqueries should be able to do as you wish, if I understand your question correctly. Something along the lines of the following:
SELECT
P.ID,
P.QUANTITY,
P.CASE_ID,
P.DEMAND_ID,
P.PHASE_ID
FROM DEMAND_STATE P
INNER JOIN (
-- Next level up groups it down and so gets the rows first returned for each PHASE_ID, which is the highest priority due to the subquery
SELECT
D.PHASE_ID,
D.PRIORITY,
D.DEMAND_ID
FROM (
-- Top level query to get all rows and order them in desc priority order
SELECT
S.PHASE_ID,
S.PRIORITY,
S.DEMAND_ID
FROM DEMAND_STATE S
WHERE S.DEMAND_ID IN (1) -- Update this to be whichever DEMAND_IDs you are interested in
AND S.CASE_ID IN (1,2)
ORDER BY
S.PHASE_ID ASC,
S.DEMAND_ID ASC,
S.PRIORITY DESC
) D
GROUP BY
D.PHASE_ID,
S.DEMAND_ID
) SUB
ON SUB.PHASE_ID = P.PHASE_ID
AND SUB.DEMAND_ID = P.DEMAND_ID
The top level subquery exists to get the rows you are interested in and order them in an order which allows predictable results when they are then grouped down by PHASE_ID and DEMAND_ID. This in turn allows a simple INNER JOIN to DEMAND_STATE hopefully (unless I have misunderstood your query)
This may still be expensive though depending on how much data is within that top level query.

How to split column based on row value in mysql?

I need a column from row value.
I have two table.
Table 1 : working_day Contains list of all working day date.
date
--------
2013-03-30
2013-03-29
2013-03-28
Table 2 : entry contains each employee in and out time.
id In Out Date
1 9 0 2013-03-30
2 8 0 2013-03-30
3 7 0 2013-03-30
1 8 18 2013-03-29
2 9 16 2013-03-29
3 6 20 2013-03-29
4 12 15 2013-03-29
Expected Output :
ID 29-03-2013_IN 29-03-2013_Out 30-03-2013_In
1 8 18 9
2 9 16 8
3 6 20 7
4 12 15 0
Tried :
SELECT id,
Case condition1 for 29_in, // I don't know which condition suite here.
Case condition1 for 29_out,
Case condition1 for 30_in
FROM entry
WHERE DATE
IN (
SELECT *
FROM (
SELECT DATE
FROM working_day
ORDER BY DATE DESC
LIMIT 0 , 2
)a
)
You could try something like that:
select
e.id,
(SELECT `in` FROM entry WHERE id = e.id AND date = '2013-03-30') as '2013-03-30_in',
(SELECT `in` FROM entry WHERE id = e.id AND date = '2013-03-29') as '2013-03-29_in',
(SELECT `out` FROM entry WHERE id = e.id AND date = '2013-03-29') as '2013-03-29_out'
from entry e
group by e.id;
Here is Demo
IMO you should do this in application instead of SQL

Display results in a particular format MYSQL

I have a table like
ID Name Points
1 A 10
1 A 11
1 B 11
1 B 12
1 C 12
1 C 13
2 A 8
2 A 9
2 B 9
2 B 10
2 C 10
2 C 11
I want my output to look like the following
ID Average(A) Average(B) Average(C)
1 10.5 11.5 12.5
2 8.5 9.5 10.5
The following group by query displays the output but not in above format
Select Avg(Points),ID,name from table group by Name,ID
Thanks
Wrapping your existing query in a subquery will allow you to build out a pivot table around it. The `MAX()
aggregate's purpose is only to eliminate the NULLs produced by the CASE statement, and therefore collapse multiple rows per ID down to one row per ID with a non-NULL in each column.
SELECT
ID,
MAX(CASE WHEN Name = 'A' THEN Points ELSE NULL END) AS `Average (A)`,
MAX(CASE WHEN Name = 'B' THEN Points ELSE NULL END) AS `Average (B)`,
MAX(CASE WHEN Name = 'C' THEN Points ELSE NULL END) AS `Average (C)`
FROM (
SELECT ID, AVG(Points) AS Points, Name FROM yourtable GROUP BY Name, ID
) avg_subq
GROUP BY ID
Here is a live demonstration on SQLFiddle