Wrong data due to Joins - mysql

How can I remove unreal data that I'm getting after several joins that I ran.
my entire Query is:
SELECT
distinct vortex_dbo.vw_public_material_location.material_name
,vw_public_request_material_location_mir.material_request_id
,vw_public_request_material_location_mir.parttype_name
,operation_code
,vw_public_request_material_location_mir.result_name
,vw_public_request_material_location_mir.qdf_number
, requestor
,[vortex_hvc].[vortex_dbo].[material_request].created_by
,[vortex_hvc].[vortex_dbo].[material_request].created_datetime as time1
,[vortex_hvc].[vortex_dbo].[material_request].distribution_list
,[vortex_hvc].[vortex_dbo].[material_request].recipient_name
, DATEPART(WW,[vortex_hvc].[vortex_dbo].[material_request].created_datetime) as WW
,vw_public_request_material_location_mir.product_code_name
,task_name
,vw_public_request_material_location_mir.full_location_name
FROM [vortex_hvc].[vortex_dbo].[vw_public_request_material_location_mir]
left join request on vw_public_request_material_location_mir.material_request_id = request.request_key
left join vortex_dbo.material_request on vw_public_request_material_location_mir.material_request_id = vortex_dbo.material_request.material_request_id
left join vortex_dbo.vw_public_material_location on vw_public_request_material_location_mir.last_result_id = vortex_dbo.vw_public_material_location.last_result_id
left join vortex_dbo.vw_public_material_history on vw_public_request_material_location_mir.material_request_id like (substring(vw_public_material_history.comments,12,6))
where (vw_public_request_material_location_mir.qdf_number not like 'null' and vw_public_request_material_location_mir.qdf_number not like '')
and vw_public_request_material_location_mir.product_code_name like 'LAKE%'
and vw_public_request_material_location_mir.task_id not like 'null'
and (vw_public_request_material_location_mir.result_name like 'bin 100' or vw_public_request_material_location_mir.result_name like 'bin 01'
or vw_public_request_material_location_mir.result_name like 'bin 02' or vw_public_request_material_location_mir.result_name like 'pass')
and (requestor like 'BUGANIM, RINAT' and employee_name like 'BUGANIM, RINAT')
and ( DateDiff(DD,[vortex_hvc].[vortex_dbo].[material_request].created_datetime, getdate()) < 180)
and (concat('',substring(vortex_dbo.vw_public_material_location.comments,12,6)) like vw_public_request_material_location_mir.material_request_id
or vortex_dbo.vw_public_material_location.comments like 'Changed by Matrix Transaction Handler' or vortex_dbo.vw_public_material_location.comments like 'Unit Ownership:%')
and (unit_number = vortex_dbo.vw_public_material_location.material_name or unit_number is null)
and vortex_dbo.vw_public_material_location.material_name like 'D7QM748200403'
order by vortex_dbo.vw_public_material_location.material_name desc
The results I'm getting are:
two rows that only the 2nd one contains true data.
material_name material_request_id parttype_name operation_code result_name qdf_number requestor created_by time1 WW product_code_name task_name full_location_name
D7QM748200403 332160 H6 4GXDCV K Y 7295 BIN 01 Q1T5 BUGANIM, RINAT SMS_Interface 2017-12-03 20:27:30.327 49 CANNON LAKE Y 2+2 PPV-M SAMPLE: QDF INVENTORY
D7QM748200403 332176 H6 4GXDCV K Y 7295 BIN 01 Q1T5 BUGANIM, RINAT SMS_Interface 2017-12-03 21:02:33.247 49 CANNON LAKE Y 2+2 PPV-M SAMPLE: QDF INVENTORY
What can I do in order to retrieve true data only?, I have more cases like this.
Thanks!!

Related

Issue with Union Sub-query

I'm attempting to use a union sub query to get the results for a couple different queries. What I'm looking to do is select all the players who hit a home run in the 2014 season, create a home run count for each player and find the average pitch speed of each home run. I'm also attempting to break things down by pitch type, my current code and result are as follows:
Select output.Batter_Name,
output.Qty,
output.speed,
output.avg_Speed,
output.break,
output.Type_Pitch,
Output.CH_Qty,
Output.CH_Pitch,
Output.Ch_Speed,
Output.CH_Avg_speed,
Output.CH_Break,
Output.CH_Type_Pitch
From(
SELECT
count(gameday.atbats.event) as Qty,
gameday.batters.name_display_first_last as Batter_Name,
gameday.pitches.type as Pitch,
gameday.pitches.start_speed as speed,
avg(gameday.pitches.start_speed) as avg_speed,
avg(gameday.pitches.break_length) as Break,
gameday.pitches.Pitch_type as Type_Pitch,
"0" as CH_Qty,
"0" as CH_Pitch,
"0" as Ch_Speed,
"0" as CH_Avg_speed,
"0" as CH_Break,
"0" as CH_Type_Pitch
FROM
gameday.atbats
JOIN
gameday.pitches ON gameday.atbats.num = gameday.pitches.gameAtBatID
AND gameday.pitches.gamename = gameday.atbats.gamename
INNER JOIN
gameday.batters ON gameday.atbats.batter = gameday.batters.ID
AND gameday.atbats.gamename = gameday.batters.gameName
INNER JOIN
gameday.pitchers ON gameday.atbats.pitcher = gameday.pitchers.ID
AND gameday.atbats.gamename = gameday.pitchers.gamename
WHERE
(gameday.atbats.event = 'Home Run')
AND gameday.pitches.type = 'x'
and gameday.pitches.Pitch_type = 'FF'
group by gameday.batters.name_display_first_last
UNION ALL
SELECT
"0" as Qty,
gameday.batters.name_display_first_last as Batter_Name,
"0" as Pitch,
"0" as Speed,
"0" as Avg_speed,
"0" as Break,
"0" as Type_Pitch,
count(gameday.atbats.event) as CH_Qty,
gameday.pitches.type as CH_Pitch,
gameday.pitches.start_speed as CH_speed,
avg(gameday.pitches.start_speed) as CH_avg_speed,
avg(gameday.pitches.break_length) as CH_Break,
gameday.pitches.Pitch_type as CH_Type_Pitch
FROM
gameday.atbats
JOIN
gameday.pitches ON gameday.atbats.num = gameday.pitches.gameAtBatID
AND gameday.pitches.gamename = gameday.atbats.gamename
INNER JOIN
gameday.batters ON gameday.atbats.batter = gameday.batters.ID
AND gameday.atbats.gamename = gameday.batters.gameName
INNER JOIN
gameday.pitchers ON gameday.atbats.pitcher = gameday.pitchers.ID
AND gameday.atbats.gamename = gameday.pitchers.gamename
WHERE
(gameday.atbats.event = 'Home Run')
AND gameday.pitches.type = 'x'
and gameday.pitches.Pitch_type = 'CH'
group by gameday.batters.name_display_first_last
) as Output
group by Output.Batter_name
A Sample of my results are below:
Batter_Name, Qty, speed, avg_Speed, break, Type_Pitch, CH_Qty, CH_Pitch, Ch_Speed, CH_Avg_speed, CH_Break, CH_Type_Pitch
A.J. Pollock 1 89 90 4.3 FF 0 0 0 0 0 0
Aaron Hicks 0 0 0 0 0 1 X 83 83 6 CH
The first player, Ellis shows that he had one home run on a FF, and zero on a CH. The 2nd player,Peirzynski, had 0 home runs on a FF, but 1 on a CH. The issue is that I know these players had home runs on both types of pitches, but the query is only one or the other, not both. My intended results are something like this:
Batter_Name, Qty, speed, avg_Speed, break, Type_Pitch, CH_Qty, CH_Pitch, Ch_Speed, CH_Avg_speed, CH_Break, CH_Type_Pitch
A.J. Pollock 1 89 90 4.3 FF 2 X 84 82 3.2 CH
Aaron Hicks 4 90 91 2.5 FF 1 X 83 83 6 CH
I'm thinking the issue has to be my setting some fields to 0, kind of like a place holder, but i cant seem to find a workable solution that gets me the results I want.

Two methods of performing cohort analysis in MySQL using joins

I make a cohort analysis processor. Input parameters: time range and step, condition (initial event) to exctract cohorts, additional condition (retention event) to check after each N hours/days/months. Output parameters: cohort analysis grid, like this:
0h | 16h | 32h | 48h | 64h | 80h | 96h |
cohort #00 15 | 6 | 4 | 1 | 1 | 2 | 2 |
cohort #01 1 | 35 | 8 | 0 | 2 | 0 | 1 |
cohort #02 0 | 3 | 31 | 11 | 5 | 3 | 0 |
cohort #03 0 | 0 | 4 | 27 | 7 | 6 | 2 |
cohort #04 0 | 1 | 1 | 4 | 29 | 4 | 3 |
Basically:
fetch cohorts: unique users who did something 1 in every period from time_begin every time_step.
find how many of them (in each cohort) did something 2 after N seconds, N*2 seconds, N*3, and so on until now.
In short - I have 2 solutions. One works too slow and includes a heavy select with joins for each data step: 1 day, 2 day, 3 day, etc. I want to optimize it by joining result for every data step to cohorts - and it's the second solution. It looks like it works but I'm not sure it's the best way and that it will give the same result even if cohorts will intersect. Please check it out.
Here's the whole story.
I have a table of > 100,000 events, something like this:
#user-id, timestamp, event_name
events_view (uid varchar(64), tm int(11), e varchar(64))
example input row:
"user_sampleid1", 1423836540, "level_end:001:win"
To make a cohort analisys first I extract cohorts: for example, users, who send special event '1st_launch' in 10 hour periods starting from 2015-02-13 and ending with 2015-02-16. All code in this post is simplified and shortened to see the idea.
DROP TABLE IF EXISTS tmp_c;
create temporary table tmp_c (uid varchar(64), tm int(11), c int(11) );
set beg = UNIX_TIMESTAMP('2015-02-13 00:00:00');
set en = UNIX_TIMESTAMP('2015-02-16 00:00:00');
select min(tm) into t_start from events_view ;
select max(tm) into t_end from events_view ;
if beg < t_start then
set beg = t_start;
end if;
if en > t_end then
set en = t_end;
end if;
set period = 3600 * 10;
set cnt_c = ceil((en - beg) / period) ;
/*works quick enough*/
WHILE i < cnt_c DO
insert into tmp_c (
select uid, min(tm), i from events_view where
locate("1st_launch", e) > 0 and tm > (beg + period * i)
AND tm <= (beg + period * (i+1)) group by uid );
SET i = i+1;
END WHILE;
Cohorts may consist the same user ids, though usually one user is exist only in one cohort. And in each cohort users are unique.
Now I have temp table like this:
user_id | 1st timestamp | cohort_no
uid1 1423836540 0
uid2 1423839540 0
uid3 1423841160 1
uid4 1423841460 2
...
uidN 1423843080 M
Then I need to again divide time range on periods and calculate for each period how many users from each cohort have sent event "level_end:001:win".
For each small period I select all unique users who have sent "level_end:001:win" event and left join them to tmp_c cohorts table. So I have something like this:
user_id | 1st timestamp | cohort_no | user_id | other fields...
uid1 1423836540 0 uid1
uid2 1423839540 0 null
uid3 1423841160 1 null
uid4 1423841460 2 uid4
...
uidN 1423843080 M null
This way I see how many users from my cohorts are in those who have sent "level_end:001:win", exclude not found by where clause: where t2.uid is not null.
Finally I perform grouping and have counts of users in each cohort, who have sent "level_end:001:win" in this particluar period.
Here's the code:
DROP TABLE IF EXISTS tmp_res;
create temporary table tmp_res (uid varchar(64) CHARACTER SET cp1251 NOT NULL, c int(11), cnt int(11) );
set i = 0;
set cnt_c = ceil((t_end - beg) / period) ;
WHILE i < cnt_c DO
insert into tmp_res
select concat(beg + period * i, "_", beg + period * (i+1)), c, count(distinct(uid)) from
(select t1.uid, t1.c from tmp_c t1 left join
(select uid, min(tm) from events_view where
locate("level_end:001:win", e) > 0 and
tm > (beg + period * i) AND tm <= (beg + period * (i+1)) group by uid ) t2
on t1.uid = t2.uid where t2.uid is not null) t3
group by c;
SET i = i+1;
END WHILE;
/*getting result of the first method: tooo slooooow!*/
select * from tmp_res;
The result I've got (it's ok that some cohorts are not appear on some periods):
"1423832400_1423890000","1","35"
"1423832400_1423890000","2","3"
"1423832400_1423890000","3","1"
"1423832400_1423890000","4","1"
"1423890000_1423947600","1","21"
"1423890000_1423947600","2","50"
"1423890000_1423947600","3","2"
"1423947600_1424005200","1","9"
"1423947600_1424005200","2","24"
"1423947600_1424005200","3","70"
"1423947600_1424005200","4","6"
"1424005200_1424062800","1","7"
"1424005200_1424062800","2","15"
"1424005200_1424062800","3","21"
"1424005200_1424062800","4","32"
"1424062800_1424120400","1","7"
"1424062800_1424120400","2","13"
"1424062800_1424120400","3","24"
"1424062800_1424120400","4","18"
"1424120400_1424178000","1","10"
"1424120400_1424178000","2","12"
"1424120400_1424178000","3","18"
"1424120400_1424178000","4","14"
"1424178000_1424235600","1","6"
"1424178000_1424235600","2","7"
"1424178000_1424235600","3","9"
"1424178000_1424235600","4","12"
"1424235600_1424293200","1","6"
"1424235600_1424293200","2","8"
"1424235600_1424293200","3","9"
"1424235600_1424293200","4","5"
"1424293200_1424350800","1","5"
"1424293200_1424350800","2","3"
"1424293200_1424350800","3","11"
"1424293200_1424350800","4","10"
"1424350800_1424408400","1","8"
"1424350800_1424408400","2","5"
"1424350800_1424408400","3","7"
"1424350800_1424408400","4","7"
"1424408400_1424466000","2","6"
"1424408400_1424466000","3","7"
"1424408400_1424466000","4","3"
"1424466000_1424523600","1","3"
"1424466000_1424523600","2","4"
"1424466000_1424523600","3","8"
"1424466000_1424523600","4","2"
"1424523600_1424581200","2","3"
"1424523600_1424581200","3","3"
It works but it takes too much time to process because there are many queries here instead of one, so I need to rewrite it.
I think it can be rewritten with joins, but I'm still not sure how.
I decided to make a temporary table and write period boundaries in it:
DROP TABLE IF EXISTS tmp_times;
create temporary table tmp_times (tm_start int(11), tm_end int(11));
set cnt_c = ceil((t_end - beg) / period) ;
set i = 0;
WHILE i < cnt_c DO
insert into tmp_times values( beg + period * i, beg + period * (i+1));
SET i = i+1;
END WHILE;
Then I get periods-to-events mapping (user_id + timestamp represent particular event) to temp table and left join it to cohorts table and group the result:
SELECT Concat(tm_start, "_", tm_end) per,
t1.c coh,
Count(DISTINCT( t2.uid ))
FROM tmp_c t1
LEFT JOIN (SELECT *
FROM tmp_times t3
LEFT JOIN (SELECT uid,
tm
FROM events_view
WHERE Locate("level_end:101:win", e) > 0)
t4
ON ( t4.tm > t3.tm_start
AND t4.tm <= t3.tm_end )
WHERE t4.uid IS NOT NULL
ORDER BY t3.tm_start) t2
ON t1.uid = t2.uid
WHERE t2.uid IS NOT NULL
GROUP BY per,
coh
ORDER BY per,
coh;
In my tests this returns the same result as method #1. I can't check the result manually, but I understand how method #1 work more and as far I can see it gives what I want. Method #2 is faster, but I'm not sure it's the best way and it will give the same result even if cohorts will intersect.
Maybe there are well-known common methods to perform a cohort analysis in SQL? Is method #1 I use more reliable than method #2? I work with joins not that often, that's why still do not fully understand joins magic yet.
Method #2 looks like pure magic, and I used to not believe in what I don't understand :)
Thanks for answers!

MYSQL: Select specific column value depending on condition

I have table with columns
id doctor_name charges_cash charges_cashless
1 1 300 600
2 2 200 400
Now I am trying to run this query:
SELECT ipd.patient_name, r.room_name, doctor.doctor_name,
CASE p.tpa_name
WHEN NULL
THEN i.charges_cash
ELSE i.charges_cashless
END AS 'charges'
FROM `daily_ward_entry` d, ipd_charges i, ipd_patient_entry ipd, room_charges r,
patient_detail p, doctor
WHERE d.doctor_visit_name = i.doctor
AND r.id = d.room_name
AND d.patient_name = ipd.id
AND d.doctor_visit_name = doctor.id
I am getting the result for charges as 400 whereas p.tpa_name being null, I expect it to be 200,
I am out of any clue, what I am doing wrong here?
The result set is like this
patient_name room_name doctor_name charges
Sapna Agrawal MG-1 Dr. Dungri 400
Thanks.
You need the IS operator when comparing to NULL
CASE WHEN p.tpa_name IS NULL
THEN i.charges_cash
ELSE i.charges_cashless
END AS 'charges'

SQL Server: calculate field data from fields in same table but different set of data

I was looking around and found no solution to this. I´d be glad if someone could help me out here:
I have a table, e.g. that has among others, following columns:
Vehicle_No, Stop1_depTime, Segment_TravelTime, Stop_arrTime, Stop_Sequence
The data might look something like this:
Vehicle_No Stop1_DepTime Segment_TravelTime Stop_Sequence Stop_arrTime
201 13000 60 1
201 13000 45 2
201 13000 120 3
201 13000 4
202 13300 240 1
202 13300 60 2
...
and I need to calculate the arrival time at each stop from the departure time at the first stop and the travel times in between for each vehicle. What I need in this case would look like this:
Vehicle_No Stop1_DepTime Segment_TravelTime Stop_Sequence Stop_arrTime
201 13000 60 1
201 13000 45 2 13060
201 13000 120 3 13105
201 13000 4 13225
202 13300 240 1
202 13300 60 2 13540
...
I have tried to find a solution for some time but was not successful - Thanks for any help you can give me!
Here is the query that still does not work - I am sure I did something wrong with getting the table from the database into this but dont know where. Sorry if this is a really simple error, I have just begun working with MSSQL.
Also, I have implemented the solution provided below and it works. At this point I mainly want to understand what went wrong here to learn about it. If it takes too much time, please do not bother with my question for too long. Otherwise - thanks a lot :)
;WITH recCTE
AS
(
SELECT ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id, ZAEHL_2011.dbo.L32.PlanAbfahrtStart, ZAEHL_2011.dbo.L32.Fahrzeit, ZAEHL_2011.dbo.L32.Sequenz, ZAEHL_2011.dbo.L32.PlanAbfahrtStart AS Stop_arrTime
FROM ZAEHL_2011.dbo.L32
WHERE ZAEHL_2011.dbo.L32.Sequenz = 1
UNION ALL
SELECT t. ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id, t. ZAEHL_2011.dbo.L32.PlanAbfahrtStart, t. ZAEHL_2011.dbo.L32.Fahrzeit,t. ZAEHL_2011.dbo.L32.Sequenz, r.Stop_arrTime + r. ZAEHL_2011.dbo.L32.Fahrzeit AS Stop_arrTime
FROM recCTE AS r
JOIN ZAEHL_2011.dbo.L32 AS t
ON t. ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id = r. ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id
AND t. ZAEHL_2011.dbo.L32.Sequenz = r. ZAEHL_2011.dbo.L32.Sequenz + 1
)
SELECT ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id, ZAEHL_2011.dbo.L32.PlanAbfahrtStart, ZAEHL_2011.dbo.L32.Fahrzeit, ZAEHL_2011.dbo.L32.Sequenz, ZAEHL_2011.dbo.L32.PlanAbfahrtStart,
CASE WHEN Stop_arrTime = ZAEHL_2011.dbo.L32.PlanAbfahrtStart THEN NULL ELSE Stop_arrTime END AS Stop_arrTime
FROM recCTE
ORDER BY ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id, ZAEHL_2011.dbo.L32.Sequenz
A recursive CTE solution - assumes that each Vehicle_No appears in the table only once:
DECLARE #t TABLE
(Vehicle_No INT
,Stop1_DepTime INT
,Segment_TravelTime INT
,Stop_Sequence INT
,Stop_arrTime INT
)
INSERT #t (Vehicle_No,Stop1_DepTime,Segment_TravelTime,Stop_Sequence)
VALUES(201,13000,60,1),
(201,13000,45,2),
(201,13000,120,3),
(201,13000,NULL,4),
(202,13300,240,1),
(202,13300,60,2)
;WITH recCTE
AS
(
SELECT Vehicle_No, Stop1_DepTime, Segment_TravelTime,Stop_Sequence, Stop1_DepTime AS Stop_arrTime
FROM #t
WHERE Stop_Sequence = 1
UNION ALL
SELECT t.Vehicle_No, t.Stop1_DepTime, t.Segment_TravelTime,t.Stop_Sequence, r.Stop_arrTime + r.Segment_TravelTime AS Stop_arrTime
FROM recCTE AS r
JOIN #t AS t
ON t.Vehicle_No = r.Vehicle_No
AND t.Stop_Sequence = r.Stop_Sequence + 1
)
SELECT Vehicle_No, Stop1_DepTime, Segment_TravelTime,Stop_Sequence, Stop1_DepTime,
CASE WHEN Stop_arrTime = Stop1_DepTime THEN NULL ELSE Stop_arrTime END AS Stop_arrTime
FROM recCTE
ORDER BY Vehicle_No, Stop_Sequence
EDIT
Corrected version of OP's query - note that it's not necessary to fully qualify the column names:
;WITH recCTE
AS
(
SELECT Zaehl_Fahrt_Id, PlanAbfahrtStart, Fahrzeit, L32.Sequenz, PlanAbfahrtStart AS Stop_arrTime
FROM ZAEHL_2011.dbo.L32
WHERE Sequenz = 1
UNION ALL
SELECT t.Zaehl_Fahrt_Id, t.PlanAbfahrtStart, t.Fahrzeit,t.Sequenz, r.Stop_arrTime + r.Fahrzeit AS Stop_arrTime
FROM recCTE AS r
JOIN ZAEHL_2011.dbo.L32 AS t
ON t.Zaehl_Fahrt_Id = r.Zaehl_Fahrt_Id
AND t.Sequenz = r.Sequenz + 1
)
SELECT Zaehl_Fahrt_Id, PlanAbfahrtStart, Fahrzeit, Sequenz, PlanAbfahrtStart,
CASE WHEN Stop_arrTime = PlanAbfahrtStart THEN NULL ELSE Stop_arrTime END AS Stop_arrTime
FROM recCTE
ORDER BY Zaehl_Fahrt_Id, Sequenz
I'm quite sure this works:
SELECT a.Vehicle_No, a.Stop1_DepTime,
a.Segment_TravelTime, a.Stop_Sequence, a.Stop1_DepTime +
(SELECT SUM(b.Segment_TravelTime) FROM your_table b
WHERE b.Vehicle_No = a.Vehicle_No AND b.Stop_Sequence < a.Stop_Sequence)
FROM your_table a
ORDER BY a.Vehicle_No

mysql,select query ,and or clause problem

i have a table named item with four attribute name,code,class,value
now i want to group them in following way:
group a: name='A',code=11,class='high',value between( (5300 and 5310),(7100 and 7200),(8210 and 8290))
group b: name='b',code=11,class='high',value between( (1300 and 1310),(2100 and 2200),(3210 and 3290))
how can i do it?
You might want to try something like this:
SELECT
CASE
WHEN code = 11 AND
class = 'high' AND
(code BETWEEN 5300 AND 5310 OR
code BETWEEN 7100 AND 7200 OR
code BETWEEN 8210 AND 8290)
THEN 'A'
WHEN code = 11 AND
class = 'high' AND
(code BETWEEN 1300 AND 1310 OR
code BETWEEN 2100 AND 2200 OR
code BETWEEN 3210 AND 3290)
THEN 'B'
ELSE Unknown
END AS name,
*
FROM your_table
ORDER BY name
You might wish to change ORDER BY to GROUP BY and you should be aware that BETWEEN includes both endpoints.
First group
select * from item
where name LIKE 'A'
and code LIKE '11'
and class LIKE 'high'
and (value BETWEEN 5300 AND 5310 OR value BETWEEN 7100 AND 7200 OR value BETWEEN 8210 AND 8290)
the same idea for group b