SQL - Using sum but optionally using default value for row - mysql

Given tables
asset
col - id
date_sequence
col - date
daily_history
col - date
col - num_error_seconds
col - asset_id
historical_event
col - start_date
col - end_date
col - asset_id
I'm trying to count up all the daily num_error_seconds for all assets in a given time range in order to display "Percentage NOT in error" by day. The catch is if there is a historical_event involving an asset that has an end_date beyond the sql query range, then daily_history should be ignored and a default value of 86400 seconds (one day of error_seconds) should be used for that asset
The query I have that does not use the historical_event is:
select ds.date,
IF(count(dh.time) = 0,
100,
100 - (100*sum(dh.num_error_seconds) / (86400 * count(*)))
) percent
from date_sequence ds
join asset a
left join daily_history dh on dh.date = ds.date and dh.asset_id=a.asset_id
where ds.date >= in_start_time and ds.date <= in_end_time
group by ds.thedate;
To build on this is beyond my SQL knowledge. Because of the aggregate function, I cannot simply inject 86400 seconds for each asset that is associated with an event that has an end_date beyond the in_end_time.
Sample Data
Asset
1
2
Date Sequence
2013-09-01
2013-09-02
2013-09-03
2013-09-04
Daily History
2013-09-01, 1400, 1
2013-09-02, 1501, 1
2013-09-03, 1420, 1
2013-09-04, 0, 1
2013-09-01, 10000, 2
2013-09-02, 20000, 2
2013-09-03, 30000, 2
2013-09-04, 40000, 2
Historical Event
start_date, end_date, asset_id
2013-09-03 12:01:03, 2014-01-01 00:00:00, 1
What I would expect to see with this sample data is a % of time these assets are in error
2013-09-01 => 100 - (100*(1400 + 10000))/(86400*2)
2013-09-02 => 100 - (100*(1501 + 20000))/(86400*2)
2013-09-03 => 100 - (100*(1420 + 30000))/(86400*2)
2013-09-04 => 100 - (100*(0 + 40000))/(86400*2)
Except: There was a historical event which should take precendence. It happened on 9/3 and is open-ended (has an end date in the future, so the calculations would change to:
2013-09-01 => 100 - (100*(1400 + 10000))/(86400*2)
2013-09-02 => 100 - (100*(1501 + 20000))/(86400*2)
2013-09-03 => 100 - (100*(86400 + 30000))/(86400*2)
2013-09-04 => 100 - (100*(86400 + 40000))/(86400*2)
Asset 1's num_error_seconds gets overwritten with a full day of error seconds if there is a historical event that has a start_date before 'in_end_time' and an end_time after the in_end_time
Can this be accomplished in one query? Or do I need to stage data with an initial query?

I think you're after something like this:
Select
ds.date,
100 - 100 * Sum(
case
when he.asset_id is not null then 86400 -- have a historical_event
when dh.num_error_seconds is null then 0 -- no daily_history record
else dh.num_error_seconds
end
) / 86400 / count(a.id) as percent -- need to divide by number of assets
From
date_sequence ds
cross join
asset a
left outer join
daily_history dh
on a.id = dh.asset_id and
ds.date = dh.date
left outer join (
select distinct -- avoid counting multiple he records
asset_id
from
historical_event he
Where
he.end_date > in_end_time
) he
on a.id = he.asset_id
Where
ds.date >= in_start_time and
ds.date <= in_end_time -- I'd prefer < here
Group By
ds.date
Example Fiddle

Related

Where do I begin Looping statements in MySQL Workbench

I need to have this query run 12 times (previous 12 months) and append the results to a table. I am not very good with looping, looking for input. I am just not sure where to put my counter variables or any other looping statements. I think I may need two variables to loop because of the Previous Month First Day and Last day variables.
SET #PM_FD = last_day(curdate() - interval 2 month) + interval 1 day;
SET #PM_LD = last_day(curdate() - interval 1 month);
insert into sandbox.metrics_history
SELECT
'CHI' as Company
,count(*) as Result
,'SSRM10' as Metric_ID
,'PONoReqLine' as Metric_Name
, MONTHNAME(#PM_FD) as Month, year(#PM_FD) as Year
FROM
poline pol
INNER JOIN
purchorder po ON pol.company = po.company
AND pol.po_number = po.po_number
AND pol.po_release = po.po_release
AND pol.po_code = po.po_code
LEFT JOIN
polinesrc src ON pol.company = src.company
AND pol.po_number = src.po_number
AND pol.po_release = src.po_release
AND pol.line_nbr = src.line_nbr
AND pol.po_code = src.po_code
LEFT JOIN
buyer byr ON pol.buyer_code = byr.buyer_code
WHERE
pol.buyer_code != 'POC'
AND src.company IS NULL
AND po.po_date >= #PM_FD
AND po.po_date <= #PM_LD
ORDER BY pol.company , pol.po_number , pol.line_nbr

Combining two mysql query returns ok instead of rows

I have a query in which I return some information regarding an invoice, I take that invoice and compare it to another table "payment" to see if that invoice (fullamount -fullpaid) exists in the other table and if it does some function should not run in my backend code.
SELECT a.status, a.rf_reference, a.payor_orig_id , a.gross_amount + a.type_related_amount as fullamount,
a.gross_paid + a.type_related_paid as fullpaid
FROM invoice a
where a.type = 3 and
a.status in (10, 30) and
a.UPDATE_DT is null
having fullamount > fullpaid
order by a.ORIG_ID;
The above query returns
status| rf_reference | payor_orig_id | fullamount | fullpaid
30 RF123456 212 1000 200
So now I take the above information and pass it onto another query to see if a row field matches.
I pass it on like this
select *
from payment
where
payor_orig_id = 212 and
rf_reference = RF123456 and
payment_amount = (1000-200) and
status = 10 and
INSERT_DT BETWEEN DATE_SUB(NOW(), INTERVAL 185 DAY) AND NOW() and
UPDATE_DT IS NULL;
So now the above code will return a row by which basically I do not run my backend function.
Since this are two separate query I would like to combine them to one where I make sure that I add a having statement and check that ONLY rows are returned where there is no match between the invoice and payment table.
SELECT a.status, a.rf_reference, a.payor_orig_id , a.gross_amount + a.type_related_amount as fullamount,
a.gross_paid + a.type_related_paid as fullpaid,
(select b.payment_amount
from payment b
where
b.payor_orig_id = a.payor_orig_id and
b.rf_reference = a.rf_reference and
b.status = 10 and
b.INSERT_DT BETWEEN DATE_SUB(NOW(), INTERVAL 185 DAY) AND NOW() and
b.UPDATE_DT IS NULL) as payment_amount
FROM invoice a
where a.type = 3 and
a.status in (10, 30) and
a.UPDATE_DT is null
having fullamount > fullpaid and
(fullamount - fullpaid ) <> payment_amount
order by a.ORIG_ID;
The above query returns "OK" which is odd since I am not sure how to debug it.
Try seeing if the other table exists or not using NOT EXIST
SELECT a.* ,
a.gross_amount + a.type_related_amount as fullamount,
a.gross_paid + a.type_related_paid as fullpaid
FROM invoice a
where a.type = 3 and
a.status in (10, 30) and
a.UPDATE_DT is null and
NOT EXISTS ( select *
from payment
where
payor_orig_id = a.payor_orig_id and
rf_reference = a.rf_reference and
payment_amount = ((a.gross_amount + a.type_related_amount) - (a.gross_paid + a.type_related_paid)) and
status = 10 and
INSERT_DT BETWEEN DATE_SUB(NOW(), INTERVAL 185 DAY) AND NOW() and
UPDATE_DT IS NULL )
having fullamount > fullpaid
order by a.ORIG_ID;

MySQL query optimization 5

I am working on an accounting software with JAVA + MySQL (maria db). I calculate the amount of the query with the following query, and the query takes 16 seconds when I run the query. is query duration normal? Do I make a mistake in the question?
SELECT products_id as ID,prod_name as 'Product Name',
IFNULL((SELECT sum(piece)
FROM `ktgcari_000_fatura_xref`
WHERE product_id = ktgcari_000_stok.products_id AND
(type = 1 or type = 4)
), 0) -
IFNULL((SELECT sum(piece)
FROM `ktgcari_000_fatura_xref`
WHERE product_id = ktgcari_000_stok.products_id AND
(type = 2 or type = 5)
), 0) +
IFNULL((SELECT sum(piece)
FROM ktgcari_000_ssayim
WHERE urun_id = ktgcari_000_stok.products_id
), 0) as stock
FROM ktgcari_000_stok
LIMIT 0,1000
 
Stock=(sum of incoming invoice + sum of incoming dispatch) - (sum of outgoing invoice + total of outgoing dispatch) + (total of counting receipt)
Database Information:
number of stock cards: 39000
Number of invoices: 545
Invoice content table count: 1800
Number of counting fingers: 942
database size: 5 MB
I would write the query as:
SELECT s.products_id as ID, s.prod_name as `Product Name`,
(COALESCE((SELECT SUM(CASE WHEN x.type IN (1, 4) THEN piece
WHEN x.type IN (2, 5) THEN - piece
END)
FROM `ktgcari_000_fatura_xref` x
WHERE x.product_id = s.products_id AND
x.type IN (1, 2, 4, 5)
), 0) +
COALESCE((SELECT SUM(ss.piece)
FROM ktgcari_000_ssayim ss
WHERE ss.urun_id = s.products_id
)), 0
) as stock
FROM ktgcari_000_stok s
LIMIT 0, 1000
Then for performance, you want indexes on ktgcari_000_fatura_xref(product_id, type, piece) and ktgcari_000_ssayim(urun_id, piece).
I also note that you are using LIMIT without ORDER BY. You do realize that SQL result sets are unordered, unless they have an explicit ORDER BY.
I edited the sql query as follows. the query time was 8 sec. how can i reduce the duration.
SELECT products_id as ID,prod_name,(SELECT IF(type=1 or
type=4,sum(urun_adet),0)-IF(type=2 or type=5,sum(urun_adet),0) FROM
ktgcari_000_fatura_xref where
product_id=ktgcari_000_stok.products_id)+IFNULL((SELECT sum(miktar)
FROM ktgcari_000_ssayim where urun_id=ktgcari_000_stok.products_id),0)
as 'stock' FROM ktgcari_000_stok LIMIT 0,1000

How to optimize query when using sub-queries in left join

Tables:
Please take a look here to see tables. How to query counting specific wins of team and find the winner of the series
Questions:
How to make query more optimize?
How to reduce query redundancy?
How to make this query more faster?
Summary
As you can see in the example query this part is use many times.
WHERE leagueid = 2096
AND start_time >= 1415938900
AND ((matches.radiant_team_id= 1848158 AND matches.dire_team_id= 15)
OR (matches.radiant_team_id= 15 AND matches.dire_team_id= 1848158))
SELECT matches.radiant_team_id,
matches.dire_team_id,
matches.radiant_name,
matches.dire_name,
TA.Count AS teamA,
TB.Count AS teamB,
TA.Count + TB.Count AS total_matches,
SUM(TA.wins),
SUM(TB.wins),
(CASE
WHEN series_type = 0 THEN 1
WHEN series_type = 1 THEN 2
WHEN series_type = 2 THEN 3
END) AS wins_goal
FROM matches
LEFT JOIN
(SELECT radiant_team_id,
COUNT(id) AS COUNT,
CASE
WHEN matches.radiant_team_id = radiant_team_id && radiant_win = 1 THEN 1
END AS wins
FROM matches
WHERE leagueid = 2096
AND start_time >= 1415938900
AND ((matches.radiant_team_id= 1848158
AND matches.dire_team_id= 15)
OR (matches.radiant_team_id= 15
AND matches.dire_team_id= 1848158))
GROUP BY radiant_team_id) AS TA ON TA.radiant_team_id = matches.radiant_team_id
LEFT JOIN
(SELECT dire_team_id,
COUNT(id) AS COUNT,
CASE
WHEN matches.dire_team_id = dire_team_id && radiant_win = 0 THEN 1
END AS wins
FROM matches
WHERE leagueid = 2096
AND start_time >= 1415938900
AND ((matches.radiant_team_id= 1848158
AND matches.dire_team_id= 15)
OR (matches.radiant_team_id= 15
AND matches.dire_team_id= 1848158))
GROUP BY dire_team_id) AS TB ON TB.dire_team_id = matches.dire_team_id
WHERE leagueid = 2096
AND start_time >= 1415938900
AND ((matches.radiant_team_id= 1848158
AND matches.dire_team_id= 15)
OR (matches.radiant_team_id= 15
AND matches.dire_team_id= 1848158))
GROUP BY series_id
Scheduled Matches
ID| leagueid| team_a_id| team_b_id| starttime
1| 2096| 1848158| 15| 1415938900
I believe it can be done without subqueries.
I made the following match table
And used the following query to group results, one line per series
SELECT
matches.leagueid,
matches.series_id,
matches.series_type,
COUNT(id) as matches,
IF(radiant_team_id=1848158,radiant_name, dire_name) AS teamA,
IF(radiant_team_id=1848158,dire_name, radiant_name) AS teamB,
SUM(CASE
WHEN radiant_team_id=1848158 AND radiant_win=1 THEN 1
WHEN dire_team_id=1848158 AND radiant_win=0 THEN 1
ELSE 0 END) AS teamAwin,
SUM(CASE
WHEN radiant_team_id=15 AND radiant_win=1 THEN 1
WHEN dire_team_id=15 AND radiant_win=0 THEN 1
ELSE 0 END) AS teamBwin
FROM `matches`
WHERE leagueid = 2096
AND start_time >= 1415938900
AND dire_team_id IN (15, 1848158)
AND radiant_team_id IN (15, 1848158)
group by leagueid,series_id,series_type,teamA,teamB
which yields the following result
Please note that, when grouping the results of one series, there isn't such thing as radiant team or dire team. The radiant and dire roles might be switched several times during the same series, so I only addressed the teams as teamA and teamB.
Now, looking at your prior question, I see that you need to determine the series winner based on the series type and each team victories. This would need to wrap the former query and use it as a subquery such as
SELECT matchresults.*,
CASE series_type
WHEN 0 then IF(teamAwin>=1, teamA,teamB)
WHEN 1 then IF(teamAwin>=2, teamA,teamB)
ELSE IF(teamAwin>=3, teamA,teamB)
END as winner
from ( THE_MAIN_QUERY) as matchresults
There may be more efficient ways to get the results you want. But, to make this query more efficient, you can add indexes. This is the repeated where clause:
WHERE leagueid = 2096 AND
start_time >= 1415938900 AND
((matches.radiant_team_id= 1848158 AND matches.dire_team_id= 15) OR
(matches.radiant_team_id= 15 AND matches.dire_team_id= 1848158))
Conditions with or are hard for the optimizer. The following index will be helpful: matches(leagueid, start_time). A covering index (for the where conditions at least) is matches(leagueid, start_time, radiant_team_id, dire_team_id). I would start with this latter index and see if that improves performance sufficiently for your purposes.

Running WHILE or CURSOR or both in SQL Server 2008

I am trying to run a loop of some sort in SQL Server 2008/TSQL and I am unsure whether this should be a WHILE or CURSOR or both. The end result is I am trying to loop through a list of user logins, then determine the unique users, then run a loop to determine how many visits it took for the user to be on the site for 5 minutes , broken out by the channel.
Table: LoginHistory
UserID Channel DateTime DurationInSeconds
1 Website 1/1/2013 1:13PM 170
2 Mobile 1/1/2013 2:10PM 60
3 Website 1/1/2013 3:10PM 180
4 Website 1/1/2013 3:20PM 280
5 Website 1/1/2013 5:00PM 60
1 Website 1/1/2013 5:05PM 500
3 Website 1/1/2013 5:45PM 120
1 Mobile 1/1/2013 6:00PM 30
2 Mobile 1/1/2013 6:10PM 90
5 Mobile 1/1/2013 7:30PM 400
3 Website 1/1/2013 8:00PM 30
1 Mobile 1/1/2013 9:30PM 200
SQL Fiddle to this schema
I can select the unique users into a new table like so:
SELECT UserID
INTO #Users
FROM LoginHistory
GROUP BY UserID
Now, the functionality I'm trying to develop is to loop over these unique UserIDs, order the logins by DateTime, then count the number of logins needed to get to 300 seconds.
The result set I would hope to get to would look something like this:
UserID TotalLogins WebsiteLogins MobileLogins Loginsneededto5Min
1 4 2 2 2
2 2 2 0 0
3 3 3 0 3
4 1 1 0 0
5 2 1 1 2
If I were performing this in another language, I would think it would something like this: (And apologies because this is not complete, just where I think I am going)
for (i in #Users):
TotalLogins = Count(*),
WebsiteLogins = Count(*) WHERE Channel = 'Website',
MobileLogins = Count(*) WHERE Channel = 'Mobile',
for (i in LoginHistory):
if Duration < 300:
count(NumLogins) + 1
** Ok - I'm laughing at myself the way I combined multiple different languages/syntaxes, but this is how I am thinking about solving this **
Thoughts on a good way to accomplish this? My preference is to use a loop so I can continue to write if/then logic into the code.
Ok, this is one of those times where a CURSOR would probably outperform a set based solution. Sadly, I'm not very good with cursors, so I can give you a set base solution for you to try:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [DateTime]) RN
FROM UserLogins
), CTE2 AS
(
SELECT *, 1 RecursionLevel
FROM CTE
WHERE RN = 1
UNION ALL
SELECT B.UserID, B.Channel, B.[DateTime],
A.DurationInSeconds+B.DurationInSeconds,
B.RN, RecursionLevel+1
FROM CTE2 A
INNER JOIN CTE B
ON A.UserID = B.UserID AND A.RN = B.RN - 1
)
SELECT A.UserID,
COUNT(*) TotalLogins,
SUM(CASE WHEN Channel = 'Website' THEN 1 ELSE 0 END) WebsiteLogins,
SUM(CASE WHEN Channel = 'Mobile' THEN 1 ELSE 0 END) MobileLogins,
ISNULL(MIN(RecursionLevel),0) LoginsNeedeto5Min
FROM UserLogins A
LEFT JOIN ( SELECT UserID, MIN(RecursionLevel) RecursionLevel
FROM CTE2
WHERE DurationInSeconds > 300
GROUP BY UserID) B
ON A.UserID = B.UserID
GROUP BY A.UserID
A slightly different piece-wise approach. A minor difference is that the recursive portion terminates when it reaches 300 seconds for each user rather than summing all of the available logins.
An index on UserId/StartTime should improve performance on larger datasets.
declare #Logins as Table ( UserId Int, Channel VarChar(10), StartTime DateTime, DurationInSeconds Int )
insert into #Logins ( UserId, Channel, StartTime, DurationInSeconds ) values
( 1, 'Website', '1/1/2013 1:13PM', 170 ),
( 2, 'Mobile', '1/1/2013 2:10PM', 60 ),
( 3, 'Website', '1/1/2013 3:10PM', 180 ),
( 4, 'Website', '1/1/2013 3:20PM', 280 ),
( 5, 'Website', '1/1/2013 5:00PM', 60 ),
( 1, 'Website', '1/1/2013 5:05PM', 500 ),
( 3, 'Website', '1/1/2013 5:45PM', 120 ),
( 1, 'Mobile', '1/1/2013 6:00PM', 30 ),
( 2, 'Mobile', '1/1/2013 6:10PM', 90 ),
( 5, 'Mobile', '1/1/2013 7:30PM', 400 ),
( 3, 'Website', '1/1/2013 8:00PM', 30 ),
( 1, 'Mobile', '1/1/2013 9:30PM', 200 )
select * from #Logins
; with MostRecentLogins as (
-- Logins with flags for channel and sequenced by StartTime (ascending) for each UserId .
select UserId, Channel, StartTime, DurationInSeconds,
case when Channel = 'Website' then 1 else 0 end as WebsiteLogin,
case when Channel = 'Mobile' then 1 else 0 end as MobileLogin,
Row_Number() over ( partition by UserId order by StartTime ) as Seq
from #Logins ),
CumulativeDuration as (
-- Start with the first login for each UserId .
select UserId, Seq, DurationInSeconds as CumulativeDurationInSeconds
from MostRecentLogins
where Seq = 1
union all
-- Accumulate additional logins for each UserId until the running total exceeds 300 or they run out of logins.
select CD.UserId, MRL.Seq, CD.CumulativeDurationInSeconds + MRL.DurationInSeconds
from CumulativeDuration as CD inner join
MostRecentLogins as MRL on MRL.UserId = CD.UserId and MRL.Seq = CD.Seq + 1 and CD.CumulativeDurationInSeconds < 300 )
-- Display the summary.
select UserId, Sum( WebsiteLogin + MobileLogin ) as TotalLogins,
Sum( WebsiteLogin ) as WebsiteLogins, Sum( MobileLogin ) as MobileLogins,
( select Max( Seq ) from CumulativeDuration where UserId = LT3.UserId and CumulativeDurationInSeconds >= 300 ) as LoginsNeededTo5Min
from MostRecentLogins as LT3
group by UserId
order by UserId
Note that your sample results seem to have an error. UserId 3 reaches 300 seconds in two calls: 180 + 120. Your example shows three calls.