MySQL query optimization 5 - mysql

I am working on an accounting software with JAVA + MySQL (maria db). I calculate the amount of the query with the following query, and the query takes 16 seconds when I run the query. is query duration normal? Do I make a mistake in the question?
SELECT products_id as ID,prod_name as 'Product Name',
IFNULL((SELECT sum(piece)
FROM `ktgcari_000_fatura_xref`
WHERE product_id = ktgcari_000_stok.products_id AND
(type = 1 or type = 4)
), 0) -
IFNULL((SELECT sum(piece)
FROM `ktgcari_000_fatura_xref`
WHERE product_id = ktgcari_000_stok.products_id AND
(type = 2 or type = 5)
), 0) +
IFNULL((SELECT sum(piece)
FROM ktgcari_000_ssayim
WHERE urun_id = ktgcari_000_stok.products_id
), 0) as stock
FROM ktgcari_000_stok
LIMIT 0,1000
 
Stock=(sum of incoming invoice + sum of incoming dispatch) - (sum of outgoing invoice + total of outgoing dispatch) + (total of counting receipt)
Database Information:
number of stock cards: 39000
Number of invoices: 545
Invoice content table count: 1800
Number of counting fingers: 942
database size: 5 MB

I would write the query as:
SELECT s.products_id as ID, s.prod_name as `Product Name`,
(COALESCE((SELECT SUM(CASE WHEN x.type IN (1, 4) THEN piece
WHEN x.type IN (2, 5) THEN - piece
END)
FROM `ktgcari_000_fatura_xref` x
WHERE x.product_id = s.products_id AND
x.type IN (1, 2, 4, 5)
), 0) +
COALESCE((SELECT SUM(ss.piece)
FROM ktgcari_000_ssayim ss
WHERE ss.urun_id = s.products_id
)), 0
) as stock
FROM ktgcari_000_stok s
LIMIT 0, 1000
Then for performance, you want indexes on ktgcari_000_fatura_xref(product_id, type, piece) and ktgcari_000_ssayim(urun_id, piece).
I also note that you are using LIMIT without ORDER BY. You do realize that SQL result sets are unordered, unless they have an explicit ORDER BY.

I edited the sql query as follows. the query time was 8 sec. how can i reduce the duration.
SELECT products_id as ID,prod_name,(SELECT IF(type=1 or
type=4,sum(urun_adet),0)-IF(type=2 or type=5,sum(urun_adet),0) FROM
ktgcari_000_fatura_xref where
product_id=ktgcari_000_stok.products_id)+IFNULL((SELECT sum(miktar)
FROM ktgcari_000_ssayim where urun_id=ktgcari_000_stok.products_id),0)
as 'stock' FROM ktgcari_000_stok LIMIT 0,1000

Related

Can someone optimize this SQL query?

I am currently working on a project that has 2 very large sql tables Users and UserDocuments having around million and 2-3 millions records respectively. I have a query that will return the count of all the documents that each indvidual user has uploaded provided the document is not rejected.
A user can have multiple documents against his/her id.
My current query:-
SELECT
u.user_id,
u.name,
u.date_registered,
u.phone_no,
t1.docs_count,
t1.last_uploaded_on
FROM
Users u
JOIN(
SELECT user_id,
MAX(updated_at) AS last_uploaded_on,
SUM(CASE WHEN STATUS != 2 THEN 1 ELSE 0 END) AS docs_count
FROM
UserDocuments
WHERE
user_id IN(
SELECT
user_id
FROM
Users
WHERE
region_id = 1 AND city_id = 8 AND user_type = 1 AND user_suspended = 0 AND is_enabled = 1 AND verification_status = -1
) AND document_id IN('1', '2', '3', '4', '10', '11')
GROUP BY
user_id
ORDER BY
user_id ASC
) t1
ON
u.user_id = t1.user_id
WHERE
docs_count < 6 AND region_id = 1 AND city_id = 8 AND user_type = 1 AND user_suspended = 0 AND is_enabled = 1 AND verification_status = -1
LIMIT 1000, 100
Currently the query is taking very long around 20 secs to return data with indexes. can someone suggest some tweaks in the follwing query to gain some more preformance out of it.
SELECT
u.user_id,
max( u.name ) name,
max( u.date_registered ) date_registered,
max( phone_no ) phone_no,
MAX(d.updated_at) last_uploaded_on,
SUM(CASE WHEN d.STATUS != 2
THEN 1 ELSE 0 END) docs_count
FROM
Users u
JOIN UserDocuments d
ON u.user_id = d.user_id
AND d.document_id IN ('1', '2', '3', '4', '10', '11')
WHERE
u.region_id = 1
AND u.city_id = 8
AND u.user_type = 1
AND u.user_suspended = 0
AND u.is_enabled = 1
AND u.verification_status = -1
GROUP BY
u.user_id
HAVING
SUM(CASE WHEN d.STATUS != 2
THEN 1 ELSE 0 END) < 6
ORDER BY
u.user_id ASC
LIMIT
1000, 100
Have indexes on your tables as
user ( region_id, city_id, user_type, user_suspended, is_enabled, verification_status )
UserDocuments ( user_id, document_id, status, updated_at )
You are adding extra querying from the user table to both the inner and outer joins which might be killing it. Having an index on your critical "WHERE" components by user will pre-filter that set out. Only from that will it join to the UserDocuments table. By having the outer query get the counts() at the top level query.
Since the users name, registered and phone dont change per user, applying max() to each respectively prevents the need of adding those columns to the group by clause.
The index on the documents table on only the columns needed to confirm status and document_id and when last updated. This prevents the engine from having to go to the raw data pages as it can get the qualifying details directly from the index parts saving you time too.
LIMIT without ORDER BY does not make sense.
An ORDER BY in a 'derived table' is ignored.
Will you really have thousands of result rows? (I see the "offset of 1000".)
Use JOIN instead of IN ( SELECT ... )
What indexes do you have? I suggest INDEX(region_id, city_id, user_id)
CASE WHEN d.STATUS != 2 THEN 1 ELSE 0 END can be shortened to d.status != 2.
How many different values of status are there? If only two, then flip the test to d.status = 1`.

Get total cost for a customer call

I have 2 tables = customer and their call history
Now I want to charge them based on call duration and that too for a specific month say Jan 2015.
Here is the criteria to calculate the cost for calls -
A) For incoming calls, the charge is 1 unit per second. Example if the duration is 250 seconds then cost is 250
B) For outgoing calls, for the first 2mins, the cost is fixed at 500 units. for subsequent seconds the cost is 2 units per second.
Example if the outgoing duration is 5mins then cost is 500 units + 2*3*60 units = 860 units
Below are the tables:
customer table with columns id, name, phone
history table with columns id, incoming_phone, outgoing_phone, duration, dialed_on (YYYY-MM-DD)
I have come up with below queries for my conditions:
For incoming call cost:
select c.name, c.phone, h.duration as cost
from customer c join history h on c.phone = h.incoming_phone
When I run the above query I did not get any syntax errors.
For outgoing call cost:
select c.name, c.phone, CASE
WHEN h.duration > 120 THEN 500 + 2*(h.duration-120)
ELSE 2*(h.duration-120)
END; as cost
from customer c join history h on c.phone = h.outgoing_phone
When I run the above query I got syntax error like "ERROR 1109 (42S02) at line 1: Unknown table 'c' in field list"
I want to join these two queries and get the total cost and display the fields as name, phone, cost
I still need to add a condition for a specific month to restrict data for Jan 2015, but got stuck with the approach.
The error is due to the extra semicolon ; after END.
Sounds like your final query will be this:
SELECT c.name,
c.phone,
SUM(CASE WHEN h.direction = 'in' THEN h.duration END) as IncomingCost,
SUM(CASE WHEN h.direction = 'out' AND h.duration > 120 THEN 500 + 2*(h.duration-120)
WHEN h.direction = 'out' AND h.duration <= 120 THEN 500
END) as OutgoingCost,
SUM(CASE WHEN h.direction = 'in' THEN h.duration END +
CASE WHEN h.direction = 'out' AND h.duration > 120 THEN 500 + 2*(h.duration-120)
WHEN h.direction = 'out' AND h.duration <= 120 THEN 500
END) as TotalCost
FROM customer c
JOIN (SELECT 'out' as directon, duration, dialed_on, outgoing_phone as phone
FROM history
WHERE YEAR(dialed_on) = 1995
AND MONTH(dialed_on) = 1
UNION ALL
SELECT 'in' as direction, duration, dialed_on, incoming_phone as phone
FROM history
WHERE YEAR(dialed_on) = 1995
AND MONTH(dialed_on) = 1
) h ON c.phone = h.phone
GROUP BY c.name,
c.phone

Combining two mysql query returns ok instead of rows

I have a query in which I return some information regarding an invoice, I take that invoice and compare it to another table "payment" to see if that invoice (fullamount -fullpaid) exists in the other table and if it does some function should not run in my backend code.
SELECT a.status, a.rf_reference, a.payor_orig_id , a.gross_amount + a.type_related_amount as fullamount,
a.gross_paid + a.type_related_paid as fullpaid
FROM invoice a
where a.type = 3 and
a.status in (10, 30) and
a.UPDATE_DT is null
having fullamount > fullpaid
order by a.ORIG_ID;
The above query returns
status| rf_reference | payor_orig_id | fullamount | fullpaid
30 RF123456 212 1000 200
So now I take the above information and pass it onto another query to see if a row field matches.
I pass it on like this
select *
from payment
where
payor_orig_id = 212 and
rf_reference = RF123456 and
payment_amount = (1000-200) and
status = 10 and
INSERT_DT BETWEEN DATE_SUB(NOW(), INTERVAL 185 DAY) AND NOW() and
UPDATE_DT IS NULL;
So now the above code will return a row by which basically I do not run my backend function.
Since this are two separate query I would like to combine them to one where I make sure that I add a having statement and check that ONLY rows are returned where there is no match between the invoice and payment table.
SELECT a.status, a.rf_reference, a.payor_orig_id , a.gross_amount + a.type_related_amount as fullamount,
a.gross_paid + a.type_related_paid as fullpaid,
(select b.payment_amount
from payment b
where
b.payor_orig_id = a.payor_orig_id and
b.rf_reference = a.rf_reference and
b.status = 10 and
b.INSERT_DT BETWEEN DATE_SUB(NOW(), INTERVAL 185 DAY) AND NOW() and
b.UPDATE_DT IS NULL) as payment_amount
FROM invoice a
where a.type = 3 and
a.status in (10, 30) and
a.UPDATE_DT is null
having fullamount > fullpaid and
(fullamount - fullpaid ) <> payment_amount
order by a.ORIG_ID;
The above query returns "OK" which is odd since I am not sure how to debug it.
Try seeing if the other table exists or not using NOT EXIST
SELECT a.* ,
a.gross_amount + a.type_related_amount as fullamount,
a.gross_paid + a.type_related_paid as fullpaid
FROM invoice a
where a.type = 3 and
a.status in (10, 30) and
a.UPDATE_DT is null and
NOT EXISTS ( select *
from payment
where
payor_orig_id = a.payor_orig_id and
rf_reference = a.rf_reference and
payment_amount = ((a.gross_amount + a.type_related_amount) - (a.gross_paid + a.type_related_paid)) and
status = 10 and
INSERT_DT BETWEEN DATE_SUB(NOW(), INTERVAL 185 DAY) AND NOW() and
UPDATE_DT IS NULL )
having fullamount > fullpaid
order by a.ORIG_ID;

SQL - Using sum but optionally using default value for row

Given tables
asset
col - id
date_sequence
col - date
daily_history
col - date
col - num_error_seconds
col - asset_id
historical_event
col - start_date
col - end_date
col - asset_id
I'm trying to count up all the daily num_error_seconds for all assets in a given time range in order to display "Percentage NOT in error" by day. The catch is if there is a historical_event involving an asset that has an end_date beyond the sql query range, then daily_history should be ignored and a default value of 86400 seconds (one day of error_seconds) should be used for that asset
The query I have that does not use the historical_event is:
select ds.date,
IF(count(dh.time) = 0,
100,
100 - (100*sum(dh.num_error_seconds) / (86400 * count(*)))
) percent
from date_sequence ds
join asset a
left join daily_history dh on dh.date = ds.date and dh.asset_id=a.asset_id
where ds.date >= in_start_time and ds.date <= in_end_time
group by ds.thedate;
To build on this is beyond my SQL knowledge. Because of the aggregate function, I cannot simply inject 86400 seconds for each asset that is associated with an event that has an end_date beyond the in_end_time.
Sample Data
Asset
1
2
Date Sequence
2013-09-01
2013-09-02
2013-09-03
2013-09-04
Daily History
2013-09-01, 1400, 1
2013-09-02, 1501, 1
2013-09-03, 1420, 1
2013-09-04, 0, 1
2013-09-01, 10000, 2
2013-09-02, 20000, 2
2013-09-03, 30000, 2
2013-09-04, 40000, 2
Historical Event
start_date, end_date, asset_id
2013-09-03 12:01:03, 2014-01-01 00:00:00, 1
What I would expect to see with this sample data is a % of time these assets are in error
2013-09-01 => 100 - (100*(1400 + 10000))/(86400*2)
2013-09-02 => 100 - (100*(1501 + 20000))/(86400*2)
2013-09-03 => 100 - (100*(1420 + 30000))/(86400*2)
2013-09-04 => 100 - (100*(0 + 40000))/(86400*2)
Except: There was a historical event which should take precendence. It happened on 9/3 and is open-ended (has an end date in the future, so the calculations would change to:
2013-09-01 => 100 - (100*(1400 + 10000))/(86400*2)
2013-09-02 => 100 - (100*(1501 + 20000))/(86400*2)
2013-09-03 => 100 - (100*(86400 + 30000))/(86400*2)
2013-09-04 => 100 - (100*(86400 + 40000))/(86400*2)
Asset 1's num_error_seconds gets overwritten with a full day of error seconds if there is a historical event that has a start_date before 'in_end_time' and an end_time after the in_end_time
Can this be accomplished in one query? Or do I need to stage data with an initial query?
I think you're after something like this:
Select
ds.date,
100 - 100 * Sum(
case
when he.asset_id is not null then 86400 -- have a historical_event
when dh.num_error_seconds is null then 0 -- no daily_history record
else dh.num_error_seconds
end
) / 86400 / count(a.id) as percent -- need to divide by number of assets
From
date_sequence ds
cross join
asset a
left outer join
daily_history dh
on a.id = dh.asset_id and
ds.date = dh.date
left outer join (
select distinct -- avoid counting multiple he records
asset_id
from
historical_event he
Where
he.end_date > in_end_time
) he
on a.id = he.asset_id
Where
ds.date >= in_start_time and
ds.date <= in_end_time -- I'd prefer < here
Group By
ds.date
Example Fiddle

Running WHILE or CURSOR or both in SQL Server 2008

I am trying to run a loop of some sort in SQL Server 2008/TSQL and I am unsure whether this should be a WHILE or CURSOR or both. The end result is I am trying to loop through a list of user logins, then determine the unique users, then run a loop to determine how many visits it took for the user to be on the site for 5 minutes , broken out by the channel.
Table: LoginHistory
UserID Channel DateTime DurationInSeconds
1 Website 1/1/2013 1:13PM 170
2 Mobile 1/1/2013 2:10PM 60
3 Website 1/1/2013 3:10PM 180
4 Website 1/1/2013 3:20PM 280
5 Website 1/1/2013 5:00PM 60
1 Website 1/1/2013 5:05PM 500
3 Website 1/1/2013 5:45PM 120
1 Mobile 1/1/2013 6:00PM 30
2 Mobile 1/1/2013 6:10PM 90
5 Mobile 1/1/2013 7:30PM 400
3 Website 1/1/2013 8:00PM 30
1 Mobile 1/1/2013 9:30PM 200
SQL Fiddle to this schema
I can select the unique users into a new table like so:
SELECT UserID
INTO #Users
FROM LoginHistory
GROUP BY UserID
Now, the functionality I'm trying to develop is to loop over these unique UserIDs, order the logins by DateTime, then count the number of logins needed to get to 300 seconds.
The result set I would hope to get to would look something like this:
UserID TotalLogins WebsiteLogins MobileLogins Loginsneededto5Min
1 4 2 2 2
2 2 2 0 0
3 3 3 0 3
4 1 1 0 0
5 2 1 1 2
If I were performing this in another language, I would think it would something like this: (And apologies because this is not complete, just where I think I am going)
for (i in #Users):
TotalLogins = Count(*),
WebsiteLogins = Count(*) WHERE Channel = 'Website',
MobileLogins = Count(*) WHERE Channel = 'Mobile',
for (i in LoginHistory):
if Duration < 300:
count(NumLogins) + 1
** Ok - I'm laughing at myself the way I combined multiple different languages/syntaxes, but this is how I am thinking about solving this **
Thoughts on a good way to accomplish this? My preference is to use a loop so I can continue to write if/then logic into the code.
Ok, this is one of those times where a CURSOR would probably outperform a set based solution. Sadly, I'm not very good with cursors, so I can give you a set base solution for you to try:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [DateTime]) RN
FROM UserLogins
), CTE2 AS
(
SELECT *, 1 RecursionLevel
FROM CTE
WHERE RN = 1
UNION ALL
SELECT B.UserID, B.Channel, B.[DateTime],
A.DurationInSeconds+B.DurationInSeconds,
B.RN, RecursionLevel+1
FROM CTE2 A
INNER JOIN CTE B
ON A.UserID = B.UserID AND A.RN = B.RN - 1
)
SELECT A.UserID,
COUNT(*) TotalLogins,
SUM(CASE WHEN Channel = 'Website' THEN 1 ELSE 0 END) WebsiteLogins,
SUM(CASE WHEN Channel = 'Mobile' THEN 1 ELSE 0 END) MobileLogins,
ISNULL(MIN(RecursionLevel),0) LoginsNeedeto5Min
FROM UserLogins A
LEFT JOIN ( SELECT UserID, MIN(RecursionLevel) RecursionLevel
FROM CTE2
WHERE DurationInSeconds > 300
GROUP BY UserID) B
ON A.UserID = B.UserID
GROUP BY A.UserID
A slightly different piece-wise approach. A minor difference is that the recursive portion terminates when it reaches 300 seconds for each user rather than summing all of the available logins.
An index on UserId/StartTime should improve performance on larger datasets.
declare #Logins as Table ( UserId Int, Channel VarChar(10), StartTime DateTime, DurationInSeconds Int )
insert into #Logins ( UserId, Channel, StartTime, DurationInSeconds ) values
( 1, 'Website', '1/1/2013 1:13PM', 170 ),
( 2, 'Mobile', '1/1/2013 2:10PM', 60 ),
( 3, 'Website', '1/1/2013 3:10PM', 180 ),
( 4, 'Website', '1/1/2013 3:20PM', 280 ),
( 5, 'Website', '1/1/2013 5:00PM', 60 ),
( 1, 'Website', '1/1/2013 5:05PM', 500 ),
( 3, 'Website', '1/1/2013 5:45PM', 120 ),
( 1, 'Mobile', '1/1/2013 6:00PM', 30 ),
( 2, 'Mobile', '1/1/2013 6:10PM', 90 ),
( 5, 'Mobile', '1/1/2013 7:30PM', 400 ),
( 3, 'Website', '1/1/2013 8:00PM', 30 ),
( 1, 'Mobile', '1/1/2013 9:30PM', 200 )
select * from #Logins
; with MostRecentLogins as (
-- Logins with flags for channel and sequenced by StartTime (ascending) for each UserId .
select UserId, Channel, StartTime, DurationInSeconds,
case when Channel = 'Website' then 1 else 0 end as WebsiteLogin,
case when Channel = 'Mobile' then 1 else 0 end as MobileLogin,
Row_Number() over ( partition by UserId order by StartTime ) as Seq
from #Logins ),
CumulativeDuration as (
-- Start with the first login for each UserId .
select UserId, Seq, DurationInSeconds as CumulativeDurationInSeconds
from MostRecentLogins
where Seq = 1
union all
-- Accumulate additional logins for each UserId until the running total exceeds 300 or they run out of logins.
select CD.UserId, MRL.Seq, CD.CumulativeDurationInSeconds + MRL.DurationInSeconds
from CumulativeDuration as CD inner join
MostRecentLogins as MRL on MRL.UserId = CD.UserId and MRL.Seq = CD.Seq + 1 and CD.CumulativeDurationInSeconds < 300 )
-- Display the summary.
select UserId, Sum( WebsiteLogin + MobileLogin ) as TotalLogins,
Sum( WebsiteLogin ) as WebsiteLogins, Sum( MobileLogin ) as MobileLogins,
( select Max( Seq ) from CumulativeDuration where UserId = LT3.UserId and CumulativeDurationInSeconds >= 300 ) as LoginsNeededTo5Min
from MostRecentLogins as LT3
group by UserId
order by UserId
Note that your sample results seem to have an error. UserId 3 reaches 300 seconds in two calls: 180 + 120. Your example shows three calls.