ColdFusion Query too slow - mysql

I have queries inside a cfloop that makes the process very slow. Is there a way to make this query faster?
<cfquery name="GetCheckRegister" datasource="myDB">
SELECT * FROM CheckRegister, ExpenseType
Where PropertyID=10
and ExpenseType.ExpenseTypeID=CheckRegister.ExpenseTypeID
</cfquery>
<CFOUTPUT query=GetCheckRegister>
<cfquery name="GetVendorName" datasource="myDB"> SELECT * FROM Vendors WHERE VendorID=#VendorID#</cfquery>
<!--- I use the vendor name here --->
<cfset local.CreditDate = "" />
<cfquery name="getTenantTransactionDateFrom" dataSource="myDB">
Select TenantTransactionDate as fromDate From TenantTransactions
Where CheckRegisterID = #CheckRegisterID#
Order By TenantTransactionDate Limit 1
</cfquery>
<cfquery name="getTenantTransactionDateTo" dataSource="myDB">
Select TenantTransactionDate as ToDate From TenantTransactions
Where CheckRegisterID = #CheckRegisterID#
Order By TenantTransactionDate desc Limit 1
</cfquery>
<cfif getTenantTransactionDateFrom.fromDate neq "" AND getTenantTransactionDateTo.ToDate neq "">
<cfif getTenantTransactionDateFrom.fromDate eq getTenantTransactionDateTo.ToDate>
<cfset local.CreditDate = DateFormat(getTenantTransactionDateFrom.fromDate, 'mm/dd/yyyy') />
<cfelse>
<cfset local.CreditDate = DateFormat(getTenantTransactionDateFrom.fromDate, 'mm/dd/yyyy') & " - " & DateFormat(getTenantTransactionDateTo.ToDate, 'mm/dd/yyyy') />
</cfif>
</cfif>
<!--- I use the local.CreditDate here --->
<!--- Here goes a table with the data --->
</CFOUTPUT>
cfoutput works like a loop.

As others have said, you should get rid of the loop and use joins. Looking at your inner loop, the code retrieves the earliest and latest date for each CheckRegisterID. Instead of using LIMIT, use aggregate functions like MIN and MAX and GROUP BY CheckRegisterID. Then wrap that result in a derived query so you can join the results back to CheckRegister ON id.
Some of the columns in the original query aren't scoped, so I took a few guesses. There's room for improvement, but something like is enough to get you started.
-- select only needed columns
SELECT cr.CheckRegisterID, ... other columns
FROM CheckRegister cr
INNER JOIN ExpenseType ex ON ex.ExpenseTypeID=cr.ExpenseTypeID
INNER JOIN Vendors v ON v.VendorID = cr.VendorID
LEFT JOIN
(
SELECT CheckRegisterID
, MIN(TenantTransactionDate) AS MinDate
, MAX(TenantTransactionDate) AS MaxDate
FROM TenantTransactions
GROUP BY CheckRegisterID
) tt ON tt.CheckRegisterID = cr.CheckRegisterID
WHERE cr.PropertyID = 10
I'd highly recommend reading up on JOIN's as they're critical to any web application, IMO.

You should get all of your data in one query, then work with that data to output what you want. Multiple connections to a database almost always be more resource-intensive than getting the data in one trip and working with it. To get your results:
SQL Fiddle
Initial Schema Setup:
CREATE TABLE CheckRegister ( checkRegisterID int, PropertyID int, VendorID int, ExpenseTypeID int ) ;
CREATE TABLE ExpenseType ( ExpenseTypeID int ) ;
CREATE TABLE Vendors ( VendorID int ) ;
CREATE TABLE TenantTransactions ( checkRegisterID int, TenantTransactionDate date, note varchar(20) );
INSERT INTO CheckRegister ( checkRegisterID, PropertyID, VendorID, ExpenseTypeID )
VALUES (1,10,1,1),(1,10,1,1),(1,10,2,1),(1,10,1,2),(1,5,1,1),(2,10,1,1),(2,5,1,1)
;
INSERT INTO ExpenseType ( ExpenseTypeID ) VALUES (1), (2) ;
INSERT INTO Vendors ( VendorID ) VALUES (1), (2) ;
INSERT INTO TenantTransactions ( checkRegisterID, TenantTransactionDate, note )
VALUES
(1,'2018-01-01','start')
, (1,'2018-01-02','another')
, (1,'2018-01-03','another')
, (1,'2018-01-04','stop')
, (2,'2017-01-01','start')
, (2,'2017-01-02','another')
, (2,'2017-01-03','another')
, (2,'2017-01-04','stop')
;
Main Query:
SELECT cr.*
, max(tt.TenantTransactionDate) AS startDate
, min(tt.TenantTransactionDate) AS endDate
FROM CheckRegister cr
INNER JOIN ExpenseType et ON cr.ExpenseTypeID = et.ExpenseTypeID
INNER JOIN Vendors v ON cr.vendorID = v.VendorID
LEFT OUTER JOIN TenantTransactions tt ON cr.checkRegisterID = tt.CheckRegisterID
WHERE cr.PropertyID = 10
GROUP BY cr.CheckRegisterID, cr.PropertyID, cr.VendorID, cr.ExpenseTypeID
Results:
| checkRegisterID | PropertyID | VendorID | ExpenseTypeID | startDate | endDate |
|-----------------|------------|----------|---------------|------------|------------|
| 1 | 10 | 1 | 1 | 2018-01-04 | 2018-01-01 |
| 1 | 10 | 1 | 2 | 2018-01-04 | 2018-01-01 |
| 1 | 10 | 2 | 1 | 2018-01-04 | 2018-01-01 |
| 2 | 10 | 1 | 1 | 2017-01-04 | 2017-01-01 |
I only added 2 check registers, but CheckRegisterID 1 has 2 vendors and 2 Expense Types for Vendor 1. This will look like repeated data in your query. If your data isn't set up that way, you won't have to worry about it in the final query.
Use proper JOIN syntax to get the related data you need. Then you can aggregate that data to get the fromDate and toDate. If your data is more complex, you may need to look at Window Functions. https://dev.mysql.com/doc/refman/8.0/en/window-functions.html
I don't know what your final output looks like, but the above query gives you all of the query data in one pass. And each row of that data should give you what you need to output, so now you've only got one query to loop over.

It has been a long time since I did any ColdFusion development but a common rule of thumb would be to not call queries within a loop. Depending on what you are doing, loops can be considered an RBAR (row by agonizing row) operation.
You are essentially defining one query and looping over each record. For each record, you are doing three additional queries aka three additional database network calls per record. The way I see it, you have a couple options:
Rewrite your first query to already include the data you need within each
record check.
Leave your first query the way it is and create functionality that provides more information when the user interacts with the record and do it asynchronously. Something like a "Show Credit Date" link which goes out and gets the data on demand.
Combine the queries in your loop to be one query instead of the two getTenantTransaction... and see if performance improves. This reduces the RBAR database calls from three to two.

You always want to avoid having queries in a loop. Whenever you query the database, you have roundtrip (from server to database and back from database to server) which is slow by nature.
A general approach is to bulk data by querying all required information with as few statements as possible. Joining everything in a single statement would be ideal, but this obviously depends on your table schemes. If you cannot solve it using SQL only, you can transform your queries like this:
GetCheckRegister...
(no loop)
<cfquery name="GetVendorName" datasource="rent">
SELECT * FROM Vendors WHERE VendorID IN (#valueList(GetCheckRegister.VendorID)#)
</cfquery>
<cfquery name="getTenantTransactionDateFrom" dataSource="rent">
Select TenantTransactionDate as fromDate From TenantTransactions
Where CheckRegisterID IN (#valueList(GetCheckRegister.CheckRegisterID)#)
</cfquery>
etc.
valueList(query.column) returns a comma delimited list of the specified column values. This list is then used with MySQL's IN (list) selector to retrieve all records that belong to all the listed values.
Now you would only have a single query for each statement in your loop (4 queries total, instead of 4 times number of records in GetCheckRegister). But all records are clumped together, so you need to match them accordingly. To do this we can utilize ColdFusion's Query of Queries (QoQ), which allows you to query on already retrieved data. Since the retrieved data is in memory, accessing them is quick.
GetCheckRegister, GetVendorName, getTenantTransactionDateFrom, getTenantTransactionDateTo etc.
<CFOUTPUT query="GetCheckRegister">
<!--- query of queries --->
<cfquery name="GetVendorNameSingle" dbType="query">
SELECT * FROM [GetVendorName] WHERE VendorID = #GetCheckRegister.VendorID#
</cfquery>
etc.
</CFOUTPUT>
You basically moved the real queries out of the loop and instead query the result of the real queries in your loop using QoQ.
Regardless of this, make sure your real queries are fast by profiling them in MySQL. Use indices!

Using the main query and the loop to process the data could be faster if:
Using SELECT with only specific fields you need, to avoid fetching so many columns (instead of SELECT *), unless you are using all the fields:
SELECT VendorID, CheckRegisterId, ... FROM CheckRegister, ExpenseType ...
Using less subqueries in the loop, trying to join the table to the main query. For example, using the Vendors table in the main query (if it could be posible to join this table)
SELECT VendorID, CheckRegisterId, VendorName ... FROM CheckRegister, ExpenseType, Vendors ...
Finally, you can estimate the time of the process and detect the performance problem:
ROWS = Number of rows of the result in the main query
TIME_V = Time (ms) to get the result of GetVendorName using a valid VendorId
TIME_TD1 = Time (ms) to get the result of getTenantTransactionDateFrom using a valid CheckRegisterID
TIME_TD2 = Time (ms) to get the result of getTenantTransactionDateTo using a valid CheckRegisterID
Then, you can calculate the resulting time using TOTAL = ROWS * (TIME_V+ TIME_TD1 + TIME_TD2).
For example, if ROWS=10000 , TIME_V = 30, TIME_TD1 = 15, TIME_TD2 = 15 : RESULT = 10000 * (30 + 15 + 15) = 10000 * 60 = 600000 (ms) = 600 (sec) = 10 min
So, for 10000 rows, one milisecond of the loop results in 10 seconds added to the process.
When you have many resulting rows for the main query, you need to minimize the query time of each element in the loop. Each milisecond affects in the performance of the loop. So you need to make sure there are the right indexes for each field filtered for each query in the loop.

Related

How to sum daily resetting data using MySQL

I am attempting to plot data cumulatively from a MySQL table which logs a value, resetting to 0 every day. After selecting the values using select * from table where DateTime BETWEEN DateA AND DateB, the data looks like this: current data. I would like the output to look like this: preferred data, ignoring the daily resets.
As I am a novice in SQL I was unable to find a solution to this. I did, however, obtain the correct output in Matlab using a for loop:
output = data;
for k=1:(size(data, 1)-1)
% check if next value is smaller than current
if data(k+1)<data(k)
% add current value to all subsequent values
output = output + (1:size(data, 1)>k)'.*input(k);
end
end
I would like the final product to connect to a web page, so I am curious if it would be possible obtain a similar result using only SQL. While I have tried using SUM(), I have only been able to sum all values, but I need to add the last value each day to all subsequent values.
Using CTE and comparing dates, you can sum all values each date.
Let's say that table1 below is defined.
create table table1 (col_date date, col_value int);
insert into table1 values
('2020-07-15',1000),
('2020-07-15',2000),
('2020-07-16',1000),
('2020-07-16',3000),
('2020-07-16',4000),
('2020-07-17',1000),
('2020-07-18',2000),
('2020-07-19',1000),
('2020-07-19',1000),
('2020-07-19',2000),
('2020-07-19',3000),
('2020-07-20',4000),
('2020-07-20',5000),
('2020-07-21',6000)
;
In this case, the query looks like this:
with cte1 as (
select col_date, sum(col_value) as col_sum from table1
where col_date between '2020-07-16' and '2020-07-20'
group by col_date
)
select a.col_date, max(a.col_sum), sum(b.col_sum)
from cte1 a inner join cte1 b on a.col_date >= b.col_date
group by a.col_date;
The output is below:
col_date |max(a.col_sum) |sum(b.col_sum)
2020-07-16 |8000 | 8000
2020-07-17 |1000 | 9000
2020-07-18 |2000 |11000
2020-07-19 |7000 |18000
2020-07-20 |9000 |27000
The column of max() is just for reference.

Mysql: I want to compare the results of two queries and return the results

In MS Access I have the following query and I want to duplicate it in MysQl
SELECT New_Date_Sev54.*
FROM New_Date_Sev54 LEFT JOIN Old_Date_Sev54 ON New_Date_Sev54.[Expr1] = Old_Date_Sev54.[Expr1]
WHERE (((Old_Date_Sev54.Expr1) Is Null));
New_date query :
SELECT perimeter.*, perimeter.IP, perimeter.QID, perimeter.Severity, [IP] & [QID] AS Expr1
FROM perimeter
WHERE (((perimeter.QID)<>38628 And (perimeter.QID)<>38675) AND ((perimeter.Severity)=5) AND ((perimeter.Date)=22118)) OR (((perimeter.Severity)=4));
Old Date Query:
SELECT perimeter.*, perimeter.IP, perimeter.QID, perimeter.Severity, [IP] & [QID] AS Expr1
FROM perimeter
WHERE (((perimeter.QID)<>38628 And (perimeter.QID)<>38675) AND ((perimeter.Severity)=5) AND ((perimeter.Date)=21918)) OR (((perimeter.Severity)=4));
In the ACCESS query, I basically take all the results with the new date and compare them against the results of the old date (week prior) and return anything that did not exist the week prior.
The database is used to quickly identify new vulnerabilities that exist in the perimeter. And is shaped like this
Date | IP| VulnID | VulnName | Severity | Threat | Resolution
What I have been trying in mysql is using the "NOT IN" comparison of two select statements. However, it is not working.
I want to know all the new vulnerabilities that have a severity of 4 or 5 and that do not have the Vuln id of 32628
Thanks
Put each query into temp tables:
CREATE TEMPORARY TABLE newVulns AS ([new date query])
CREATE TEMPORARY TABLE oldVulns AS ([old date query])
where [new date query] and [old date query] are your select statements.
Then
SELECT * FROM newVulns n
LEFT JOIN oldVulns o
ON n.VulnID = o.VulnID
WHERE o.VulnID IS NULL
AND n.VulnID != 32628
AND n.Severity NOT IN (4, 5)
I believe that should do it.
Temp table creation info found in the manual and a neat visual representation of joins can be found here. I find myself looking at those all the time.

query optimization for mysql

I have the following query which takes about 28 seconds on my machine. I would like to optimize it and know if there is any way to make it faster by creating some indexes.
select rr1.person_id as person_id, rr1.t1_value, rr2.t0_value
from (select r1.person_id, avg(r1.avg_normalized_value1) as t1_value
from (select ma1.person_id, mn1.store_name, avg(mn1.normalized_value) as avg_normalized_value1
from matrix_report1 ma1, matrix_normalized_notes mn1
where ma1.final_value = 1
and (mn1.normalized_value != 0.2
and mn1.normalized_value != 0.0 )
and ma1.user_id = mn1.user_id
and ma1.request_id = mn1.request_id
and ma1.request_id = 4 group by ma1.person_id, mn1.store_name) r1
group by r1.person_id) rr1
,(select r2.person_id, avg(r2.avg_normalized_value) as t0_value
from (select ma.person_id, mn.store_name, avg(mn.normalized_value) as avg_normalized_value
from matrix_report1 ma, matrix_normalized_notes mn
where ma.final_value = 0 and (mn.normalized_value != 0.2 and mn.normalized_value != 0.0 )
and ma.user_id = mn.user_id
and ma.request_id = mn.request_id
and ma.request_id = 4
group by ma.person_id, mn.store_name) r2
group by r2.person_id) rr2
where rr1.person_id = rr2.person_id
Basically, it aggregates data depending on the request_id and final_value (0 or 1). Is there a way to simplify it for optimization? And it would be nice to know which columns should be indexed. I created an index on user_id and request_id, but it doesn't help much.
There are about 4907424 rows on matrix_report1 and 335740 rows on matrix_normalized_notes table. These tables will grow as we have more requests.
First, the others are right about knowing better how to format your samples. Also, trying to explain in plain language what you are trying to do is also a benefit. With sample data and sample result expectations is even better.
However, that said, I think it can be significantly simplified. Your queries are almost completely identical with the exception of the one field of "final_value" = 1 or 0 respectively. Since each query will result in 1 record per "person_id", you can just do the average based on a CASE/WHEN AND remove the rest.
To help optimize the query, your matrix_report1 table should have an index on ( request_id, final_value, user_id ). Your matrix_normalized_notes table should have an index on ( request_id, user_id, store_name, normalized_value ).
Since your outer query is doing the average based on an per stores averages, you do need to keep it nested. The following should help.
SELECT
r1.person_id,
avg(r1.ANV1) as t1_value,
avg(r1.ANV0) as t0_value
from
( select
ma1.person_id,
mn1.store_name,
avg( case when ma1.final_value = 1
then mn1.normalized_value end ) as ANV1,
avg( case when ma1.final_value = 0
then mn1.normalized_value end ) as ANV0
from
matrix_report1 ma1
JOIN matrix_normalized_notes mn1
ON ma1.request_id = mn1.request_id
AND ma1.user_id = mn1.user_id
AND NOT mn1.normalized_value in ( 0.0, 0.2 )
where
ma1.request_id = 4
AND ma1.final_Value in ( 0, 1 )
group by
ma1.person_id,
mn1.store_name) r1
group by
r1.person_id
Notice the inner query is pulling all transactions for the final value as either a zero OR one. But then, the AVG is based on a case/when of the respective value for the normalized value. When the condition is NOT the 1 or 0 respectively, the result is NULL and is thus not considered when the average is computed.
So at this point, it is grouped on a per-person basis already with each store and Avg1 and Avg0 already set. Now, roll these values up directly per person regardless of the store. Again, NULL values should not be considered as part of the average computation. So, if Store "A" doesn't have a value in the Avg1, it should not skew the results. Similarly if Store "B" doesnt have a value in Avg0 result.

MySQL find minimum and maximum date associated with a record in another table

I am trying to write a query to find the number of miles on a bicycle fork. This number is calculated by taking the distance_reading associated with the date that the fork was installed on (the minimum reading_date on or after the Bicycle_Fork.start_date associated with the Bicycle_Fork record) and subtracting that from the date that the fork was removed (the maximum reading_date on or before the Bicycle_Fork.end_date or, if that is null, the reading closest to today's date). I've managed to restrict the range of odometer_readings to the appropriate ones, but I cannot figure out how to find the minimum and maximum date for each odometer that represents when the fork was installed. It was easy when I only had to look at records matching the start_date or end_date, but the user is not required to enter a new odometer reading for each date that a part is changed. I've been working on this query for several hours now, and I can't find a way to use MIN() that doesn't just take the single smallest date out of all of the results.
Question: How can I find the minimum reading_date and the maximum reading_date associated with each odometer_id while maintaining the restrictions created by my WHERE clause?
If this is not possible, I plan to store the values retrieved from the first query in an array in PHP and deal with it from there, but I would like to be able to find a solution solely in MySQL.
Here is an SQL fiddle with the database schema and the current state of the query: http://sqlfiddle.com/#!2/015642/1
SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id, Bicycle_Fork.fork_id
FROM Bicycle_Fork
INNER JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id
AND Odometers.bicycle_id = Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE (OdometerReadings.reading_date >= Bicycle_Fork.start_date) AND
((Bicycle_Fork.end_date IS NOT NULL AND OdometerReadings.reading_date<= Bicycle_Fork.end_date) XOR (Bicycle_Fork.end_date IS NULL AND OdometerReadings.reading_date <= CURRENT_DATE()))
This is the old query that didn't take into account the possibility of the database lacking a record that corresponded with the start_date or end_date:
SELECT MaxReadingOdo.distance_reading, MinReadingOdo.distance_reading
FROM
(SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id
FROM Bicycle_Fork
LEFT JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id
AND Odometers.bicycle_id = Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE Bicycle_Fork.start_date = OdometerReadings.reading_date) AS MinReadingOdo
INNER JOIN
(SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id
FROM Bicycle_Fork
LEFT JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id AND Odometers.bicycle_id
= Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE Bicycle_Fork.end_date = OdometerReadings.reading_date) AS
MaxReadingOdo
ON MinReadingOdo.odometer_id = MaxReadingOdo.odometer_id
I'm trying to get the following to return from the SQL schema:
I will eventually sum these into one number, but I've been working with them separately to make it easier to check the values.
min_distance_reading | max_distance_reading | odometer_id
=============================================================
75.5 | 2580.5 | 1
510.5 | 4078.5 | 2
17.5 | 78.5 | 3
I don't understand the final part of the puzzle, but this seems close...
SELECT MIN(ro.distance_reading) min_val
, MAX(ro.distance_reading) max_val
, ro.odometer_id
FROM OdometerReadings ro
JOIN odometers o
ON o.odometer_id = ro.odometer_id
JOIN Bicycle_Fork bf
ON bf.bicycle_id = o.bicycle_id
AND bf.start_date <= ro.reading_date
GROUP
BY ro.odometer_id;
http://sqlfiddle.com/#!2/015642/8

Combine 2 SELECTs into one SELECT in my RAILS application

I have a table called ORDEREXECUTIONS that stores all orders that have been executed. It's a multi currency application hence the table has two columns CURRENCY1_ID and CURRENCY2_ID.
To get a list of all orders for a specific currency pair (e.g. EUR/USD) I need to lines to get the totals:
v = Orderexecution.where("is_master=1 and currency1_id=? and currency2_id=? and created_at>=?",c1,c2,Time.now()-24.hours).sum("quantity").to_d
v+= Orderexecution.where("is_master=1 and currency1_id=? and currency2_id=? and created_at>=?",c2,c1,Time.now()-24.hours).sum("unitprice*quantity").to_d
Note that my SUM() formula is different depending on the the sequence of the currencies.
e.g. If I want the total ordered quantities of the currency pair USD it then executes (assuming currency ID for USD is 1 and EUR is 2.
v = Orderexecution.where("is_master=1 and currency1_id=? and currency2_id=? and created_at>=?",1,2,Time.now()-24.hours).sum("quantity").to_d
v+= Orderexecution.where("is_master=1 and currency1_id=? and currency2_id=? and created_at>=?",2,1,Time.now()-24.hours).sum("unitprice*quantity").to_d
How do I write this in RoR so that it triggers only one single SQL statement to MySQL?
I guess this would do:
v = Orderexecution.where("is_master=1
and ( (currency1_id, currency2_id) = (?,?)
or (currency1_id, currency2_id) = (?,?)
)
and created_at>=?"
,c1, c2, c2, c1, Time.now()-24.hours
)
.sum("CASE WHEN currency1_id=?
THEN quantity
ELSE unitprice*quantity
END"
,c1
)
.to_d
So you could do
SELECT SUM(IF(currency1_id = 1 and currency2_id = 2, quantity,0)) as quantity,
SUM(IF(currency2_id = 1 and currency1_id = 2, unitprice * quantity,0)) as unitprice _quantity from order_expressions
WHERE created_at > ? and (currency1_id = 1 or currency1_id = 2)
If you plug that into find_by_sql you should get one object back, with 2 attributes, quantity and unitprice_quantity (they won't show up in the output of inspect in the console but they should be there if you inspect the attributes hash or call the accessor methods directly)
But depending on your indexes that might actually be slower because it might not be able to use indexes as efficiently. The seemly redundant condition on currency1_id means that this would be able to use an index on [currency1_id, created_at]. Do benchmark before and after - sometimes 2 fast queries are better than one slow one!