Reducing number of calls to Database - mysql

I am using the following approach to make db calls,
for record in records:
num = "'"+str(record['Number'])+"'"
id = "'"+str(record['Id'])+"'"
query = """select col2_text,col3_text from table where id= {} and num = {} and is_active = 'Y';""".format(id,num)
Since it is iteration where total number of DB calls is equal to the number of records. I want to optimize my call and make minimum number of DB calls, ideally in a single call.

You can reduce the number of DB calls to a single one. You might want to have a look at the SQL-IN operator.
You could do the following:
values = ""
for record in records:
num = "'"+str(record['Number'])+"'"
id = "'"+str(record['Id'])+"'"
values += "({},{}),".format(num, id)
values = values[:-1]
query = """select col2_text,col3_text from table where (id, num) in ({}) and is_active = 'Y';""".format(values)

Related

generate queries for each key in pyspark data frame

I have a data frame in pyspark like below
df = spark.createDataFrame(
[
('2021-10-01','A',25),
('2021-10-02','B',24),
('2021-10-03','C',20),
('2021-10-04','D',21),
('2021-10-05','E',20),
('2021-10-06','F',22),
('2021-10-07','G',23),
('2021-10-08','H',24)],("RUN_DATE", "NAME", "VALUE"))
Now using this data frame I want to update a table in MySql
# query to run should be similar to this
update_query = "UPDATE DB.TABLE SET DATE = '2021-10-01', VALUE = 25 WHERE NAME = 'A'"
# mysql_conn is a function which I use to connect to `MySql` from `pyspark` and run queries
# Invoking the function
mysql_conn(host, user_name, password, update_query)
Now when I invoke the mysql_conn function by passing parameters the query runs successfully and the record gets updated in the MySql table.
Now I want to run the update statement for all the records in the data frame.
For each NAME it has to pick the RUN_DATE and VALUE and replace in update_query and trigger the mysql_conn.
I think we need to a for loop but not sure how to proceed.
Instead of iterating through the dataframe with a for loop, it would be better to distribute the workload across each partitions using foreachPartition. Moreover, since you are writing a custom query instead of executing one query for each query, it would be more efficient to execute a batch operation to reduce the round trips, latency and concurrent connections. Eg
def update_db(rows):
temp_table_query=""
for row in rows:
if len(temp_table_query) > 0:
temp_table_query = temp_table_query + " UNION ALL "
temp_table_query = temp_table_query + " SELECT '%s' as RUNDATE, '%s' as NAME, %d as VALUE " % (row.RUN_DATE,row.NAME,row.VALUE)
update_query="""
UPDATE DBTABLE
INNER JOIN (
%s
) new_records ON DBTABLE.NAME = new_records.NAME
SET
DBTABLE.DATE = new_records.RUNDATE,
DBTABLE.VALUE = new_records.VALUE
""" % (temp_table_query)
mysql_conn(host, user_name, password, update_query)
df.foreachPartition(update_db)
View Demo on how the UPDATE query works
Let me know if this works for you.

Iterating over records in one table and adding them to another with calculations

I'm currently using PHP and MySQL to retrieve a set of 100,000 records in a table, then iterate over each of those records to do some calculations and then insert the result into another table. I'm wondering if I'd be able to do this in pure SQL and make the query run faster.
Here's what I"m currently using:
$stmt= $pdo->query("
SELECT Well_Permit_Num
, Gas_Quantity
, Gas_Production_Days
FROM DEP_OG_Production_Import
ORDER
BY id ASC
");
foreach ($stmt as $row) {
$data = array('well_id' => $row['Well_Permit_Num'],
'gas_quantity' => $row['Gas_Quantity'],
'gas_days' => $row['Gas_Production_Days'],
'gas_average' => ($row['Gas_Production_Days']);
$updateTot = $pdo->prepare("INSERT INTO DEP_OG_TOTALS
(Well_Permit_Num,
Total_Gas,
Total_Gas_Days,
Total_Gas_Avg)
VALUES (:well_id,
:gas_quantity,
:gas_days,
:gas_average)
ON DUPLICATE KEY UPDATE
Total_Gas = Total_Gas + VALUES(Total_Gas),
Total_Gas_Days = Total_Gas_Days + VALUES(Total_Gas_Days),
Total_Gas_Avg =(Total_Gas + VALUES(Total_Gas)) / (Total_Gas_Days + VALUES(Total_Gas_Days))");
}
I'd like to see if this can be done in pure MySQL instead of having to use PHP just for the fact of using it to hold the variables.
My Result should be 1 record that is a running total for each Well. The source table may house 60-70 records for the same well, but over a few thousand different Wells.
It's a constant import process that has to be run, so it's not like there is a final table which you can just do SUM(Gas_Quantity)... etc.. on
As commented by Uueerdo, you seem to need an INSERT ... SELECT query. The role of such query is to INSERT insert the resultset returned by an inner SELECT. The inner select is an aggregate query that computes the total sum of gas and days for each well.
INSERT INTO DEP_OG_TOTALS (Well_Permit_Num, Total_Gas, Total_Gas_Days, Total_Gas_Avg)
SELECT
t.Well_Permit_Num,
SUM(t.Gas_Quantity) Total_Gas,
SUM(t.Gas_Production_Days) Total_Gas_Days
FROM DEP_OG_Production_Import t
GROUP BY t.Well_Permit_Num
ON DUPLICATE KEY UPDATE
Total_Gas = Total_Gas + t.Total_Gas,
Total_Gas_Days = Total_Gas_Days + t.Total_Gas_Days,
Total_Gas_Avg =(Total_Gas + t.Total_Gas) / (Total_Gas_Days + t.Total_Gas_Days)

MySQL user-defined function returns incorrect value when used in a SELECT statement

I met a problem when calling a user-defined function in MySQL. The computation is very simple but can't grasp where it went wrong and why it went wrong. Here's the thing.
So I created this function:
DELIMITER //
CREATE FUNCTION fn_computeLoanAmortization (_empId INT, _typeId INT)
RETURNS DECIMAL(17, 2)
BEGIN
SET #loanDeduction = 0.00;
SELECT TotalAmount, PeriodicDeduction, TotalInstallments, DeductionFlag
INTO #totalAmount, #periodicDeduction, #totalInstallments, #deductionFlag
FROM loans_table
WHERE TypeId = _typeId AND EmpId = _empId;
IF (#deductionFlag = 1) THEN
SET #remaining = #totalAmount - #totalInstallments;
IF(#remaining < #periodicDeduction) THEN
SET #loanDeduction = #remaining;
ELSE
SET #loanDeduction = #periodicDeduction;
END IF;
END IF;
RETURN #loanDeduction;
END;//
DELIMITER ;
If I call it like this, it works fine:
SELECT fn_computeLoanAmortization(3, 4)
But if I call it inside a SELECT statement, the result becomes erroneous:
SELECT Id, fn_computeLoanAmortization(Id, 4) AS Amort FROM emp_table
There's only one entry in the loans_table and the above statement should only result with one row having value in the Amort column but there are lots of random rows with the same Amort value as the one with the matching entry, which should not be the case.
Have anyone met this kind of weird dilemma? Or I might have done something wrong from my end. Kindly enlighten me.
Thank you very much.
EDIT:
By erroneous, I meant it like this:
loans_table has one record
EmpId = 1
TypeId = 2
PeriodicDeduction = 100
TotalAmount = 1000
TotalInstallments = 200
DeductionFlag = 1
emp_table has several rows
EmpId = 1
Name = Paolo
EmpId = 2
Name = Nikko
...
EmpId = 5
Name = Ariel
when I query the following statements, I get the correct value:
SELECT fn_computeLoanAmortization(1, 2)
SELECT Id, fn_computeLoanAmortization(Id, 2) AS Amort FROM emp_table WHERE EmpId = 1
But when I query this statement, I get incorrect values:
SELECT Id, fn_computeLoanAmortization(Id, 2) AS Amort FROM emp_table
Resultset would be:
EmpId | Amort
--------------------
1 | 100
2 | 100 (this should be 0, but the query returns 100)
3 | 100 (same error here)
...
5 | 100 (same error here up to the last record)
Inside your function, the variables you use to retrieve the values from the loans_table table are not local variables local to the function but session variables. When the select inside the function does not find any row, those variables still have the same values as from the previous execution of the function.
Use real local variables instead. In order to do that, use the variables names without # as a prefix and declare the variables at the beginning of the function. See this answer for more details.
I suspect the problem is that the variables in the INTO are not re-set when there is no matching row.
Just set them before the INTO:
BEGIN
SET #loanDeduction = 0.00;
SET #totalAmount = 0;
SET #periodicDeduction = 0;
SET #totalInstallments = 0;
SET #deductionFlag = 0;
SELECT TotalAmount, PeriodicDeduction, TotalInstallments, DeductionFlag
. . .
You might just want to set them to NULL.
Or, switch your logic to use local variables:
SET v_loanDeduction = 0.00;
SET v_totalAmount = 0;
SET v_periodicDeduction = 0;
SET v_totalInstallments = 0;
SET v_deductionFlag = 0;
And so on.

how to query several MYSQL tables in one statement

There are 30 tables(categories) all with the same structure storing news items with a siteID field to filter on a particular client.
The client select which tables(categories) they show by setting the field visible(tinyint) field to 1 or 0.
I have the following test MYSQL which works okay. I am using Applicationcraft.com so the syntax is different than standard MYSQL but you can see the query.
function _getAllData(cObj,p){
var result = [];
console.log('started');
selectObj=cObj.select().from('schoolNews').order('newsIDDESC').where('siteID=?',p.siteID);
result[0] = cObj.exec(selectObj);
selectObj=cObj.select().from('schoolDocs').order('newsIDASC').where('siteID=?',p.siteID);
result[1] = cObj.exec(selectObj);
return result;
}
So I have an array with the results of each table in result[0] & result[1].
So I created the following to :
function _getAllData(cObj,p){
var result = [];
console.log('started');
selectObj=cObj.select().from('schoolNews').order('newsIDDESC').where('siteID=?',p.siteID).where('visible=?',1);
result[0] = cObj.exec(selectObj);
selectObj=cObj.select().from('schoolDocs').order('newsIDASC').where('siteID=?',p.siteID).where('visible=?',1);
result[1] = cObj.exec(selectObj);
selectObj=Obj.select().from('schoolNews_copy').order('newsIDDESC').where('siteID=?',p.siteID).where('visible=?',1);
result[2] = cObj.exec(selectObj);
selectObj=cObj.select().from('schoolNews_copy').order('newsIDDESC').where('siteID=?',p.siteID).where('visible=?',1);
result[3] = cObj.exec(selectObj);
selectObj=cObj.select().from('schoolNews_copy').order('newsIDDESC').where('siteID=?',p.siteID;
result[4] = cObj.exec(selectObj).where('visible=?', 1);
upto result[30].
I have populated schoolNews_copy with 1000 records and run the query from my app.
I am getting a timed out error.
Is this because.
query the same table causes the problem.
This is the wrong approach all together.
If not what is the best approach.
Is there a way to query every table in a single statement and populate the results into an array named results.
So the result I need is an example array :
result[0] has data visible set to 1
result[1] has data visible set to 1
result[2] has data visible set to 0
I have now restructured the table as you said. And using joins can get all the info I need in one query.
SELECT * FROM categories INNER JOIN allNews on allNews.catID = categories.catID WHERE categories.visible = 1 AND categories.siteID = '+p.siteID;
MrWarby.

Executing an update statement with a select subquery clause in C

I have the following sql that I run in C:
snprintf(sql, 200, "update rec set name = (select name from pers where id = %d )
where id = %d",rec_id , emp_id );
mysql_query(conn, sql) returns a successful result but it's putting 1 in the "rec" table in the "name" field instead of the name, but when I printf the output and use it in MySQL it's working fine.
update rec set name = (select name from pers where id = 104 ) where id = 43
Is there something wrong with my sprintf? Or something has to be added?
I also tried static sql command like this
snprintf(sql,"update rec set name = (select name from pers where id = 104 ) where id = 43");
and it also put 1 in the rec.name
Is that due to count of record returned by the sub query? Can you verify by putting a condition which returns e.g. 2 records so that the name is set to 2? if this is the reason then (though less performing approach) try splitting the queries and see if it works this time.