Why there are twice “query rollback” every time when I query - sqlalchemy

sql.log
2022-12-09T10:46:30.814252Z 877 Query SELECT flag.id AS flag_id, flag.code AS flag_code, flag.isused AS flag_isused, flag.user_id AS flag_user_id
FROM flag
WHERE flag.code = '000027'
LIMIT 1
2022-12-09T10:46:30.815296Z 877 Query ROLLBACK
2022-12-09T10:46:30.815512Z 877 Query ROLLBACK
code
def exchange_code():
code = request.json.get('code')
flag = db.session.query(Flag).filter_by(code=code).first()
return flag.id
I'm using SQLAlchemy==1.4.44 and Flask-SQLAlchemy==2.5.1
i just want figure out why there twice "Query ROLLBACK" every time i query.

Related

Reducing number of calls to Database

I am using the following approach to make db calls,
for record in records:
num = "'"+str(record['Number'])+"'"
id = "'"+str(record['Id'])+"'"
query = """select col2_text,col3_text from table where id= {} and num = {} and is_active = 'Y';""".format(id,num)
Since it is iteration where total number of DB calls is equal to the number of records. I want to optimize my call and make minimum number of DB calls, ideally in a single call.
You can reduce the number of DB calls to a single one. You might want to have a look at the SQL-IN operator.
You could do the following:
values = ""
for record in records:
num = "'"+str(record['Number'])+"'"
id = "'"+str(record['Id'])+"'"
values += "({},{}),".format(num, id)
values = values[:-1]
query = """select col2_text,col3_text from table where (id, num) in ({}) and is_active = 'Y';""".format(values)

generate queries for each key in pyspark data frame

I have a data frame in pyspark like below
df = spark.createDataFrame(
[
('2021-10-01','A',25),
('2021-10-02','B',24),
('2021-10-03','C',20),
('2021-10-04','D',21),
('2021-10-05','E',20),
('2021-10-06','F',22),
('2021-10-07','G',23),
('2021-10-08','H',24)],("RUN_DATE", "NAME", "VALUE"))
Now using this data frame I want to update a table in MySql
# query to run should be similar to this
update_query = "UPDATE DB.TABLE SET DATE = '2021-10-01', VALUE = 25 WHERE NAME = 'A'"
# mysql_conn is a function which I use to connect to `MySql` from `pyspark` and run queries
# Invoking the function
mysql_conn(host, user_name, password, update_query)
Now when I invoke the mysql_conn function by passing parameters the query runs successfully and the record gets updated in the MySql table.
Now I want to run the update statement for all the records in the data frame.
For each NAME it has to pick the RUN_DATE and VALUE and replace in update_query and trigger the mysql_conn.
I think we need to a for loop but not sure how to proceed.
Instead of iterating through the dataframe with a for loop, it would be better to distribute the workload across each partitions using foreachPartition. Moreover, since you are writing a custom query instead of executing one query for each query, it would be more efficient to execute a batch operation to reduce the round trips, latency and concurrent connections. Eg
def update_db(rows):
temp_table_query=""
for row in rows:
if len(temp_table_query) > 0:
temp_table_query = temp_table_query + " UNION ALL "
temp_table_query = temp_table_query + " SELECT '%s' as RUNDATE, '%s' as NAME, %d as VALUE " % (row.RUN_DATE,row.NAME,row.VALUE)
update_query="""
UPDATE DBTABLE
INNER JOIN (
%s
) new_records ON DBTABLE.NAME = new_records.NAME
SET
DBTABLE.DATE = new_records.RUNDATE,
DBTABLE.VALUE = new_records.VALUE
""" % (temp_table_query)
mysql_conn(host, user_name, password, update_query)
df.foreachPartition(update_db)
View Demo on how the UPDATE query works
Let me know if this works for you.

what is a mysql buffered cursor w.r.t python mysql connector

Can someone please give an example to understand this?
After executing a query, a MySQLCursorBuffered cursor fetches the entire result set from the server and buffers the rows.
For queries executed using a buffered cursor, row-fetching methods such as fetchone() return rows from the set of buffered rows. For nonbuffered cursors, rows are not fetched from the server until a row-fetching method is called. In this case, you must be sure to fetch all rows of the result set before executing any other statements on the same connection, or an InternalError (Unread result found) exception will be raised.
Thanks
I can think of two ways these two types of Cursors are different.
The first way is that if you execute a query using a buffered cursor, you can get the number of rows returned by checking MySQLCursorBuffered.rowcount. However, the rowcount attribute of an unbuffered cursor returns -1 right after the execute method is called. This, basically, means that the entire result set has not yet been fetched from the server. Furthermore, the rowcount attribute of an unbuffered cursor increases as you fetch rows from it, while the rowcount attribute of a buffered cursor remains the same, as you fetch rows from it.
The following snippet code tries to illustrate the points made above:
import mysql.connector
conn = mysql.connector.connect(database='db',
user='username',
password='pass',
host='localhost',
port=3306)
buffered_cursor = conn.cursor(buffered=True)
unbuffered_cursor = conn.cursor(buffered=False)
create_query = """
drop table if exists people;
create table if not exists people (
personid int(10) unsigned auto_increment,
firstname varchar(255),
lastname varchar(255),
primary key (personid)
);
insert into people (firstname, lastname)
values ('Jon', 'Bon Jovi'),
('David', 'Bryan'),
('Tico', 'Torres'),
('Phil', 'Xenidis'),
('Hugh', 'McDonald')
"""
# Create and populate a table
results = buffered_cursor.execute(create_query, multi=True)
conn.commit()
buffered_cursor.execute("select * from people")
print("Row count from a buffer cursor:", buffered_cursor.rowcount)
unbuffered_cursor.execute("select * from people")
print("Row count from an unbuffered cursor:", unbuffered_cursor.rowcount)
print()
print("Fetching rows from a buffered cursor: ")
while True:
try:
row = next(buffered_cursor)
print("Row:", row)
print("Row count:", buffered_cursor.rowcount)
except StopIteration:
break
print()
print("Fetching rows from an unbuffered cursor: ")
while True:
try:
row = next(unbuffered_cursor)
print("Row:", row)
print("Row count:", unbuffered_cursor.rowcount)
except StopIteration:
break
The above snippet should return something like the following:
Row count from a buffered reader: 5
Row count from an unbuffered reader: -1
Fetching rows from a buffered cursor:
Row: (1, 'Jon', 'Bon Jovi')
Row count: 5
Row: (2, 'David', 'Bryan')
Row count: 5
Row: (3, 'Tico', 'Torres')
Row count: 5
Row: (4, 'Phil', 'Xenidis')
Row count: 5
Row: (5, 'Hugh', 'McDonald')
Row: 5
Fetching rows from an unbuffered cursor:
Row: (1, 'Jon', 'Bon Jovi')
Row count: 1
Row: (2, 'David', 'Bryan')
Row count: 2
Row: (3, 'Tico', 'Torres')
Row count: 3
Row: (4, 'Phil', 'Xenidis')
Row count: 4
Row: (5, 'Hugh', 'McDonald')
Row count: 5
As you can see, the rowcount attribute for the unbuffered cursor starts at -1 and increases as we loop through the result it generates. This is not the case with the buffered cursor.
The second way to tell the difference is by paying attention to which of the two (under the same connection) executes first. If you start with executing an unbuffered cursor whose rows have not been fully fetched and then try to execute a query with the buffered cursor, an InternalError exception will be raised, and you will be asked to consume or ditch what is returned by the unbuffered cursor. Below is an illustration:
import mysql.connector
conn = mysql.connector.connect(database='db',
user='username',
password='pass',
host='localhost',
port=3306)
buffered_cursor = conn.cursor(buffered=True)
unbuffered_cursor = conn.cursor(buffered=False)
create_query = """
drop table if exists people;
create table if not exists people (
personid int(10) unsigned auto_increment,
firstname varchar(255),
lastname varchar(255),
primary key (personid)
);
insert into people (firstname, lastname)
values ('Jon', 'Bon Jovi'),
('David', 'Bryan'),
('Tico', 'Torres'),
('Phil', 'Xenidis'),
('Hugh', 'McDonald')
"""
# Create and populate a table
results = buffered_cursor.execute(create_query, multi=True)
conn.commit()
unbuffered_cursor.execute("select * from people")
unbuffered_cursor.fetchone()
buffered_cursor.execute("select * from people")
The snippet above will raise a InternalError exception with a message indicating that there is some unread result. What it is basically saying is that the result returned by the unbuffered cursor needs to be fully consumed before you can execute another query with any cursor under the same connection. If you change unbuffered_cursor.fetchone() with unbuffered_cursor.fetchall(), the error will disappear.
There are other less obvious differences, such as memory consumption. Buffered cursor will likely consume more memory since they may fetch the result set from the server and buffer the rows.
I hope this proves useful.

Executing an update statement with a select subquery clause in C

I have the following sql that I run in C:
snprintf(sql, 200, "update rec set name = (select name from pers where id = %d )
where id = %d",rec_id , emp_id );
mysql_query(conn, sql) returns a successful result but it's putting 1 in the "rec" table in the "name" field instead of the name, but when I printf the output and use it in MySQL it's working fine.
update rec set name = (select name from pers where id = 104 ) where id = 43
Is there something wrong with my sprintf? Or something has to be added?
I also tried static sql command like this
snprintf(sql,"update rec set name = (select name from pers where id = 104 ) where id = 43");
and it also put 1 in the rec.name
Is that due to count of record returned by the sub query? Can you verify by putting a condition which returns e.g. 2 records so that the name is set to 2? if this is the reason then (though less performing approach) try splitting the queries and see if it works this time.

hive calculates wrong sum for json object

I have an external table with one column - data, where the data is json object
when I'm running the following hive query
hive> select get_json_object(data, "$.ev") from data_table limit 3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201212171824_0218, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201212171824_0218
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=master:8021 -kill job_201212171824_0218
2013-01-24 10:41:37,271 Stage-1 map = 0%, reduce = 0%
....
2013-01-24 10:41:55,549 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201212171824_0218
OK
2
2
2
Time taken: 21.449 seconds
But when I'm running the sum aggregation the result is strange
hive> select sum(get_json_object(data, "$.ev")) from data_table limit 3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201212171824_0217, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201212171824_0217
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=master:8021 -kill job_201212171824_0217
2013-01-24 10:39:24,485 Stage-1 map = 0%, reduce = 0%
.....
2013-01-24 10:41:00,760 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201212171824_0217
OK
9.4031522E7
Time taken: 100.416 seconds
Could anyone explain me why is that? And what should I do in for that works properly?
Hive seems to be taking the values in your JSON as floats instead of ints, and it looks like your table is pretty big so Hive is probably using the "exponent" notation for big float numbers, so 9.4031522E7 probably means 94031522.
If you want to make sure you're doing a sum over int, you can cast the field of your JSON to int and the sum should be able to return you an int:
$ hive -e "select sum(get_json_object(dt, '$.ev')) from json_table"
8.806305E7
$ hive -e "select sum(cast(get_json_object(dt, '$.ev') as int)) from json_table"
88063050