Update and insert from qtableWidget into MYSQL database more efficiently - mysql

I'm building a desktop app with PyQt5 to connect with, load data from, insert data into and update a MySQL database. What I came up with to update the database and insert data into the database works. But I feel there should be a much faster way to do it in terms of computation speed. If anyone could help that would be really helpful. What I have as of now for updating the database is this -
def log_change(self, item):
self.changed_items.append([item.row(),item.column()])
# I connect this function to the item changed signal to log any cells which have been changed
def update_db(self):
# Creating an empty list to remove the duplicated cells from the initial list
self.changed_items_load= []
[self.changed_items_load.append(x) for x in self.changed_items if x not in self.changed_items_load]
# loop through the changed_items list and remove cells with no values in them
for db_wa in self.changed_items_load:
if self.tableWidget.item(db_wa[0],db_wa[1]).text() == "":
self.changed_items_load.remove(db_wa)
try:
mycursor = mydb.cursor()
# loop through the list and update the database cell by cell
for ecr in self.changed_items_load:
command = ("update table1 set `{col_name}` = %s where id=%s;")
# table widget column name matches db table column name
data = (str(self.tableWidget.item(ecr[0],ecr[1]).text()),int(self.tableWidget.item(ecr[0],0).text()))
mycursor.execute(command.format(col_name = self.col_names[ecr[1]]),data)
# self.col_names is a list of the tableWidget columns
mydb.commit()
mycursor.close()
except OperationalError:
Msgbox = QMessageBox()
Msgbox.setText("Error! Connection to database lost!")
Msgbox.exec()
except NameError:
Msgbox = QMessageBox()
Msgbox.setText("Error! Connect to database!")
Msgbox.exec()
For inserting data and new rows into the db I was able to find some info online about that. But I have been unable to insert multiple lines at once as well as insert varying column length for each row. Like if I want to insert only 2 columns at row 1, and then 3 columns at row 2... something like that.
def insert_db(self):
# creating a list of each column
self.a = [self.tableWidget.item(row,1).text() for row in range (self.tableWidget.rowCount()) if self.tableWidget.item(row,1) != None]
self.b = [self.tableWidget.item(row,2).text() for row in range (self.tableWidget.rowCount()) if self.tableWidget.item(row,2) != None]
self.c = [self.tableWidget.item(row,3).text() for row in range (self.tableWidget.rowCount()) if self.tableWidget.item(row,3) != None]
self.d = [self.tableWidget.item(row,4).text() for row in range (self.tableWidget.rowCount()) if self.tableWidget.item(row,4) != None]
try:
mycursor = mydb.cursor()
mycursor.execute("INSERT INTO table1(Name, Date, Quantity, Comments) VALUES ('%s', '%s', '%s', '%s')" %(''.join(self.a),
''.join(self.b),
''.join(self.c),
''.join(self.d)))
mydb.commit()
mycursor.close()
except OperationalError:
Msgbox = QMessageBox()
Msgbox.setText("Error! Connection to database lost!")
Msgbox.exec()
except NameError:
Msgbox = QMessageBox()
Msgbox.setText("Error! Connect to database!")
Msgbox.exec()
Help would be appreciated. Thanks.

Like if I want to insert only 2 columns at row 1, and then 3 columns at row 2
No. A given Database table has a specific number of columns. That is an integral part of the definition of a "table".
INSERT adds new rows to a table. It is possible to construct a single SQL statement that inserts multiple rows "all at once".
UPDATE modifies one or more rows of a table. The rows are indicated by some condition specified in the Update statement.
Constructing SQL with %s is risky -- it gets in trouble if there are quotes in the string being inserted.
(I hope these comments help you get to the next stage of understanding databases.)

Related

generate queries for each key in pyspark data frame

I have a data frame in pyspark like below
df = spark.createDataFrame(
[
('2021-10-01','A',25),
('2021-10-02','B',24),
('2021-10-03','C',20),
('2021-10-04','D',21),
('2021-10-05','E',20),
('2021-10-06','F',22),
('2021-10-07','G',23),
('2021-10-08','H',24)],("RUN_DATE", "NAME", "VALUE"))
Now using this data frame I want to update a table in MySql
# query to run should be similar to this
update_query = "UPDATE DB.TABLE SET DATE = '2021-10-01', VALUE = 25 WHERE NAME = 'A'"
# mysql_conn is a function which I use to connect to `MySql` from `pyspark` and run queries
# Invoking the function
mysql_conn(host, user_name, password, update_query)
Now when I invoke the mysql_conn function by passing parameters the query runs successfully and the record gets updated in the MySql table.
Now I want to run the update statement for all the records in the data frame.
For each NAME it has to pick the RUN_DATE and VALUE and replace in update_query and trigger the mysql_conn.
I think we need to a for loop but not sure how to proceed.
Instead of iterating through the dataframe with a for loop, it would be better to distribute the workload across each partitions using foreachPartition. Moreover, since you are writing a custom query instead of executing one query for each query, it would be more efficient to execute a batch operation to reduce the round trips, latency and concurrent connections. Eg
def update_db(rows):
temp_table_query=""
for row in rows:
if len(temp_table_query) > 0:
temp_table_query = temp_table_query + " UNION ALL "
temp_table_query = temp_table_query + " SELECT '%s' as RUNDATE, '%s' as NAME, %d as VALUE " % (row.RUN_DATE,row.NAME,row.VALUE)
update_query="""
UPDATE DBTABLE
INNER JOIN (
%s
) new_records ON DBTABLE.NAME = new_records.NAME
SET
DBTABLE.DATE = new_records.RUNDATE,
DBTABLE.VALUE = new_records.VALUE
""" % (temp_table_query)
mysql_conn(host, user_name, password, update_query)
df.foreachPartition(update_db)
View Demo on how the UPDATE query works
Let me know if this works for you.

Insert multiple values in MySQL with Python from an array

I am trying to insert data from my array into MySQL.
To my big surprise there were not many examples on how to do it if you perform a for-loop for your array, every example that I have found was from an already existing array list.
Thanks to Adrian below, we noticed that I need tuples for my list.
Updated code
connection = mysql.connector.connect(
host='localhost',
database='test',
user='root',
password='pass'
)
query = "INSERT INTO blue (created, published, publisher) VALUES (%s, %s, %s)"
array = []
# The idea here is to get all table rows in the page so you can group the values into rows that are going to be added to MySQL
tr = soup.find_all('tr')
for table_row in tr:
row_data = table_row.find_all('td')
insert_row = []
for data in row_data:
data = re.sub('<[^>]*>', '', str(data))
insert_row.append(data)
array.append(tuple(insert_row))
print(array)
cursor = connection.cursor()
cursor.executemany(query, array)
cursor.commit()
Getting close but at the moment I receive the following
IndexError: Tuple index out of range
mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement
Thanks in advance!
I think you are mixing two ways of solving the problem...
One way is using the executemany method as described in the documentation
query = "INSERT INTO blues (created, published, publisher) VALUES (%s, %s, %s)"
array = []
# The idea here is to get all table rows in the page so you
# can group the values into rows that are going to be added to MySQL
tr = soup.find_all('tr')
for table_row in tr:
row_data = table_row.find_all('td')
insert_row = [None, None, None]
for idx in range(len(row_data)):
if row_data[idx] and idx < 3:
data = re.sub('<[^>]*>', '', str(row_data[idx]))
if data:
insert_row[idx] = data
array.append(tuple(insert_row))
cursor = connection.cursor()
cursor.executemany(query, array)
cursor.commit()
Another way is to build the query yourself...
query = "INSERT INTO blues (created, published, publisher) VALUES "
array = []
# The idea here is to get all table rows in the page so you can group the values into rows that are going to be added to MySQL
tr = soup.find_all('tr')
for table_row in tr:
row_data = table_row.find_all('td')
insert_row = []
for data in row_data:
data = re.sub('<[^>]*>', '', str(data))
insert_row.append(data)
array.append(tuple(insert_row))
values = []
for item in array:
row = [None, None, None]
for idx in range(len(item)):
row[idx] = item[idx]
values.append(str(tuple(row)))
query += ",".join(values)
cursor = connection.cursor()
cursor.execute(query)
cursor.commit()
Hope this helps...

How to create a insert statement based on the input of different type of tables?

I have written a function that uses mysql connector to insert data to table in mysql.
def insert_row(data, table, conn):
"""Insert new row of data receieved.
"""
cursor = conn.cursor()
query = ("INSERT INTO " + table + " "
"(temp, humidity)"
" VALUES (%(temp)s, %(humidity)s)")
print(query)
cursor.execute(query, data)
conn.commit()
cursor.close()
However, now I want to modify it to be able to build query dynamically based on table and its columns coming from the data.
the data arg is a dict object.
currently statement that gets constructed is this INSERT INTO particle_photon (temp, humidity) VALUES (%(temp)s, %(humidity)s) but different table may have different columns coming in the data dict object.
I figured out myself,
def insert_row(data, table, conn):
"""Insert new row of data receieved.
"""
cursor = conn.cursor()
placeholder = ", ".join(["%s"] * len(data))
stmt = "insert into `{table}` ({columns}) values ({values});".format(
table=table,
columns=",".join(data.keys()),
values=placeholder
)
cursor.execute(stmt, list(data.values()))
conn.commit()
cursor.close()
You can remove column names and the parenthesis around them from your prepared query, then include them again in your table parameter. So the table would be equal to particle_photon (temp, humidity) instead of just particle_photon.

Making thousands of SELECT queries faster

Situation
Working with Python 3.7.2
I have read previlege of a MariaDB table with 5M rows on a server.
I have a local text file with 7K integers, one per line.
The integers represent IDXs of the table.
The IDX column of the table is the primary key. (so I suppose it is automatically indexed?)
Problem
I need to select all the rows whose IDX is in the text file.
My effort
Version 1
Make 7K queries, one for each line in the text file. This makes approximately 130 queries per second, costing about 1 minute to complete.
import pymysql
connection = pymysql.connect(....)
with connection.cursor() as cursor:
query = (
"SELECT *"
" FROM TABLE1"
" WHERE IDX = %(idx)s;"
)
all_selected = {}
with open("idx_list.txt", "r") as f:
for idx in f:
idx = idx.strip()
if idx:
idx = int(idx)
parameters = {"idx": idx}
cursor.execute(query, parameters)
result = cursor.fetchall()[0]
all_selected[idx] = result
Version 2
Select the whole table, iterate over the cursor and cherry-pick rows. The for-loop over .fetchall_unbuffered() covers 30-40K rows per second, and the whole script costs about 3 minutes to complete.
import pymysql
connection = pymysql.connect(....)
with connection.cursor() as cursor:
query = "SELECT * FROM TABLE1"
set_of_idx = set()
with open("idx_list.txt", "r") as f:
for line in f:
if line.strip():
line = int(line.strip())
set_of_idx.add(line)
all_selected = {}
cursor.execute(query)
for row in cursor.fetchall_unbuffered():
if row[0] in set_of_idx:
all_selected[row[0]] = row[1:]
Expected behavior
I need to select faster, because the number of IDXs in the text file will grow as big as 10K-100K in the future.
I consulted other answers including this, but I can't make use of it since I only have read previlege, thus impossible to create another table to join with.
So how can I make the selection faster?
A temporary table implementation would look like:
connection = pymysql.connect(....,local_infile=True)
with connection.cursor() as cursor:
cursor.execute("CREATE TEMPORARY TABLE R (IDX INT PRIMARY KEY)")
cursor.execute("LOAD DATA LOCAL INFILE 'idx_list.txt' INTO R")
cursor.execute("SELECT TABLE1.* FROM TABLE1 JOIN R USING ( IDX )")
..
cursor.execute("DROP TEMPORARY TABLE R")
Thanks to the hint (or more than a hint) from #danblack, I was able to achieve the desired result with the following query.
query = (
"SELECT *"
" FROM TABLE1"
" INNER JOIN R"
" ON R.IDX = TABLE1.IDX;"
)
cursor.execute(query)
danblack's SELECT statement didn't work for me, raising an error:
pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'IDX' at line 1")
This is probably because of MariaDB's join syntax, so I consulted the MariaDB documentation on joining tables.
And now it selects 7K rows in 0.9 seconds.
Leaving here as an answer just for the sake of completeness, and for future readers.

what is a mysql buffered cursor w.r.t python mysql connector

Can someone please give an example to understand this?
After executing a query, a MySQLCursorBuffered cursor fetches the entire result set from the server and buffers the rows.
For queries executed using a buffered cursor, row-fetching methods such as fetchone() return rows from the set of buffered rows. For nonbuffered cursors, rows are not fetched from the server until a row-fetching method is called. In this case, you must be sure to fetch all rows of the result set before executing any other statements on the same connection, or an InternalError (Unread result found) exception will be raised.
Thanks
I can think of two ways these two types of Cursors are different.
The first way is that if you execute a query using a buffered cursor, you can get the number of rows returned by checking MySQLCursorBuffered.rowcount. However, the rowcount attribute of an unbuffered cursor returns -1 right after the execute method is called. This, basically, means that the entire result set has not yet been fetched from the server. Furthermore, the rowcount attribute of an unbuffered cursor increases as you fetch rows from it, while the rowcount attribute of a buffered cursor remains the same, as you fetch rows from it.
The following snippet code tries to illustrate the points made above:
import mysql.connector
conn = mysql.connector.connect(database='db',
user='username',
password='pass',
host='localhost',
port=3306)
buffered_cursor = conn.cursor(buffered=True)
unbuffered_cursor = conn.cursor(buffered=False)
create_query = """
drop table if exists people;
create table if not exists people (
personid int(10) unsigned auto_increment,
firstname varchar(255),
lastname varchar(255),
primary key (personid)
);
insert into people (firstname, lastname)
values ('Jon', 'Bon Jovi'),
('David', 'Bryan'),
('Tico', 'Torres'),
('Phil', 'Xenidis'),
('Hugh', 'McDonald')
"""
# Create and populate a table
results = buffered_cursor.execute(create_query, multi=True)
conn.commit()
buffered_cursor.execute("select * from people")
print("Row count from a buffer cursor:", buffered_cursor.rowcount)
unbuffered_cursor.execute("select * from people")
print("Row count from an unbuffered cursor:", unbuffered_cursor.rowcount)
print()
print("Fetching rows from a buffered cursor: ")
while True:
try:
row = next(buffered_cursor)
print("Row:", row)
print("Row count:", buffered_cursor.rowcount)
except StopIteration:
break
print()
print("Fetching rows from an unbuffered cursor: ")
while True:
try:
row = next(unbuffered_cursor)
print("Row:", row)
print("Row count:", unbuffered_cursor.rowcount)
except StopIteration:
break
The above snippet should return something like the following:
Row count from a buffered reader: 5
Row count from an unbuffered reader: -1
Fetching rows from a buffered cursor:
Row: (1, 'Jon', 'Bon Jovi')
Row count: 5
Row: (2, 'David', 'Bryan')
Row count: 5
Row: (3, 'Tico', 'Torres')
Row count: 5
Row: (4, 'Phil', 'Xenidis')
Row count: 5
Row: (5, 'Hugh', 'McDonald')
Row: 5
Fetching rows from an unbuffered cursor:
Row: (1, 'Jon', 'Bon Jovi')
Row count: 1
Row: (2, 'David', 'Bryan')
Row count: 2
Row: (3, 'Tico', 'Torres')
Row count: 3
Row: (4, 'Phil', 'Xenidis')
Row count: 4
Row: (5, 'Hugh', 'McDonald')
Row count: 5
As you can see, the rowcount attribute for the unbuffered cursor starts at -1 and increases as we loop through the result it generates. This is not the case with the buffered cursor.
The second way to tell the difference is by paying attention to which of the two (under the same connection) executes first. If you start with executing an unbuffered cursor whose rows have not been fully fetched and then try to execute a query with the buffered cursor, an InternalError exception will be raised, and you will be asked to consume or ditch what is returned by the unbuffered cursor. Below is an illustration:
import mysql.connector
conn = mysql.connector.connect(database='db',
user='username',
password='pass',
host='localhost',
port=3306)
buffered_cursor = conn.cursor(buffered=True)
unbuffered_cursor = conn.cursor(buffered=False)
create_query = """
drop table if exists people;
create table if not exists people (
personid int(10) unsigned auto_increment,
firstname varchar(255),
lastname varchar(255),
primary key (personid)
);
insert into people (firstname, lastname)
values ('Jon', 'Bon Jovi'),
('David', 'Bryan'),
('Tico', 'Torres'),
('Phil', 'Xenidis'),
('Hugh', 'McDonald')
"""
# Create and populate a table
results = buffered_cursor.execute(create_query, multi=True)
conn.commit()
unbuffered_cursor.execute("select * from people")
unbuffered_cursor.fetchone()
buffered_cursor.execute("select * from people")
The snippet above will raise a InternalError exception with a message indicating that there is some unread result. What it is basically saying is that the result returned by the unbuffered cursor needs to be fully consumed before you can execute another query with any cursor under the same connection. If you change unbuffered_cursor.fetchone() with unbuffered_cursor.fetchall(), the error will disappear.
There are other less obvious differences, such as memory consumption. Buffered cursor will likely consume more memory since they may fetch the result set from the server and buffer the rows.
I hope this proves useful.