I just want to know how to create the engine with the config file that would give the same result as:
engine = create_engine('sqlite:///mydb.sqlite', echo=True)
I think it would be with a config.py file like
DATABASE_URL = 'sqlite:///oracle.sqlite'
and
import config
from sqlalchemy import create_engine, engine_from_config
from sqlalchemy_utils.functions import database_exists, drop_database
engine = create_engine(config.DATABASE_URL, echo=True)
for creating the database every time it's needed. For deleting:
if database_exists(config.DATABASE_URL):
drop_database(engine.url)
config = {'db.url':'sqlite:///./somedb.db', 'db.echo':'True'}
engine = engine_from_config(config, prefix='db.')
You can have the config outside the file, and import it!
Also check this out, on SQLAlchemy Docs.
If you want to capture sql commands, see statistics, find slowest, check sqlalchemy-capture-sql (Disclaimer: I am the author of the package):
from sqlalchemy_capture_sql import CaptureSqlStatements
with CaptureSqlStatements(sqlalchemy_engine) as capture_stmts:
# put here calls to functions that issue sqlalchemy commands that
# produce some sql statements execution
session.add(user)
capture_stmts.pp()
the library will make full report, for instance:
============================================================
1. 0.0020 INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
<- 'joe+Joe Joey+joey'
2. 0.0009 SELECT FROM users WHERE users.id = ?
<- '2'
...
============================================================
== Slowest (top 5):
1. INSERT USERS 1 0.002 s INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
2. INSERT USERS 1 0.001 s INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
...
============================================================
== By sql command (top 20):
INSERT 2 0.003 s
SELECT 4 0.003 s
============================================================
== By table (top 20):
USERS 6 0.007 s
...
============================================================
== By sql command + table (top 20):
INSERT USERS 2 0.003 s
SELECT USERS 2 0.002 s
...
== Totally captured 8 statement(s) in 0.008866 s
Related
I am making a database which needs at least 500 row entry per seconds
row is as follows
[1,amit,"55555","2020-12-21 12:12:12"]
when I try to insert it in mysql using queue and threading in pyhton
it insert only 150 rows per second
my coad is as follows
import mysql.connector
from queue import Queue
from threading import Thread
def do_stuff(q,mydb,mycursor):
while True:
# mycursor = mydb.cursor()
a= q.get()
sql = "INSERT INTO list (id, ticker, price, tmp) VALUES (%s, %s, %s, %s)"
val = (a[0], a[1],float(a[3]),a[2])
mycursor.execute(sql, val)
mydb.commit()
q.task_done()
q = Queue(maxsize=0)
num_threads = 10
for i in range(num_threads):
mydb = mysql.connector.connect(host="localhost",user="root",password="password",database="list1",auth_plugin='mysql_native_password')
mycursor = mydb.cursor()
worker = Thread(target=do_stuff, args=(q,mydb,mycursor))
worker.start()
def strt():
for i in range(100000):
a = [i,"name","154.55","2020-12-21 12:12:12"]
q.put(a)
if anyone has suggestion to improve performance or any other method to write faster
please let me know thank you
I am attempting to do a bulk insert into MySQL using
INSERT INTO TABLE (a, b, c) VALUES (?, ?, ?), (?, ?, ?)
I have the general log on, and see that this works splendidly for most cases. However, when the table has a BLOB column, it doesn't work as well.
I am trying to insert 20 records.
Without the BLOB, I see all 20 records in the same query in the general log, 20 records inserted in the same query.
WITH the BLOB, I see only 2 records per query in the general log, it takes 10 queries in total.
Is this a problem with MySQL, the JDBC Driver, or am I missing something else. I would prefer to use a BLOB as I have data in protobufs.
Here is an example table...
CREATE TABLE my_table (
id CHAR(36) NOT NULL,
name VARCHAR(256) NOT NULL,
data BLOB NOT NULL,
PRIMARY KEY (id)
);
Then, create your batch inserts in code...
val ps = conn.prepareStatement(
"INSERT INTO my_table(id, name, data) VALUES (?, ?, ?)")
records.grouped(1000).foreach { group =>
group.foreach { r =>
ps.setString(1, UUID.randomUUID.toString)
ps.setString(2, r.name)
ps.setBlob(3, new MariaDbBlob(r.data))
ps.addBatch()
}
ps.executeBatch()
}
If you run this and inspect the general log, you will see...
"2018-10-12T18:37:55.714825Z 4 Query INSERT INTO my_table(id, name, fqdn, data) VALUES ('b4955537-2450-48c4-9953-e27f3a0fc583', '17-apply-test', _binary '
17-apply-test\"AAAA(?2Pending8?????,J$b4955537-2450-48c4-9953-e27f3a0fc583
1:2:3:4:5:6:7:8Rsystem'), ('480e470c-6d85-4bbc-b718-21d9e80ac7f7', '18-apply-test', _binary '
18-apply-test\"AAAA(?2Pending8?????,J$480e470c-6d85-4bbc-b718-21d9e80ac7f7
1:2:3:4:5:6:7:8Rsystem')
2018-10-12T18:37:55.715489Z 4 Query INSERT INTO my_table(id, name, data) VALUES ('7571a651-0e0b-4e78-bff0-1394070735ce', '19-apply-test', _binary '
19-apply-test\"AAAA(?2Pending8?????,J$7571a651-0e0b-4e78-bff0-1394070735ce
1:2:3:4:5:6:7:8Rsystem'), ('f77ebe28-73d2-4f6b-8fd5-284f0ec2c3f0', '20-apply-test', _binary '
20-apply-test\"AAAA(?2Pending8?????,J$f77ebe28-73d2-4f6b-8fd5-284f0ec2c3f0
As you can see, each INSERT INTO only has 2 records in it.
Now, if you remove the data field from the schema and insert and re-run, you will see the following output (for 10 records)...
"2018-10-12T19:04:24.406567Z 4 Query INSERT INTO my_table(id, name) VALUES ('d323d21e-25ac-40d4-8cff-7ad12f83b8c0', '1-apply-test'), ('f20e37f2-35a4-41e9-8458-de405a44f4d9', '2-apply-test'), ('498f4e96-4bf1-4d69-a6cb-f0e61575ebb4', '3-apply-test'), ('8bf7925d-8f01-494f-8f9f-c5b8c742beae', '4-apply-test'), ('5ea663e7-d9bc-4c9f-a9a2-edbedf3e5415', '5-apply-test'), ('48f535c8-44e6-4f10-9af9-1562081538e5', '6-apply-test'), ('fbf2661f-3a23-4317-ab1f-96978b39fffe', '7-apply-test'), ('3d781e25-3f30-48fd-b22b-91f0db8ba401', '8-apply-test'), ('55ffa950-c941-44dc-a233-ebecfd4413cf', '9-apply-test'), ('6edc6e25-6e70-42b9-8473-6ab68d065d44', '10-apply-test')"
All 10 records are in the same query
I tinkered until I found the fix...
val ps = conn.prepareStatement(
"INSERT INTO my_table(id, name, data) VALUES (?, ?, ?)")
records.grouped(1000).foreach { group =>
group.foreach { r =>
ps.setString(1, UUID.randomUUID.toString)
ps.setString(2, r.name)
//ps.setBlob(3, new MariaDbBlob(r.data))
ps.setBytes(r.data)
ps.addBatch()
}
ps.executeBatch()
Using PreparedStatement.setBytes instead of using MariaDbBlob seemed to do the trick
I'm using Python 3.6, mysql-connector-python 8.0.11 and 8.0.11
MySQL Community Server - GPL. The table in question is using the innoDB engine.
When using the MySQL Workbench I can enter:
USE test; START TRANSACTION; SELECT * FROM tasks WHERE task_status != 1 LIMIT 1 FOR UPDATE;
And it provides 1 record as expected:
When I use a script using python3 (from the same machine - same access, etc):
* SQL QRY: START TRANSACTION; SELECT * FROM test WHERE task_status != 1 LIMIT 1 FOR UPDATE;
* SQL RES: No result set to fetch from.
This is debug output from my script. If I change the Query to normal SELECT, I do get output.
* SQL QRY: SELECT * FROM test WHERE task_status != 1 LIMIT 1;
* SQL RES: [(1, 0, 'TASK0001')]
I know SELECT * isn't the way to go but just trying to get some response for now.
I'm trying to allow multiple worker scripts to pick up a task without the workers taking the same task:
Do a select and row lock the task so other workers 'SELECT' query doesn't show them,
Set the task status to 'being processed' and unlock the record.
This is my first venture into locking so this is new ground. I'm able to do normal queries and populate tables etc so have some experience but not with locking.
TABLE creation:
create table test
(
id int auto_increment
primary key,
task_status int not null,
task_ref varchar(16) not null
);
Questions:
Is this the correct mindset? I.e. is there a more pythonic/mysql way to do this?
Is there a specific way I need to initiate the mysql connection? Why would it work using the MySQL workbench but not via script? I've tried using direct mysql and this works too - so I think it is the python connector that may need setting up correctly as it is the only component not working.
Currently I'm using 'autocommit=1' on the connector and 'buffered=True' on the cursor. I know you can set 'autocommit=0' in the SQL before the 'START TRANSACTION' so understand for the locking I may need to do this, but for all other transactions I would prefer to keep autocommit on. Is this OK and/or doable?
CODE:
#!/usr/bin/env python
import mysql.connector
import pprint
conn = mysql.connector.connect(user='testuser',
password='testpass',
host='127.0.0.1',
database='test_db',
autocommit=True)
dbc = conn.cursor(buffered=True)
qry = "START TRANSACTION; SELECT * FROM 'test' WHERE task_status != 1 LIMIT 1 ON UPDATE;"
sql_select = dbc.execute(qry)
try:
output = dbc.fetchall()
except mysql.connector.Error as e:
print(" * SQL QRY: {0}".format(qry))
print(" * SQL RES: {0}".format(e))
exit()
else:
print(" * SQL QRY: {0}".format(qry))
print(" * SQL RES: {0}".format(output))
Many Thanks,
Frank
So after playing around a bit, I worked out (by trial and error) that the proper way to do this is to just put 'FOR UPDATE' at the end of the normal query:
Full code is below (including option to add dummy records for testing):
#!/usr/bin/env python
import mysql.connector
import pprint
import os
conn = mysql.connector.connect(user='testuser',
password='testpass',
host='127.0.0.1',
database='test_db',
autocommit=True)
dbc = conn.cursor(buffered=True)
worker_pid = os.getpid()
all_done = False
create = False
if create:
items = []
for i in range(10000):
items.append([0, 'TASK%04d' % i])
dbc.executemany('INSERT INTO test (task_status, task_ref) VALUES (%s, %s)', tuple(items))
conn.commit()
conn.close
exit()
while all_done is False:
print(all_done)
qry = (
"SELECT id FROM test WHERE task_status != 1 LIMIT 1 FOR UPDATE;"
)
sql_select = dbc.execute(qry)
try:
output = dbc.fetchall()
except mysql.connector.Error as e:
print(" * SQL QRY: {0}".format(qry))
print(" * SQL RES: {0}".format(e))
exit()
else:
print(" * SQL QRY: {0}".format(qry))
print(" * SQL RES: {0}".format(output))
if len(output) == 0:
print("All Done = Yes")
all_done = True
continue
else:
print("Not Done yet!")
if len(output) > 0:
test_id = output[0][0]
print("WORKER {0} FOUND: '{1}'".format(worker_pid, test_id))
qry = "UPDATE test SET task_status = %s, task_ref = %s WHERE id = %s;"
sql_select = dbc.execute(qry, tuple([1, worker_pid, test_id]))
conn.commit()
try:
output = dbc.fetchall()
except mysql.connector.Error as e:
print(" * SQL QRY: {0}".format(qry))
print(" * SQL RES: {0}".format(e))
else:
print(" * SQL QRY: {0}".format(qry))
print(" * SQL RES: {0}".format(output))
print(all_done)
Hope this helps someone else save some time as there are a lot of places with different info but searches for python3, mysql-connector and transactions didn't get me anything.
Good Luck,
Frank
I'm using the mysql-otp driver for Erlang. It seems to be working fine but there is no documentation on using it to insert multiple rows into a table.
simple use case for single row insert:
ok = mysql:query(Pid, "INSERT INTO mytable (id, bar) VALUES (?, ?)", [1, 42]).
But I need to insert multiple values, can I do something like this?
ok = mysql:query(Pid, "INSERT INTO mytable (id, bar) VALUES (?, ?)", [(1, 42),(2, 36), (3,12)]).
Documentation states Params = [term()], so probably not, which is a bummer.
You can certainly do a combination of lists:foldl/3 and lists:join/2 on your arguments to create the desired query format:
L = [[1, 42],[2, 36], [3,12]],
PreparedList = lists:foldl(fun (Params, Inserts) -> Inserts ++ [io_lib:format("(~p,~p)", Params)] end, [], L),
%% Then you need to join these with a comma:
Prepared = lists:flatten(lists:join(",", PreparedList)),
%% this will result in "(1,42),(2,36),(3,12)"
Now you just need to call the mysql insert with this Prepared variable:
ok = mysql:query(Pid, "INSERT INTO mytable (id, bar) VALUES ?", [Prepared]).
%% The query will look like: "INSERT INTO mytable (id, bar) VALUES (1,42),(2,36),(3,12)"
I don't think this driver or mysql can do such kind of things.
I think you should do it likes below
insert_mytable(Data)->
{ok,Ref} = mysql:prepare(Pid,insert_mytable,"INSERT INTO mytable (id, bar) VALUES (?, ?)"),
loop_insert(_Pid,Ref,Data).
loop_insert(_Pid,_Ref,[])-> ok;
loop_insert(Pid,Ref,[H|T])->
ok = mysql:execute(Pid,Ref,H),
loop_insert(Pid,Ref,T).
I have a data frame with 10 million rows and 5 columns that I want to insert to an existing sql table. Note that I do not have permission to create a table, I can only insert values into an existing table. I'm currently using RODBCext
query_ch <- "insert into [blah].[dbo].[blahblah]
(col1, col2, col3, col4, col5)
values (?,?,?,?,?)"
sqlExecute(channel, query_ch, my_data)
This takes way too long (more than 10 hours). Is there a way accomplish this faster?
TL;DR: LOAD DATA INFILE is one order of magnitude faster than multiple INSERT statements, which are themselves one order of magnitude faster than single INSERT statements.
I benchmark below the three main strategies to importing data from R into Mysql:
single insert statements, as in the question:
INSERT INTO test (col1,col2,col3) VALUES (1,2,3)
multiple insert statements, formated like so:
INSERT INTO test (col1,col2,col3) VALUES (1,2,3),(4,5,6),(7,8,9)
load data infile statement, i.e. loading a previously written CSV file in mysql:
LOAD DATA INFILE 'the_dump.csv' INTO TABLE test
I use RMySQL here, but any other mysql driver should lead to similar results. The SQL table was instantiated with:
CREATE TABLE `test` (
`col1` double, `col2` double, `col3` double, `col4` double, `col5` double
) ENGINE=MyISAM;
The connection and test data were created in R with:
library(RMySQL)
con = dbConnect(MySQL(),
user = 'the_user',
password = 'the_password',
host = '127.0.0.1',
dbname='test')
n_rows = 1000000 # number of tuples
n_cols = 5 # number of fields
dump = matrix(runif(n_rows*n_cols), ncol=n_cols, nrow=n_rows)
colnames(dump) = paste0('col',1:n_cols)
Benchmarking single insert statements:
before = Sys.time()
for (i in 1:nrow(dump)) {
query = paste0('INSERT INTO test (',paste0(colnames(dump),collapse = ','),') VALUES (',paste0(dump[i,],collapse = ','),');')
dbExecute(con, query)
}
time_naive = Sys.time() - before
=> this takes about 4 minutes on my computer
Benchmarking multiple insert statements:
before = Sys.time()
chunksize = 10000 # arbitrary chunk size
for (i in 1:ceiling(nrow(dump)/chunksize)) {
query = paste0('INSERT INTO test (',paste0(colnames(dump),collapse = ','),') VALUES ')
vals = NULL
for (j in 1:chunksize) {
k = (i-1)*chunksize+j
if (k <= nrow(dump)) {
vals[j] = paste0('(', paste0(dump[k,],collapse = ','), ')')
}
}
query = paste0(query, paste0(vals,collapse=','))
dbExecute(con, query)
}
time_chunked = Sys.time() - before
=> this takes about 40 seconds on my computer
Benchmarking load data infile statement:
before = Sys.time()
write.table(dump, 'the_dump.csv',
row.names = F, col.names=F, sep='\t')
query = "LOAD DATA INFILE 'the_dump.csv' INTO TABLE test"
dbSendStatement(con, query)
time_infile = Sys.time() - before
=> this takes about 4 seconds on my computer
Crafting your SQL query to handle many insert values is the simplest way to improve the performances. Transitioning to LOAD DATA INFILE will lead to optimal results. Good performance tips can be found in this page of mysql documentation.