I am using pandas.to_sql to insert rows into tables. I read that SQLAlchemy released an option of fast_executemany for mssql+pyodbc connections. I wanted to know if there is something similar I can use for MySQL?
My current code:
dstConn = create_engine('mysql+mysqlconnector://{}:{}#{}:{}/{}'.format(user, pwd, dbServer, PORT, dbName),
echo=False, pool_size=10, max_overflow=20)
for items in pd.read_sql(sqlFrom, con = origConn, chunksize = 2000):
items = items.rename(columns = rename_cols)
table_name = 'mytable'
items.reindex(columns = get_dst_colums).to_sql
(name = table_name, con = dstConn, if_exists = 'append', index = False, method = None)
In pandas official docs for to_sql there is another argument called - method which is default None. I wanted to know if I set that to Multi would I receive any speed up for the above inserts?
Related
I´m using Google Cloud Funtion to create a table, for now everything works the way it is supposed to.
But I would like to add a new field in the table, one that could show the time of its creation.
This is an example of the code that I´m using at the moment.
Don´t know why Is not working but my main goal is to actually be able to do it with one table and the replicate the process in codes there I handle two or more tables.
Example:
structure of the data in the bucket:
Function
Code
main
from google.cloud
import bigquery
import pandas as pd
from previsional_tables
import table_TEST1
creation_date = pd.Timestamp.now()# Here is where I´ m supposed to get the date.
def main_function(event, context):
dataset = 'bd_clients'
file = event
input_bucket_name = file['bucket']
path_file = file['name']
uri = 'gs://{}/{}'.format(input_bucket_name, path_file)
path_file_list = path_file.split("/")
file_name_ext = path_file_list[len(path_file_list) - 1]
file_name_ext_list = file_name_ext.split(".")
name_file = file_name_ext_list[0]
print('nombre archivo ==> ' + name_file.upper())
print('Getting the data from bucket "{}"'.format(uri))
path_file_name = str(uri)
print("ruta: ", path_file_name)
if ("gs://bucket_test" in path_file_name):
client = bigquery.Client()
job_config = bigquery.LoadJobConfig()
table_test1(dataset, client, uri, job_config, bigquery)
tables
def table_test1(dataset, client, uri, job_config, bigquery):
table = "test1"
dataset_ref = client.dataset(dataset)
job_config.autodetect = True
job_config.max_bad_records = 1000
job_config.schema = [
bigquery.SchemaField("NAME", "STRING"),
bigquery.SchemaField("LAST_NAME", "STRING"),
bigquery.SchemaField("ADDRESS", "STRING"),
bigquery.SchemaField("DATE", bigquery.enums.SqlTypeNames.DATE)# create each column in Big Query along with types
]
job_config.source_format = bigquery.SourceFormat.CSV
job_config.field_delimiter = ';'
job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
load_job = client.load_table_from_uri(uri, dataset_ref.table(table), job_config = job_config)
requirements
# Function dependencies, for example:
# package>=version
google-cloud-bigquery==2.25.1
pysftp==0.2.9
pandas==1.4.2
fsspec==2022.5.0
gcsfs==2022.5.0
Output structure in Database:
First of all, sorry for my lack of knowledge regarding databases, this is my first time working with them.
I am having some issues trying to get the data from an excel file and putting it into a data base.
Using answers from the site, I managed to kind of connect to the database by doing this.
import pandas as pd
import pyodbc
server = 'XXXXX'
db = 'XXXXXdb'
# create Connection and Cursor objects
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
cursor = conn.cursor()
# read data from excel
data = pd.read_excel('data.csv')
But I dont really know what to do now.
I have 3 tables, which are connected by a 'productID', my excel file mimics the data base, meaning that all the columns in the excel file have a place to go in the DB.
My plan was to read the excel file and make lists with each column, then insert into the DB each column value but I have no idea how to create a query that can do this.
Once I get the query I think the data insertion can be done like this:
query = "xxxxxxxxxxxxxx"
for row in data:
#The following is not the real code
productID = productID
name = name
url = url
values = (productID, name, url)
cursor.execute(query,values)
conn.commit()
conn.close
Database looks like this.
https://prnt.sc/n2d2fm
http://prntscr.com/n2d3sh
http://prntscr.com/n2d3yj
EDIT:
Tried doing something like this, but i'm getting 'not all arguments converted during string formatting' Type error.
import pymysql
import pandas as pd
connStr = pymysql.connect(host = 'xx.xxx.xx.xx', port = xxxx, user = 'xxxx', password = 'xxxxxxxxxxx')
df = pd.read_csv('GenericProducts.csv')
cursor = connStr.cursor()
query = "INSERT INTO [Productos]([ItemID],[Nombre])) values (?,?)"
for index,row in df.iterrows():
#cursor.execute("INSERT INTO dbo.Productos([ItemID],[Nombre])) values (?,?,?)", row['codigoEspecificoProducto'], row['nombreProducto'])
codigoEspecificoProducto = row['codigoEspecificoProducto']
nombreProducto = row['nombreProducto']
values = (codigoEspecificoProducto,nombreProducto)
cursor.execute(query,values)
connStr.commit()
cursor.close()
connStr.close()
I think my problem is in how I'm defining the query, surely thats not the right way
Try this, you seem to have changed the library from pyodbc to mysql, it seems to expect %s instead of ?
import pymysql
import pandas as pd
connStr = pymysql.connect(host = 'xx.xxx.xx.xx', port = xxxx, user = 'xxxx', password = 'xxxxxxxxxxx')
df = pd.read_csv('GenericProducts.csv')
cursor = connStr.cursor()
query = "INSERT INTO [Productos]([ItemID],[Nombre]) values (%s,%s)"
for index,row in df.iterrows():
#cursor.execute("INSERT INTO dbo.Productos([ItemID],[Nombre]) values (%s,%s)", row['codigoEspecificoProducto'], row['nombreProducto'])
codigoEspecificoProducto = row['codigoEspecificoProducto']
nombreProducto = row['nombreProducto']
values = (codigoEspecificoProducto,nombreProducto)
cursor.execute(query,values)
connStr.commit()
cursor.close()
connStr.close()
I am doing some data analytics using MySQL and Matlab. My program works fine if there is already an existing database present. But it doesn't work when there is not database for which I am trying to create connection. Now, what I want to do is to create a database with a fixed name if there is no database present in that name. I searched all over the internet and haven't found any option for that. May be I am missing something.
Additionally, I would like to create a table on the newly created database and store some random data on them. I can do this part. But I am stucked on the first part which is creating database programmatically using matlab.
Please note that, I have to use only matlab for this project. Any kind cooperation will be greatly appreciated.
Update
The code example is given below -
%findIfFeederExists Summary of this function goes here
% finds whether there is no. of feeders are empty or not
% Detailed explanation goes here
%Set preferences with setdbprefs.
setdbprefs('DataReturnFormat', 'dataset');
setdbprefs('NullNumberRead', 'NaN');
setdbprefs('NullStringRead', 'null');
%Make connection to database. Note that the password has been omitted.
%Using ODBC driver.
conn = database('wisedb', 'root', '');
conn.Message;
%Read data from database.
sqlQuery = 'SELECT * FROM joined_table';
%sqlQuery = 'SELECT * FROM joined_table where joined_table.`N. of Feeder` > 0';
curs = exec(conn, sqlQuery);
curs = fetch(curs);
dbMatrix = curs.data;
[row_count, ~] = size(dbMatrix);
if (row_count >= id)
val = dbMatrix(id, 3);
disp(val);
if (val.N0x2EOfFeeder > 0)
Str = strcat('Feeder is present on the id : ', num2str(id));
disp(Str);
disp(dbMatrix(id, 1:end));
else
Str = strcat('Feeder is NOT present on the id : ', num2str(id));
disp(Str);
end
else
Str = strcat('No row found for id : ', num2str(id));
disp(Str);
end
% = exec(conn,'SELECT * FROM inventoryTable');
close(curs);
%Assign data to output variable
%Close database connection.
close(conn);
%Clear variables
clear curs conn
end
You can see I can connect to an existing database using ODBC. But I am not sure how I can create a new database. What can be done for this?
I have in MySQL database table containing "tasks". Each task have flag (if it's taken or not).
And now for example 3 threads do:
query_base = session.query(PredykcjaRow).filter(
PredykcjaRow.predyktor == predictor,
PredykcjaRow.czy_wziete == False
)
query_disprot = query_base.join(NieustrRow, NieustrRow.fastaId == PredykcjaRow.fastaId)
query_pdb = query_base.join(RawBialkoRow, RawBialkoRow.fasta_id == PredykcjaRow.fastaId)
response = query_pdb.union(query_disprot)
response = response.with_for_update()
response = response.first()
if response is None:
return None
response.czy_wziete = True
try:
session.commit()
return response
except:
return None
each thread have own session (ScopedSession) but all 3 threads get the same object.
In configuration
tx_isolation..... REPEATABLE-READ
Assuming the scoped session is created like this:
Session = scoped_session(sessionmaker(bind=engine))
Make sure you aren't doing this
session = Session()
give_to_thread1(session)
give_to_thread2(session)
With a scoped session, you can use it directly, e.g.
Session.query(...)
So your threads should do this:
def runs_in_thread():
Session.add(...)
# or
session = Session()
session.add(...)
The problem is union statement. MySQL does not provide accumulative SELECTS with FOR UPDATE - it execute without warning, but row is not locked.
I found this information in official documentation but now I can't. If anyone can, please post comment.
Isn't my coding typing wrong way? I need create an update button so user can edit the information by using Matlab. After update, the button need connect to mySQL server 5.6 and ODBC connector.
This is my code:
% --- Executes on button press in update.
function update_Callback(hObject, eventdata, handles)
% hObject handle to update (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
%Display dialog box to confirm save
choice = questdlg('Confirm update to database?', ...
'', ...
'Yes','No','Yes');
% Handle dialog box response
switch choice
case 'Yes'
%Set preferences with setdbprefs.
setdbprefs('DataReturnFormat', 'cellarray');
%Make connection to database.
conn = database('animal_cbir', '', '');
%Test if database connection is valid
testConnection = isconnection(conn);
disp(testConnection);
fileID = getappdata(0,'namevalue');
imageID = fileID;
name = get(handles.edit11,'String');
commonName = get(handles.edit1,'String');
scientificName = get(handles.edit2,'String');
class = get(handles.edit3,'String');
diet = get(handles.edit4,'String');
habitat = get(handles.edit5,'String');
lifeSpan = get(handles.edit6,'String');
size = get(handles.edit7,'String');
weight = get(handles.edit8,'String');
characteristic = get(handles.edit10,'String');
tablename = 'animal';
colnames ={'imageID','name','commonName','scientificName','class','diet','habitat','lifeSpan','size','weight','characteristic'};
data = {imageID,name,commonName,scientificName,class,diet,habitat,lifeSpan,size,weight,characteristic};
disp (data);
whereClause = sprintf(['where imageID = "%s"'],fileID);
update(conn,tablename,colnames,data,whereClause);
updateSuccess = helpdlg('Existing animal species successfully updated in database.');
commit(conn);
case 'No'
end
Error I am getting:
No method 'setInt' with matching signature found for class 'sun.jdbc.odbc.JdbcOdbcPreparedStatement'.
Hope that anyone can help me solve it.