Writing to MySQL with Python without using SQL strings - mysql

I am importing data into my Python3 environment and then writing it to a MySQL database. However, there is a lot of different data tables, and so writing out each INSERT statement isn't really pragmatic, plus some have 50+ columns.
Is there a good way to create a table in MySQL directly from a dataframe, and then send insert commands to that same table using a dataframe of the same format, without having to actually type out all the col names? I started trying to call column names and format it and concat everything as a string, but it is extremely messy.
Ideally there is a function out there to directly handle this. For example:
apiconn.request("GET", url, headers=datheaders)
#pull in some JSON data from an API
eventres = apiconn.getresponse()
eventjson = json.loads(eventres.read().decode("utf-8"))
#create a dataframe from the data
eventtable = json_normalize(eventjson)
dbconn = pymysql.connect(host='hostval',
user='userval',
passwd='passval',
db='dbval')
cursor = dbconn.cursor()
sql = sqltranslate(table = 'eventtable', fun = 'append')
#where sqlwrite() is some magic function that takes a dataframe and
#creates SQL commands that pymysql can execute.
cursor.execute(sql)

What you want is a way to abstract the generation of the SQL statements.
A library like SQLAlchemy will do a good job, including a powerful way to construct DDL, DML, and DQL statements without needing to directly write any SQL.

Related

how to insert variables into Sqlite3 database

I'm not familiar with SQL(or python to be honest), and i was wondering how to add variables into tables. This is what i tried:
import sqlite3
conn = sqlite3.connect('TESTDB.db')
c = conn.cursor()
c.execute('''CREATE TABLE IF NOT EXISTS table(number real, txt text)''')
r=4
c.execute('''INSERT INTO table(r,'hello')''')
conn.commit()
conn.close()
This doesn't work.I get the error, "TypeError: 'str' object is not callable"How do i make the table insert the variable r??
Thanks
You have to bind values to placeholders in a prepared statement. See examples in the documentation.
Trying to build a query string on the fly with unknown values inserted directly in it is a great way to get a SQL injection attack and should be avoided.
Your problem is that you don't inject r properly into your query. This should work:
c.execute('''INSERT INTO table(''' + str(r) + ''','hello')''')
cheers

Python3, MySQL, and SqlAlchemy -- does SqlAlchemy always require a DBAPI?

I am in the process of migrating databases from sqlite to mysql. Now that I've migrated the data to mysql, I'm not able to use my sqlalchemy code (in Python3) to access it in the new mysql db. I was under the impression that sqlalchemy syntax was database agnostic (i.e. the same syntax would work for accessing sqlite and mysql), but this appears not to be the case. So my question is: Is it absolutely required to use a DBAPI in addition to Sqlalchemy to read the data? Do I have to edit all of my sqlalchemy code to now read mysql?
The documentation says: The MySQL dialect uses mysql-python as the default DBAPI. There are many MySQL DBAPIs available, including MySQL-connector-python and OurSQL, which I think means that I DO need a DBAPI.
My old code with sqlite successfully worked like this with sqlite:
engine = create_engine('sqlite:///pmids_info.db')
def connection():
conn = engine.connect()
return conn
def load_tables():
metadata = MetaData(bind=engine) #init metadata. will be empty
metadata.reflect(engine) #retrieve db info for metadata (tables, columns, types)
inputPapers = Table('inputPapers', metadata)
return inputPapers
inputPapers = load_tables()
def db_inputPapers_retrieval(user_input):
result = engine.execute("select title, author, journal, pubdate, url from inputPapers where pmid = :0", [user_input])
for row in result:
title = row['title']
author = row['author']
journal = row['journal']
pubdate = row['pubdate']
url = row['url']
apa = str(author+' ('+pubdate+'). '+title+'. '+journal+'. Retrieved from '+url)
return apa
This worked fine and dandy. So then I tried to update it to work with the mysql db like this:
engine = create_engine('mysql://snarkshark#localhost/pmids_info')
At first when I tried to run my sample code like this, it complained because I didn't have MySqlDB. Some googling around informed me that MySqlDB does NOT work for Python 3. So then I tried pip installing pymysql and changing my engine statement to
engine = create_engine('mysql+pymysql://snarkshark#localhost/pmids_info')
which also ends up giving me various syntax errors when I try to adjust things.
So what I want to know, is if there is any way I can get my current syntax to work with mysql? Since the syntax is from sqlalchemy, I thought it would work perfectly for the exact same data in mysql that was previously in sqlite. Will I have to go through and update ALL of my db functions to use the syntax of the DBAPI?
This will sound like a dumb answer, but you'll need to change all the places where you're using database-specific behavior. SQLAlchemy does not guarantee that anything you do with it is portable across all backends. It leaks some abstractions on purpose to allow you to do things that are only available on certain backends. What you're doing is like using Python because it's cross-platform, then doing a bunch of os.fork()s everywhere, and then being surprised that it doesn't work on Windows.
For your specific case, at a minimum, you need to wrap all your raw SQL in text() so that you're not affected by the supported paramstyle of the DBAPI. However, there are still subtle differences between different dialects of SQL, so you'll need to use the SQLAlchemy SQL expression language instead of raw SQL if you want portability. After all that, you'll still need to be careful not to use backend-specific features in the SQL expression language.

SqlAlchemy table name reflection using an efficient method

I am using the code below to extract table names on a database at a GET call in a Flask app.:
session = db.session()
qry = session.query(models.BaseTableModel)
results = session.execute(qry)
table_names = []
for row in results:
for column, value in row.items():
#this seems like a bit of a hack
if column == "tables_table_name":
table_names.append(value)
print('{0}: '.format(table_names))
Given that tables in the database may added/deleted regularly, is the code above an efficient and reliable way to get the names of tables in a database?
One obvious optimization is to use row["tables_table_name"] instead of second loop.
Assuming that BaseTableModel is a table, which contains names of all other tables, than you're using the fastest approach to get this data.

How to parsimoniously refer to a data frame in RMySQL

I have a MySQL table that I am reading with the RMySQL package of R. I would like to be able to directly refer to the data frame stored in the table so I can seamlessly interact with it rather than having to execute RMySQL statement every time I want to do something. Is there a way to accomplish this? I tried:
data <- dbReadTable(conn = con, name = 'tablename')
For example, if I now want to check how many rows I have in this table I would run:
nrow(data)
Does this go through the database connection, or am I now storing the object "data" locally, defeating the whole purpose of using an external database?
data <- dbReadTable(conn = con, name = 'tablename')
This command downloads all the data into a local R dataframe (assuming you have enough RAM). Any operations with data from that point forward do not require the SQL connection.

Any Best practice of doing table record insert with SQL CLR store procedure?

Recently we turned a set of complicate C# based scheduling logic into SQL CLR stored procedure (running in SQL Server 2005). We believed that our code is a great SQL CLR candidate because:
The logic involves tons of data from sQL Server.
The logic is complicate and hard to be done using TSQL
There is no threading or sychronization or accessing resources from outside of the sandbox.
The result of our sp is pretty good so far. However, since the output of our logic is in form of several tables of data, we can't just return a single rowset as the result of the sp. Instead, in our code we have a lot of "INSERT INTO ...." statements in foreach loops in order to save each record from C# generic collection into SQL tables. During code review, someone raised concern about whether the inline SQL INSERT approach within the SQL CLR can cause perforamnce problem, and wonder if there's other better way to dump data out (from our C# generic collections).
So, any suggestion?
I ran across this while working on an SQLite project a few months back and found it enlightening. I think it might be what you're looking for.
...
Fastest universal way to insert data
using standard ADO.NET constructs
Now that the slow stuff is out of the
way, lets talk about some hardcore
bulk loading. Aside from SqlBulkCopy
and specialized constructs involving
ISAM or custom bulk insert classes
from other providers, there is simply
no beating the raw power of
ExecuteNonQuery() on a parameterized
INSERT statement. I will demonstrate:
internal static void FastInsertMany(DbConnection cnn)
{
using (DbTransaction dbTrans = cnn.BeginTransaction())
{
using (DbCommand cmd = cnn.CreateCommand())
{
cmd.CommandText = "INSERT INTO TestCase(MyValue) VALUES(?)";
DbParameter Field1 = cmd.CreateParameter();
cmd.Parameters.Add(Field1);
for (int n = 0; n < 100000; n++)
{
Field1.Value = n + 100000;
cmd.ExecuteNonQuery();
}
}
dbTrans.Commit();
}
}
You could return a table with 2 columns (COLLECTION_NAME nvarchar(max), CONTENT xml) filled with as many rows as internal collections you have. CONTENT will be an XML representation of the data in the collection.
Then you can use the XML features of SQL 2005/2008 to parse each collection's XML into tables, and perform your INSERT INTO's or MERGE statements on the whole table.
That should be faster than individual INSERTS inside your C# code.