This is a follow up to Convert sqlalchemy core statements to raw SQL without a connection?. I would like to use insert with named arguments, but without executing them through SQLA.
I have an insert statement defined as such:
table = Table("table", meta_data,
Column("id", Integer, auto_increment=True, primary_key=1),
Column("name", String),
Column("full_name", String))
params = {"full_name": "some_name", "name": "some_other_name"}
stmt = sqlalchemy.sql.expression.insert(table).values(params)
c_stmt = stmt.compile(dialect=dialect)
I would later execute this statement through a DBAPI connection. How can I ensure that the position between the generated sql string and my set of parameters is consistent ?
Most SQL engines (all?) support named bind parameters as well as positional parameters, and SA provides an interface (bindparam, funny enough) to do so. The SA Docs have a section with examples that will probably serve your needs. Of course, the examples assume you are executing the queries with SA, so you'll need to interpret them for your use case.
Related
I would like to use update_all in order to update all records in a database.
Let me assume that the name of the attribute to be update is price.
I succeeded command A.
A
MyModel.update_all(price: "$500")
However, I would like to mysql function(e.g. IF, CONCAT, arithmetic...) in updating.
so I tried command B, but failed and the string value CONCAT('$', 500) was stored.
B
MyModel.update_all(price: "CONCAT('$', 500)")
I succeeded when I tried C, but I don't want use it because of the SQL injection risks.
C
MyModel.update_all("price = CONCAT('$', 500)")
How can I do that without any risk of SQL injection attacks?
Here, I have to use sql function here for some reasons.
Thank you in advance.
You need to create methods corresponding to allowable sql functions that take the params as input. So if you want to provide "CONCAT(..,...)" then user should be able to call a method like this. You can extent it to check type of arguments are allowed and add checks there as well.
def allowed_functions(func_name, all_args_here)
case func_name
when 'concat' then "CONCAT(#{all_args_here})"
end
end
I've got a confusing issue on Airflow which I don't understand.
I have a SQL scripts folder at DML/analytics/my_script.sql. The MySQL operator works perfectly in normal circumstances, but does not when I try to call it from a Python operator as follows. This is necessitated by needing to pass in XCOM values from another task:
def insert_func(**kwargs):
run_update = MySqlOperator(
sql='DML/analytics/my_script.sql',
task_id='insert_func',
mysql_conn_id="bi_mysql",
params={
"table_name": table_name,
'ts': kwargs['task_instance'].xcom_pull(key='return_value',task_ids='get_existing_data')
},
)
run_update.execute(context=kwargs['task_instance'])
with DAG("my_dag", **dag_params) as dag:
with TaskGroup(group_id='insert') as insert:
get_existing_data = PythonOperator(
task_id='get_existing_data',
python_callable=MySQLGetRecord,
op_kwargs={
'target_db_conn_id':'bi_mysql',
'target_db':'analytics',
'sql': f'SELECT invoice_date FROM analytics.{table_name} ORDER BY 1 DESC'
}
),
insert = PythonOperator(
task_id='insert',
python_callable=insert_func
)
get_existing_data >> insert_func
The error I get is: MySQLdb._exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DML/analytics/my_script.sql' at line 1")
Clearly it is trying to run the literal string passed in the sql parameter rather than using it as a file location. Why is this happening? Again, this works if I move the run_update task to the my_dag with clause, but I need to do it this way to get the XCOM value from get_existing_data, correct...?
When you are using operator as normal (e.g to be used by Airflow) then Airflow is responsible for the whole task lifecycle. This means Airflow handles the templating, executing pre_execute(), executing execute(), executing on_faulure/retries etc...
What you did is using operator inside operator -> PythonOperator that contains MySqlOperator. In this case the inner operator (MySqlOperator) is just a regular Python class. While it's called Operator - it's is not a "real" Operator.
You are not enjoying any of the lifecycle steps as you might expect.
You might have already realised it as by your own example you specifically triggered the execute():
run_update.execute(context=kwargs['task_instance'])
Notice you didn't need to do this for the PythonOperaor.
You can see in the code base that Airflow invokes render_templates before it invokes pre_execute() and before it invokes execute().
This means that if you want the MySqlOperator to be templated you need to call the function that does the templating before you invoke the execute()
That said - I strongly encourage you - Do not use operator inside operator.
From your code I don't see reason why you can't just use MySqlOperator directly without the PythonOperaor but should there be a reason the proper way to handle it is to create a CustomMySqlOperator that handles the logic you seek. By doing so you will not have problems with using .sql files.
I'm writing a simple - or it should be simple - script to acquire tweets from Twitter's API (I have developer/app keys and am using the Tweepy interface, not scraping or anything of that sort - I may ditch Tweepy for something closer to the modern API but that is almost certainly not what's causing this issue here).
I have a MySQL instance which I connect to and can query just fine, until it comes time to insert the tweet - which has a lot of special characters, almost inevitably. To be clear, I am using the official Python driver/connector for MySQL.
import mysql.connector
from mysql.connector import errorcode
Now, I'm aware StackOverflow is LITTERED with threads where people get my exact error - simply stating to check the MySQL syntax manual. These threads, which aren't all that old (and I'm not using the latest Python, I use 3.7.9 for compatibility with some NLP libraries) insist the answer is to place the string that has the special characters into an old-style format string WITHIN the cursor.execute method, to enclose string variable placeholders in quotes, and to pass a tuple with an empty second value if, as in my case, only one variable is to be inserted. This is also a solution posted as part of a bug report response on the MySQL website - and yet, I have no success.
Here's what I've got - following the directions on dozens of pages here and the official database website:
for tweet in tweepy.Cursor(twilek.search, q=keyword, tweet_mode='extended').items():
twi_tweet = tweet.full_text
print(twi_tweet)
twi_tweet = twi_tweet.encode('utf8')
requests_total+=1
os.environ['TWITTER_REQUESTS'] = str(requests_total)
requests_total = int(os.environ.get('TWITTER_REQUESTS'))
# insert the archived tweet text into the database table
sql = 'USE hate_tweets'
ms_cur.execute(sql)
twi_tweet = str(twi_tweet)
insert_tweet = re.sub(r'[^A-Za-z0-9 ]+', '', twi_tweet)
ms_cur.execute("INSERT INTO tweets_lgbt (text) VALUES %s" % (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
(twilek is my cursor object because I'm a dork)
expected result: string formatter passes MySQL a modified tweet string that it can process and add as a row to the tweets_lgbt table
actual result: insertion fails on a syntax error for any tweet
I've tried going so far as to use regex to strip everything but alphanumeric and spaces - same issue. I'm wondering if the new string format features of current Python versions have broken compatibility with this connector? I prefer to use the official driver but I'll switch to an ORM if I must. (I did try the newer features like F strings, and found they caused the same result.)
I have these observations:
the VALUES clause requires parentheses VALUES (%s)
the quoting / escaping of values should be delegated to the cursor's execute method, by using unquoted placeholders in the SQL and passing the values as the second argument: cursor.execute(sql, (tweet_text,)) or cursor.executemany(sql, [(tweet_text1,), (tweet_text2,)])
once these steps are applied there's no need for encoding/stringifying/regex-ifying: assuming twi_text is a str and the database's charset/collation supports the full UTF-8 range (for example utf8mb4) then the insert should succeed.
in particular, encoding a str and then calling str on the result is to be avoided: you end up with "b'my original string'"
This modified version of the code in the question works for me:
import mysql.connector
DDL1 = """DROP TABLE IF EXISTS tweets_lgbt"""
DDL2 = """\
CREATE TABLE tweets_lgbt (
`text` VARCHAR (256))
"""
# From https://twitter.com/AlisonMitchell/status/1332567013701500928?s=20
insert_tweet = """\
Particularly pleased to see #SarahStylesAU
quoted in this piece for the work she did
👌
Thrive like a girl: Why women's cricket in Australia is setting the standard
"""
# Older connector releases don't support with...
with mysql.connector.connect(database='test') as cnx:
with cnx.cursor() as ms_cur:
ms_cur.execute(DDL1)
ms_cur.execute(DDL2)
ms_cur.execute("INSERT INTO tweets_lgbt (`text`) VALUES (%s)", (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
This is how you should insert a row to your table,
insert_tweet = "ABCEFg 9 XYZ"
"INSERT INTO tweets_lgbt (text) VALUES ('%s');"%(insert_tweet)
"INSERT INTO tweets_lgbt (text) VALUES ('ABCEFg 9 XYZ');"
Things to note
The arguments to a string formatter is just like the arguments to a
function. So, you cannot add a comma at the end to convert a string
to a tuple there.
If you are trying to insert multiple values at once, you can use cursor.executemany or this answer.
I'm using a MySql database and was trying to find a MySQL alternative to tedious.js (a SQL server parameterised query builder).I'm using Node.js for my backend.
I read that the .raw() command from knex.js is susceptible to sql injection, if not used with bindings.
But are the other commands and knex.js as a whole safe to use to prevent sql injection? Or am I barking up the wrong tree?
Read carefully from knex documentation how to pass values to knex raw (http://knexjs.org/#Raw).
If you are passing values as parameter binding to raw like:
knex.raw('select * from foo where id = ?', [1])
In that case parameters and query string are passed separately to database driver protecting query from SQL injection.
Other query builder methods always uses binding format internally so they are safe too.
To see how certain query is passed to database driver one can do:
knex('foo').where('id', 1).toSQL().toNative()
Which will output SQL string and bindings that are given to driver for running the query (https://runkit.com/embed/2yhqebv6pte6).
Biggest mistake that one can do with knex raw queries is to use javascript template string and interpolate variables directly to SQL string format like:
knex.raw(`select * from foo where id = ${id}`) // NEVER DO THIS
One thing to note is that knex table/identifier names cannot be passed as bindings to driver, so with those one should be extra careful to not read table / column names from user and use them without properly validating them first.
Edit:
By saying that identifier names cannot be passed as bindings I mean that when one is using ?? knex -binding for identifier name, that will be rendered as part of SQL string when passed to the database driver.
This code works when the connection is made to an accdb database:
Dim customer = connection.Query(Of Klantgegevens)("Select Actief,Onderhoudscontract From Klantgegevens Where Klantnummer=#Idx", New With {.Idx = customerId}).SingleOrDefault
But the code below gives the error about the Idx parameter when the connection is made to a SQL server database that has a table with the same structure:
Dim customer = connection.Query(Of Klantgegevens)("Select Actief,Onderhoudscontract From [dbo.Klantgegevens] Where Klantnummer=#Idx", New With {.Idx = customerId}).SingleOrDefault
What is going wrong here? I had hoped that by using Dapper I would be able to write database agnostic code. But it seems that is not the case!
If you are using an ODBC/OLEDB connection, then my first suggestion would be: move to SqlClient (SqlConnection). Everything should work fine with SqlConnection.
If you can't do that for some reason - i.e. you're stuck with a provider that doesn't have good support for named parameters - then you might need to tell dapper to use pseudo-positional parameters. Instead of #Idx, use ?Idx?. Dapper interprets this as an instruction to replace ?Idx? with the positional placeholder (simply: ?), using the value from the member Idx.
This is also a good fix for talking to accdb, which has very atypical parameter usage for an ADO.NET provider: it allows named parameter tokens, but all the tokens all replaced with ?, and given values from the positions of the added parameters (not via their names).