Related
I'm writing a simple - or it should be simple - script to acquire tweets from Twitter's API (I have developer/app keys and am using the Tweepy interface, not scraping or anything of that sort - I may ditch Tweepy for something closer to the modern API but that is almost certainly not what's causing this issue here).
I have a MySQL instance which I connect to and can query just fine, until it comes time to insert the tweet - which has a lot of special characters, almost inevitably. To be clear, I am using the official Python driver/connector for MySQL.
import mysql.connector
from mysql.connector import errorcode
Now, I'm aware StackOverflow is LITTERED with threads where people get my exact error - simply stating to check the MySQL syntax manual. These threads, which aren't all that old (and I'm not using the latest Python, I use 3.7.9 for compatibility with some NLP libraries) insist the answer is to place the string that has the special characters into an old-style format string WITHIN the cursor.execute method, to enclose string variable placeholders in quotes, and to pass a tuple with an empty second value if, as in my case, only one variable is to be inserted. This is also a solution posted as part of a bug report response on the MySQL website - and yet, I have no success.
Here's what I've got - following the directions on dozens of pages here and the official database website:
for tweet in tweepy.Cursor(twilek.search, q=keyword, tweet_mode='extended').items():
twi_tweet = tweet.full_text
print(twi_tweet)
twi_tweet = twi_tweet.encode('utf8')
requests_total+=1
os.environ['TWITTER_REQUESTS'] = str(requests_total)
requests_total = int(os.environ.get('TWITTER_REQUESTS'))
# insert the archived tweet text into the database table
sql = 'USE hate_tweets'
ms_cur.execute(sql)
twi_tweet = str(twi_tweet)
insert_tweet = re.sub(r'[^A-Za-z0-9 ]+', '', twi_tweet)
ms_cur.execute("INSERT INTO tweets_lgbt (text) VALUES %s" % (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
(twilek is my cursor object because I'm a dork)
expected result: string formatter passes MySQL a modified tweet string that it can process and add as a row to the tweets_lgbt table
actual result: insertion fails on a syntax error for any tweet
I've tried going so far as to use regex to strip everything but alphanumeric and spaces - same issue. I'm wondering if the new string format features of current Python versions have broken compatibility with this connector? I prefer to use the official driver but I'll switch to an ORM if I must. (I did try the newer features like F strings, and found they caused the same result.)
I have these observations:
the VALUES clause requires parentheses VALUES (%s)
the quoting / escaping of values should be delegated to the cursor's execute method, by using unquoted placeholders in the SQL and passing the values as the second argument: cursor.execute(sql, (tweet_text,)) or cursor.executemany(sql, [(tweet_text1,), (tweet_text2,)])
once these steps are applied there's no need for encoding/stringifying/regex-ifying: assuming twi_text is a str and the database's charset/collation supports the full UTF-8 range (for example utf8mb4) then the insert should succeed.
in particular, encoding a str and then calling str on the result is to be avoided: you end up with "b'my original string'"
This modified version of the code in the question works for me:
import mysql.connector
DDL1 = """DROP TABLE IF EXISTS tweets_lgbt"""
DDL2 = """\
CREATE TABLE tweets_lgbt (
`text` VARCHAR (256))
"""
# From https://twitter.com/AlisonMitchell/status/1332567013701500928?s=20
insert_tweet = """\
Particularly pleased to see #SarahStylesAU
quoted in this piece for the work she did
👌
Thrive like a girl: Why women's cricket in Australia is setting the standard
"""
# Older connector releases don't support with...
with mysql.connector.connect(database='test') as cnx:
with cnx.cursor() as ms_cur:
ms_cur.execute(DDL1)
ms_cur.execute(DDL2)
ms_cur.execute("INSERT INTO tweets_lgbt (`text`) VALUES (%s)", (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
This is how you should insert a row to your table,
insert_tweet = "ABCEFg 9 XYZ"
"INSERT INTO tweets_lgbt (text) VALUES ('%s');"%(insert_tweet)
"INSERT INTO tweets_lgbt (text) VALUES ('ABCEFg 9 XYZ');"
Things to note
The arguments to a string formatter is just like the arguments to a
function. So, you cannot add a comma at the end to convert a string
to a tuple there.
If you are trying to insert multiple values at once, you can use cursor.executemany or this answer.
This comment made me worried that the way I am searching my databases may result in an injection attack. Bellow is the query I'm currently using:
return db.query("SELECT * FROM customers
WHERE ( num LIKE ? AND name LIKE ? )",
[customer.num + "%", customer.name + "%"], callback)
Should I be adding the % symbol in my call to the API, or how would this be properly implemented?
The comment you're referring to uses string concatenation, whereas you are using prepared statements
This means that the intention is that the db.query function should be applying filtering and escaping to the inputs you provide, and hopefully they would be fairly extensive, and protecting you well.
It doesn't mean that you're immune to SQLI attacks, because there are just so many of them, but you have followed good practices in using prepared statements.
To check and/or improve your security against SQLI attacks, you could:
Audit the database connector library you are using to see whether it has known issues. npm audit is your friend here
Consider using another system for the queries, like an ORM like Sequelize, which tend to use querying systems even further separated from the actual SQL.
No, you misunderstood it. The problem comes when you concatenate a string with the SQL Query . e.g db.query("SELECT * FROM customers WHERE ( num LIKE "+ var +"AND name LIKE "+ var2+" )". You are safe here because you are using placeholders that will escape them.
I'm trying to accomplish a multiple word searching in a quotes database using Ruby, ActiveRecord, and MySQL. The way I did is shown bellow, and it is working, but I would like to know if there a better way to do.
# receives a string, splits it in a array of words, create the 'conditions'
# query, and send it to ActiveRecord
def search
query = params[:query].strip.split if params[:query]
like = "quote LIKE "
conditions = ""
query.each do |word|
conditions += (like + "'%#{word}%'")
conditions += " AND " unless query.last == word
end
#quotes = Quote.all(:conditions => conditions)
end
I would like to know if there is better way to compose this 'conditions' string. I also tried it using string interpolation, e.g., using the * operator, but ended up needing more string processing. Thanks in advance
First, I strongly encourage you to move Model's logic into Models. Instead of creating the search logic into the Controller, create a #search method into your Quote mode.
class Quote
def self.search(query)
...
end
end
and your controller becomes
# receives a string, splits it in a array of words, create the 'conditions'
# query, and send it to ActiveRecord
def search
#quotes = Quote.search(params[:query])
end
Now, back to the original problem. Your existing search logic does a very bad mistake: it directly interpolates value opening your code to SQL injection. Assuming you use Rails 3 you can take advantage of the new #where syntax.
class Quote
def self.search(query)
words = query.to_s.strip.split
words.inject(scoped) do |combined_scope, word|
combined_scope.where("quote LIKE ?", "%#{word}%")
end
end
end
It's a little bit of advanced topic. I you want to understand what the combined_scope + inject does, I recommend you to read the article The Skinny on Scopes.
MySQL fulltext search not working, so best way to do this:
class Quote
def self.search_by_quote(query)
words = query.to_s.strip.split
words.map! { |word| "quote LIKE '%#{word}%'" }
sql = words.join(" AND ")
self.where(sql)
end
end
The better way to do it would be to implement full text searching. You can do this in MySQL but I would highly recommend Solr. There are many resources online for implementing Solr within rails but I would recommend Sunspot as an entrance point.
Create a FULLTEXT index in MySQL. With that, you can leave string processing to MySQL.
Example : http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
I'm trying to run a native query through JPA that uses a ':' character. The particular instance is using a MySQL user variable in the query:
SELECT foo, bar, baz,
#rownum:= if (#id = foo, #rownum+1, 1) as rownum,
#id := foo as rep_id
FROM
foo_table
ORDER BY
foo,
bar desc
The JPA code:
Query q = getEntityManager().createNativeQuery(query, SomeClass.class);
return q.getResultList();
However, this gives me an exception about not being allowed to follow a ':' with a space. I've tried escaping them with backslashes, I've tried escaping them by doubling them up. Is there any way to actually do this, or am I SOL?
I faced similar experience when using postgresql json function in native JPA query.
select * from component where data ::json ->> ?1 = ?2
JPA will throw error that i have not set the named parameter :json.
The solution:
"select * from component where data \\:\\:json ->> ?1 = ?2"
I'm not aware of a standard way to escape a colon character in a query that is obviously interpreted as a named parameter prefix, and thus confuses the query parser.
My suggestion would be to create and use SQL functions if possible. Depending on your provider, there might be other options (like using another character and substituting the chosen character by a : in an interceptor) but at least the previous suggestion would keep your JPA code portable across providers.
PS: if you're using Hibernate, there is a very old patch attached to HHH-1237.
Update: There is an "interesting" paragraph in the JPA 1.0 spec about named parameters and native queries:
3.6.3 Named Parameters
A named parameter is an identifier
that is prefixed by the ":" symbol.
Named parameters are case-sensitive.
Named parameters follow the rules for
identifiers defined in Section 4.4.1.
The use of named parameters applies to the Java Persistence query
language, and is not defined for
native queries. Only positional
parameter binding may be portably used
for native queries.
The parameter names passed to the
setParameter methods of the Query
API do not include the ":" prefix.
This won't really help you but your case is a strong hint that the ":" in native queries shouldn't even be considered (at least not without a way to escape it or disable it detection).
Try this:
String query =
"SELECT foo, bar, baz,
#rownum \\\\:= if (#id = foo, #rownum+1, 1) as rownum,
#id \\\\:= foo as rep_id
FROM
foo_table
ORDER BY
foo,
bar desc -- escape='\' ";
Query q = getEntityManager().createNativeQuery(query, SomeClass.class);
return q.getResultList();
I just asked an SQL related question, and the first answer was: "This is a situation where dynamic SQL is the way to go."
As I had never heard of dynamic SQL before, I immediately searched this site and the web for what it was. Wikipedia has no article with this title. The first Google results all point to user forums where people ask more or less related questions.
However, I didn't find a clear definition of what a 'dynamic SQL' is. Is it something vendor specific? I work with MySQL and I didn't find a reference in the MySQL handbook (only questions, mostly unanswered, in the MySQL user forums).
On the other hand, I found many references to stored procedures. I have a slightly better grasp of what stored procedures are, although I have never used any. How are the two concepts related? Are they the same thing or does one uses the other?
Basically, what is needed is a simple introduction to dynamic SQL for someone who is new to the concept.
P.S.: If you feel like it, you may have a go at answering my previous question that prompted this one: SQL: How can we make a table1 JOIN table2 ON a table given in a field in table1?
Dynamic SQL is merely where the query has been built on the fly - with some vendors, you can build up the text of the dynamic query within one stored procedure, and then execute the generated SQL. In other cases, the term merely refers to a decision made by code on the client (this is at least vendor neutral)
Other answers have defined what dynamic SQL is, but I didn't see any other answers that attempted to describe why we sometimes need to use it. (My experience is SQL Server, but I think other products are generally similar in this respect.)
Dynamic SQL is useful when you are replacing parts of a query that can't be replaced using other methods.
For example, every time you call a query like:
SELECT OrderID, OrderDate, TotalPrice FROM Orders WHERE CustomerID = ??
you will be passing in a different value for CustomerID. This is the simplest case, and one that can by solved using a parameterized query, or a stored procedure that accepts a parameter, etc.
Generally speaking, dynamic SQL should be avoided in favor of parameterized queries, for performance and security reasons. (Although the performance difference probably varies quite a bit between vendors, and perhaps even between product versions, or even server configuration).
Other queries are possible to do using parameters, but might be simpler as dynamic SQL:
SELECT OrderID, OrderDate, TotalPrice FROM Orders
WHERE CustomerID IN (??,??,??)
If you always had 3 values, this is as easy as the first one. But what if this is a variable-length list? Its possible to do with parameters, but can be very difficult. How about:
SELECT OrderID, OrderDate, TotalPrice FROM Orders WHERE CustomerID = ??
ORDER BY ??
This can't be substituted directly, you can do it with a huge complicated CASE statement in the ORDER BY explicitly listing all possible fields, which may or may not be practical, depending on the number of fields available to sort by.
Finally, some queries simply CAN'T be done using any other method.
Let's say you have a bunch of Orders tables (not saying this is great design), but you might find yourself hoping you can do something like:
SELECT OrderID, OrderDate, TotalPrice FROM ?? WHERE CustomerID = ??
This can't be done using any other methods. In my environment, I frequently encounter queries like:
SELECT (programatically built list of fields)
FROM table1 INNER JOIN table2
(Optional INNER JOIN to table3)
WHERE (condition1)
AND (long list of other optional WHERE clauses)
Again, not saying that this is necessarily great design, but dynamic SQL is pretty much required for these types of queries.
Hope this helps.
Dynamic SQL is simply a SQL statement that is composed on the fly before being executed. For example, the following C# (using a parameterized query):
var command = new SqlCommand("select * from myTable where id = #someId");
command.Parameters.Add(new SqlParameter("#someId", idValue));
Could be re-written using dynamic sql as:
var command = new SqlCommand("select * from myTable where id = " + idValue);
Keep in mind, though, that Dynamic SQL is dangerous since it readily allows for SQL Injection attacks.
Dynamic SQL is a SQL built from strings at runtime. It is useful to dynamically set filters or other stuff.
An example:
declare #sql_clause varchar(1000)
declare #sql varchar(5000)
set #sql_clause = ' and '
set #sql = ' insert into #tmp
select
*
from Table
where propA = 1 '
if #param1 <> ''
begin
set #sql = #sql + #sql_clause + ' prop1 in (' + #param1 + ')'
end
if #param2 <> ''
begin
set #sql = #sql + #sql_clause + ' prop2 in (' + #param2 + ')'
end
exec(#sql)
It is exactly what Rowland mentioned. To elaborate on that a bit, take the following SQL:
Select * from table1 where id = 1
I am not sure which language you are using to connect to the database, but if I were to use C#, an example of a dynamic SQL query would be something like this:
string sqlCmd = "Select * from table1 where id = " + userid;
You want to avoid using dynamic SQL, because it becomes a bit cumbersome to keep integrity of the code if the query get too big. Also, very important, dynamic SQL is susceptible to SQL injection attacks.
A better way of writing the above statement would be to use parameters, if you are using SQL Server.
Rowland is correct, and as an addendum, unless you're properly using parameters (versus just concatonating parameter values inline from provided text, etc.) it can also be a security risk. It's also a bear to debug, etc.
Lastly, whenever you use dynamic SQL unwisely, things are unleashed and children are eaten.
To most databases, every SQL query is "dynamic" meaning that it is a program that is interpreted by the query optimiser given the input SQL string and possibly the parameter bindings ("bind variables").
Static SQL
However, most of the time, that SQL string is not constructed dynamically but statically, either in procedural languages like PL/SQL:
FOR rec IN (SELECT * FROM foo WHERE x = 1) LOOP
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ "static SQL"
..
END LOOP;
Or in client / host languages like Java, using JDBC:
try (ResultSet rs = stmt.executeQuery("SELECT * FROM foo WHERE x = 1")) {
// "static SQL" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
..
}
In both cases, the SQL string is "static" in the language that embeds it. Technically, it will still be "dynamic" to the SQL engine, which doesn't know how the SQL string is constructed, nor that it was a static SQL string.
Dynamic SQL
Sometimes, the SQL string needs to be constructed dynamically, given some input parameters. E.g. the above query might not need any predicate at all in some cases.
You might then choose to proceed to constructing the string dynamically, e.g. in PL/SQL:
DECLARE
TYPE foo_c IS REF CURSOR;
v_foo_c foo_c;
v_foo foo%ROWTYPE;
sql VARCHAR2(1000);
BEGIN
sql := 'SELECT * FROM foo';
IF something THEN
sql := sql || ' WHERE x = 1'; -- Beware of syntax errors and SQL injection!
END IF;
OPEN v_foo_c FOR sql;
LOOP
FETCH v_foo_c INTO v_foo;
EXIT WHEN v_foo_c%NOTFOUND;
END LOOP;
END;
Or in Java / JDBC:
String sql = "SELECT * FROM foo";
if (something)
sql += " WHERE x = 1"; // Beware of syntax errors and SQL injection!
try (ResultSet rs = stmt.executeQuery(sql)) {
..
}
Or in Java using a SQL builder like jOOQ
// No syntax error / SQL injection risk here
Condition condition = something ? FOO.X.eq(1) : DSL.trueCondition();
for (FooRecord foo : DSL.using(configuration)
.selectFrom(FOO)
.where(condition)) {
..
}
Many languages have query builder libraries like the above, which shine most when doing dynamic SQL.
(Disclaimer: I work for the company behind jOOQ)
Is it something vendor specific?
The SQL-92 Standard has a whole chapter on dynamic SQL (chapter 17) but it only applies to FULL SQL-92 and I know of no vendor that has implemented it.
I think what's meant is that you should build the query dynamically before executing it. For your other questions this means that you should select the table name you need first and the use your programming language to build a second query for doing what you want (what you want to do in the other question isn't possible directly like you want).