Since it may be efficient to paste a flawed sql query directly into a database administration tool such as phpmyadmin in order to work on it until it returns the expected result,
Is there any way to retrieve the ultimate sql sentence Sqlalchemy Core supposedly passes to the MySql database, in a ready-to-execute shape ?
This typically means that you want the bound parameters to be rendered inline. There is limited support for this automatically (as of SQLA 0.9 this will work):
from sqlalchemy.sql import table, column, select
t = table('x', column('a'), column('b'))
stmt = select([t.c.a, t.c.b]).where(t.c.a > 5).where(t.c.b == 10)
print(stmt.compile(compile_kwargs={"literal_binds": True}))
also you'd probably want the query to be MySQL specific, so if you already have an engine lying around you can pass that in too:
from sqlalchemy import create_engine
engine = create_engine("mysql://")
print(stmt.compile(engine, compile_kwargs={"literal_binds": True}))
and it prints:
SELECT x.a, x.b
FROM x
WHERE x.a > 5 AND x.b = 10
now, if you have more elaborate values in the parameters, like dates, SQLAlchemy might throw an error, it only has "literal binds" renderers for a very limited number of types. An approach that bypasses that system instead and gives you a pretty direct shot at turning those parameters into strings is then do to a "search and replace" on the statement object, replacing the bound parameters with literal strings:
from sqlalchemy.sql import visitors, literal_column
from sqlalchemy.sql.expression import BindParameter
def _replace(arg):
if isinstance(arg, BindParameter):
return literal_column(
repr(arg.effective_value) # <- do any fancier conversion here
)
stmt = visitors.replacement_traverse(stmt, {}, _replace)
once you do that you can just print it:
print(stmt)
or the MySQL version:
print(stmt.compile(engine))
Related
I am creating an app which performs raw queries across different databases and I am struggling with list parameters (IN).
I use SQLAlchemy for performing these queries.
I want to perform a query that accepts list parameter and that parameter might be NULL, which means I don't have to filter by field.
from sqlalchemy import create_engine, text
SQL = """SELECT group, count(1) cnt
FROM some_table
WHERE group IN :groups OR :groups IS NULL
GROUP BY group
"""
params = {'groups': ('group1', 'group2')}
engine = create_engine(connection_string)
query = text(SQL).bindparams(**params)
cursor = engine.execute(query)
Currently I'm testing it on PostgreSQL, MySQL and SQLite, but in production mode it is also supposed to work with SQL Server and Oracle.
The code above works only on PostgreSQL, however if I change params with None
params = {'groups': None}
The code wouldn't work on any databases.
Is there workaround for this problem?
I understand that solution might be specific for each RDBMS.
I'm writing a simple - or it should be simple - script to acquire tweets from Twitter's API (I have developer/app keys and am using the Tweepy interface, not scraping or anything of that sort - I may ditch Tweepy for something closer to the modern API but that is almost certainly not what's causing this issue here).
I have a MySQL instance which I connect to and can query just fine, until it comes time to insert the tweet - which has a lot of special characters, almost inevitably. To be clear, I am using the official Python driver/connector for MySQL.
import mysql.connector
from mysql.connector import errorcode
Now, I'm aware StackOverflow is LITTERED with threads where people get my exact error - simply stating to check the MySQL syntax manual. These threads, which aren't all that old (and I'm not using the latest Python, I use 3.7.9 for compatibility with some NLP libraries) insist the answer is to place the string that has the special characters into an old-style format string WITHIN the cursor.execute method, to enclose string variable placeholders in quotes, and to pass a tuple with an empty second value if, as in my case, only one variable is to be inserted. This is also a solution posted as part of a bug report response on the MySQL website - and yet, I have no success.
Here's what I've got - following the directions on dozens of pages here and the official database website:
for tweet in tweepy.Cursor(twilek.search, q=keyword, tweet_mode='extended').items():
twi_tweet = tweet.full_text
print(twi_tweet)
twi_tweet = twi_tweet.encode('utf8')
requests_total+=1
os.environ['TWITTER_REQUESTS'] = str(requests_total)
requests_total = int(os.environ.get('TWITTER_REQUESTS'))
# insert the archived tweet text into the database table
sql = 'USE hate_tweets'
ms_cur.execute(sql)
twi_tweet = str(twi_tweet)
insert_tweet = re.sub(r'[^A-Za-z0-9 ]+', '', twi_tweet)
ms_cur.execute("INSERT INTO tweets_lgbt (text) VALUES %s" % (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
(twilek is my cursor object because I'm a dork)
expected result: string formatter passes MySQL a modified tweet string that it can process and add as a row to the tweets_lgbt table
actual result: insertion fails on a syntax error for any tweet
I've tried going so far as to use regex to strip everything but alphanumeric and spaces - same issue. I'm wondering if the new string format features of current Python versions have broken compatibility with this connector? I prefer to use the official driver but I'll switch to an ORM if I must. (I did try the newer features like F strings, and found they caused the same result.)
I have these observations:
the VALUES clause requires parentheses VALUES (%s)
the quoting / escaping of values should be delegated to the cursor's execute method, by using unquoted placeholders in the SQL and passing the values as the second argument: cursor.execute(sql, (tweet_text,)) or cursor.executemany(sql, [(tweet_text1,), (tweet_text2,)])
once these steps are applied there's no need for encoding/stringifying/regex-ifying: assuming twi_text is a str and the database's charset/collation supports the full UTF-8 range (for example utf8mb4) then the insert should succeed.
in particular, encoding a str and then calling str on the result is to be avoided: you end up with "b'my original string'"
This modified version of the code in the question works for me:
import mysql.connector
DDL1 = """DROP TABLE IF EXISTS tweets_lgbt"""
DDL2 = """\
CREATE TABLE tweets_lgbt (
`text` VARCHAR (256))
"""
# From https://twitter.com/AlisonMitchell/status/1332567013701500928?s=20
insert_tweet = """\
Particularly pleased to see #SarahStylesAU
quoted in this piece for the work she did
👌
Thrive like a girl: Why women's cricket in Australia is setting the standard
"""
# Older connector releases don't support with...
with mysql.connector.connect(database='test') as cnx:
with cnx.cursor() as ms_cur:
ms_cur.execute(DDL1)
ms_cur.execute(DDL2)
ms_cur.execute("INSERT INTO tweets_lgbt (`text`) VALUES (%s)", (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
This is how you should insert a row to your table,
insert_tweet = "ABCEFg 9 XYZ"
"INSERT INTO tweets_lgbt (text) VALUES ('%s');"%(insert_tweet)
"INSERT INTO tweets_lgbt (text) VALUES ('ABCEFg 9 XYZ');"
Things to note
The arguments to a string formatter is just like the arguments to a
function. So, you cannot add a comma at the end to convert a string
to a tuple there.
If you are trying to insert multiple values at once, you can use cursor.executemany or this answer.
I am importing data into my Python3 environment and then writing it to a MySQL database. However, there is a lot of different data tables, and so writing out each INSERT statement isn't really pragmatic, plus some have 50+ columns.
Is there a good way to create a table in MySQL directly from a dataframe, and then send insert commands to that same table using a dataframe of the same format, without having to actually type out all the col names? I started trying to call column names and format it and concat everything as a string, but it is extremely messy.
Ideally there is a function out there to directly handle this. For example:
apiconn.request("GET", url, headers=datheaders)
#pull in some JSON data from an API
eventres = apiconn.getresponse()
eventjson = json.loads(eventres.read().decode("utf-8"))
#create a dataframe from the data
eventtable = json_normalize(eventjson)
dbconn = pymysql.connect(host='hostval',
user='userval',
passwd='passval',
db='dbval')
cursor = dbconn.cursor()
sql = sqltranslate(table = 'eventtable', fun = 'append')
#where sqlwrite() is some magic function that takes a dataframe and
#creates SQL commands that pymysql can execute.
cursor.execute(sql)
What you want is a way to abstract the generation of the SQL statements.
A library like SQLAlchemy will do a good job, including a powerful way to construct DDL, DML, and DQL statements without needing to directly write any SQL.
I have been googling and reading through the SQLAlchemy documentation but haven't found what I am looking for.
I am looking for a function in SQLAlchemy that limits the number of results returned by a query to a certain number, for example: 5? Something like first() or all().
for sqlalchemy >= 1.0.13
Use the limit method.
query(Model).filter(something).limit(5).all()
Alternative syntax
query.(Model).filter(something)[:5].all()
If you need it for pagination you can do like this:
query = db.session.query(Table1, Table2, ...).filter(...)
if page_size is not None:
query = query.limit(page_size)
if page is not None:
query = query.offset(page*page_size)
query = query.all()
Or if you query one table and have a model for it you can:
query = (Model.query(...).filter(...))
.paginate(page=start, per_page=size))
Since v1.4, SQLAlchemy core's select function provides a fetch method for RDBMS that support FETCH clauses*. FETCH was defined in the SQL 2008 standard to provide a consistent way to request a partial result, as LIMIT/OFFSET is not standard.
Example:
# As with limit queries, it's usually sensible to order
# the results to ensure results are consistent.
q = select(tbl).order_by(tbl.c.id).fetch(10)
# Offset is supported, but it is inefficient for large resultsets.
q_with_offset = select(tbl).order_by(tbl.c.id).offset(10).fetch(10)
# A suitable where clause may be more efficient
q = (select(tbl)
.where(tbl.c.id > max_id_from_previous_query)
.order_by(tbl.c.id)
.fetch(10)
)
The syntax is supported in the ORM layer since v1.4.38. It is only supported for 2.0-style select on models; the legacy session.query syntax does not support it.
q = select(Model).order_by(Model.id).fetch(10)
* Currently Oracle, PostgreSQL and MSSQL.
In my case it works like
def get_members():
m = Member.query[:30]
return m
I have a model with PointField for location coordinates. I have a MySQL function that calculates the distance between two points called dist. I use extra() "select" to calculate distance for each returned object in the queryset. I also use extra() "where" to filter those objects that are within a specific range. Like this
query = queryset.extra(
select={
"distance":"dist(geomfromtext('%s'),geomfromtext('%s'))"%(loc1, loc2)
},
where=["1 having `distance` <= %s"%(km)]
) #simplified example
This works fine for getting and reading the results, except when I try counting the resultset I get the error that 'distance' is not a field. After exploring a bit further, it seems that count ignores the "select" from extra and just uses "where". While the full SQL query looks like this:
SELECT (dist(geomfromtext('POINT (-4.6858300000000003 36.5154300000000021)'),geomfromtext('POINT (-4.8858300000000003 36.5154300000000021)'))) AS `distance`, `testmodel`.`id`, `testmodel`.`name`, `testmodel`.`email`, (...) FROM `testmodel` WHERE 1 having `distance` <= 50.0
The count query is much shorter and doesn't have the dist selection part:
SELECT COUNT( `testmodel`.`id`) FROM `testmodel` WHERE 1 having `distance` <= 50.0
Logically, MySQL gives an error because "distance" is undefined. Is there a way to tell Django it has to include the extra select for the count?
Thanks for any ideas!
You could use a raw query if you are not plannig to use any other database system.
params = {'point1':wktpoint1, 'point2':wktpoint2}
query = """
SELECT
dist(%(point1)s, %(point2)s)
FROM
testmodel
;"""
query_set = self.raw(query, params)
Also, if you need more GIS support, you should evaluate PostgreSQL+PostGIS (If you don't like to reinvent the wheel, you should not make your own dist function)
Django offers GIS support through GeoDjango. There you got functions like distance. You should check support here
In order to use GeoDjango you need to add a field on yout model, to tell them to use the GeoManager, Then you can start doing geoqueries, and you should have no problems with count.
with mysql you cando something like this using geodjango
### models.py
from django.contrib.gis.db import models
class YourModel(models.Model):
your_geo_field=models.PolygonField()
#your_geo_field=models.PointField()
#your_geo_field=models.GeometryField()
objects = models.GeoManager()
### your code
from django.contrib.gis.geos import *
from django.contrib.gis.measure import D
a_geom=fromstr('POINT(-96.876369 29.905320)', srid=4326)
distance=5
YoourModel.objects.filter(your_geo_field__distance_lt=(a_geom, D(m=distance))).count()
you can see better examples here and the reference here