I need SQLAlchemy to check a database table column for occurrences of python-pickled strings (such as S'foo'\np0\n.), unpickle them (which in this example case would yield foo) , and write them back. How do I do that (efficiently)? (Can I somehow abuse SQLAlchemy's PickleType?)
Okay, found a way using sqlalchemy.sql.expression.func.substr:
from sqlalchemy.sql.expression import func
table.update().where(
and_(table.c.column.startswith("S'"),
table.c.column.endswith("'\np0\n."))
).values({table.c.column:
func.substr(table.c.column,
3,
func.char_length(table.c.column)-8)
}).execute()
Related
I want a "group by and count" command in sqlalchemy. How can I do this?
The documentation on counting says that for group_by queries it is better to use func.count():
from sqlalchemy import func
session.query(Table.column,
func.count(Table.column)).group_by(Table.column).all()
If you are using Table.query property:
from sqlalchemy import func
Table.query.with_entities(Table.column, func.count(Table.column)).group_by(Table.column).all()
If you are using session.query() method (as stated in miniwark's answer):
from sqlalchemy import func
session.query(Table.column, func.count(Table.column)).group_by(Table.column).all()
You can also count on multiple groups and their intersection:
self.session.query(func.count(Table.column1),Table.column1, Table.column2).group_by(Table.column1, Table.column2).all()
The query above will return counts for all possible combinations of values from both columns.
I'm writing a simple - or it should be simple - script to acquire tweets from Twitter's API (I have developer/app keys and am using the Tweepy interface, not scraping or anything of that sort - I may ditch Tweepy for something closer to the modern API but that is almost certainly not what's causing this issue here).
I have a MySQL instance which I connect to and can query just fine, until it comes time to insert the tweet - which has a lot of special characters, almost inevitably. To be clear, I am using the official Python driver/connector for MySQL.
import mysql.connector
from mysql.connector import errorcode
Now, I'm aware StackOverflow is LITTERED with threads where people get my exact error - simply stating to check the MySQL syntax manual. These threads, which aren't all that old (and I'm not using the latest Python, I use 3.7.9 for compatibility with some NLP libraries) insist the answer is to place the string that has the special characters into an old-style format string WITHIN the cursor.execute method, to enclose string variable placeholders in quotes, and to pass a tuple with an empty second value if, as in my case, only one variable is to be inserted. This is also a solution posted as part of a bug report response on the MySQL website - and yet, I have no success.
Here's what I've got - following the directions on dozens of pages here and the official database website:
for tweet in tweepy.Cursor(twilek.search, q=keyword, tweet_mode='extended').items():
twi_tweet = tweet.full_text
print(twi_tweet)
twi_tweet = twi_tweet.encode('utf8')
requests_total+=1
os.environ['TWITTER_REQUESTS'] = str(requests_total)
requests_total = int(os.environ.get('TWITTER_REQUESTS'))
# insert the archived tweet text into the database table
sql = 'USE hate_tweets'
ms_cur.execute(sql)
twi_tweet = str(twi_tweet)
insert_tweet = re.sub(r'[^A-Za-z0-9 ]+', '', twi_tweet)
ms_cur.execute("INSERT INTO tweets_lgbt (text) VALUES %s" % (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
(twilek is my cursor object because I'm a dork)
expected result: string formatter passes MySQL a modified tweet string that it can process and add as a row to the tweets_lgbt table
actual result: insertion fails on a syntax error for any tweet
I've tried going so far as to use regex to strip everything but alphanumeric and spaces - same issue. I'm wondering if the new string format features of current Python versions have broken compatibility with this connector? I prefer to use the official driver but I'll switch to an ORM if I must. (I did try the newer features like F strings, and found they caused the same result.)
I have these observations:
the VALUES clause requires parentheses VALUES (%s)
the quoting / escaping of values should be delegated to the cursor's execute method, by using unquoted placeholders in the SQL and passing the values as the second argument: cursor.execute(sql, (tweet_text,)) or cursor.executemany(sql, [(tweet_text1,), (tweet_text2,)])
once these steps are applied there's no need for encoding/stringifying/regex-ifying: assuming twi_text is a str and the database's charset/collation supports the full UTF-8 range (for example utf8mb4) then the insert should succeed.
in particular, encoding a str and then calling str on the result is to be avoided: you end up with "b'my original string'"
This modified version of the code in the question works for me:
import mysql.connector
DDL1 = """DROP TABLE IF EXISTS tweets_lgbt"""
DDL2 = """\
CREATE TABLE tweets_lgbt (
`text` VARCHAR (256))
"""
# From https://twitter.com/AlisonMitchell/status/1332567013701500928?s=20
insert_tweet = """\
Particularly pleased to see #SarahStylesAU
quoted in this piece for the work she did
👌
Thrive like a girl: Why women's cricket in Australia is setting the standard
"""
# Older connector releases don't support with...
with mysql.connector.connect(database='test') as cnx:
with cnx.cursor() as ms_cur:
ms_cur.execute(DDL1)
ms_cur.execute(DDL2)
ms_cur.execute("INSERT INTO tweets_lgbt (`text`) VALUES (%s)", (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
This is how you should insert a row to your table,
insert_tweet = "ABCEFg 9 XYZ"
"INSERT INTO tweets_lgbt (text) VALUES ('%s');"%(insert_tweet)
"INSERT INTO tweets_lgbt (text) VALUES ('ABCEFg 9 XYZ');"
Things to note
The arguments to a string formatter is just like the arguments to a
function. So, you cannot add a comma at the end to convert a string
to a tuple there.
If you are trying to insert multiple values at once, you can use cursor.executemany or this answer.
I use the admin-import tool of Neo4j to import bulk data in csv format. I use Integer as ID datatype in the header [journal:ID:int(Journal-ID)] and the part of importing the nodes works fine. When the import-tool comes to the relationships, I get the error that the referring node is missing.
Seems like the relations-import it is searching the ID in String format.
I already tried to change the type of the ID in the relations File as well, but get an other error. I found no way to specify the ID as int in the relations-File.
Here is an minimal example. Lets say we have two node types with the headers:
journal:ID:int(Journal-ID)
and
documentID:ID(Document-ID),title
and the example files journal.csv:
"123"
"987"
and document.csv:
"PMID:1", "Title"
"PMID:2", "Other Title"
We also have a relation "hasDocument" with the header:
:START_ID(Journal-ID),:END_ID(Document-ID)
and the example file relation.csv:
"123", "PMID:1"
When running the import I get the the error:
Error in input data
Caused by:123 (Journal-ID)-[hasDocument]->PMID:1 (Document-ID) referring to missing node 123
I tried to specify the relation header as
:START_ID:int(Journal-ID),:END_ID(Document-ID)
but this also produces an error.
The command to start the import is:
neo4j-admin import --nodes:Document="document-header.csv,documentNodes.csv" --nodes:Journal="journal-header.csv,journalNodes.csv" --relationships:hasDocument="hasDocument-header.csv,relationsHasDocument.csv"
Is there a way to specify the ID in the relation file as Integer or is there an other solution to that problem?
It doesn't seem to be supported. The documentation doesn't mention it and the code doesn't have such test case.
You could import the data with String ids and cast it after you start the database.
MATCH (j:Journal)
SET j.id = toInteger(j.id)
If your dataset is large you can use apoc with iterate:
call apoc.periodic.iterate("
MATCH (j:Journal) RETURN j
","
SET j.id = toInteger(j.id)
",{batchSize:10000})
I am using postgresql db with sqlalchemy
When I use the query select now() directly I get a result that can be converted into string, but I can't produce this output using sqlalchemy.
Already I have used the following module which is not giving me the result I needed
from sqlalchemy: import func
The func module is a proxy that creates functions. So func.now() will produce the column you want.
now = session.query(func.now()).scalar()
This returns a Python datetime object.
Since it may be efficient to paste a flawed sql query directly into a database administration tool such as phpmyadmin in order to work on it until it returns the expected result,
Is there any way to retrieve the ultimate sql sentence Sqlalchemy Core supposedly passes to the MySql database, in a ready-to-execute shape ?
This typically means that you want the bound parameters to be rendered inline. There is limited support for this automatically (as of SQLA 0.9 this will work):
from sqlalchemy.sql import table, column, select
t = table('x', column('a'), column('b'))
stmt = select([t.c.a, t.c.b]).where(t.c.a > 5).where(t.c.b == 10)
print(stmt.compile(compile_kwargs={"literal_binds": True}))
also you'd probably want the query to be MySQL specific, so if you already have an engine lying around you can pass that in too:
from sqlalchemy import create_engine
engine = create_engine("mysql://")
print(stmt.compile(engine, compile_kwargs={"literal_binds": True}))
and it prints:
SELECT x.a, x.b
FROM x
WHERE x.a > 5 AND x.b = 10
now, if you have more elaborate values in the parameters, like dates, SQLAlchemy might throw an error, it only has "literal binds" renderers for a very limited number of types. An approach that bypasses that system instead and gives you a pretty direct shot at turning those parameters into strings is then do to a "search and replace" on the statement object, replacing the bound parameters with literal strings:
from sqlalchemy.sql import visitors, literal_column
from sqlalchemy.sql.expression import BindParameter
def _replace(arg):
if isinstance(arg, BindParameter):
return literal_column(
repr(arg.effective_value) # <- do any fancier conversion here
)
stmt = visitors.replacement_traverse(stmt, {}, _replace)
once you do that you can just print it:
print(stmt)
or the MySQL version:
print(stmt.compile(engine))