pandas dataframe index datetime.date converts to object KeyError - mysql

I retrieve some data from my MySQL database. This data has the date (not datetime) in one column and some other random data in the other columns. Let's say dtf is my dataframe. There is no index yet so I set one
dtf.set_index('date', inplace=True)
Now I would like to get data from a specific date so I write for example
dtf.loc['2000-01-03']
or just
dtf['2000-01-03']
This gives me a KeyError:
KeyError: '2000-01-03'
But I know its in there. dtf.head() shows me that.
So I did take a look at the type of the index of the first row:
type(dtf.index[0])
and it tells me: datetime.date. All good, now what happens if I just type
dtf.index
Index([2000-01-03, 2000-01-04, 2000-01-05, 2000-01-06, 2000-01-07, 2000-01-10,
2000-01-11, 2000-01-12, 2000-01-13, 2000-01-14,
...
2015-09-09, 2015-09-10, 2015-09-11, 2015-09-14, 2015-09-15, 2015-09-16,
2015-09-17, 2015-09-18, 2015-09-21, 2015-09-22],
dtype='object', name='date', length=2763)
I am a bit confused about the dtype='object'. Shouldn't this read datetime.date?
If I use datetime in my mysql table instead of date everything works like a charm. Is this a bug or a feature? I really would like to use datetime.date because it describes my data best.
My pandas version is 0.17.0
I am using python 3.5.0
My os is arch linux

You should use datetime64/Timestamp rather than datetime.datetime:
dtf.index = pd.to_datetime(dtf.index)
will mean you have a DatetimeIndex and can do nifty things like loc by strings.
dtf.loc['2000-01-03']
You won't be able to do that with datetime.datetime.

Related

Python MySQL reads the data wrong for Timestamp(3)

I recently met with a weird problem about SQL timestamp.
I created a table and the column was like
`time` TIMESTAMP(3) DEFAULT '1970-01-01 08:00:01.000'
And I manually inserted 2021-03-18 17:00:32.123
And what I read through python mysql.connector is 2021-03-18 17:00:32.000123 ?????
Seems like I found the rule, then I changed it to TIMESTAMP(1). Guess what I got 2021-03-18 17:00:32.000001
Obviously, it's the way in contrast, what could be the problem, thanks
--- Update---
For the python code, there's nothing special
cursor.execute("select time from table")
times = list(cursor)
And from the debug console, I can see the time is incorrect, as well as the timestamp in UNIX, like
unixTime = times[0].timestamp()
The Unix time will be something like XXXX.000123 instead of XXXX.123
But I can get the correct result from UNIX_TIMESTAMP(), like
cursor.execute("select UNIX_TIMESTAMP(time) from table")
So I think it seems like python mysql lib seems didn't get or convert the format correctly.

Has Python's string formatter changes in recent editions broken the MySQL connector?

I'm writing a simple - or it should be simple - script to acquire tweets from Twitter's API (I have developer/app keys and am using the Tweepy interface, not scraping or anything of that sort - I may ditch Tweepy for something closer to the modern API but that is almost certainly not what's causing this issue here).
I have a MySQL instance which I connect to and can query just fine, until it comes time to insert the tweet - which has a lot of special characters, almost inevitably. To be clear, I am using the official Python driver/connector for MySQL.
import mysql.connector
from mysql.connector import errorcode
Now, I'm aware StackOverflow is LITTERED with threads where people get my exact error - simply stating to check the MySQL syntax manual. These threads, which aren't all that old (and I'm not using the latest Python, I use 3.7.9 for compatibility with some NLP libraries) insist the answer is to place the string that has the special characters into an old-style format string WITHIN the cursor.execute method, to enclose string variable placeholders in quotes, and to pass a tuple with an empty second value if, as in my case, only one variable is to be inserted. This is also a solution posted as part of a bug report response on the MySQL website - and yet, I have no success.
Here's what I've got - following the directions on dozens of pages here and the official database website:
for tweet in tweepy.Cursor(twilek.search, q=keyword, tweet_mode='extended').items():
twi_tweet = tweet.full_text
print(twi_tweet)
twi_tweet = twi_tweet.encode('utf8')
requests_total+=1
os.environ['TWITTER_REQUESTS'] = str(requests_total)
requests_total = int(os.environ.get('TWITTER_REQUESTS'))
# insert the archived tweet text into the database table
sql = 'USE hate_tweets'
ms_cur.execute(sql)
twi_tweet = str(twi_tweet)
insert_tweet = re.sub(r'[^A-Za-z0-9 ]+', '', twi_tweet)
ms_cur.execute("INSERT INTO tweets_lgbt (text) VALUES %s" % (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
(twilek is my cursor object because I'm a dork)
expected result: string formatter passes MySQL a modified tweet string that it can process and add as a row to the tweets_lgbt table
actual result: insertion fails on a syntax error for any tweet
I've tried going so far as to use regex to strip everything but alphanumeric and spaces - same issue. I'm wondering if the new string format features of current Python versions have broken compatibility with this connector? I prefer to use the official driver but I'll switch to an ORM if I must. (I did try the newer features like F strings, and found they caused the same result.)
I have these observations:
the VALUES clause requires parentheses VALUES (%s)
the quoting / escaping of values should be delegated to the cursor's execute method, by using unquoted placeholders in the SQL and passing the values as the second argument: cursor.execute(sql, (tweet_text,)) or cursor.executemany(sql, [(tweet_text1,), (tweet_text2,)])
once these steps are applied there's no need for encoding/stringifying/regex-ifying: assuming twi_text is a str and the database's charset/collation supports the full UTF-8 range (for example utf8mb4) then the insert should succeed.
in particular, encoding a str and then calling str on the result is to be avoided: you end up with "b'my original string'"
This modified version of the code in the question works for me:
import mysql.connector
DDL1 = """DROP TABLE IF EXISTS tweets_lgbt"""
DDL2 = """\
CREATE TABLE tweets_lgbt (
`text` VARCHAR (256))
"""
# From https://twitter.com/AlisonMitchell/status/1332567013701500928?s=20
insert_tweet = """\
Particularly pleased to see #SarahStylesAU
quoted in this piece for the work she did
👌
Thrive like a girl: Why women's cricket in Australia is setting the standard
"""
# Older connector releases don't support with...
with mysql.connector.connect(database='test') as cnx:
with cnx.cursor() as ms_cur:
ms_cur.execute(DDL1)
ms_cur.execute(DDL2)
ms_cur.execute("INSERT INTO tweets_lgbt (`text`) VALUES (%s)", (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
This is how you should insert a row to your table,
insert_tweet = "ABCEFg 9 XYZ"
"INSERT INTO tweets_lgbt (text) VALUES ('%s');"%(insert_tweet)
"INSERT INTO tweets_lgbt (text) VALUES ('ABCEFg 9 XYZ');"
Things to note
The arguments to a string formatter is just like the arguments to a
function. So, you cannot add a comma at the end to convert a string
to a tuple there.
If you are trying to insert multiple values at once, you can use cursor.executemany or this answer.

Can store and retrieve Object Django BinaryField in SQLite but not MySQL

I have some implementation that is best served by pickling a pandas dataframe and storing it in a DB.
This works fine if the database is sqlite but fails with a load error when it is MySQL
I have found other people with similar issues on stackoverflow and google but it seems that everybodys solution is to use sql to store the dataframe.
As a last resort I would go down that route but it would be a shame for this use case to do that.
Anybody got a solution to get the same behaviour from mysql as sqlite here?
I simply dump the dataframe with
pickledframe = pickle.dumps(frame)
and store pickledframe as a BinaryField
pickledframe = models.BinaryField(null=True)
I load it in with
unpickled = pickle.loads(pickledframe)
with sqlite it works fine, with mysql I get
Exception Type: UnpicklingError
Exception Value: invalid load key, ','.
upon trying to load it.
Thanks

How to parse json column in MySQL Version < 5.7

I have a column called inventory.
id ..... inventory
the content of this column is:
{"STC1":{"count":"1"},"STC2":{"count":0}}
the count value is variable.
I don't want to use application side, I want to use sql.
for example
what I want:
... where STC1.count > 0
or
... where STC1.count > 1 or STC2.count < 5
You can't, according to the official MySQL website, the JSON support has been introduced for the 5.7.8 version, it isn't available natively for older versions.
For MySQL < 5.7.8, this JSON content is just a string, MySQL has nothing to extract structured data from it.
At best you'll be abble to check WHERE inventory LIKE '%"STC1":{"count":"0"%' to detect the row with a STC1 at 0 or something like that but not much more, and it will rapidly becoming really hairy to do anything complex.
In general it's better to store atomic data to avoid this kind of problem.

Mysql "Time" type gives an "ArgumentError: argument out of range" in Rails if over 24 hours

I'm writing a rails application on top of a legacy mysql db which also feeds a PHP production tool. Because of this setup so its not possible for me to change the databases structure.
The problem I'm having is that two table have a "time" attribute (duration) as long as the time is under 24:00:00 rails handles this, but as soon as rails comes across something like 39:00:34 I get this "ArgumentError: argument out of range".
I've looked into this problem and seen how rails handle the time type, and from my understanding it treats it like a datetime, so a value of 39:00:34 would throw this error.
I need some way of mapping / or changing the type cast so I don't get this error. Reading the value as a string would also be fine.
Any ideas would be most appreciated.
Cheers
I'm not familiar with Rails so there can be a clean, native solution to this, but if all else fails, one workaround might be writing into a VARCHAR field, then running a 2nd query to copy it over into a TIME field within mySQL:
INSERT INTO tablename (name, stringfield)
VALUES ("My Record", "999:02:02");
UPDATE tablename SET datefield = CAST(stringfield as TIME)
WHERE id = LAST_INSERT_ID();