SQLAlchemy reports "Invalid utf8mb4 character string" for BINARY column - sqlalchemy

Assuming this MySQL table schema:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uuid` binary(16) NOT NULL,
`email` varchar(255) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`photo` binary(16) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uuid` (`uuid`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4;
When I use the execute() API from SQLAlchemy connection class as such:
with self.engine.begin() as connection:
user_uuid = uuid.UUID("...")
result = connection.execute("SELECT email, name, photo FROM user WHERE uuid=%s", user_uuid.bytes)
If the UUID is F393A167-A919-4B50-BBB7-4AD356E89E6B, then SQLAlchemy prints this warning:
/site-packages/sqlalchemy/engine/default.py:450: Warning: Invalid utf8mb4 character string: 'F393A1'
The uuid column is a BINARY column, so why is SQLAlchemy considering this parameter a text one instead of a binary one and how to prevent this?

The explanation and solution is actually in this bug report in MySQL:
replace:
cursor.execute("""
INSERT INTO user (uuid)
VALUES (%s)
""", my_uuid)
with
cursor.execute("""
INSERT INTO user (uuid)
VALUES (_binary %s)
""", my_uuid)
Mind the underscore. It's "_binary", not "binary".
This "_binary" tells MySQL that the following string is to be interpreted as binary, not to be interpreted/validated as utf8.

The problem doesn't happen on Python 3, so I think that the problem is that the database driver is unable to distinguish what should be bytes given the Python 2 str type.
Regardless, it seems using SQLAlchemy core directly works correctly, presumably because it knows the column type directly.
from sqlalchemy import MetaData, Table, select
meta = MetaData()
user = Table('user', meta, autoload_with=engine)
result = select([user]).where(user.c.uuid == user_uuid.bytes)
If you wish to continue executing a string, you can cast to bytesarray like SQLAlchemy appears to be doing:
with self.engine.begin() as connection:
user_uuid = uuid.UUID("...")
result = connection.execute(
"SELECT email, name, photo FROM user WHERE uuid=%s",
bytearray(user_uuid.bytes))
Or to tell SQLAlchemy what type the bound parameter is to get this automatically:
from sqlalchemy import text, bindparam, BINARY
with self.engine.begin() as connection:
user_uuid = uuid.UUID("...")
stmt = text("SELECT email, name, photo FROM user WHERE uuid = :uuid")
stmt = stmt.bindparams(bindparam('uuid', user_uuid.bytes, type_=BINARY))
result = connection.execute(stmt)

Related

Using json blob field attributes in MySQL where clause

I am using MySQL 5.7 and have a table with following schema
CREATE TABLE `Test` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
`created_by` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`status` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`metadata` blob COMMENT 'to capture the custom metadata',
`created_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci'
And the sample row data for the table looks like this
1234,user1,open,"{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}",2021-05-18 16:01:25
I want to select rows from this table based on the keys in json blob field metadata; for example, let's say where key1 = 'value1'. So I tried something like this
select * from `test` where metadata->>"$.key1" = "value1";
But I got this error Cannot create a JSON value from a string with CHARACTER SET 'binary'. So I casted it to json first by something like below
select JSON_EXTRACT(CAST(metadata as JSON), "$") as meta from test;
The problem is this returns base64 encoded string and when I try to decode the same using FROM_BASE64 like below I get null values in the column.
select FROM_BASE64(JSON_EXTRACT(CAST(metadata as JSON), "$")) as meta from test;
So I think I have two problems here: the first one being how to decode the base64 encoded data which I get after casting blob as json, and second how to filter the rows based on keys in the metadata field.
I do feel this as a design error where the most ideal data type should have been json but since this is how it is now, I need some way to workaround this.
Edit
I also tried following as suggested in one of the comments
select cast(convert(cast(metadata as char) using utf8) as json) from test;
but I get this error
Data truncation: Invalid JSON text in argument 1 to function cast_as_json: "Missing a name for object member." at position 1
Is there any way I can work around this ?

Error : "Not all parameters were used in the SQL statement" Insert and update in mariadb using Python

I am inserting records in maria db table from a file using python. Input file has header. Some of the columns are fully or partial empty. I am trying below code -
Table Definition -
CREATE TABLE `local_db`.`table_x` (
`Unique_code` varchar(50) NOT NULL,
`city` varchar(200) DEFAULT NULL,
`state` varchar(50) DEFAULT NULL,
`population` bigint(20) DEFAULT NULL,
`Govt` varchar(50) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
input_file = "C:\\Users\\input_file.csv"
csv_data = csv.reader(open(input_file))
try:
connection = mysql.connector.connect(host='localhost',
database='local_db',
user='root',
password='root',
port = '3306')
cursor = connection.cursor()
for row in csv_data:
sql = """
INSERT INTO table_x(Unique_code,city,state,population,Govt) \
VALUES(?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE city = VALUES(city),state = VALUES(state), \
population = VALUES(population),Govt = VALUES(Govt)"""
cursor.execute(sql, row)
connection.commit()
print(cursor.rowcount, "Record inserted successfully into table_x")
cursor.close()
except mysql.connector.Error as error:
print("Failed to insert record into table_x table {}".format(error))
finally:
if (connection.is_connected()):
connection.close()
print("MySQL connection is closed")
But I am getting below error -
Failed to insert record into table_x table Not all parameters were used in the SQL statement
MySQL connection is closed
Please suggest what code changes I can do here to handle this situation.
You may find it convenient to pip install sqlalchemy
and then assign sql = sqlalchemy.Table("""INSERT ...""").
Then the quoting syntax for bind parameters will definitely
look like this:
... VALUES(:Unique_code,
:city,
:state,
:population,
:Govt) ...
Obtaining input with a dict reader may also prove convenient.
The bind param syntax used by your posted code may look like this:
... VALUES(%(Unique_code)s,
%(city)s,
%(state)s,
%(population)s,
%(Govt)s) ...
Your CREATE TABLE omits a PRIMARY KEY,
so you should probably promote that initial Unique_code column to PK.
Since the posted schema does not have any unique keys being enforced,
the ON DUPLICATE KEY could be removed without changing behavior.
Then you would have a simpler INSERT statement to worry about
during debugging.

MySQL Case Sensitivity (or otherwise, how to store passwords correctly in MySQL)

CAUSE:
I have a table and the columns are all suitably Collated as utf8mb4_unicode_ci,
CREATE TABLE IF NOT EXISTS `users` (
`user_id` int(8) NOT NULL AUTO_INCREMENT,
`username` varchar(100) NOT NULL,
`pass_word` varchar(512) NOT NULL ,
...etc etc...
PRIMARY KEY (`user_id`),
UNIQUE KEY `email_addr` (`email_addr`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 AUTO_INCREMENT=989 ;
...Including the column storing the password hash (generated from password_hash) such as $2y$14$tFpExwd2TXm43Bd20P4nkMbL1XKxwF.VCpL.FXeVRaUO3FFxGJ4Di.
BUT, I find that due to the case insensitivity of the column, that a hash of $2y$14$tFpExwd2tXm43Bd20P4NKmbL1XKxwF.VCpL.FxEVRaUO3FFxGJ4DI would still allow access.
This means that there are potentially hundreds of collisions possible by storing the data in a case insensitive manner. Not good.
ISSUE:
Now, Is there a way of forcing MySQL to treat pass_word column as a case sensitive column, when doing comparisons. I want to avoid having to edit every occurance of the PHP/SQL querying, and instead simply set the database table column to compare in a case sensitive manner by default.
The utf8mb4 character set does not give me any _cs options, and the only non-_ci option appears to be utf8mb4_bin.
So simple questions:
Does the UTF8mb4_bin character set & collation on MySQL treat standard comparisons case sensitively? [yes]
Dose the UTF8mb4_bin suit what I want to do. Should I use another set, and if so, why?
Are there any issues in storing password_hash outputs in a MySQL utf8mb4_bin column?
Does this approach conveniently sidestep the need to edit the query SQL of each login query? Can I change the column type and then move on?
EDIT
As detailed by nj_ , this is a silly issue that is not an issue at all because the value of pass_word is never directly edited when logging in.
... It's been a long day.
If you're really that worried about the potential 2^55 collisions in your 62^55 address space, you can simply change the column type to BLOB, which is always case-sensitive.
CREATE TABLE IF NOT EXISTS `users` (
`user_id` int(8) NOT NULL AUTO_INCREMENT,
`username` varchar(100) NOT NULL,
`pass_word` BLOB NOT NULL ,
...etc etc...
PRIMARY KEY (`user_id`),
UNIQUE KEY `email_addr` (`email_addr`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 AUTO_INCREMENT=989 ;
Example:
INSERT INTO `users` (..., `pass_word`) VALUES (..., 'AbC');
SELECT * FROM `users` WHERE `pass_word` = 'AbC' LIMIT 0,1000; -> 1 hit
SELECT * FROM `users` WHERE `pass_word` = 'abc' LIMIT 0,1000; -> 0 hits
Case sensitivity is no problem in this case, because you cannot verify the password directly with SQL anyway. A correctly salted password hash cannot be searched for in the database. Search by username only and extract the stored hash from the database:
$sql= 'SELECT * FROM users WHERE username = ?';
$db->prepare($sql);
$db->bind_param('s', $_POST['username']);
Afterwards you can extract the hash from the row and check the entered password against the found hash with the password_verify() function:
// Check if the hash of the entered login password, matches the stored hash.
// The salt and the cost factor will be extracted from $existingHashFromDb.
$isPasswordCorrect = password_verify($password, $existingHashFromDb);

PySpark, order of column on write to MySQL with JDBC

I'm struggling a bit understanding spark and writing dataframes to a mysql database. I have the following code:
forecastDict = {'uuid': u'8df34d5a-ce02-4d02-b282-e10363690122', 'created_at': datetime.datetime(2014, 12, 31, 23, 0)}
forecastFrame = sqlContext.createDataFrame([forecastDict])
forecastFrame.write.jdbc(url="jdbc:mysql://example.com/example_db?user=bla&password=blabal123", table="example_table", mode="append")
The last line in the code throws the following error:
Incorrect datetime value: '8df34d5a-ce02-4d02-b282-e10363690122' for column 'created_at' at row 1
I can post the entire stack trace if necessary, but basically what's happening here is that the pyspark is mapping the uuid field to the wrong column in mysql. Here's the mysql definition:
mysql> show create table example_table;
...
CREATE TABLE `example_table` (
`uuid` varchar(36) NOT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
...
If we change the mysql definition to the following (notice that only the order of the columns is different):
CREATE TABLE `example_table` (
`created_at` datetime NOT NULL,
`uuid` varchar(36) NOT NULL,
PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The insert works fine. Is there a way to implement this without being dependent on the order of the columns, or what's the preferred way of saving data to an external relational database from spark?
Thanks!
--chris
I would simply force expected order on write:
url = ...
table = ...
columns = (sqlContext.read.format('jdbc')
.options(url=url, dbtable=table)
.load()
.columns())
forecastFrame.select(*columns).write.jdbc(url=url, dbtable=table, mode='append')
Also be careful with using schema inference on dictionaries. This is not only deprecated but also rather unstable.

MySQL SHA1 hash does not match

I have a weird problem with a MySQL users table. I have quickly created a simplified version as a testcase.
I have the following table
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`identity` varchar(255) NOT NULL,
`credential` varchar(255) NOT NULL,
`credentialSalt` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=ucs2 AUTO_INCREMENT=2 ;
INSERT INTO `users` (`id`, `identity`, `credential`, `credentialSalt`) VALUES
(1, 'test', '7288edd0fc3ffcbe93a0cf06e3568e28521687bc', '123');
And I run the following query
SELECT id,
IF (credential = SHA1(CONCAT('test', credentialSalt)), 1, 0) AS dynamicSaltMatches,
credentialSalt AS dynamicSalt,
SHA1(CONCAT('test', credentialSalt)) AS dynamicSaltHash,
IF (credential = SHA1(CONCAT('test', 123)), 1, 0) AS staticSaltMatches,
123 AS staticSalt,
SHA1(CONCAT('test', 123)) AS staticSaltHash
FROM users
WHERE identity = 'test'
Which gives me the following result
The dynamic salt does NOT match while the static salt DOES match.
This is blowing my mind. Can someone help me point out the cause of this?
My MySQL version is 5.5.29
It's because of the default character set of your table. You appear to be running this on a UTF8 database and something in SHA1() is having problems with the differing character sets.
If you change your table declaration to the following it will match again:
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`identity` varchar(255) NOT NULL,
`credential` varchar(255) NOT NULL,
`credentialSalt` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
SQL Fiddle
As robertklep commented explicitly casting your string to a character will also work, basically ensure you're using the same characterset when doing comparisons using SHA1()
As the encryption functions documentation says:
Many encryption and compression functions return strings for which the result might contain arbitrary byte values. If you want to store these results, use a column with a VARBINARY or BLOB binary string data type. This will avoid potential problems with trailing space removal or character set conversion that would change data values, such as may occur if you use a nonbinary string data type (CHAR, VARCHAR, TEXT).
This was changed in version 5.5.3:
As of MySQL 5.5.3, the return value is a nonbinary string in the connection character set. Before 5.5.3, the return value is a binary string; see the notes at the beginning of this section about using the value as a nonbinary string.