String to Blob comparison in MySQL where clause - mysql

I have a table called messages with a column (BLOB) called message_text. I'm passing a value from the web app when a user sends a new message and I want to check if it's an exact duplicate message text.
SELECT count(message_id) FROM messages WHERE message_text = '$msgTxt' AND user_id = $userId
Where $msgTxt will be a formatted string like...
"Hello there. I don\'t know you.
I\'ve just made a new line. "
The problem is that the comparison isn't working and I'm never finding duplicates. Even if I literally copy/paste an existing value from the database and replace it with $msgTxt in my query I never get any results, and so I'm assuming there's something wrong with the way I'm comparing a blob to a string.

BLOB values are treated as binary strings (byte strings). They have the binary character set and collation, and comparison and sorting are based on the numeric values of the bytes in column values. String or Text values are treated as nonbinary strings (character strings). They have a character set other than binary, and values are sorted and compared based on the collation of the character set.
So, you have to convert either BLOB to String or String to BLOB and then compare both.
If you are using java,
Convert Blob to String
byte[] bdata = blob.getBytes(1, (int)blob.length());
String data1 = new String(bdata);

What API are you using to call MySQL? I see some backslashes, but need to verify that \ is not turning into \\, and that other escapings are not unnecessarily happening or not happening.
Which OS are you using? Windows, when reading stuff, likes to convert NL into CRLF, thereby making it so that it won't match.

Related

Validate string before saving it to Mysql for chars taking more than 3 bytes

The Mysql db I am using has char encoding utf8 but certain sets of chars which take beyond 3 bytes are not getting saved. I could have changed the encoding to utf8mb4 but that is not an option. All I want to do is validate a string to check if the string will get saved in Mysql. I don't want to unnecessarily limit my chars to ASCII. How do I check if a char will take more than three bytes?
Plan A:
In your app language, convert your string to hex. Then look for f0. That byte would indicate utf8mb4 is needed.
In MySQL, the expression is HEX(col) REGEXP '^(..)*f0'.
Plan B:
Attempt to insert your text into a CHARACTER SET utf8 column of a spare table. Read it back and see if it matches. Storing a 4-byte character will either turn it into question marks or truncate the string. Either way, it won't match.
If you wish to insert the data in mysql query only rather than programmaticaly, then you can use the length() function to check for the byte length.
MySQL provides the LENGTH function to get a length of a string in bytes, and the CHAR_LENGTH function to get the length of a string in characters. If a string contains the multi-bytes character, the result of the LENGTH function is greater than the result of the CHAR_LENGTH() function
http://www.mysqltutorial.org/mysql-character-set/
Sample query follows
insert into x_table(data_string)
SELECT 'šč' as data_string where length('šč')<4
in Java check length before inserting into mysql
using
String s = new String("stringvalue");
byte[] bytes = s.getBytes("UTF-8");
System.out.println("bytes.length = "+bytes.length);
bytes.length can be checked before inserting (String.getBytes().length).

MySQL: Which data type can handle special character?

I have MySQL table, and in message field I want to store encrypted data. Encrypted data looks like
�O-�H,,E%P!�O-�H-!E%!P!�O-�H,E%�P!�O-�H,,E$�P"�O-!H,E%P!�O-H+�E%P"
Hence, I cannot store such characters in message either I did utf_general_ci or blog.
Please help me to figure out which datatype can store such characters.
Take a look at this URL: https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html
"Many encryption and compression functions return strings for which the result might contain arbitrary byte values. If you want to store these results, use a column with a VARBINARY or BLOB binary string data type. This will avoid potential problems with trailing space removal or character set conversion that would change data values, such as may occur if you use a nonbinary string data type (CHAR, VARCHAR, TEXT)."

How can I insert arbitrary binary data into a VARCHAR column?

I have a MySQL table with a VARCHAR(100) column, using the utf8_general_ci collation.
I can see rows where this column contains arbitrary byte sequences (i.e. data that contains invalid UTF8 character sequences), but I can't figure out how to write an UPDATE or INSERT statement that allows this type of data to be entered.
For example, I've tried the following:
UPDATE DataTable SET Data = CAST(BINARY(X'16d7a4fca7442dda3ad93c9a726597e4') AS CHAR(100)) WHERE Id = 1;
But I get the error:
Incorrect string value: '\xFC\xA7D-\xDA:...' for column 'Data' at row 1
How can I write an INSERT or UPDATE statement that bypasses the destination column's collation, allowing me to insert arbitrary byte sequences?
Have you considered using one of the Blob data types instead of varchar? I believe that this'd take a lot of the pain away from your use-case.
EDIT: Alternatively, there is the HEX and UNHEX functions, which MySQL supports. Hex takes either a str or a numeric argument and returns the hexadecimal representation of your argument as a string. Unhex does the inverse; taking a hexadecimal string and returning a binary string.
The short answer is that it shouldn't be possible to insert values with invalid UTF8 characters into VARCHAR column declared to use UTF8 characterset.
That's the design goal of MySQL, to disallow invalid values. When there's an attempt to do that, MySQL will return either an error or a warning, or (more leniently?) silently truncate the supplied value at the first invalid character encountered.
The more usual variety of characterset issues are with MySQL performing a characterset conversion when a characterset conversion isn't required.
But the issue you are reporting is that invalid characters were inserted into a UTF8 column. It's as if a latin1 (ISO-8859) encoding was supplied, and a characterset conversion was required, but was not performed.
As far as working around that... I believe it was possible in earlier versions of MySQL. I believe it was possible to cast a value to BINARY, and then warp that in CONVERT( ... USING UTF8), and MySQL wouldn't perform a validation of the characterset. I don't know if that's still possible with the current MySQL Connectors.
If it is possible, then that's (IMO) a bug in the Connector.
The only way I can think of getting around that characterset check/validation would be to get the MySQL sever to trust the client, and determine that no check of the characterset is required. (That would also mean the MySQL server wouldn't be doing a characterset conversion, the client lying to the server, the client telling the server that it's supplying valid UTF8 characters.
Basically, the client would be telling the server "Hey server, I'm going to be sending UTF8 character encodings".
And the server says "Okay. I'll not do any characterset conversion then, since we match. And I'll just trust that what you send is valid UTF8".
And then the client mischievously chuckles to itself, "Heh, heh, I lied. I'm actually sending character encodings that aren't valid UTF8".
And I think it's much more likely to be able to achieve such mischief using prepared statements with the old school MySQL C API (mysql_stmt_prepare, mysql_stmt_execute), supplying nvalid UTF8 encodings as values for string bind parameters. (The onus is really on the client to supply valid values for bind parameters.)
You should base64 encode your value beforehand so you can generate a valid SQL with it:
UPDATE DataTable SET Data = from_base64('mybase64-encoded-representation-of-my-value') WHERE Id = 1;

How to save cipher text to db if it has illegal characters?

I am working on encrypting text strings that contain sensitive data. I need to save these encrypted strings to a MySQL db. The strings are cipher text, and all character (printable ASCII to control characters to null are equally likely).
These strings are not long (< 40 characters). I am using Ruby 2.1 (no Rails) along with the encryptor gem with a custom salt and iv to do the encryption. The encryptor gem is a wrapper for Ruby's openssl.
For many strings this works fine. However, I have run into a small number of these strings that, once encrypted, contain illegal or improperly quoted characters. As a result, when the string is saved I get an error.
What is the best way to handle the encrypted value so it can be reliably saved to MySQL?
Here is my encryption command:
require 'encryptor'
encrypted_value = Encryptor.encrypt(#sensitive_string,
:key => #config["encryption"]["key"],
:iv = #config["encryption"]["iv"],
:salt => #config["encryption"]["salt"])
Here is the encrypted value:
encrypted_value: /:Z`߉Nc??"v'??\??؟??????Oa?jR
and a screenshot since some of the characters did not copy correctly:
MySQL update statement:
query = "UPDATE db.table
SET `key` = mysql_real_escape_string(#{encrypted_value})"
With the value in the query it looks like:
query UPDATE db.table
SET `key` = mysql_real_escape_string(/:Z`߉Nc??"v'??\??؟??????Oa?jR)
I have tried both the MySQL Quote and mysql_real_escape_string functions. I get the same error with both and also wrapping the encrypted_value in double and single quotes.
Then I get this error:
wrong number of arguments (0 for 2) (ArgumentError)
What is the best way to tackle this? Any advice is appreciated.
You can perform base64 encoding of encrypted string, and store the encoded value in DB.
When retrieving the value, you can decode it back to binary and decrypt it.
require "base64"
enc = Base64.encode64('Send reinforcements')
# -> "U2VuZCByZWluZm9yY2VtZW50cw==\n"
plain = Base64.decode64(enc)
# -> "Send reinforcements"
http://ruby-doc.org/stdlib-2.2.0/libdoc/base64/rdoc/Base64.html

Force mySQL queries to be characters not numeric in R

I'm using RODBC to interface R with a MySQL database and have encountered a problem. I need to join two tables based on unique ID numbers (IDNUM below). The issue is that the ID numbers are 20 digit integers and R wants to round them. OK, no problem, I'll just pull these IDs as character strings instead of numeric using CAST(blah AS CHAR).
But R sees the incoming character strings as numbers and thinks "hey, I know these are character strings... but these character strings are just numbers, so I'm pretty sure this guy wants me to store this as numeric, let me fix that for him" then converts them back into numeric and rounds them. I need to force R to take the input as given and can't figure out how to make this happen.
Here's the code I'm using (Interval is a vector that contains a beginning and an ending timestamp, so this code is meant to only pull data from a chosen timeperiod):
test = sqlQuery(channel, paste("SELECT CAST(table1.IDNUM AS CHAR),PartyA,PartyB FROM
table1, table2 WHERE table1.IDNUM=table2.IDNUM AND table1.Timestamp>=",Interval[1],"
AND table2.Timestamp<",Interval[2],sep=""))
You will most likely want to read the documentation for the function you are using at ?sqlQuery, which includes notes about the following two relevant arguments:
as.is which (if any) columns returned as character should be
converted to another type? Allowed values are as for read.table. See
‘Details’.
and
stringsAsFactors logical: should columns returned as character and
not excluded by as.is and not converted to anything else be converted
to factors?
In all likelihood you want to specify the columns in questions in as.is.