SQLAlchemy convert field using utf8 - mysql

How do I get the equivalent SQLAlchemy statement for this SQL query?
select CONVERT(blobby USING utf8) as blobby from myTable where CONVERT(blobby USING utf8)="Hello";
Where blobby is a blob type field that has to be unblobbed to a string.
I tried this
DBSession.query(myTable).filter(myTable.blobby.encode(utf-8)="Hello").first()
But doesn't work. Gives out a queer error
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with myTable.blobby has an attribute 'encode'
Moreover, I tried inserting into the database as follows:
que=myTable(blobby=str(x.blobby))
DBSession.add(que)
This also returns an error that says
(ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
This error goes away if I insert like
que=myTable(blobby=str(x))
But the inserted entries do not support foreign language values..
EDIT:-
This is what I am trying to do :
I have a list of utf-8 strings. The sqlalchemy column which I am
trying to insert my string has the property "convert_unicode" set to
true. Now, I have to insert these strings into my database without
loss of data. I need to preserve my multilingual characters which
worked fine when I fetched them as utf-8 from my Mysql database.
Trouble here is inserting them into SQLite using SQLalchemy.
Please help. Thanks and Regards.

So here is what I did to get rid of my encoding hell, and this error.
UnicodeDecodeError: 'ascii' codec
can't decode byte 0xc4 in position
10: ordinal not in range(128)
I shall explain it stepwise.
Use this command : type(mydirtyunicodestring) .
If it returns
then it means you have to convert it to unicode to get
rid of the error.(byte string!!!!)
Use dirtystring.decode('utf-8')
or any other format to get your type to There you
go. No more 8 bit programming errors that go ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str).

Related

Writing "latin1" encoded byte array with escape characters

I'm working on a database import/export process in VB.NET which writes data from a MySQL (5.5) database to a plain text file. The application reads the data to a DataTable, then goes through the rows/columns to actually write the data to the OutputFile (System.IO.StreamWriter object). The encoding on the tables in this database is Latin1. There is a MediumBlob field in one of the tables I've been using for testing which contains image files stored as a byte array.
In my attempts to validate the output from my application, I've exported the data directly from the database using the MySQL Workbench, then compared that with the results I get when I write the same data from my application. In the direct export from MySQL Workbench, I see some of these bytes are exported with the backslash. When I read the data through my application, however, this escape character does not appear. Viewed through Notepad++, it clearly shows some distinct differences between the two output results (see screenshot).
Obviously, while apparently very similar, the two are not completely identical. My application is not including the backslashes for escaped characters, and some characters such as NULL are coming out differently altogether. My code for writing this field to the file is:
OutputFile.Write("'" & System.Text.Encoding.GetEncoding(28591).GetString(CType(COPYRow(ColumnIndex), Byte())) & "'")
There doesn't appear to be an overload for the GetString method that allows me to specify an escape character, so I'm wondering if there's another way that, using this method, I can ensure the characters are correctly encoded, including escape characters.
I'm "assuming" that this method should also work in general when I start working with my PostgreSQL database, but with possibly a different encoding. I'm trying to build things as "generic" as possible, but I'll have to worry about specifying encodings at run-time instead of hard-coding them later.
EDIT
I just ran across another SO question, which might point me in the right direction: Convert a Unicode string to an escaped ASCII string. Obviously, it might take a bit more work to get it right, but this looks like the closest thing to what I'm trying to accomplish.

Is data returned from a MySQL Connector/C query not in native C data format?

If I execute a query against the MySQL Connector/C library the data I'm getting back all appears to be in straight char * format, including numerical data types.
For example, if I execute a query that returns 4 columns, all of which are INTEGER in MySQL, rather than getting back 4 bytes worth of data (each byte representing a single column row value), I'm actually getting back 4 ASCII encoded character bytes, where 1 is actually a byte with the numeric value 49 in it (ASCII for 1).
Is this accurate or am I just missing something complete?
Do I really need to then atoi that returned byte into an int in my code or is there a mechanism to get the native C data types out of the MySQL client directly?
I guess my real question is: is the mysql_store_result structure converting that data to ASCII encoded representations in a way that can be bypassed by my application code?
I believe the data is sent on the wire as text in the MySQL protocol (I just confirmed this with Wireshark). So that means mysql_store_result() is not converting the data, it's just simply passing the data on as it was received. MySQL actually sends integers as text. I agree this always seemed like an odd design to me as well.
MySQL originally only offered the Text Protocol that you are currently using, in which (as you note) results are encoded as strings. MySQL v4.1 (released in April 2003) introduced the Prepared Statement protocol, which (amongst other things) transmits results in a binary format.
See C API Prepared Statements for more information on how to use the latter protocol with Connector/C.

'FROM_BASE64' Function Returns Hex Value

I just upgraded the MySQL that I am using to MySQL version 5.6.14. When I issued this query,
SELECT FROM_BASE64(TO_BASE64('MySQL'));
I received 4d7953514c (hex value) as the answer instead of 'MySQL'. What is actually the problem? Is there anything that I have to do to unhex it?
NOTE: The UNHEX function in my MySQL also returns the same thing. If a hex value is given to UNHEX function, I will receive the same hex value again.
TQ in advance
MySQL client displays binary data using hexadecimal notation by default. The result of TO_BASE64 is a VAR_STRING with binary collation and, given that the original collation/character set has been already lost, FROM_BASE64 has no way of guessing what character set was used when producing those Bytes. As such, this particular client application (others may do it differently) chooses the safer route and just display their hexadecimal notation.
You can change this behavior by providing the option --skip-binary-as-hex when starting the mysql client. Just mind that if you don't know the source of the data this yield unexpected results. See the documentation for --binary-as-hex for additional ways of how to workaround this.
I'd also recommend turning on display result set metadata as it allows you to better understand what is causing this behavior.

encoding issues between python and mysql

I have a weird encoding problem from my PyQt app to my mysql database.
I mean weird in the sense that it works in one case and not the other ones, even though I seem to be doing the exact same thing for all.
My process is the following:
I have some QFocusOutTextEdit elements in which I write text possibly containing accents and stuff (é,à,è,...)
I get the text written with :
text = self.ui.text_area.toPlainText()
text = text.toUtf8()
Then to insert it in my database I do :
text= str(text).decode('unicode_escape').encode('iso8859-1').decode('utf8')
I also set the character set of my database, the specific tables and the specific columns of the table to utf8.
It is working for one my text areas, and for the other ones it puts weird characters instead in my db.
Any hint appreciated on this !
RESOLVED :
sorry for the disturbance, apparently I had some fields in my database that weren't up to date and this was blocking the process of encoding somehow.
You are doing a lot of encoding, decoding, and reencoding which is hard to follow even if you know what all of it means. You should try to simplify this down to just working natively with Unicode strings. In Python 3 that means str (normal strings) and in Python 2 that means unicode (u"this kind of string").
Arrange for your connection to the MySQL database to use Unicode on input and output. If you use something high-level like Sqlalchemy, you probably don't need to do anything. If you use MySQLdb directly make sure you pass charset="UTF8" (which implies use_unicode) to the connect() method.
Then make sure the value you are getting from PyQT is a unicode value. I don't know PyQT. Check the type of self.ui.text_area or self.ui.text_area.toPlainText(). Hopefully it is already a Unicode string. If yes: you're all set. If no: it's a byte string which is probably encoded in UTF-8 so you can decode it with theresult.decode('utf8') which will give you a Unicode object.
Once your code is dealing with all Unicode objects and no more encoded byte strings, you don't need to do any kind of encoding or decoding anymore. Just pass the strings directly from PyQT to MySQL.

SQL storing MD5 in char column

I have a column of type char(32) where I want to store an MD5 hash key. The problem is i've used SQL to update the existing records using HashBytes() function which creates values like
:›=k! ©úw"5Ýâ‘<\
but when I do the insert via .NET it comes through as
3A9B3D6B2120A9FA772235DDE2913C5C
What do I need to do to get these to match up? Is it the encoding?
HashKey isn't a SQL function, did you mean HASHBYTES? Some actual code would help. SQL appears to be computing the raw binary hash and displaying it as ASCII characters.
.NET is computing the hash, then converting it to hexadecimal (or so it appears). CHAR(32) isn't a good way to store raw binary data, you would want to use the BINARY type.
An Example in SQL:
SELECT SUBSTRING(sys.fn_varbintohexstr(HASHBYTES('MD5',0x2040)),3, 32)
And an Example in .NET:
using (MD5 md5 = MD5.Create())
{
var data = new byte[] { 0x20, 0x40 };
var hashed = md5.ComputeHash(data);
var hexHash = BitConverter.ToString(hashed).Replace("-", "");
Console.Out.WriteLine("hexHash = {0}", hexHash);
}
These will both produce the same value. (Where 0x2040 is sample data).
You can either store the hexadecimal data as CHAR(32), or as BINARY(16). Storing the Binary data is twice as space efficient than storing it as hex. What you should not be doing is storing the binary data as CHAR(16).
It's not clear what you mean by "when I do the insert via .NET" - but you shouldn't be storing binary data just in a raw form, as it looks like your'e doing using HashKey(). (Do you definitely mean HashKey by the way? I can't find a reference for it, but there's HashBytes...)
Two common options are to encode the raw binary data as hex - which it looks like you're doing in the second case - or to use base64. Either way should be easy from .NET (Base64 marginally easier, using Convert.ToBase64String) and you probably just need to find the equivalent SQL Server function.
MD5 is typically stored as in hex encoding. I'd guess that your hashkey() SQL function is not hex encoding the MD5 hash, rather it's just returning the ASCII characters representing the hash. But your .NET method is HEX encoding. If you store your MD5 hashing consistently as HEX (or not - up to you but usually stored as HEX), then the results between the two should always be consistent.
For example, the : symbol from your SQL hash is the first character returned from HashKey(). In the .NET method, the first 2 characters are 3A. 31 is 51 in decimal. ASCII code 51 is the colon (:) character. Similarly, you can work your way through each other character, and do the HEX conversion.
See any ASCII codes table for reference, i.e. http://www.asciitable.com/