Does PL/JSON software support UTF-8 characters - json

I'm using PL/JSON to parse data from MongoDB to Oracle DB. The Packages work fine with latin characters. However, whenver there is json_value with chinese characters, the resulted value in oracle is totally corrupted (question marks, symbole ... etc).
As an example, I use the following line to parse:
Remarks := json_ext.get_string(json(l_list.get(i)),'Remarks');
I'm aware the type json and get_string function are using varchar and not nvarchar. More importantly, Oracle DB instance supports chinese (I can see when I insert directly into table, no corruption occurs. Only when I parse chinese json file it gets corrupted.)
So my question is, Does PL/JSON software support chinesecharacters? Do I need to update the package on my own to accommodate chinese characters? Where exactly needs to be fixed?
In adidtion, I ran the following:
select value from nls_database_parameters where parameter='NLS_CHARACTERSET';
and it returns : AL32UTF8.
does not mean my DB is configured for UTF-8? or there are other things need to be checked for this pupose?
Update
Inspired by #jsumners' comment, I tried to track the corruption source and I found it from the beginning. That is, the snippet below is run before using PL/SJON to receive the htp response:
BEGIN
LOOP
UTL_HTTP.read_text(l_http_response, buf);
DBMS_OUTPUT.PUT_LINE(buf);
l_response_text := l_response_text || buf;
END LOOP;
EXCEPTION
WHEN UTL_HTTP.end_of_body THEN
NULL;
END;
The line DBMS_OUTPUT.PUT_LINE(buf); returns corrupted fields of json. I'm not sure if this is related to the way I send and receive htp requests. Noteworthy, buf is nvarchar2.

Related

Problems with bad characters in UTF8 JSON strings with Delphi XE

We have a Delphi XE application that exchanges JSON data with a cloud application that is written in Node.
Generally everything works fine - but every once in a while we get bad (unknown) characters in some of our strings - and we are having problems tracking it down. These characters are rendered as diamonds and have a character code of 65533
We perform a rest POST call to get our data from the cloud and we get it as a JSON object that includes some meta-data and an array of record. We use DBXJson's TJsonObject for the json parsing in the form of
jsv := TJSONObject.parseJsonValue(s)
where s is the data we got from our Post call.
From this we use a TJsonArray to get the record, traverse it and use the JSonValue.ToString to retrieve the string value.
The data is stored in DBIsam using VarChar fields.
Any ideas how to detect and prevent "bad" characters that appear or where else this could happen?

SQLAlchemy convert field using utf8

How do I get the equivalent SQLAlchemy statement for this SQL query?
select CONVERT(blobby USING utf8) as blobby from myTable where CONVERT(blobby USING utf8)="Hello";
Where blobby is a blob type field that has to be unblobbed to a string.
I tried this
DBSession.query(myTable).filter(myTable.blobby.encode(utf-8)="Hello").first()
But doesn't work. Gives out a queer error
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with myTable.blobby has an attribute 'encode'
Moreover, I tried inserting into the database as follows:
que=myTable(blobby=str(x.blobby))
DBSession.add(que)
This also returns an error that says
(ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
This error goes away if I insert like
que=myTable(blobby=str(x))
But the inserted entries do not support foreign language values..
EDIT:-
This is what I am trying to do :
I have a list of utf-8 strings. The sqlalchemy column which I am
trying to insert my string has the property "convert_unicode" set to
true. Now, I have to insert these strings into my database without
loss of data. I need to preserve my multilingual characters which
worked fine when I fetched them as utf-8 from my Mysql database.
Trouble here is inserting them into SQLite using SQLalchemy.
Please help. Thanks and Regards.
So here is what I did to get rid of my encoding hell, and this error.
UnicodeDecodeError: 'ascii' codec
can't decode byte 0xc4 in position
10: ordinal not in range(128)
I shall explain it stepwise.
Use this command : type(mydirtyunicodestring) .
If it returns
then it means you have to convert it to unicode to get
rid of the error.(byte string!!!!)
Use dirtystring.decode('utf-8')
or any other format to get your type to There you
go. No more 8 bit programming errors that go ProgrammingError) You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str).

JSON feed in UTF-8 without byte order marker

I have an WCF application written in C# that deliver my data in JSON or XML, depending on what the user asks for in the query string. Here is a quick snippet of my code that delivers the data:
Encoding utf8 = new System.Text.UTF8Encoding(false);
return WebOperationContext.Current.CreateTextResponse(data, "application/json", utf8);
When I deliver the data using above method, the special characters are all messed up. So Chávez looks like Chávez. On the other hand, if I create the utf8 variable above with the BOM or use the enum (Encoding.UTF8), the special characters are working fine. But then, some of my consumers are complaining that their code is throwing exception when consuming my API. This of course is happening because of the BOM in the feed. Is there a way for me to correctly display the special characters without the BOM in the feed?
It looks like the output is correct, but whatever you are using to display it expects ANSI encoded text. Chávez is what you get when you encode Chávez in UTF-8 and interpret the result as if it was Latin 1.

Erlang emysql iPhone Emoji Encoding Issue

I'm trying to store text (with emoji) From an iPhone Client App on a MySQL database with Erlang. (Into a varchar column)
I used to do it with a socket connection server done with C++ and mysqlpp, it was working great. (It is the exact same database, So I can assume that the issue is not coming from the database)
However, I decided to pass everything on Erlang for scalability reasons, and since, I am unable to store and retrieve correctly emojis.
I'm using emysql to communicate with my database.
When I'm storing, I'm sending this list to the database :
[240,159,152,130]
When I'm retrieving, here what I get :
<<195,176,194,159,194,152,194,130>>
There is some similarities obviously, we can see 159, 152 and 130 on both lines, but no 240. I do not know where 195, 176 and 194 come from.
I though about changing the emysql encoding when creating the connection pool.
emysql:add_pool(my_db, 3, "login", "password", "db.mydomain.com", 3306, "MyTable", utf8)
But I can seems to find the proper atom for utf32 encoding. (The interesting thing is that I have not set any encoding on C++ and mysqlpp, it worked out of the box).
I have made some test...
storing from C++, retrieving from C++ (Works fine)
storing from Erlang, retrieving from Erlang (Does not work)
storing from Erlang, retrieving from C++ (Does not work)
storing from C++, retrieving from Erlang (Does not work)
One more information, I'm using prepared statement on Erlang, while I'm not on C++
Any help would be appreciated.
AS requested, here the query for storing data :
UPDATE Table SET c=? WHERE id=?
Quite simple really...
It is all about utf-8 encoding. In Erlang a list of characters, in your case [240,159,152,130], aren't normally encoded but are the unicode code points. When you retrieved the data you got a binary containing with utf-8 encoding bytes of your characters. Exactly where this encoding occurred I don't know. From the erlang shell:
10> Bin = <<195,176,194,159,194,152,194,130>>.
<<195,176,194,159,194,152,194,130>>
11> <<M/utf8,N/utf8,O/utf8,P/utf8,R/binary>> = Bin.
<<195,176,194,159,194,152,194,130>>
12> [M,N,O,P].
[240,159,152,130]
Handling unicode in erlang is pretty simple, characters in lists are usually the unicode code points and are very rarely encoded, while storing them in binaries means you have to encode them in some way, as binaries are just arrays of bytes. The default encoding is utf-8. In the module unicode there are functions for converting between unicode lists and binaries.

encoding issues between python and mysql

I have a weird encoding problem from my PyQt app to my mysql database.
I mean weird in the sense that it works in one case and not the other ones, even though I seem to be doing the exact same thing for all.
My process is the following:
I have some QFocusOutTextEdit elements in which I write text possibly containing accents and stuff (é,à,è,...)
I get the text written with :
text = self.ui.text_area.toPlainText()
text = text.toUtf8()
Then to insert it in my database I do :
text= str(text).decode('unicode_escape').encode('iso8859-1').decode('utf8')
I also set the character set of my database, the specific tables and the specific columns of the table to utf8.
It is working for one my text areas, and for the other ones it puts weird characters instead in my db.
Any hint appreciated on this !
RESOLVED :
sorry for the disturbance, apparently I had some fields in my database that weren't up to date and this was blocking the process of encoding somehow.
You are doing a lot of encoding, decoding, and reencoding which is hard to follow even if you know what all of it means. You should try to simplify this down to just working natively with Unicode strings. In Python 3 that means str (normal strings) and in Python 2 that means unicode (u"this kind of string").
Arrange for your connection to the MySQL database to use Unicode on input and output. If you use something high-level like Sqlalchemy, you probably don't need to do anything. If you use MySQLdb directly make sure you pass charset="UTF8" (which implies use_unicode) to the connect() method.
Then make sure the value you are getting from PyQT is a unicode value. I don't know PyQT. Check the type of self.ui.text_area or self.ui.text_area.toPlainText(). Hopefully it is already a Unicode string. If yes: you're all set. If no: it's a byte string which is probably encoded in UTF-8 so you can decode it with theresult.decode('utf8') which will give you a Unicode object.
Once your code is dealing with all Unicode objects and no more encoded byte strings, you don't need to do any kind of encoding or decoding anymore. Just pass the strings directly from PyQT to MySQL.