Rails 3, mysql/mysql2 misinterpreting some retrieved strings as ASCII-8BIT

Rails 3, mysql/mysql2 misinterpreting some retrieved strings as ASCII-8BIT - mysql

This problem began as the commonly-seen “incompatible character encodings: ASCII-8BIT and UTF-8” problem, but that is not what I'm asking. Rather, I discovered that this problem was happening because certain fields of my database are being tagged as ASCII-8BIT when they're retrieved, while most are correctly shown as UTF-8.
For example, in a table with columns country and nationality, where both columns in row 16 have identical values (copied-and-pasted), I get
irb(main):003:0> c = Country.find(16)
irb(main):004:0> puts "#{c.name}, #{c.name.encoding}, #{c.name.bytes.to_a}"
�land Islands, UTF-8, [195, 133, 108, 97, 110, 100, 32, 73, 115, 108, 97, 110, 100, 115]
irb(main):005:0> puts "#{c.nationality}, #{c.nationality.encoding}, #{c.nationality.bytes.to_a}"
�land Islands, ASCII-8BIT, [195, 133, 108, 97, 110, 100, 32, 73, 115, 108, 97, 110, 100, 115]
Likewise, a simple puts name gives �land Islands while for nationality it gives "\xC3\x85land Islands" -- same bytes, different presentation.
The encoding for a given column appears to be constant regardless of whether the string has non-ascii characters, so it is not simply a problem with the string. That is, all the values in nationality are interpreted as ascii and all those in name as UTF-8.
The problem is not limited to a single table, and I have not found any pattern to which columns are mis-recognized.
Here are the settings and environment:
Rails 3.0.0 on Windows 7 64-bit
Database adapter: mysql2 and mysql both show same behavior
Database.yml includes encoding: utf8
application.rb includes config.encoding = "utf-8"
MySQL database, table, and both columns are defined as utf8
Both columns in MySQL are varchar, 255, allow null
I can reproduce the problem with a fresh installation of Rails and nothing except the Country model defined, to access the database. I have not yet tried with a fresh, one-line database.
Anyone know what's going on here?

Same problem here! mysql2 gem, Rails 3.0.3 and "incompatible character encodings" errors

I found the solution, use ruby-mysql gem instead mysql or mysql2 gems.

I think I'm living a similar problem.
When I retrivied datas from MySQL DB rails convert a string to float:
Columns "Complemento" and "estado" are string in DB although action "show" says
Complemento: 0.0 -> at DB is "apto 191"
Escola was successfully created.
Nome: Silva Braga
Endereco: Rua Dr Arnaldo
Numero: 99
Complemento: 0.0 -> DB is "apto 191"
Cidade: São Paulo
Estado: 0.0 -> DB is "MG"
Edit | Back

Related

RDkit fingerprint

I have 100 polymers and I want to compare their solubility by their fingerprint.
By using rdkit I reach a list of bits for each polymer like as [39, 80, 152, 233, 234, 265, 310, 314, 321, 356, 360, 406, 547, 650, 662, 726, 730, 801, 819, 849, 935]', but I faced with this error: " it could not convert string to float: "
my first question is how can I reach to just one bit for each polymer?
and how can I define each bit as a single feature in rdkit?

Based on your problem, I believe you use Morgan Fingerprint with radius=2 and fpSize=1024. However, count fingerprint results in a list of hashed value.
If you want to deal with comparison, I suggested you should use rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect in here #1.
If you want to use count fingerprint, see here #2 and search this query: The types of atom pairs and torsions are normal (default), hashed and bit vector (bv). The types of the Morgan fingerprint are bit vector (bv, default) and count vector (count).
If you want to get the result as np.array, you can run bv = GetMorganFingerprintAsBitVect(mol, radius=your_radius, nBits=1024, *args, **kwargs).ToBitString(), then run np.frombuffer(bv.encode(), dtype=np.uint8) - 48
However, I cannot provide explicit description and solution without the code so please provide it for further support. Thank you.
#1: https://www.rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html?highlight=getmorganfingerprintasbitvect#rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect
#2: https://www.rdkit.org/docs/GettingStartedInPython.html

Encoding::UndefinedConversionError dump to json

I'm working with a gem fie and ran into an issue with this gem that I would like to solve however I'm having trouble doing so. Fie is a gem for Rails. In it, it has some lines where it stores a marshal dump of an ActiveRecord::Base in json however I'm running in to an encoding error. I Have been able to replicate this across different machines and versions of ROR, although Rails 5.2 and greater.
Easiest way to reproduce is:
[5] pry(main)> Marshal.dump(User.first).to_json
User Load (29.8ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1]]
Encoding::UndefinedConversionError: "\x80" from ASCII-8BIT to UTF-8
from /home/chris/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/activesupport-5.2.1/lib/active_support/core_ext/object/json.rb:38:in `encode'
Digging In I tried a few things but was unable to make it work. It seems that a marshal dump is ASCII-8BIT but Json ants UTF-8bit. I was unable to force the encoding.
> User.first.to_json.encoding
=> #<Encoding:UTF-8>
> Marshal.dump(User.first).encoding
=> #<Encoding:ASCII-8BIT>
> { foo: Marshal.dump(object).force_encoding("ASCII-8BIT").encode("UTF-8") }.to_json
Encoding::UndefinedConversionError: "\x80" from ASCII-8BIT to UTF-8
from (pry):139:in `encode'
> { foo: Marshal.dump(object).force_encoding("ISO-8859-1").encode("ASCII-8BIT") }.to_json
Encoding::UndefinedConversionError: U+0080 to ASCII-8BIT in conversion from ISO-8859-1 to UTF-8 to ASCII-8BIT
ruby 2.5.1
Rails 5.2.1
git issue I opened

I had this issue and fixed it by using:
Marshal.dump(value).force_encoding("ISO-8859-1").encode("UTF-8")
I hope this help!
But as Tom Lord suggested you should be a bit more specific with your question to help us know what you are trying to achieve.

Oracle 12c PL/JSON Chopping Off Strings at 5000 Characters

I'm having an issue with PL/JSON chopping off string values at exactly 5000 characters.
Example data: {"n1":"v1","n2":"v2","n3":"10017325060844,10017325060845,... this goes on for a total of 32,429 characters ...10017325060846,10017325060847"}
After I convert the JSON string to an object I run this...
dbms_output.put_line(json_obj.get('n3').get_string);
And it only outputs the first 5000 characters. So I did some digging, see line 26 of this code. And right below it at line 31 the extended_str is being set and contains all 32,429 chars. So now let's move on to the get_string() member function. There are two of them. I verified that it's the first one that is being called, the one with the max_byte_size and max_char_size parameters. Both of those parameters are null. So why is my text being chopped off at 5000 characters? I need this to work for data strings of varchar2(32767) and clobs. Thanks!
Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
UPDATE: I found that the chopping of the text is coming from line 35: dbms_lob.read(str, amount, 1, self.str);. I ignored this code before because I saw the comment and knew my string wasn't null. So why is this read needed? Is this a bug?

As maintainer of the pljson project I have answered your question on github (https://github.com/pljson/pljson/issues/154). For any further question feel free to ask on the same issue thread on github.

MySQL-Python character encoding exception on db insert

I am getting the following Exception when trying to save a row to a db:
Unexpected error: (<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('latin-1', u"First 'A\u043a' Last", 7, 8, 'ordinal not in range(256)'), <traceback object at 0x106562908>)
Before inserting, I am converting every string in a dictionary to latin-1 like this:
for k,v in row.items():
if type(v) is str:
row[k] = v.decode('utf-8').encode('latin-1')
The offending character seems to be 'A\u043a' - in other cases there seem to be other characters also "not within range."
Help appreciated.

Solved. The problem was attempting to decode a string which was already UTF-8. I also added 'ignore' to the encode() arguments,
v.encode('latin-1', 'ignore')
This ensures any non-encodable characters are replaced with a '?'

Why is ActiveRecord and/or MySQL having a problem with this character?

When I insert certain strings coming in from API calls into my db, they get cut off at certain characters. This is with ruby 1.8.7. I have everything set to utf8 app-wide and in MySQL. I typically don't have any problem entering utf8 content into the DB in other parts of the app.
It's supposed to be "El Soldado y La Muñeca". If I insert it into the db, only this makes it in: "11 El Soldado y La Mu".
>> name
=> "11 El Soldado y La Mu?eca(1).mp3"
>> name[20..20]
=> "u"
>> name[21..21]
=> "\361"
>> name[22..22]
=> "e"
is that a utf8 character?
i know that ruby 1.8 isn't encoding aware, but to be honest i always forget how this should affect me -- i always just set everything at all the other layers to utf8 and everything is fine. WHY THIS NO WORK NOW?
update
CORRECTION-- i was wrong, it's not coming from the api, it's coming from the file system.
the wrongly-encoded character is coming from inside the house!
new question: How can I get utf8 characters from File#path

You are somehow getting a Latin-1 (AKA ISO-8859-1) ñ rather than a UTF-8 ñ. In Latin-1 the ñ is 361 in octal (hence your single byte "\361"). In UTF-8 that lower case tilde-n should be \303\261 (i.e. bytes 0303 and 0261 in octal or 0xc3 and 0xb1 in hex).
You might have to start playing with Iconv in the Ruby side to make sure you get everything in UTF-8.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Rails 3, mysql/mysql2 misinterpreting some retrieved strings as ASCII-8BIT - mysql

Same problem here! mysql2 gem, Rails 3.0.3 and "incompatible character encodings" errors

I found the solution, use ruby-mysql gem instead mysql or mysql2 gems.

Related

RDkit fingerprint

Encoding::UndefinedConversionError dump to json

Oracle 12c PL/JSON Chopping Off Strings at 5000 Characters

MySQL-Python character encoding exception on db insert

Why is ActiveRecord and/or MySQL having a problem with this character?

Categories

Resources