Encoding latin1 to UTF8 in Rails - mysql

I have a legacy MySQL database with latin1 encoding, and new MySQL database with utf-8 encoding. When I migrate the data from the legacy to new one, the strings returned and rendered in webpages contains weird characters.
example: â¥Chocolate biscuitâ¢
There's a model that connects to the old database to get the records, and use a rake task to map the old records and create new records
class LegacyDatabase < ActiveRecord::Base
self.abstract_class = true
establish_connection(
adapter: "mysql2",
database: "old_database",
encoding: "latin1"
)
end
I referred to this website to solve the encoding problems, but the fallback hash is hardcoded. I need a dynamic one so that I don't need to add the fallbacks each time when a loop that fixes encoding is called.
def fix_encoding(str)
# Referring to the solution from the website
new_str = str.encode('cp1252',:fallback => {
"\u0080" => "\x80".force_encoding("cp1252"),
"\u0081" => "\x81".force_encoding("cp1252"),
"\u008D" => "\x8D".force_encoding("cp1252"),
"\u008F" => "\x8F".force_encoding("cp1252"),
"\u0090" => "\x90".force_encoding("cp1252"),
"\u009D" => "\x9D".force_encoding("cp1252"),
"\u0098" => "\x98".force_encoding("cp1252"),
"\u0099" => "\x99".force_encoding("cp1252")
}).force_encoding("utf-8")
return new_str
end
I want to change it to be dynamic, but I failed to convert it.
# How to do it in dynamic?
new_str = str.encode('cp1252', :fallback => Proc.new { |v| "#{v[4..5]}".force_encoding("cp1252") }).force_encoding("utf-8")
Or is there any other solution to fix the encoding problem?

Related

Perl issue when encoding mysql data from UTF-8 to UCS-2 for SMPP

I am trying to fetch UTF-8 accentuated characters "é" "ê" from mysql and convert them to UCS-2 when sending over SMPP. The data is stored as utf8_general_ci and I perform the following when opening the DB connection:
$dbh->{'mysql_enable_utf8'}=1;
$dbh->do("set NAMES 'utf8'");
If I test the sending part by hard coding the string value with "é" "ê" using data_encoding=8, it goes through perfectly. However if I comment out the first line and just use what comes from the DB, it fails. Also, if I try to send the characters using the DB and setting data_encoding=3, it also works fine, but then the "ê" would not appear, which is also expected. Here is what I use:
$fred = 'éêcole'; <-- If I comment out this line, the SMPP call fails
$fred = decode('utf-8', $fred);
$fred = encode('UCS-2', $fred);
$resp_pdu = $short_smpp->submit_sm(
source_addr_ton => 0x00,
source_addr_npi => 0x01,
source_addr => $didnb,
dest_addr_ton => 0x01,
dest_addr_npi => 0x01,
destination_addr => $number,
data_coding => 0x08,
short_message => $fred
) or do {
Log("ERROR: submit_sm indicated error: " . $resp_pdu->explain_status());
$success = 0;
};
The different values for the data_coding fields are the following:
Meaning of "data_coding" field in SMPP
00000000 (0) - usually GSM7
00000011 (3) for standard ISO-8859-1
00001000 (8) for the universal character set -- de facto UTF-16
The SMPP provider's documentation also mentions that special characters should be handled via UCS-2:
https://community.sinch.com/t5/SMS-365-enterprise-service/Handling-Special-Characters/ta-p/1137
How should I prepare the data that is coming out of the DB to make this SMPP call work?
I am using Perl v5.10.1
Thanks !
$dbh->{'mysql_enable_utf8'} = 1; is used to decode the values returned from the database, causing queries to return decoded text (strings of Unicode Code Points). It makes no sense to decode such a string. Go straight to the encode.
my $s_ucp = "\xE9\xEA\x63\x6F\x6C\x65"; # éêcole
# -or-
use utf8; # Script is encoded using UTF-8.
my $s_ucp = "éêcole";
printf "%vX\n", $s_ucp; # E9.EA.63.6F.6C.65
my $s_ucs2be = encode('UCS-2', $s_ucp);
printf "%vX\n", $s_ucs2be; # 0.E9.0.EA.0.63.0.6F.0.6C.0.65
SET NAMES says the encoding you have/want in the client. That is, regardless of the encoding in the table, MySQL will convert it to whatever SET NAMES says during a SELECT.
So, feed what comes from the SELECT directly to SMPP. (It won't be readable by most other clients.)
SET NAMES ucs2
(The collation is irrelevant to the encoding.)
You could ask the SELECT to convert with something like
CONVERT(col_name, CHAR UNICODE)
https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html

Ruby Mysql2 Client not taking backslash while insert

We are using production and staging databases in our application.
Our requirement is to insert all the records to staging database when ever a record is added in production database, so that both the servers are consistent and same data.
I have used Mysql2 client pool to connect to staging server and insert the record that is added to production.
here is my code:
def create
#aperson = Person.new
#person = #aperson.save
if #person && Rails.env == "production"
#add_new_person_to_staging
client = Mysql2::Client.new(:host => dbconfig[:host], :username => dbconfig[:username], :password => dbconfig[:password], :database => dbconfig[:database])
#person_result = client.query('INSERT INTO user_types(user_name, regex, code) Values ("myname" , "\.myregex\." , "ns" );')
end
end
Here "#person_result" record is inserted to mysql table but the "regex" column eliminates "\" slashes.
like : user_name = myname, regex = .myregex., code = ns
when I manually execute the "Insert" query in mysql command line it inserts as it is along with \ slash. but not through "client.query"
Why does \ slash is eliminated. please help me here.
Thanks.
\ is likely being removed by the MySQL2 client as part of a SQL injection protection preprocessor.
Have you looked at trying either a double backslash or using the escape method to properly escape the string?
Try using this
#person_result = client.query('INSERT INTO user_types(user_name, regex, code) Values (myname , "\."+myregx+".\" , ns )')

Datamapper not saving records to MySQL

Datamapper isn't saving my user models.
(This is a Sinatra webapp and the db is an AWS RDS mysql db.)
The User model:
class User
include DataMapper::Resource
property :uid, Serial
property :user, String, :key => true, :length => 3..20
property :pass, String, :required => true, :length => 6..50
end
The code to set it:
post "/register" do
username = params["username"]
password = params["password"]
begin
encrypted_password = BCrypt::Password.create password
meme = User.new :user => username, :pass => encrypted_password
meme.save
raise DatabaseError, "User record not saved" unless meme.saved?
flash[:register] = "Welcome, new user! Please log in now."
redirect "/login"
# disabled rescue stuff...
end
end
(if you want, test it yourself at dittoslash.uk)
(can i do this on stack overflow? edit this out if you can't)
EDIT: Updated validation rules. Now I'm getting an error of 'Pass must be between 6 and 50 characters long' (with a 28 (or 30?) character password)
max pleaner answered this for me.
For Googlers looking for answers:
check your validations, and make sure that encryption dosen't make the password longer than your maximum.

Inserting data from CSV into MySQL DB is very slow

Trying to insert data from a CSV file to a MySQL DB using Ruby, and it's very slow. Note that this is not a Rails application, just stand-alone Ruby script.
Here is my code:
def add_record (data1, data2, time)
date = DateTime.strptime(time, "%m/%d/%y %H:%M")
<my table>.create(data1: data1, data2: data2, time: date)
end
def parse_file (file)
path = #folder + "\\" + file
CSV.foreach(path, {headers: :first_row}) do |line|
add_record(line[4], line[5], line[0])
end
end
def analyze_data ()
Dir.foreach #folder do |file|
next if file == '.' or file == '..'
parse_file file
end
end
And my connection:
#connection = ActiveRecord::Base.establish_connection(
:adapter=> "mysql2",
:host => "localhost",
:database=> <db>,
:username => "root",
:password => <pw>
)
Any help appreciated.
Use Load Data Infile.
Here is a nice article on performance and strategies titled Testing the Fastest Way to Import a Table into MySQL. Don't let the mysql version of the title or inside the article scare you away. Jumping to the bottom and picking up some conclusions:
The fastest way you can import a table into MySQL without using raw
files is the LOAD DATA syntax. Use parallelization for InnoDB for
better results, and remember to tune basic parameters like your
transaction log size and buffer pool. Careful programming and
importing can make a >2-hour problem became a 2-minute process. You
can disable temporarily some security features for extra performance
You might just find your times greatly reduced.
Use the zdennis/activerecord-import gem. you can insert tons records quickly.

Ruby mysql2 error when executing statements in rapid succession

I have a strange issue using the Mysql2 client in Ruby. When trying to execute the following:
client.query("CREATE DATABASE ...; INSERT INTO ..."); #SQL truncated for brevity
client.query("SELECT 1 FROM ...") #SQL truncated for brevity
Ruby throws an error that the table I'm selecting from doesn't exist. However if I try the following:
client.query("CREATE DATABASE ...; INSERT INTO ..."); #SQL truncated for brevity
sleep 1
client.query("SELECT 1 FROM ...") #SQL truncated for brevity
The query works with no problems. It seems as though I need to give the MySQL server some time to load the data before I'm able to query it. Can anyone explain why this is happening and how to programmatically overcome this without using sleep?
Update
I initialize the client as so:
Mysql2::Client.new({
:adapter => "mysql2",
:host => ip_address,
:username => db_username,
:password => db_password,
:flags => Mysql2::Client::MULTI_STATEMENTS
})
I checked the 'query_options' attribute and async is set to false. I have tried explicitly setting the async => false flag to no avail.
The same issue happens if I use
Model.connection.execute(SQL HERE)
Note, this is all executed from within a Rails unit test.
Thanks
For some reason the only thing that ended up working and not needing the sleep 1 in between is the following:
#model = Model.new
Mysql2::Client.default_query_options[:connect_flags] |= Mysql2::Client::MULTI_STATEMENTS
#model.connection.reconnect!