Using UTF and Hindi in CakePHP and MySQL - mysql

I've create a form that contains Hindi (UTF-8) data which i want to store in MySQL table. The columns corresponding to UTF data has collation value set to utf_general_ci.
I've successfully stored the data in table but when I'm executing a select-where query, it doesn't returns the data. Here is my query:
SELECT Birth.sno, Birth.bookingnumber, Birth.birth_date, Birth.baby_gender, Birth.baby_name, Birth.baby_father_name, Birth.baby_father_address, Birth.baby_mother_name, Birth.birth_place, Birth.place_type, Birth.applicant_name, Birth.applicant_address, Birth.registration_number, Birth.registration_date, Birth.registration_ward, Birth.registration_city_village, Birth.registration_district, Birth.remark, Birth.mother_place_name, Birth.mother_place_type, Birth.mother_place_district, Birth.mother_place_state, Birth.person_religion, Birth.father_education, Birth.mother_education, Birth.father_occupation, Birth.mother_occupation, Birth.mother_age_at_marriage, Birth.mother_age_at_birth, Birth.count_of_mother_child, Birth.birth_by, Birth.birth_method, Birth.mother_weight_at_birth, Birth.pregnancy_duration, Birth.date_of_issue FROM np.births AS Birth WHERE Birth.baby_name = 'd' AND Birth.baby_father_name = 'e' AND Birth.baby_mother_name = 'f' AND Birth.baby_father_address = 'g' AND Birth.person_religion = 'हिंदू' AND Birth.baby_gender = 'पुरुष'
The name of the database is np and name of the table is births
The above query was printed in the log file. I tried to copy and paste the same query in HeidiSQL (front end for MySQL) but its not running. However, if I remove the following part: ** AND Birth.person_religion = 'हिंदू' AND Birth.baby_gender = 'पुरुष'**, the query works fine.
How can I resolve this issue?

This looks like a case when your MySQL client and your MySQL server do not "talk" the same encoding.
There are 3 places where you need to take care of your encoding.
The Web Form (what the users sees) -> Your Web Application (CakePHP) -> Your Database Server (MySQL)
One of those three is NOT using the same encoding as the others. So by the time:
"'हिंदू'" and "'पुरुष'" get to your database they will be something totally different that will not be found in the database.
So, make sure that in your default.ctp file you have set your encoding:
echo $this->Html->charset(); //this will result in a UTF-8 encoding of the page.
Look at the source code of your web page (where I guess you have a search/filter form).
At the top you should see:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Then look for the code generated for your search/filter form. You should see:
<form id="your_form_id" accept-charset="utf-8" method="post" action="/your/action/">
The important part is that it that "utf-8" that MUST show up in those places.
Next, look into your database.php file and make sure this line:
'encoding' => 'utf8', is NOT commented out!
Finally, with a client that you are sure supports UTF-8 (probably HeidiSQL) have a look at your data table np.births and make sure that what data you have there actually makes sense! It's possible it got mangled because of the discrepancies in encoding before.
Once the data makes sense in the database you should be good to go!
IF this does not do it you, you'll have to read and thoroughly understand this article. Only then you will be able to locate where the problem is and get your encodings in sync.
(Obviously your PHP source files should be UTF-8 encoded as well...)

Related

SQLAlchemy mysql cannot get the correct charset

Python 3.8.8 programm with Flask 2.0.1 and Flask-SQLAlchemy 2.5.1
MySql database, collation of the tables: utf8_general_ci.
I'm using two other sqlserver DB with SQLALCHEMY_BINDS. Everything runs on Windows 10.
Some chars from select queries on the MySql DB comes wrong: "situazione è decisamente migliorata"
should be: "situazione è decisamente migliorata"
This would solve the problem:
mystring.encode('cp1252').decode('utf8')
but I need a solution at program level. I tried:
appending to SQLALCHEMY_DATABASE_URI connection string:
"?charset=utf8" or "?charset=cp1215" and others
setting app.config['MYSQL_CHARSET'] and
app.config['MYSQL_DATABASE_CHARSET'] to 'utf8', 'utf8mb4', 'latin1', 'cp1252'
...
passing a parameter to SQLAlchemy like db = SQLAlchemy(use_native_unicode="utf8"), many variations here too
No attemp worked. Please I need suggestions.
Are you looking for a way to specify per database connection encoding ?
For all connections try to use
app.config['SQLALCHEMY_ENGINE_OPTIONS'] = {'encoding': 'cp1252'}
For specific connections to different DBs you can also use engine_options:
engine = create_engine('mysql://user:password#hostname/dbname',
encoding='cp1252')
Got the solution.
The problem was not a problem.
The person who build the original database (that is quite old) coded wrong some characters.
Some of my approaches and the one suggested by olegsv, worked, I checked that debugging deep down into into sqlalchemy data structures, the driver accepted the characters encoding, but the very chars in data were themself worong.
This was unespected.
Maybe I should delete the whole question.

SISS MSSQL to MySQL with different collation is not copying finnish letter å

I don't think title could be more described better as tl;dr, because problem is a bit deeper.
I've got two databases (finnish language):
MSSQL (collation: SQL_Latin1_General_CP437_CI_AI)
MySQL (collation: utf_general_ci)
I've created BI project in vs2017, connected two databases and transfered tables from one to another, no problem. Except for 1 letter: "å" - instead it was "?". I cannot change any database collation so I am trying to find a way to transfer words with this letter.
What I've tried:
OLD DB Source -> ODBC Destination
Point "1" with "Data Conversion" block in between (with code page 1252)
Script Component, in which I have tried:
Insert with "_latin"
sql= "INSERT INTO db.words(Name) VALUES(_latin1'å')";
byte[] b = Encoding.UTF8.GetBytes(sql);
odbcCmd = new OdbcCommand(Encoding.UTF8.GetString(b), odbcConn);
odbcCmd.ExecuteNonQuery();
Insert without it
sql= "INSERT INTO db.words(Name) VALUES('å')";
byte[] b = Encoding.UTF8.GetBytes(sql);
odbcCmd = new OdbcCommand(Encoding.UTF8.GetString(b), odbcConn);
odbcCmd.ExecuteNonQuery();
Diferent ways of encoding
byte[] bytes = Encoding.GetEncoding(1252).GetBytes("å");
var myString = Encoding.GetEncoding(1252).GetString(bytes);
byte[] bytes2 = Encoding.Default.GetBytes("å");
var myString2 = Encoding.Default.GetString(bytes2);
Insert with COLLATE which got me error
insert into db.words(Name) values ("å" COLLATE latin1_swedish_ci) ;
and error:
System.Data.Odbc.OdbcException: „ERROR [HY000] [MySQL][ODBC 5.3(a) Driver][mysqld-5.7.21-log]COLLATION 'latin1_swedish_ci' is not valid for CHARACTER SET 'cp1250'”
Here is interesting part:
I can make insert with this letter in MySQL Workbench without a problem, and it will be inserted, but when I try to pass it from one database to another it is lost. I've set Data Viewers between Data Conversion and the letter was still there, and also when debugging script it was after encoding in string that were inserted to database.
Maybe someone got any idea what else I can try, because I feel like I have tried everything, and feel that the resolve of this problem is really close, but I just don't see it.
CP1250 does not include å; CP437 and utf8 do include it.
COLLATE is irrelevant -- it applies only to comparing and sorting.
Don't use any encode/conversion functions; instead, specify how the data is encoded.
I see 'code' -- but what is the encoding for the source in that language and/or editor?
Show us the hex of any strings in question.
Which direction are you trying to transfer?
What are the connection parameters for each database?

Hibernate, MySQL Encoding does not work on debian

I've made an application in Java EE that uses Hibernate to communicate with MySQL. It works perfectly on my Windows development machine, but I have problem on debian, where the application is deployed.
When I search for keyword with Polish letters(like ł, ą, ć, ó etc,) the result is ok on Windows, but on server, where I have imported the database it does not work.
Hibernate query looks like this:
#NamedQuery(name = "Keyword.findByKeyword", query = "SELECT k FROM Keyword k WHERE k.keyword = :keyword")
and is called like this:
myEntityManager.createNamedQuery("Keyword.findByKeyword").setParameter("keyword", keyword).getSingleResult();
When I use mysql on debian via SSH and type in SELECT query manually:
SELECT * FROM keywords WHERE keyword = 'ser żółty';
it also works and return single result. Encoding and collations of tables and columns are also ok. In datasource configuration I've added
?UseUnicode=true&characterEncoding=utf8
parameters, but it also did not help. I thought that maybe there is a problem with encoding in data from request send by form, but the problem appears even if the parameter i.e. "ser żółty" is hardcoded in my repository class.
I also use Hibernate Search for indexing and the FullTextEntityManager return correct results with Polish letters.
I assume that there is some problem between Hibernate and MySQL, but I have no more ideas what could I change. Any suggestions?
Server Wildfly9.0.1, MySQL 5.6
Ok the problem was in encoding on the mysql server level. It was set to latin1 by default. To fix this follow this question Change MySQL default character set to UTF-8 in my.cnf? and edit your my.cnf file.

Mysql SQL to update URLs that do not have www

I have a million odd rows where most start
'http://www.' or 'https://www.'
but occasionally they start with no 'www.' - this may be correct but the website owner wants consistency throughout the data and thus I need to update the table to always have 'www.'
I'm struggling with the SQL to do this. I tried:
select * from the_million where URL like 'http://[!w]'
But that returns 0 records so I've fallen at the first hurdle of building up the SQL. I guess after I've got the records I want I'll then do a replace.
I'm happy to run this in two goes for each of http and https so no need for anything fancy there.
You can try this query:
UPDATE the_million SET url=REPLACE(url, 'http://', 'http://www.')
WHERE url NOT LIKE 'http://www.%' AND url NOT LIKE 'https://www.%'
UPDATE the_million SET url=REPLACE(url, 'https://', 'https://www.')
WHERE url NOT LIKE 'http://www.%' AND url NOT LIKE 'https://www.%'
Search & replace in 2 queries.
try this
select * from the_million where URL not like 'http://www.%'
This condition:
URL like 'http://[!w]'
... is identical to this one:
URL='http://[!w]'
because it doesn't contain any valid wildcard for MySQL LIKE operator. If you check the MySQL manual page you'll see that the only wildcards are % and _.
The W3Schools page where you read that [!charlist] is valid identifies the section as "SQL Wildcards" which is misleading or plain wrong (depending on how benevolent you feel). That's not standard SQL at all. The error messages returned by their "SQL Tryit Editor" suggest that queries run against a Microsoft Access database, thus it's only a (pretty irrelevant) SQL dialect.
My advice:
Avoid W3Schools as reference site. Their info is often wrong and they apparently don't care enough to amend it.
Always use the official manual of whatever DBMS engine you are using.
Last but not least, the good old www prefix is not a standard part of the HTTP protocol URIs (like http://); it's only a naming convention. Preppending it to an arbitrary list of URLs is like adding "1st floor" to all your customer addresses. Make sure your client knows that he's paying money to corrupt his data on purpose. And if he feels generous, you can propose him to replace all https: with http: as well.

How to get utf8-encoded text from restfb

i'm trying to fetch all posts and comments from a facebook's page using RestFB. All works, but when i try to fetch a russian page, that has particular chars, storing the result in mysql, every rows has some ? and i understand that encoding isn't good.
So:
My table charset encoding is utf8_general_ci.
From RestFB i fetch feed from page in this way:
Connection<Post> pagePosts = facebookClient.fetchConnection(page+"/feed", Post.class,Parameter.with("message", "utf8"));
but every comments stored in db is always something like:
Liels paldies Amerikas Tirdzniec?bas pal?tai un m?su burv?gajiem viesiem par br?niš??go pas?kumu!
How can i fix?
My problem was in jdbc connection..
Solved in this way:
jdbc:mysql://server/database?characterEncoding=UTF-8&useUnicode=true