MYSQL/Coldfusion replace registration symbol not working - mysql

I'd like to make all registration symbols superscript by wrapping them with a <sup> HTML tag. So, I can do this in SQL no problem:
SELECT s.id,
Replace(s.name,'®','<sup>®</sup>') AS name
FROM staff s
WHERE name LIKE '%®%'
Result:
id | name
1 | Name1 CFP<sup>®</sup>, CDFA
2 | Jeffrey test CFP<sup>®</sup>
3 | Matthew hello CFP<sup>®</sup> CFA
But when I run it in Coldfusion from a cfquery tag, it looks as if the ® character is interpreted as ®.
<cfquery name="getStaff" dataSource="#this.dsn#">
SELECT s.id,
Replace(s.name,'®','<sup>®</sup>') AS name
FROM staff s
WHERE 1=1
<cfif isDefined("arguments.permalink")>
AND s.permalink=<cfqueryparam value="#arguments.permalink#" />
</cfif>
</cfquery>
Is there a better way to approach this? I originally did this in Coldfusion using <cfset getStaff.name = Replace(getStaff.name,Chr(174),'<sup>®</sup>') />, which worked fine until I switched to Mustache templating.
I'd definitely prefer to use the CHAR() function if I could figure out what numeric character ® is in Mysql. (Note, using utf8_general_ci on this and all DB tables) I tried CHAR(174) in Mysql, but it won't work because (as far as I can tell) Mysql isn't using the same character set - SELECT CHAR(174) returns a blob.

UPDATE:
I'd definitely prefer to use the CHAR() function if I could figure out
what numeric character ® is in Mysql. (Note, using utf8_general_ci on
this and all DB tables) I tried CHAR(174) in Mysql, but it won't work
because (as far as I can tell) Mysql isn't using the same character
set - SELECT CHAR(174) returns a blob.
As mentioned in the comments, it sounds like the default charset for your database is utf8. So presumably it failed because the decimal 174 is not the correct way to represent the registered sign in utf8. That symbol requires two bytes. Using the proper hex or decimal value for your default charset (ie utf8) it works as expected:
Hex: CHAR(0xC2AE)
Decimal: CHAR(194,174)
Though it would be better to specify the charset explicitly with USING:
Hex: CHAR(194,174 USING utf8)
Decimal: CHAR(0xC2AE USING utf8)
Is the symbol hard-coded into the .cfm script? If so, it is probably an issue with the character encoding of the script. When interpreting literal characters within the file, the page encoding is what matters. Try:
Adding <cfprocessingdirective pageEncoding="utf-8"> to the top of the script.
Note: For CFC's, the cfprocessingdirective tag must follow the cfcomponent tag
IF the default charset for your database is utf8, try using the CF equivalent function, ie #chr(174)#. However, IMO it is better to use the MySQL Char() function instead.
Side note about cfqueryparam, it is a good practice to always specify a cfsqltype. If omitted, it defaults to CF_SQL_CHAR, which may force implicit conversion and cause wrong/unintended results in some cases (numbers, dates, etcetera). Even for string values it is a good idea to specify the type, as there may be slight differences with how CHAR and VARCHAR types are treated on the database side.

It is possible to do something like ColdFusion Char() in SQL
<cfquery name="getStaff" dataSource="#this.dsn#">
SELECT s.id,
REPLACE(s.name, CHAR(174), '<sup>®</sup>') AS name
FROM staff s
WHERE 1=1
<cfif isDefined("arguments.permalink")>
AND s.permalink=<cfqueryparam value="#arguments.permalink#" />
</cfif>
</cfquery>
For MySQL:
See: http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_char
For SQL Server:
See: https://msdn.microsoft.com/en-us/library/ms187323.aspx

Related

MySQL strange characters replace with <BR

I inherited a MySQL table (MyISAM utf8_general_ci encoding) that has a strange character looks like this in myPHPAdmin: •
I assume this a bullet point of some type?
When rendered on a HTML page it looks like this: �
How do I replace this value with a <BR><LI> so I can turn it into a line break with a properly formatted list item?
I've tried a standard UPDATE query but it does not replace these values? I assume I need to escape them somehow?
Query attempted:
UPDATE `FL_Regs` SET `Remarks` = "<BR><LI>" WHERE `Remarks` = "•"
You did not showed your query, so I'm only guessing.
If you're having hard times with your client encoding characters for you (I imagine you may use phpmyadmin, which involve a lot of steps between your browser and the actual server), you may try by giving the string to search as sequence of bytes.
It happen that • is U+2022, a character named "BULLET" in Unicode, which is encoded as e2 80 a2 in UTF8. So you can use X'E280A2' instead of '•' in your query.
Typically:
> select X'E280A2';
+-----------+
| X'E280A2' |
+-----------+
| • |
+-----------+
You can, if you want to better understand what's happening, try to use the HEX() function, first maybe to check what's MySQL is receiving when your're sending a bullet:
SELECT HEX('•');
Typically I'm getting E280A2 which is as previously seen the UTF8 encoding of the BULLET character.
And so see what's actually stored in your table:
SELECT HEX(your_column) FROM your_table;
Try to limit the search to a single raw to make it almost readable.

mysql regex utf-8 characters

I am trying to get data from MySQL database via REGEX with or without special utf-8 characters.
Let me explain on example :
If user enters word like sirena it should return rows which include words like sirena,siréna,šíreňá .. and so on..
also it should work backwards when he enters siréná it should return the same results..
I am trying to search it via REGEX, my query looks like this :
SELECT * FROM `content` WHERE `text` REGEXP '[sšŠ][iíÍ][rŕŔřŘ][eéÉěĚ][nňŇ][AaáÁäÄ0]'
It works only when in database is word sirena but not when there is word siréňa..
Is it because something with UTF-8 and MySQL? (collation of mysql column is utf8_general_ci)
Thank you!
MySQL's regular expression library does not support utf-8.
See Bug #30241 Regular expression problems, which has been open since 2007. They will have to change the regular expression library they use before that can be fixed, and I haven't found any announcement of when or if they will do this.
The only workaround I've seen is to search for specific HEX strings:
mysql> SELECT * FROM `content` WHERE HEX(`text`) REGEXP 'C3A9C588';
+----------+
| text |
+----------+
| siréňa |
+----------+
Re your comment:
No, I don't know of any solution with MySQL.
You might have to switch to PostgreSQL, because that RDBMS supports \u codes for UTF characters in their regular expression syntax.
Try something like ... REGEXP '(a|b|[ab])'
SELECT * FROM `content` WHERE `text` REGEXP '(s|š|Š|[sšŠ])(i|í|Í|[iíÍ])(r|ŕ|Ŕ|ř|Ř|[rŕŔřŘ])(e|é|É|ě|Ě|[eéÉěĚ])(n|ň|Ň|[nňŇ])(A|a|á|Á|ä|Ä|0|[AaáÁäÄ0])'
It works for me!
Use the lib_mysqludf_preg library from the mysql UDF repository for PCRE regular expressions directly in mysql
Although MySQL's regular expression library does not support utf-8 the mysql UDF repository has the ability to use utf-8 compatible regex according PCRE regular expressions directly in mysql.
http://www.mysqludf.org/
https://github.com/mysqludf/lib_mysqludf_preg#readme

MySQL query with non-printing characters (left-to-right mark)

I just found myself lost in the interesting situation that I need to query MySQL for fields containing a so called Left-to-right mark.
As the nature of this character is to be non-printing, thus invisible, I'm unable to simply copy/paste it into a query.
As mentioned in the linked Wikipedia article, the Left-to-right mark is Unicode character U+200F, which is a fact that I'm sure is the key to success in my current adventure.
My question is: How do I use raw Unicode in a MySQL query? Something along the lines of:
SELECT * FROM users WHERE username LIKE '%\U+200F%'
or
SELECT * FROM users WHERE username REGEXP '\U+200F'
or whatever the correct syntax for Unicode in MySQL is and depending on whether this is supported with LIKE and/or REGEXP.
To get a unicode char, something like this should work:
SELECT CHAR(<number> USING utf8);
Also, don't use REGEXP, because the regexp lib used by MySQL is very old, and doesn't support multi-byte charsets.

MySQL Query to Identify bad characters?

We have some tables that were set with the Latin character set instead of UTF-8 and it allowed bad characters to be entered into the tables, the usual culprit is people copy / pasting from Word or Outlook which copys those nasty hidden characters...
Is there any query we can use to identify these characters to clean them?
Thanks,
I assume that your connection chacater set was set to UTF8 when you filled the data in.
MySQL replaces unconvertable characters with ? (question marks):
SELECT CONVERT('тест' USING latin1);
----
????
The problem is distinguishing legitimate question marks from illegitimate ones.
Usually, the question marks in the beginning of a word are a bad sign, so this:
SELECT *
FROM mytable
WHERE myfield RLIKE '\\?[[:alnum:]]'
should give a good start.
You're probably noticing something like this 'bug'. The 'bad characters' are most likely UTF-8 control characters (eg \x80). You might be able to identify them using a query like
SELECT bar FROM foo WHERE bar LIKE LOCATE(UNHEX(80), bar)!=0
From that linked bug, they recommend using type BLOB to store text from windows files:
Use BLOB (with additional encoding field) instead of TEXT if you need to store windows files (even text files). Better than 3-byte UTF-8 and multi-tier encoding overhead.
Take a look at this Q/A (it's all about your client encoding aka SET NAMES )

Unicode (hexadecimal) character literals in MySQL

Is there a way to specify Unicode character literals in MySQL?
I want to replace a Unicode character with an Ascii character, something like the following:
Update MyTbl Set MyFld = Replace(MyFld, "ẏ", "y")
But I'm using even more obscure characters which are not available in most fonts, so I want to be able to use Unicode character literals, something like
Update MyTbl Set MyFld = Replace(MyFld, "\u1e8f", "y")
This SQL statement is being invoked from a PHP script - the first form is not only unreadable, but it doesn't actually work!
You can specify hexadecimal literals (or even binary literals) using 0x, x'', or X'':
select 0xC2A2;
select x'C2A2';
select X'C2A2';
But be aware that the return type is a binary string, so each and every byte is considered a character. You can verify this with char_length:
select char_length(0xC2A2)
2
If you want UTF-8 strings instead, you need to use convert:
select convert(0xC2A2 using utf8mb4)
And we can see that C2 A2 is considered 1 character in UTF-8:
select char_length(convert(0xC2A2 using utf8mb4))
1
Also, you don't have to worry about invalid bytes because convert will remove them automatically:
select char_length(convert(0xC1A2 using utf8mb4))
0
As can be seen, the output is 0 because C1 A2 is an invalid UTF-8 byte sequence.
Thanks for your suggestions, but I think the problem was further back in the system.
There's a lot of levels to unpick, but as far as I can tell, (on this server at least) the command
set names utf8
makes the utf-8 handling work correctly, whereas
set character set utf8
doesn't.
In my environment, these are being called from PHP using PDO, for what difference that may make.
Thanks anyway!
You can use the hex and unhex functions, e.g.:
update mytable set myfield = unhex(replace(hex(myfield),'C383','C3'))
The MySQL string syntax is specified here, as you can see, there is no provision for numeric escape sequences.
However, as you are embedding the SQL in PHP, you can compute the right bytes in PHP. Make sure the bytes you put into the SQL actually match your client character set.
There is also the char function that will allow what you wanted (providing byte numbers and a charset name) and getting a char.