Encoding in Coldfusion - Observing a weird value from query result - json

I'm just running a query and forming a JSON string in cfloop.
For some values that are formed within JSON, I see some bogus extra characters at the end. At first, I suspected them to be white spaces or tabs but adding a Trim(name) did not work.
"first_name":"Jon "
When I copied the string over to Notepad++ and converted it to utf-8, Here is what I am seeing:
"first_name":"Jon **xA0**"
I am not sure what that xA0 means here. Is there any way to supress this?
Thanks.

Try replacing with this
<cfset lastname = replacelist(lastname, chr(160), '')>

Related

SQL is seemgly removing spaces that have been inserted by RegEx

I am working on a JavaScript app, in which I am preparing my data replacing tabs with spaces using RegEx in the frontend:
str = str.replace(/\t+/g, " ");
So
'tabbed title'
becomes
'tabbed title' and so on and so forth
This is then passed to an express route which then sends the data to my MySQL database via a stored procedure, utilizing the escape() method from the Javascript MySQL sdk
The issue is, when passing a string where tab characters have been replaced with spaces after the RegEx, the title is being stored in the database as 'tabbedtitle'
When entering 'tabbed title' normally, with spaces entered via my keyboard, the space is preserved. After the RegEx transform, it is not. It seems like SQL is doing something under the hood, or the " " in my RegEx is not a traditional space character (even though in all my of my research it appears it is a regular space)
I've confirmed I am indeed passing 'tabbed title' to the db from express, and there is nothing transforming the data inside my SP. I've even tried entering a utf-8 space \u0020 rather than " " in my RegEx, but the problem perists
Instead of replacing tabs with a space maybe replace them with a hyphen or some other non-whitespace character? Might help narrow it down

Using REPLACE with CHAR(160) is Returning Hexadecimal as Value

I am trying to get rid of &nbsp characters in MYSQL, but am getting weird behavior where using REPLACE is returning a hexadecimal string.
The original value is some HTML stored in a field with the type BLOB:
<h3>This was just an appetizer. Are you ready for the full course?</h3><p>Dive into more business news, check out the latest tech trends, and get a couple quick tips from our health section.  </p></div>
The SQL I am using is this:
UPDATE tbl
SET field = REPLACE(field, CHAR(160), '');
And after executing, this is what is left in the database:
3C68333E5468697320776173206A75737420616E206170706574697A65722E2041726520796F7520726561647920666F72207468652066756C6C20636F757273653F3C2F68333E3C703E4469766520696E746F206D6F726520627573696E657373206E6577732C20636865636B206F757420746865206C61746573742074656368207472656E64732C20616E6420676574206120636F75706C6520717569636B20746970732066726F6D206F7572206865616C74682073656374696F6E2E20C23C2F703E3C2F6469763E
What is going on and how could I avoid this? Do I need to use VARCHAR for the field type?
You get (binary) BLOB back, after the replace.
so you have to convert it back to text
UPDATE tbl
SET field = CAST(REPLACE(field, CHAR(160), '')AS CHAR(10000) CHARACTER SET utf8);
Of course you have to check character set and size.
I found that CHAR codes didn't work, but a copy pasted whitespace worked. This looks like a normal space, but is in fact CHAR(160) and I don't have an error anymore. ' '

Laravel Route Parameters Not Trimmed (it normally works when whitespace is added)

I have the following route in web.php:
Route::get('posts/{encoded_id}/{slug}', 'PostController#show')
... and it works fine:
http://example.test/posts/1Dl89aRjpk/this-is-some-title
But the "problem" is that it will also work when I add a white space at the end of route parameter {encoded_id}:
http://example.test/posts/1Dl89aRjpk /this-is-some-title
// or
http://example.test/posts/1Dl89aRjpk%20 /this-is-some-title
// or
http://example.test/posts/1Dl89aRjpk%20%20 /this-is-some-title
With whitespace added at the end - this will work normally and there is no 404:
Post::where('encoded_id', $encoded_id)->firstOrFail();
... but why? And how can I make it to fail (to give 404)?
Maybe because of the type of field in the DB (CHAR)?
$table->char('encoded_id', 10)
If that's why - is there any way to configure MySQL in databases.php so that it will prevent this?
Or maybe it has something to do with .htaccess (I'm using XAMPP / Windows)?
I'm using Laravel 5.6.
EDIT:
I'm asking why this is happening and how can I prevent it, not how to trim route parameter. For example, add white space at the end of the question id on stackoverflow url and you will get 404:
https://stackoverflow.com/questions/51068436 /laravel-route-parameters-not-trimmed-it-normally-works-when-whitespace-is-added
This is due to expected SQL behaviour. In your controller you receive the full $encoded_id with spaces. All what Laravel does for you, is calling an SQL select query with WHERE. SQL ignores trailing spaces in WHERE comparison.
See this question.
If you want a 404, replace spaces in the ID to some dummy character:
$encoded_id = str_replace(' ', '#', $encoded_id);
Do this only if it is guaranteed that the ID doesn't contain spaces or hash marks otherwise.
Building on balping's answer. Some other solutions would be:
Replace all trailing spaces with #
preg_replace("/\s+$/", "#", $encoded_id);
Use trim in combination with str_pad and strlen. This will trim the whitespaces from the front and back but pad the string with #'s so it's still the original length.
str_pad(trim($encoded_id), strlen($encoded_id), '#');

Weird character at the end of database entry

I am migrating an excel sheet (csv) to mysql, however when I do an insert, some fields end up with empty spaces at the end, and I cant get rid of them for some reason. So I assume there is a wierd character at the end, since not even this:
UPDATE FOO set FIELD2 = TRIM(Replace(Replace(Replace(FIELD2,'\t',''),'\n',''),'\r',''));
Gets rid of it completely, I still have a whitespace at the end and I dont know how to get rid of it. I have over 2000 entries, so doing it manually is not an option. I am using Laravel with the revision package and it doesnt work because it thinks that those spaces at the end are changes and it creates a bunch of duplicates. Thank you for your help.
If you think there are weird characters in the original csv, you could open it in a text processor capable of doing regex replaces, and then replace all non ascii characters with nothing.
Your regex would look like this:
[^\u0000-\u007F]+
then after removing any possible strange characters, re-import the data into the database.
Unfortunately, I don't think regex replaces are possible in sql, so you'll need to re-import.

Removing strange characters from MySQL data

Somewhere along the way, between all the imports and exports I have done, a lot of the text on a blog I run is full of weird accented A characters.
When I export the data using mysqldump and load it into a text editor with the intention of using search-and-replace to clear out the bad characters, searching just matches every "a" character.
Does anyone know any way I can successfully hunt down these characters and get rid of them, either directly in MySQL or by using mysqldump and then reimporting the content?
This is an encoding problem; the  is a non-breaking space (HTML entity ) in Unicode being displayed in Latin1.
You might try something like this... first we check to make sure the matching is working:
SELECT * FROM some_table WHERE some_field LIKE BINARY '%Â%'
This should return any rows in some_table where some_field has a bad character. Assuming that works properly and you find the rows you're looking for, try this:
UPDATE some_table SET some_field = REPLACE( some_field, BINARY 'Â', '' )
And that should remove those characters (based on the page you linked, you don't really want an nbsp there as you would end up with three spaces in a row between sentences etc, you should only have one).
If it doesn't work then you'll need to look at the encoding and collation being used.
EDIT: Just added BINARY to the strings; this should hopefully make it work regardless of encoding.
The accepted answer did not work for me.
From here http://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/ I have found that the binary code for  character is c2a0 (by converting the column to VARBINARY and looking what it turns to).
Then here http://www.oneminuteinfo.com/2013/11/mysql-replace-non-ascii-characters.html found the actual solution to remove (replace) it:
update entry set english_translation = unhex(replace(hex(english_translation),'C2A0','20')) where entry_id = 4008;
The query above replaces it to a space, then a normal trim can be applied or simply replace to '' instead.
I have had this problem and it is annoying, but solvable. As well as  you may find you have a whole load of characters showing up in your data like these:
“
This is connected to encoding changes in the database, but so long as you do not have any of these characters in your database that you want to keep (e.g. if you are actually using a Euro symbol) then you can strip them out with a few MySQL commands as previously suggested.
In my case I had this problem with a Wordpress database that I had inherited, and I found a useful set of pre-formed queries that work for Wordpress here http://digwp.com/2011/07/clean-up-weird-characters-in-database/
It's also worth noting that one of the causes of the problem in the first place is opening a database in a text editor which might change the encoding in some way. So if you can possibly manipulate the database using MySQL only and not a text editor this will reduce the risk of causing further trouble.