Removing non-breaking spaces? - mysql

I have a query for remove all special characters.
But ONE space resists to that query at the end of email string.
Example : 'test#hotmail.com '
UPDATE my_table SET email= REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(LTRIM(RTRIM(email))),\'\x0B\',\'\'),\'\0\',\'\'),\'\t\',\'\'),\'\r\',\'\'),\'\n\',\'\'),\'\r\n\',\'\'),\'\n\r\',\'\'),\' \',\'\'),CHAR(160),\'\') WHERE id=X
Why?
I use this statement because I have a WHERE id IN(), so I don't want to process special characters in PHP. I want to UPDATE every emails directly with SET and replace, trim() function.
However, some whitespace is not deleted and I don't know why.
My table has approximately 12 millions of rows. I have programmed a CRON which fetch them to delete all specials characters (unfortunately because in the past we don't had check them on INSERT).
So I have build this query to process my 12 MM rows. It works very great except the right whitespace (sometimes it is removed sometimes not). And I want to add that on Workbench, the query works 100% all the time. It does not make sense.
Here is my query again without backslash and with my where IN:
UPDATE NEWSLETTER_SUBSCRIPTION SET email= REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(LTRIM(RTRIM(email))),'\x0B',''),'\0',''),'\t',''),'\r',''),'\n',''),'\r\n',''),'\n\r',''),' ',''),CHAR(160),'') WHERE id IN (' . implode(',', $idEmailToBeProcess) . ')
$idEmailToBeProcess contains around 500 ids.
I think the right whitespace it's a non-breaking space, but my last test with CHAR(160) in my query didn't work.

how about whitelisting? ie allow only valid characters
regex_replace [^-_.#a-zA-Z] with ''

Ok, finally I had found the problem !!!
Encoding of PDO is the problem...
Just adjusted driver options and all works good!
PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES \'UTF8\'')
Thanks guys anyway!

Found that on MySQL checking against CHAR(160) does not work.For UTF-8 NBSP the below worked for me:
REPLACE(the_field, UNHEX('C2A0'), ' ').
Solution provided on similar stack overflow question Whitespace in a database field is not removed by trim()

Related

Laravel Route Parameters Not Trimmed (it normally works when whitespace is added)

I have the following route in web.php:
Route::get('posts/{encoded_id}/{slug}', 'PostController#show')
... and it works fine:
http://example.test/posts/1Dl89aRjpk/this-is-some-title
But the "problem" is that it will also work when I add a white space at the end of route parameter {encoded_id}:
http://example.test/posts/1Dl89aRjpk /this-is-some-title
// or
http://example.test/posts/1Dl89aRjpk%20 /this-is-some-title
// or
http://example.test/posts/1Dl89aRjpk%20%20 /this-is-some-title
With whitespace added at the end - this will work normally and there is no 404:
Post::where('encoded_id', $encoded_id)->firstOrFail();
... but why? And how can I make it to fail (to give 404)?
Maybe because of the type of field in the DB (CHAR)?
$table->char('encoded_id', 10)
If that's why - is there any way to configure MySQL in databases.php so that it will prevent this?
Or maybe it has something to do with .htaccess (I'm using XAMPP / Windows)?
I'm using Laravel 5.6.
EDIT:
I'm asking why this is happening and how can I prevent it, not how to trim route parameter. For example, add white space at the end of the question id on stackoverflow url and you will get 404:
https://stackoverflow.com/questions/51068436 /laravel-route-parameters-not-trimmed-it-normally-works-when-whitespace-is-added
This is due to expected SQL behaviour. In your controller you receive the full $encoded_id with spaces. All what Laravel does for you, is calling an SQL select query with WHERE. SQL ignores trailing spaces in WHERE comparison.
See this question.
If you want a 404, replace spaces in the ID to some dummy character:
$encoded_id = str_replace(' ', '#', $encoded_id);
Do this only if it is guaranteed that the ID doesn't contain spaces or hash marks otherwise.
Building on balping's answer. Some other solutions would be:
Replace all trailing spaces with #
preg_replace("/\s+$/", "#", $encoded_id);
Use trim in combination with str_pad and strlen. This will trim the whitespaces from the front and back but pad the string with #'s so it's still the original length.
str_pad(trim($encoded_id), strlen($encoded_id), '#');

Weird character at the end of database entry

I am migrating an excel sheet (csv) to mysql, however when I do an insert, some fields end up with empty spaces at the end, and I cant get rid of them for some reason. So I assume there is a wierd character at the end, since not even this:
UPDATE FOO set FIELD2 = TRIM(Replace(Replace(Replace(FIELD2,'\t',''),'\n',''),'\r',''));
Gets rid of it completely, I still have a whitespace at the end and I dont know how to get rid of it. I have over 2000 entries, so doing it manually is not an option. I am using Laravel with the revision package and it doesnt work because it thinks that those spaces at the end are changes and it creates a bunch of duplicates. Thank you for your help.
If you think there are weird characters in the original csv, you could open it in a text processor capable of doing regex replaces, and then replace all non ascii characters with nothing.
Your regex would look like this:
[^\u0000-\u007F]+
then after removing any possible strange characters, re-import the data into the database.
Unfortunately, I don't think regex replaces are possible in sql, so you'll need to re-import.

MySQL: how to replace literal \r\n with special characters \r\n

I have some faulty PHP code which inserted literal \r\n characters into the database instead of the special characters representing new line and carriage return. Can anyone help me come up with a query that will replace the literals with the special characters?
Here's an SQL Fiddle setup. All I really need is something that will return the row containing "abc\r\ndef" rather than the other row. It's probably a very simple escape that's needed, but I can't work it out.
http://sqlfiddle.com/#!9/1f2acb/1
Once I have that query I guess I will simply use
UPDATE test SET txt replace(txt, 'UNKNOWN EXPRESSIOn', '\r\n');
I'm running MySQL 5.5 on Ubuntu.
The answer was in a similar question that juanvan linked to.
UPDATE test set txt = replace(txt,'\\r\\n','\r\n');

How to replace delimiters from a string in SQL Server

I have the following data
abc
pqr
xyz,
jkl mno
This is one string separated by delimiters like space, new line, comma, tab.
There could be two or more consecutive spaces or tabs or any delimiter after or before a word.
I would like to be able to do the following
Get the individual words removing all leading and trailing delimiters off it
Append the individual words with "OR"
I am trying to achieve this to build a T-SQL query separated by OR clause.
Thanks
I think you can achieve what you need (although I think using a programming language is way better) using just SQL, here is my approach.
Kindly note that I will just handle commas, newlines and multiple-spaces, but you can simple follow using the same technique to remove the rest of your undesired characters
so let's assume that we have a table names ExampleData with a column named DataBefore and another called DataAfter.
DataBefore: has the line value that you want to clean
DataAfter: will host the cleaned text
First we need to trim the preceding & leading space(s) from the text
Update ExampleData
set DataAfter = LTRIM(RTRIM(DataBefore))
Second, we should clean all the commas, and replace them with spaces (doesn't matter if we will end up with many spaces together)
Update ExampleData
set DataAfter = replace(replace(DataAfter,',',' '),char(13),' ')
This is the part in which you may continue and remove any other characters using the same technique, and replace it by a space
So far we have a text that has no spaces before or after, and every comma, newline, TAB, dash, etc character replaced by a space, let's continue our cleaning procedure.
We can now safely move on to replace the spaces between words with just one, this is made by using the following SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' ')
as per your needs, we need to place an OR between each word, this is achievable with this SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' OR ')
we are done now, as a final step that may or may not make a change, we need to remove any space at the end of the whole text, just in case an unwanted character was at the end of the text and as a result got replaced by a space, this can be achieved by the following statement:
Update ExampleData
set DataAfter = RTRIM(DataAfter)
we are now done. :)
as a test, I've generated the following text inside the DataBefore column:
this is just a, test, to be sure, that everything is, working, great .
and after running the previous commands, ended up with this value inside the DataAfter column:
this OR is OR just OR a OR test OR to OR be OR sure OR that OR everything OR is OR working OR great OR .
Hope that this is what you want, let me know if you need any extra help :)

Removing strange characters from MySQL data

Somewhere along the way, between all the imports and exports I have done, a lot of the text on a blog I run is full of weird accented A characters.
When I export the data using mysqldump and load it into a text editor with the intention of using search-and-replace to clear out the bad characters, searching just matches every "a" character.
Does anyone know any way I can successfully hunt down these characters and get rid of them, either directly in MySQL or by using mysqldump and then reimporting the content?
This is an encoding problem; the  is a non-breaking space (HTML entity ) in Unicode being displayed in Latin1.
You might try something like this... first we check to make sure the matching is working:
SELECT * FROM some_table WHERE some_field LIKE BINARY '%Â%'
This should return any rows in some_table where some_field has a bad character. Assuming that works properly and you find the rows you're looking for, try this:
UPDATE some_table SET some_field = REPLACE( some_field, BINARY 'Â', '' )
And that should remove those characters (based on the page you linked, you don't really want an nbsp there as you would end up with three spaces in a row between sentences etc, you should only have one).
If it doesn't work then you'll need to look at the encoding and collation being used.
EDIT: Just added BINARY to the strings; this should hopefully make it work regardless of encoding.
The accepted answer did not work for me.
From here http://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/ I have found that the binary code for  character is c2a0 (by converting the column to VARBINARY and looking what it turns to).
Then here http://www.oneminuteinfo.com/2013/11/mysql-replace-non-ascii-characters.html found the actual solution to remove (replace) it:
update entry set english_translation = unhex(replace(hex(english_translation),'C2A0','20')) where entry_id = 4008;
The query above replaces it to a space, then a normal trim can be applied or simply replace to '' instead.
I have had this problem and it is annoying, but solvable. As well as  you may find you have a whole load of characters showing up in your data like these:
“
This is connected to encoding changes in the database, but so long as you do not have any of these characters in your database that you want to keep (e.g. if you are actually using a Euro symbol) then you can strip them out with a few MySQL commands as previously suggested.
In my case I had this problem with a Wordpress database that I had inherited, and I found a useful set of pre-formed queries that work for Wordpress here http://digwp.com/2011/07/clean-up-weird-characters-in-database/
It's also worth noting that one of the causes of the problem in the first place is opening a database in a text editor which might change the encoding in some way. So if you can possibly manipulate the database using MySQL only and not a text editor this will reduce the risk of causing further trouble.