mysql select query ignoring inner spaces - mysql

Banging me head against the wall with this one.
I have table containing postcodes and street names and I have another table where Houses are listed for sale ( where the Street name is missing) and I am tryin to get the Street name for each post code.
The problem is that table 1 stores the postcode without the space and table 2 which I am trying to update stores the post code with the space.
So in table 1 the postcode is stored as "l249pb" and table 2 it is stored as "l24 9pb".
Now if the post codes where both stored in exactly the same format i.e without the space I would expect this query to work:
UPDATE Table1
INNER JOIN Table2 ON ( Table1.PostCode = Table2.PostCode )
SET Table1.StreetName = Table2.StreetName
I have tried this but it wont work :
UPDATE Table1
INNER JOIN Table2 ON ( Table1.PostCode = REPLACE(Table2.PostCode,' ',''))
SET Table1.StreetName = Table2.StreetName
can anyone tell me how to check for a match ignoring spaces ( like a trim but removing every space )
Many thanks for any help you can offer.

With the data you've given your UPDATE runs just fine. Probably the whitespaces you see are not actually spaces, but something else, e.g. non-breaking spaces, tabs etc.
After normal SPACE, the next most common white spaces (which are not line breaks) are CHARACTER TABULATION (ie. horizontal tab) and NO-BREAK SPACE. You could use CHAR(9) and CHAR(160), respectively, to reference them in your query.
It also might be possible that your table viewer application shows line breaks as a space for brevity, so if replacing space, tab and nbsp isn't enough, try replacing those, too.
If you really need to replace all white space characters… Unfortunately there is no "white space wildcard" to use in MySQL. Technically, you could make a monster REPLACE(REPLACE(REPLACE(REPLACE…-call, which, in the end, would replace all whitespace characters with ''. For example, to replace every THREE-PER-EM SPACE, first look for its Unicode code point (U+2004), then you can replace its occurences e.g. with:
REPLACE(PostCode, CHAR(0x2004 using ucs2), '')
There is a hackish shortcut to this: if you are sure that your data should contain only Latin-1 characters and no ? (question mark), you could CONVERT() the string first as latin1, which replaces all characters with overflowing code as ?and then replace all ? as '':
REPLACE(CONVERT(PostCode using latin1), '?', '')
This can be useful in one-off, manual queries, but for continuing use, better replace the characters explicitly.
But first you should check your data input sanitizer/validator, so future records won't be such a mess. Perhaps you could consider running a bulk replace to normalize the data on PostCode column(s), if possible, before even trying to do your join query. Legacy systems with legacy data only get worse over time.

Related

How to replace delimiters from a string in SQL Server

I have the following data
abc
pqr
xyz,
jkl mno
This is one string separated by delimiters like space, new line, comma, tab.
There could be two or more consecutive spaces or tabs or any delimiter after or before a word.
I would like to be able to do the following
Get the individual words removing all leading and trailing delimiters off it
Append the individual words with "OR"
I am trying to achieve this to build a T-SQL query separated by OR clause.
Thanks
I think you can achieve what you need (although I think using a programming language is way better) using just SQL, here is my approach.
Kindly note that I will just handle commas, newlines and multiple-spaces, but you can simple follow using the same technique to remove the rest of your undesired characters
so let's assume that we have a table names ExampleData with a column named DataBefore and another called DataAfter.
DataBefore: has the line value that you want to clean
DataAfter: will host the cleaned text
First we need to trim the preceding & leading space(s) from the text
Update ExampleData
set DataAfter = LTRIM(RTRIM(DataBefore))
Second, we should clean all the commas, and replace them with spaces (doesn't matter if we will end up with many spaces together)
Update ExampleData
set DataAfter = replace(replace(DataAfter,',',' '),char(13),' ')
This is the part in which you may continue and remove any other characters using the same technique, and replace it by a space
So far we have a text that has no spaces before or after, and every comma, newline, TAB, dash, etc character replaced by a space, let's continue our cleaning procedure.
We can now safely move on to replace the spaces between words with just one, this is made by using the following SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' ')
as per your needs, we need to place an OR between each word, this is achievable with this SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' OR ')
we are done now, as a final step that may or may not make a change, we need to remove any space at the end of the whole text, just in case an unwanted character was at the end of the text and as a result got replaced by a space, this can be achieved by the following statement:
Update ExampleData
set DataAfter = RTRIM(DataAfter)
we are now done. :)
as a test, I've generated the following text inside the DataBefore column:
this is just a, test, to be sure, that everything is, working, great .
and after running the previous commands, ended up with this value inside the DataAfter column:
this OR is OR just OR a OR test OR to OR be OR sure OR that OR everything OR is OR working OR great OR .
Hope that this is what you want, let me know if you need any extra help :)

Delete all characters before and after quotation marks

I have a CSV file, which has two columns and 4500 rows. In one column, I have several phrases that are surrounded in quotation marks. I need to delete all the text that comes before and after the quotations marks.
For example:
How would you say "Hello, my Friend" when speaking outside?
should become "Hello, my Friend"
I also have several rows that have the word NULL in the second column. I need these rows deleted in full.
What's the best way of doing something like this? I have been looking at regular expressions, but I'm not sure if they are flexible enough to do what I want to do, or how you would use them on a CSV file (I need the table structure to remain).
EDIT:
1) At the moment I am just using Apple Numbers, but I know that wont don't it, so I am happy to any suggestions. It must support Kanji characters.
2) I have removed all the NULL rows, so that is no longer needed (I simply added a column of numbers, sorted the table so all the NULLs were together, deleted them and the sorted back by the column of numbers).
Find a text editor that supports regular expression search and replace.
Something like this would match ,NULL in the second column: ^.*,NULL.*$. Replace it with "DELETEMEDELETEME" to mark the line, or as an empty string or find a way to have it match on `\n' or '\r' to catch the line break and remove the entire line completely.
Stripping out parts of the quoted string might work like this:
^(.*,){n}(.*)(\".\")(.*)(,.*)$ replaced with \1\3\5 where n is the number of columns preceding the one you want to edit. Repeat (.*,) if that's not available. It will depend on the regex flavor of your tool.

sql server select where breaks with field containing apostrophe

I have set up a job to run reports and uses multiple tables with joins. I am joining two tables on a string field and if the field contains an apostrophe, it does not return any matches. This is weird and not sure why is is happening now and never before. I am perhaps not identifying the exact cause but will appreciate any help here:
Example query: "today's deals"
SET #TITLE = (SELECT MAX(B.DATEADDED) as 'td','',
(C.CLIENT + CHAR(10) + B.CLIENTKEY) as 'td','',
B.BADQ as 'td','',A.FULLQ as 'td','', B.BADERROR as 'td',
''
FROM BADQUERY AS B
LEFT JOIN QDATA AS A ON B.BADQ = A.QUERYT
LEFT JOIN Clients AS C ON C.clientKey = B.clientKey
WHERE DATEDIFF(minute,CAST(B.DATEADDED as datetime),GETDATE())<=420 AND
DAY(GETDATE()) = DAY(B.DATEADDED)
GROUP BY B.BADT,A.FULLQ, B.CLIENTKEY,C.CLIENT, B.BADERROR
FOR XML PATH ('tr'), ELEMENTS XSINIL)
For some reason A.FULLQ is being returned as NULL. When I do it separately with just a query the result set is also null but I know the matching record in QDATA as A is in the table. So if it is the query with apostrophe how can get the matching field or is sql server matching the data and something else is wrong.
If I try and match with a like it returns results but this is not accurate.
If B.BADQ and A.QUERYT don't exactly match, you won't get any records back. The fact that it works with a LIKE makes me wonder whether one of them has additional characters, either before or after the matching data (depending on how you set up the LIKE).
Michael Green is right, below, that trailing blanks by themselves don't prevent a match, but, depending on where your data originates, you might have some other character (such as an embedded CHAR(0) or a TAB character) that doesn't appear when you view the data in the record but which is enough to prevent the records from matching. You might use the CHECKSUM() function on the two strings to verify that they do represent the same data.
Another, similar possibility is that if there is a string of blanks in the values (something like "A, B, ' '") the number of blanks might be different between the two instances. They'd look the same in HTML (which it looks like you're generating) but they'd be different in reality and be enough to prevent a match.
Finally, the fact that you're generating XML and observing trouble with apostrophes made me think of this: if the content of an XML tag has an apostrophe, it will be converted to &apos;. That ought to affect only the output, not the functioning, of the query, but I don't know what your data actually looks like.

MySQL imported wrong datatype into a VARCHAR column

Today I did something stupid: I had a list of card numbers in an excel file that I had to import to a DB table somehow. So i exported the numbers to CSV file, but without any quotes (don't ask me why). The file looked like:
123456
234567
345678
...
Then I created a table with a single VARCHAR(22) column and did a
LOAD DATA LOCAL INFILE 'numbers.csv' INTO TABLE cards
This worked fine, apart from many warnings, which I ignored (the other stupid thing I did).
After that I tried to query with this SQL:
SELECT * FROM cards WHERE number='123456'
which gave me an empty result. Whereas this works:
SELECT * FROM cards WHERE number=123456
Notice the missing quotes! So it seems, that I managed to populate my VARCHAR table with INTEGER data. I have no idea how that is possible at all.
I already tried to fix this with an UPDATE like this
UPDATE cards SET number = CAST(number AS CHAR(22))
But that didn't work.
So is there a way to fix this and how could this even happen?
This is the result of some implicit conversion in order to do a numerical comparison:
SELECT * FROM cards WHERE number='123'
This will only match against text fields that are literally "123" and will miss on " 123" and "123\r" if you have those. For some reason, "123 " and "123" are considered "equivalent" presumably do to trailing space removal on both sides.
When doing your import, don't forget LINES TERMINATED BY '\r\n'. If you're ever confused about what's in a field, including hidden characters, try:
SELECT HEX(number) FROM cards
This will show the hex-dumped output of each string. Things like 20 indicate space, just as %20 in a URL is a space.
You can also fix this by:
UPDATE cards SET number=REPLACE(number, '\r', '')

MySQL won't replace words with empty space

Basically, I have a problem with replace() function in MySQL (via phpMyAdmin). One table got messed and some special characters (+ empty space after it) appeared inside a word. So all I wanted to do was:
UPDATE myTable SET columnName =
(replace(columnName, 'Å house',
'house'))
But MySQL returns
0 row(s) affected. ( Query took 0.0107 sec )
The same is when I try to replace foreign towns with special characters in the name of a town (Swedish town, German town, etc.)
Am I doing something wrong???
Å house
Is likely to actually be:
Å house
That is, with a U+00A0 Non Break Space character and not a normal space. Of course normally you cannot see the difference, but a string replace can and won't touch it.
This was probably originally just a single non-breaking-space character, that has been mangled through a classic UTF-8-read-as-ISO-8859-1 encoding screw-up. Other non-ASCII characters in your database are likely to have been similarly messed up.