Remove Quotes and Commas from a String in MySQL - mysql

I'm importing some data from a CSV file, and numbers that are larger than 1000 get turned into 1,100 etc.
What's a good way to remove both the quotes and the comma from this so I can put it into an int field?
Edit:
The data is actually already in a MySQL table, so I need to be able to this using SQL. Sorry for the mixup.

My guess here is that because the data was able to import that the field is actually a varchar or some character field, because importing to a numeric field might have failed. Here was a test case I ran purely a MySQL, SQL solution.
The table is just a single column (alpha) that is a varchar.
mysql> desc t;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| alpha | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
Add a record
mysql> insert into t values('"1,000,000"');
Query OK, 1 row affected (0.00 sec)
mysql> select * from t;
+-------------+
| alpha |
+-------------+
| "1,000,000" |
+-------------+
Update statement.
mysql> update t set alpha = replace( replace(alpha, ',', ''), '"', '' );
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from t;
+---------+
| alpha |
+---------+
| 1000000 |
+---------+
So in the end the statement I used was:
UPDATE table
SET field_name = replace( replace(field_name, ',', ''), '"', '' );
I looked at the MySQL Documentation and it didn't look like I could do the regular expressions find and replace. Although you could, like Eldila, use a regular expression for a find and then an alternative solution for replace.
Also be careful with s/"(\d+),(\d+)"/$1$2/ because what if the number has more then just a single comma, for instance "1,000,000" you're going to want to do a global replace (in perl that is s///g). But even with a global replace the replacement starts where you last left off (unless perl is different), and would miss the every other comma separated group. A possible solution would be to make the first (\d+) optional like so s/(\d+)?,(\d+)/$1$2/g and in this case I would need a second find and replace to strip the quotes.
Here are some ruby examples of the regular expressions acting on just the string "1,000,000", notice there are NOT double quote inside the string, this is just a string of the number itself.
>> "1,000,000".sub( /(\d+),(\d+)/, '\1\2' )
# => "1000,000"
>> "1,000,000".gsub( /(\d+),(\d+)/, '\1\2' )
# => "1000,000"
>> "1,000,000".gsub( /(\d+)?,(\d+)/, '\1\2' )
# => "1000000"
>> "1,000,000".gsub( /[,"]/, '' )
# => "1000000"
>> "1,000,000".gsub( /[^0-9]/, '' )
# => "1000000"

Here is a good case for regular expressions. You can run a find and replace on the data either before you import (easier) or later on if the SQL import accepted those characters (not nearly as easy). But in either case, you have any number of methods to do a find and replace, be it editors, scripting languages, GUI programs, etc. Remember that you're going to want to find and replace all of the bad characters.
A typical regular expression to find the comma and quotes (assuming just double quotes) is: (Blacklist)
/[,"]/
Or, if you find something might change in the future, this regular expression, matches anything except a number or decimal point. (Whitelist)
/[^0-9\.]/
What has been discussed by the people above is that we don't know all of the data in your CSV file. It sounds like you want to remove the commas and quotes from all of the numbers in the CSV file. But because we don't know what else is in the CSV file we want to make sure that we don't corrupt other data. Just blindly doing a find/replace could affect other portions of the file.

You could use this perl command.
Perl -lne 's/[,|"]//; print' file.txt > newfile.txt
You may need to play around with it a bit, but it should do the trick.

Here's the PHP way:
$stripped = str_replace(array(',', '"'), '', $value);
Link to W3Schools page

Actually nlucaroni, your case isn't quite right. Your example doesn't include double-quotes, so
id,age,name,...
1,23,phil,
won't match my regex. It requires the format "XXX,XXX". I can't think of an example of when it will match incorrectly.
All the following example won't include the deliminator in the regex:
"111,111",234
234,"111,111"
"111,111","111,111"
Please let me know if you can think of a counter-example.
Cheers!

The solution to the changed question is basically the same.
You will have to run select query with the regex where clause.
Somthing like
Select *
FROM SOMETABLE
WHERE SOMEFIELD REGEXP '"(\d+),(\d+)"'
Foreach of these rows, you want to do the following regex substitution s/"(\d+),(\d+)"/$1$2/ and then update the field with the new value.
Please Joseph Pecoraro seriously and have a backup before doing mass changes to any files or databases. Because whenever you do regex, you can seriously mess up data if there are cases that you have missed.

My command does remove all ',' and '"'.
In order to convert the sting "1,000" more strictly, you will need the following command.
Perl -lne 's/"(\d+),(\d+)"/$1$2/; print' file.txt > newfile.txt

Daniel's and Eldila's answer have one problem: They remove all quotes and commas in the whole file.
What I usually do when I have to do something like this is to first replace all separating quotes and (usually) semicolons by tabs.
Search: ";"
Replace: \t
Since I know in which column my affected values will be I then do another search and replace:
Search: ^([\t]+)\t([\t]+)\t([0-9]+),([0-9]+)\t
Replace: \1\t\2\t\3\4\t
... given the value with the comma is in the third column.
You need to start with an "^" to make sure that it starts at the beginning of a line. Then you repeat ([0-9]+)\t as often as there are columns that you just want to leave as they are.
([0-9]+),([0-9]+) searches for values where there is a number, then a comma and then another number.
In the replace string we use \1 and \2 to just keep the values from the edited line, separating them with \t (tab). Then we put \3\4 (no tab between) to put the two components of the number without the comma right after each other. All values after that will be left alone.
If you need your file to have semicolon to separate the elements, you then can go on and replace the tabs with semicolons. However then - if you leave out the quotes - you'll have to make sure that the text values do not contain any semicolons themselves. That's why I prefer to use TAB as column separator.
I usually do that in an ordinary text editor (EditPlus) that supports RegExp, but the same regexps can be used in any programming language.

Related

SQL Regex last character search not working

I'm using regex to find specific search but the last separator getting ignore.
Must search for |49213[A-Z]| but searches for |49213[A-Z]
SELECT * FROM table WHERE (data REGEXP '/\|49213[A-Z]+\|/')
Why are you using | in the pattern? Why the +?
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]\|')
If you want multiple:
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]+\|')
or:
SELECT * FROM table WHERE (data REGEXP '[|]49213[A-Z][|]')
Aha. That is rather subtle.
\ escapes certain characters that have special meaning.
But it does not seem to do so for | ("or") or . ("any byte"), etc.
So, \| is the same as |.
But the regexp parser does not like having either side of "or" being empty. (I suspect this is a "bug"). Hence the error message.
https://dev.mysql.com/doc/refman/5.7/en/regexp.html says
To use a literal instance of a special character in a regular expression, precede it by two backslash () characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:
The best fix seems to be [|] or \\| instead of \| when you want the pipe character.
Someday, the REGEXP parser in MySQL will be upgraded to PCRE as in MariaDB. Then a lot more features will come, and this 'bug' may go away.

Select special characteres mysql

I need to make selects from fields that can contain special characteres for example
+--------------+
| code |
+--------------+
| **4058947"_\ |
| **4123/"_\ |
| sew'-8947"_\ |
+--------------+
i try this
select code from table where code REGEXP '[(|**4058947"_\|)]';
select code from table where code REGEXP '[(**4058947"_\)]';
select code from table where code REGEXP '^[(**4058947"_\)]';
but the querys return all rows and this query return empty
select code from table where code REGEXP '^[(**4058947"_\)]$';
and i need that only return the first one or the specified
To select only one row, you could just do this if it doesn't matter which one.
SELECT code FROM table LIMIT 1
If it does matter, drop the regex.
SELECT code FROM table WHERE code = "**4058947\"_\\"
To match those special characters (in this case, " and \), you need to "escape" them. (That's how it's called. I didn't make that up.) In most mainstream languages this is done by putting a backslash in front of it (MySQL does it this way too). The backslash is the escape character, a backslash with another character behind it is called an escape sequence. As you see, I escaped the quote and the backslash in the code value I want to match, so it should work now.
If you need to keep the regexes (which I hope is not the case, since you have the literal string you want to match against) same thing applies. Escape quotes and backslashes and you'll be fine, if you drop the parentheses and brackets. Note that in a regex, you need to escape far more characters. This is because some characters (for example: | [] () * + have a special function in a regex. This is very handy, but becomes a bit of a problem when you need to match a string with that character in it. In that case, you need to escape it, but with a double backslash! This is because MySQL first parses the query and will throw an error if it encounters an invalid escape sequence (that is, if you escape a character you needn't escape according to MySQL). Only then is the result parsed as a regex, with the double backslashes replaced by single backslashes. This gets ugly very quickly, since this means matching a backslash with a MySQL regex requires 4 backslashes! Two in the regex, but this needs to be doubled, since MySQL parses it as a string first!

SQL UPDATE with LIKE

I was trying to update approx 20,000 records in a table by using statements like this one,
however, I got the message say 0 row(s) affected so it didn't work.
UPDATE nc_files SET title ="Worklog details" WHERE "log_name" LIKE "%PC01%"
The log_name field has all the file names with mixed file extensions and cases ,e.g.
PC01.TXT | Worklog details
PC02.txt | Project Summary
PC03.DOC| Worklog details
PC04.doc| Project Summary
The reason why I need to use LIKE is that the updated file only have file name without extension, e.g.
PC01 | Worklog details
PC02 | Project Summary
How do I update records by using LIKE?
The log_name is a field name, remove literal quotes from it -
UPDATE nc_files SET title ="Worklog details" WHERE log_name LIKE "%PC01%"
this is because your column name log_name should not be in ' quotes.
"log_name" LIKE "%PC01%" condition will always fail and zero rows will get updated, try this:
UPDATE nc_files
SET title ="Worklog details"
WHERE log_name LIKE "%PC01%";
By default MySQL allows double quoted names to be understood as either identifiers (like column names) or as string literals.
This is meant as a convenience, but I find the semantic ambiguity frustrating. MySQL must resolve the ambiguity, and cannot magically always guess the coder's intention, as you discovered.
-- In default sql_mode under 5.5
--
SELECT "foo" -- "foo" is a *column* if tbl.foo exists, otherwise a string
FROM "tbl" -- Oops! ER_PARSE_ERROR (1064) Can't do this in default mode.
WHERE "foo" = "foo"; -- Both of these are strings
So, the way around it is to force unambiguous interpretation of identifiers:
Do not quote simple identifiers
Use MySQL-specific backticks for quoting(This is ODBC's SQL_IDENTIFIER_QUOTE_CHAR)
Always override the change the sql_mode to include ANSI_QUOTES (or a superset of it)Double quotes are then exclusively for identifiers, single quotes for strings.
#3 is my personal favorite, for readability and portability. The problem is it tends to surprise people who only know MySQL, and you have to remember to override the default.
"log_name" should not be in quotes
I had a similar trouble. The problem are the quotations marks "
I Fixed my code as follow.
UPDATE Table SET Table.Field = "myreplace"
WHERE (((Table.Field) Like '%A-16%'));
Regards, Alexwin1982
Try replace keyword
UPDATE nc_files SET title = REPLACE(title, 'PC01', 'Worklog details') WHERE log_name LIKE '%PC01%'

Creating variables and reusing within a mysql update query? possible?

I am struggling with this query and want to know if I am wasting my time and need to write a php script or is something like the following actually possible?
UPDATE my_table
SET #userid = user_id
AND SET filename('http://pathto/newfilename_'#userid'.jpg')
FROM my_table
WHERE filename
LIKE '%_%' AND filename
LIKE '%jpg'AND filename
NOT LIKE 'http%';
Basically I have 700 odd files that need renaming in the database as they do not match the filenames as I am changing system, they are called in the database.
The format is 2_gfhgfhf.jpg which translates to userid_randomjumble.jpg
But not all files in the database are in this format only about 700 out of thousands. So I want to identify names that contain _ but don't contain http (thats the correct format that I don't want to touch).
I can do that fine but now comes the tricky bit!!
I want to replace that file name userid_randomjumble.jpg with http://pathto/filename_userid.jpg So I want to set the column user_id in that row to a variable and insert it into my new filename.
The above doesn't work for obvious reasons but I am not sure if there is a way round what I'm trying to do. I have no idea if it's possible? Am I wasting my time with this and should I turn to PHP with mysql and stop being lazy? Or is there a way to get this to work?
Yes it is possible without the php. Here is a simple example
SET #a:=0;
SELECT * FROM table WHERE field_name = #a;
Yes you can do it using straightforward SQL:
UPDATE my_table
SET filename = CONCAT('http://pathto/newfilename_', userid, '.jpg')
WHERE filename LIKE '%\_%jpg'
AND filename NOT LIKE 'http%';
Notes:
No need for variables. Any columns of rows being updated may be referenced
In mysql, use CONCAT() to add text values together
With LIKE, an underscore (_) has a special meaning - it means "any single character". If you want to match a literal underscore, you must escape it with a backslash (\)
Your two LIKE predicates may be safely merged into one for a simpler query

Escape % character in MySQL

I'm trying to save a char which contains, among others %, but the problem is that while other characters such as ' / or " seem to work just fine, I just can't figue out how to escape %. I tried many things, but using mysql_real_escape_string or other stuff just don't work.
At the moment, I'm replacing % with 'percent' (it didn't work with backslash + %), saving that to the database, and then replacing it back before echoing, but as you may notice, it's not really optimal. Please let me know if there's a better way to do it.
Don't try to put variable values in-line with the query. Use placeholders and value binding.
If in php you need to use addslashes to percent character before inserting.
for mysql its:
'10\% off' //you need to escape % character
MySQL will not change % character while doing an insert/update. If you are having problems with it, it must be some other layer in your setup which is doing the conversion.
create table test ( a varchar(10));
insert into test values ('abc%def');
select * from test;
+---------+
| a |
+---------+
| abc%def |
+---------+
1 row in set (0.00 sec)
Apparently, the % can be escaped by making it double, so a quite simple str_replace did it for me.