Escape single backslash in MySQL JSON INSERT - mysql

I'm trying to encode a regex in a JSON data field in a MySQL database.
The regex is as follows: ^\d*[13579]$ and should look the same, if I try to read it afterwards.
AFAIK, for single backslash escaping in SQL I need double backslashes.
However, when is replace the single backslash with two like this:
^\\d*[13579]$, I get an error stating:
Invalid JSON text: "Invalid escape character in string." and my IDE also shows it as an error. When I use another set of two backslashes, the error disappears, but I also get two backslashes in the final string.
Any idea, what the problem might be?
Thanks!

The double-backslash is correct for JSON.
JSON has its own escape sequences, similar to the escape sequences in regular expressions. In JSON, \b means backspace, \n means newline, \t means tab, and so on. If you want to store a literal backslash character, use \\. Otherwise the backslash must be followed by one of the recognized escape sequences.
If you store a literal backslash character in a JSON value, it must be a double backslash. If you extract that JSON value and "unquote" it, it will be returned as a single backslash as you intended.
Demo:
mysql> create table t ( j json );
mysql> insert into t set j = '["^\\\\d*[13579]$"]';
mysql> select j from t;
+-------------------+
| j |
+-------------------+
| ["^\\d*[13579]$"] |
+-------------------+
mysql> select j->>'$[0]' from t;
+--------------+
| j->>'$[0]' |
+--------------+
| ^\d*[13579]$ |
+--------------+

Related

MySQL- Insert single escape character into MySQL JSON field

In dealing with the headache of the different rulesets with TEXT escaping and JSON escaping, I've come across the issue where double escaping is required to convert a string to a JSON literal. For example, the original UPDATE looks like this:
UPDATE sourcing_item_data SET data_JSON='{"test": "test \ test"}' WHERE ID = 1;
The above simply removes the '\'.
The problem is I can't see how we get a single backslash into the system. Using two \'s causes the Invalid JSON error. Using three \'s does the same. Using four \'s puts in two \'s.
How does one get a single backslash into a JSON literal from a string with MySQL?
Also, has anyone written a SP or Function that scans a string that's supposed to be converted to MySQL JSON to ensure the string is "scrubbed" for issues (such as this one)?
Thanks!
Four backslashes works.
UPDATE sourcing_item_data SET data_JSON='{"test": "test \\\\ test"}' WHERE ID = 1;
You need to double the backslash to escape it in JSON, and then double each of those to escape in the SQL string.
If you print the JSON value it will show up as two backslashes, but that's because it shows the value in JSON format, which means that the backslash has to be escaped. If you extract the value and unquote it, there will just be one backslash.
select data_JSON->>"$.test" as result
from sourcing_item_data
WHERE id = 1;
shows test \ test
DEMO

Postgres row_to_json produces invalid JSON with double escaped quotes

Postgres escapes quotes incorrectly when creating a JSON export. Note the double quotes in the below update...
UPDATE models SET column='"hello"' WHERE id=1;
COPY (SELECT row_to_json(models)
FROM (SELECT column FROM shaders WHERE id=1) shaders)
TO '/output.json';
The contents of output.json:
{"column":"\\"hello\\""}
You can see that the quotes are escaped improperly and it creates invalid JSON.
It should be:
{"column":"\"hello\""}
How can I fix this Postgres bug or work around it?
This is not JSON related. It's about the way text format (default) in COPY command handles backslashes. From the PostgreSQL documentation - COPY:
Backslash characters (\) can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character.
(Emphasis mine.)
You can solve it by using CSV-format and changing the quote character from doublequote to something else.
To demonstrate:
SELECT row_to_json(row('"hello"'))
| "{"f1":"\"hello\""}" |
COPY (SELECT row_to_json(row('"hello"'))) TO '/output.json';
| {"f1":"\\"hello\\""} |
COPY (SELECT row_to_json(row('"hello"'))) TO '/output.json' CSV QUOTE '$';
| {"f1":"\"hello\""} |
The answer by Simo Kivistö works if you are certain that the character $, or whatever the special quote character you chose does not appear in your strings. In my case, I had to export a very large table and there was no particular character which didn't appear in the strings.
To work around this issue, I piped the output of the COPY command to sed to revert the double escaping of quotes:
psql -c "COPY (SELECT row_to_json(t) from my_table as t) to STDOUT;" |
sed 's/\\"/\"/g' > my_table.json
The sed expression I am piping to simply replaces occurrences of \\" with \".

Select special characteres mysql

I need to make selects from fields that can contain special characteres for example
+--------------+
| code |
+--------------+
| **4058947"_\ |
| **4123/"_\ |
| sew'-8947"_\ |
+--------------+
i try this
select code from table where code REGEXP '[(|**4058947"_\|)]';
select code from table where code REGEXP '[(**4058947"_\)]';
select code from table where code REGEXP '^[(**4058947"_\)]';
but the querys return all rows and this query return empty
select code from table where code REGEXP '^[(**4058947"_\)]$';
and i need that only return the first one or the specified
To select only one row, you could just do this if it doesn't matter which one.
SELECT code FROM table LIMIT 1
If it does matter, drop the regex.
SELECT code FROM table WHERE code = "**4058947\"_\\"
To match those special characters (in this case, " and \), you need to "escape" them. (That's how it's called. I didn't make that up.) In most mainstream languages this is done by putting a backslash in front of it (MySQL does it this way too). The backslash is the escape character, a backslash with another character behind it is called an escape sequence. As you see, I escaped the quote and the backslash in the code value I want to match, so it should work now.
If you need to keep the regexes (which I hope is not the case, since you have the literal string you want to match against) same thing applies. Escape quotes and backslashes and you'll be fine, if you drop the parentheses and brackets. Note that in a regex, you need to escape far more characters. This is because some characters (for example: | [] () * + have a special function in a regex. This is very handy, but becomes a bit of a problem when you need to match a string with that character in it. In that case, you need to escape it, but with a double backslash! This is because MySQL first parses the query and will throw an error if it encounters an invalid escape sequence (that is, if you escape a character you needn't escape according to MySQL). Only then is the result parsed as a regex, with the double backslashes replaced by single backslashes. This gets ugly very quickly, since this means matching a backslash with a MySQL regex requires 4 backslashes! Two in the regex, but this needs to be doubled, since MySQL parses it as a string first!

Why is using DBI's variable-binding causing a MySQL query to fail?

For some reason, using DBI's bind parameter feature for the below AES key is causing a query to fail to find any rows.
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect('dbi:mysql:database=thedb;host=localhost');
my $aes_key = 'X`ku#wC_BI\SgY[S%/<iaB>&VXd5zDA+';
print length($aes_key), "\n";
my $test = $dbh->selectrow_hashref("SELECT COUNT(*) FROM users WHERE id = ?\
AND AES_DECRYPT(enc_pass, '$aes_key') IS NOT NULL", undef, 1);
print $test->{'COUNT(*)'}, "\n";
$test = $dbh->selectrow_hashref("SELECT COUNT(*) FROM users WHERE id = ?\
AND AES_DECRYPT(enc_pass, ?) IS NOT NULL", undef, 1, $aes_key);
print $test->{'COUNT(*)'}, "\n";
Output:
32
1
0
I see there's an escaped "S" in $aes_key, but it doesn't appear to have any impact on the variable since \S isn't a valid escape sequence in Perl. I do suspect that or something similar is the problem, though.
When you bind a variable to a placeholder, MySQL uses exactly what is in the Perl variable. When you interpolate a variable into a SQL statement, MySQL treats it as a string literal.
MySQL handles unknown backslash escapes by removing the backslash. As MySQL string literals, '\S' and 'S' are equivalent. When you use placeholders, '\S' in a Perl variable is equivalent to '\\S' as a MySQL string literal.
It appears that you wound up storing the key incorrectly in the database by using a string literal, so now it can't be found when you use a placeholder. I'll bet that if you change the line where you initialize $aes_key to
my $aes_key = 'X`ku#wC_BISgY[S%/<iaB>&VXd5zDA+'; # note missing backslash
then the results would change to
31
1
1
because that's the key that MySQL has actually been using.
I see there's an escaped "S" in $aes_key, but it doesn't appear to have any impact on the variable since \S isn't a valid escape sequence in Perl. I do suspect that or something similar is the problem, though.
It is and it isn't. With DBI placeholders, there's no interpretation of escape sequences at all.
The problem is that mysql has been interpreting your escape sequences when you pasted the key into SQL:
mysql> select 'X`ku#wC_BI\SgY[S%/<iaB>&VXd5zDA+', length('X`ku#wC_BI\SgY[S%/<iaB>&VXd5zDA+') as len;
+---------------------------------+-----+
| X`ku#wC_BISgY[S%/<iaB>&VXd5zDA+ | len |
+---------------------------------+-----+
| X`ku#wC_BISgY[S%/<iaB>&VXd5zDA+ | 31 |
+---------------------------------+-----+
1 row in set (0.00 sec)
See? No backslash (if you put a backslash before a character that has no special meaning, you just get the same character, but without the backslash). So when you pass the key into mysql correctly it doesn't work, because you were using it incorrectly before.

Remove Quotes and Commas from a String in MySQL

I'm importing some data from a CSV file, and numbers that are larger than 1000 get turned into 1,100 etc.
What's a good way to remove both the quotes and the comma from this so I can put it into an int field?
Edit:
The data is actually already in a MySQL table, so I need to be able to this using SQL. Sorry for the mixup.
My guess here is that because the data was able to import that the field is actually a varchar or some character field, because importing to a numeric field might have failed. Here was a test case I ran purely a MySQL, SQL solution.
The table is just a single column (alpha) that is a varchar.
mysql> desc t;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| alpha | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
Add a record
mysql> insert into t values('"1,000,000"');
Query OK, 1 row affected (0.00 sec)
mysql> select * from t;
+-------------+
| alpha |
+-------------+
| "1,000,000" |
+-------------+
Update statement.
mysql> update t set alpha = replace( replace(alpha, ',', ''), '"', '' );
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from t;
+---------+
| alpha |
+---------+
| 1000000 |
+---------+
So in the end the statement I used was:
UPDATE table
SET field_name = replace( replace(field_name, ',', ''), '"', '' );
I looked at the MySQL Documentation and it didn't look like I could do the regular expressions find and replace. Although you could, like Eldila, use a regular expression for a find and then an alternative solution for replace.
Also be careful with s/"(\d+),(\d+)"/$1$2/ because what if the number has more then just a single comma, for instance "1,000,000" you're going to want to do a global replace (in perl that is s///g). But even with a global replace the replacement starts where you last left off (unless perl is different), and would miss the every other comma separated group. A possible solution would be to make the first (\d+) optional like so s/(\d+)?,(\d+)/$1$2/g and in this case I would need a second find and replace to strip the quotes.
Here are some ruby examples of the regular expressions acting on just the string "1,000,000", notice there are NOT double quote inside the string, this is just a string of the number itself.
>> "1,000,000".sub( /(\d+),(\d+)/, '\1\2' )
# => "1000,000"
>> "1,000,000".gsub( /(\d+),(\d+)/, '\1\2' )
# => "1000,000"
>> "1,000,000".gsub( /(\d+)?,(\d+)/, '\1\2' )
# => "1000000"
>> "1,000,000".gsub( /[,"]/, '' )
# => "1000000"
>> "1,000,000".gsub( /[^0-9]/, '' )
# => "1000000"
Here is a good case for regular expressions. You can run a find and replace on the data either before you import (easier) or later on if the SQL import accepted those characters (not nearly as easy). But in either case, you have any number of methods to do a find and replace, be it editors, scripting languages, GUI programs, etc. Remember that you're going to want to find and replace all of the bad characters.
A typical regular expression to find the comma and quotes (assuming just double quotes) is: (Blacklist)
/[,"]/
Or, if you find something might change in the future, this regular expression, matches anything except a number or decimal point. (Whitelist)
/[^0-9\.]/
What has been discussed by the people above is that we don't know all of the data in your CSV file. It sounds like you want to remove the commas and quotes from all of the numbers in the CSV file. But because we don't know what else is in the CSV file we want to make sure that we don't corrupt other data. Just blindly doing a find/replace could affect other portions of the file.
You could use this perl command.
Perl -lne 's/[,|"]//; print' file.txt > newfile.txt
You may need to play around with it a bit, but it should do the trick.
Here's the PHP way:
$stripped = str_replace(array(',', '"'), '', $value);
Link to W3Schools page
Actually nlucaroni, your case isn't quite right. Your example doesn't include double-quotes, so
id,age,name,...
1,23,phil,
won't match my regex. It requires the format "XXX,XXX". I can't think of an example of when it will match incorrectly.
All the following example won't include the deliminator in the regex:
"111,111",234
234,"111,111"
"111,111","111,111"
Please let me know if you can think of a counter-example.
Cheers!
The solution to the changed question is basically the same.
You will have to run select query with the regex where clause.
Somthing like
Select *
FROM SOMETABLE
WHERE SOMEFIELD REGEXP '"(\d+),(\d+)"'
Foreach of these rows, you want to do the following regex substitution s/"(\d+),(\d+)"/$1$2/ and then update the field with the new value.
Please Joseph Pecoraro seriously and have a backup before doing mass changes to any files or databases. Because whenever you do regex, you can seriously mess up data if there are cases that you have missed.
My command does remove all ',' and '"'.
In order to convert the sting "1,000" more strictly, you will need the following command.
Perl -lne 's/"(\d+),(\d+)"/$1$2/; print' file.txt > newfile.txt
Daniel's and Eldila's answer have one problem: They remove all quotes and commas in the whole file.
What I usually do when I have to do something like this is to first replace all separating quotes and (usually) semicolons by tabs.
Search: ";"
Replace: \t
Since I know in which column my affected values will be I then do another search and replace:
Search: ^([\t]+)\t([\t]+)\t([0-9]+),([0-9]+)\t
Replace: \1\t\2\t\3\4\t
... given the value with the comma is in the third column.
You need to start with an "^" to make sure that it starts at the beginning of a line. Then you repeat ([0-9]+)\t as often as there are columns that you just want to leave as they are.
([0-9]+),([0-9]+) searches for values where there is a number, then a comma and then another number.
In the replace string we use \1 and \2 to just keep the values from the edited line, separating them with \t (tab). Then we put \3\4 (no tab between) to put the two components of the number without the comma right after each other. All values after that will be left alone.
If you need your file to have semicolon to separate the elements, you then can go on and replace the tabs with semicolons. However then - if you leave out the quotes - you'll have to make sure that the text values do not contain any semicolons themselves. That's why I prefer to use TAB as column separator.
I usually do that in an ordinary text editor (EditPlus) that supports RegExp, but the same regexps can be used in any programming language.