i'm working on a SQL procedure where i came across a Regular expression which contains this piece -
REGEXP '\\s*count\\s*\\('
i do not understand why there are two backslash ? what can be inferred from this code.
Backslash is processed at multiple levels.
It's the escape prefix in regular expressions: it makes special characters like . and ( be treated literally, and is used to created escape sequences like \s (which means any whitespace character).
But it's also the escape prefix in string literals, used for things like \n (newline) and \b (backspace). So in order for the regular expression character to get a literal backslash, you need to escape the backslash itself.
This is common in many programming languages, although a few have "raw string literals" where escape sequences are not processed, specifically to avoid having to double the slashes so much.
Related
I have a valid regex
(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]#!\$&'\(\)\*\+,;=.]+
This is available at and can be validated at
https://www.regextester.com/94502
Now I a trying to create a JSON in which the above expression is used as a value.
{
"regex": "^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]#!\$&'\(\)\*\+,;=.]+$"
}
This can be verified at
https://jsonlint.com/
It turns out to be a invalid json. What is wrong with the above json?
The quoted string on the right of "regex": contains the character sequences
\. \w \] \+ \( \)
and each of these is not valid in a JSON string - See http://json.org/ for a brief and "visual" explanation of the grammar.
To represent the given regex as a valid JSON string, each backslash has to be doubled (i.e. replace \ by \\, much like in other languages like PHP, C/C++ etc.), so the relevant line should become something like
"regex": "^(?:http(s)?:\\/\\/)?[\\w.-]+ ...
The special characters need to escape according to the http://www.json.com. And you don't need to espace single-quote ' but It is a bad practice to use single-quote. For new readers, please use double-quote, It will let you get rid of many headaches.
\b Backspace (ASCII code 08)
\f Form feed (ASCII code 0C)
\n Newline
\r Carriage return
\t Tab
\" Double quote
\\ Backslash character
Todo:
Change single quote to double quote.
Apply the Characters given above to escape special characters.
When I create a string containing backslashes, they get duplicated:
>>> my_string = "why\does\it\happen?"
>>> my_string
'why\\does\\it\\happen?'
Why?
What you are seeing is the representation of my_string created by its __repr__() method. If you print it, you can see that you've actually got single backslashes, just as you intended:
>>> print(my_string)
why\does\it\happen?
The string below has three characters in it, not four:
>>> 'a\\b'
'a\\b'
>>> len('a\\b')
3
You can get the standard representation of a string (or any other object) with the repr() built-in function:
>>> print(repr(my_string))
'why\\does\\it\\happen?'
Python represents backslashes in strings as \\ because the backslash is an escape character - for instance, \n represents a newline, and \t represents a tab.
This can sometimes get you into trouble:
>>> print("this\text\is\not\what\it\seems")
this ext\is
ot\what\it\seems
Because of this, there needs to be a way to tell Python you really want the two characters \n rather than a newline, and you do that by escaping the backslash itself, with another one:
>>> print("this\\text\is\what\you\\need")
this\text\is\what\you\need
When Python returns the representation of a string, it plays safe, escaping all backslashes (even if they wouldn't otherwise be part of an escape sequence), and that's what you're seeing. However, the string itself contains only single backslashes.
More information about Python's string literals can be found at: String and Bytes literals in the Python documentation.
As Zero Piraeus's answer explains, using single backslashes like this (outside of raw string literals) is a bad idea.
But there's an additional problem: in the future, it will be an error to use an undefined escape sequence like \d, instead of meaning a literal backslash followed by a d. So, instead of just getting lucky that your string happened to use \d instead of \t so it did what you probably wanted, it will definitely not do what you want.
As of 3.6, it already raises a DeprecationWarning, although most people don't see those. It will become a SyntaxError in some future version.
In many other languages, including C, using a backslash that doesn't start an escape sequence means the backslash is ignored.
In a few languages, including Python, a backslash that doesn't start an escape sequence is a literal backslash.
In some languages, to avoid confusion about whether the language is C-like or Python-like, and to avoid the problem with \Foo working but \foo not working, a backslash that doesn't start an escape sequence is illegal.
I want to save a mediumtext data in my table, here's my code;
concat('{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}}\viewkind4\uc1\pard\fs18','1','\par }')
it's supposed to be a rtf, but when I run, this is what happen,
{
tf1ansiansicpg1252deff0deflang1033{fonttbl{f0fnilfcharset0 Arial;}}viewkind4uc1pardfs181par }
it should be like this:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}}\viewkind4\uc1\pard\fs181\par }
the '\' mark disappear, does anyone know how to do it?
The backslash (\) is used as an escape character: it is a statement that the following character should be handled in a special way. \r, for instance, is read as a carriage return, which would explain the newline at the beginning of your result. Many of your backslashes are apparently ignored due to the character after it not meaning anything special.
Use a double backslash (\\) where you want a literal backslash. The result will be a single backslash in your output. It works this way because the first backslash is escaping the second, saying that it should be treated specially as a literal backslash.
Similar to what Jonathan said, I have this solution:
CONCAT('{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\fnil\\fcharset0 Arial;}}\\viewkind4\\uc1\\pard\\fs18','1','\\par }')
Use this as string to insert.
Here's the SQLFiddle
When i am reading some text from an Xml and putting it to the database i am getting a error if the text contains '(apostrophe) . How to overcome this problem while i am inserting into the DB.
Where ever the apostrophe is add a \ (backslash) before it. Using your text editor to do a find and replace all for ' to \' should work. BE CAREFUL not to mess up the XML structure.
ex.
John's
Needs to be
John\'s
You can also use PHP or C# to escape it for you also.
Here is the PHP function.
You should always escape you input data with whatever language-specific methodology you have for doing so. For MySQL the escape character is \. Another alternative is to use prepared statements with parametrized inputs. This would eliminate the need to escape the single apostrophe.
My guess is that you also have a significant SQL injection vulnerability with the way you are doing things. If you are not even escaping your input values or using parametrized prepared statements, then one could easily inject malicious code into the XML.
I don't know what language or method you're using for your import. Typically it's the backslash character, \ . So you would need to replace 's with \'. Make sense?
From 9.1.1 String Literals in the MySQL reference:
There are several ways to include quote characters within a string:
A “'” inside a string quoted with “'” may be written as “''”.
A “"” inside a string quoted with “"” may be written as “""”.
Precede the quote character by an escape character (“\”).
A “'” inside a string quoted with “"” needs no special treatment and need not be doubled or escaped. In the same way, “"” inside a string quoted with “'” needs no special treatment.
Thus, the given string foo'bar can be written in MySQL1:
'foo''bar'
'foo\'bar'
"foo'bar"
While the above addresses the primary question, use placeholders (aka prepared statements) - look up the correct method per language/adapter. Placeholders eliminate the need to quote (which prevents mistakes introduced by custom logic) and prevent against many cases of malicious SQL injection.
1 The syntax chosen by MySQL differs from other common SQL implementations; this syntax is not universal.
I have two databases. One has apostrophe in names like O'Bannon and one does not. I need to merge them and find the duplicates. Since it's harder to add the apostrophes I'm tring to remove them instead
But this...
UPDATE Client
SET Last_Name = REPLACE(''','')
Clearly won't work. How does one escape the '.
I'm using Xojo (not PHP)
Like you say, you'll want to escape quote characters.
See this documentation on string literals:
There are several ways to include quote characters within a string:
A “'” inside a string quoted with “'” may be written as “''”.
A “"” inside a string quoted with “"” may be written as “""”.
Precede the quote character by an escape character (“\”).
A “'” inside a string quoted with “"” needs no special treatment and need not be doubled or escaped. In the same way, “"” inside a string quoted with “'” needs no special treatment.
Depending on how you're dealing with the SQL, though, you may need to do more than that. If the application is escaping the quote character, and passing that to a stored procedure call, you may run into the same issue if you are not using parameter binding with prepared statements. This is due to MySQL removing the escape character upon processing the inputs of the SP. Then the unsantized character makes its way to the query construction and the problem repeats itself if it should be escaped there. In this case, you'll want to switch to parameter binding, so that the escaping and query construction is out of your hands.
Here we go:
UPDATE Client SET Last_Name = REPLACE(Last_Name, '\'', '');
You just need to escape apostrophe will backslash .
Simply add an escape character(\) in front of the quote:
SET Last_Name = REPLACE('\'','')
Still I don't think this is the right way to go as you will lose the information for the original name of the person and so o'reily and oreily will seem to be the same surname to you.
From 9.1.1 String Literals
Table 9.1. Special Character Escape Sequences
Escape Sequence Character Represented by Sequence
\0 An ASCII NUL (0x00) character.
\' A single quote (“'”) character.
\" A double quote (“"”) character.
\b A backspace character.
\n A newline (linefeed) character.
\r A carriage return character.
\t A tab character.
\Z ASCII 26 (Control+Z). See note following the table.
\\ A backslash (“\”) character.
\% A “%” character. See note following the table.
\_ A “_” character. See note following the table.
Of course if ANSI_MODE is not enabled you could use double quotes
If in case you are just looking to select, i.e., to match a field with data containing apostrophe.
SELECT PhraseId FROM Phrase WHERE Text = REPLACE("don't", "\'", "''")