Chrome console backslash with numbers - google-chrome

Why is chrome console converting numbers preceeded by backslash to unicode or in some cases - to special characters like "\b, \t, \f" etc.? In case of \8 and \9 it is even returning just the numbers itself? What is the logic behind this?
You can test it by opening console in chrome and just typing strings.
Here are some examples:
"\0" - outputs "\u0000"
"\1" - outputs "\u0001"
"\2" - outputs "\u0002"
"\3" - outputs "\u0003"
"\4" - outputs "\u0004"
"\5" - outputs "\u0005"
"\6" - outputs "\u0006"
"\7" - outputs "\u0007"
"\8" - outputs "8" ???
"\9" - outputs "9" ???
"\10" - outputs "\b"
"\11" - outputs "\t"
"\12" - outputs "\n"
"\13" - outputs "\u000b"
"\14" - outputs "\f"
"\15" - outputs "\r"
"\16" - outputs "\u000e"
"\17" - outputs "\u000f"
"\18" - outputs "\u00018"
"\19" - outputs "\u00019"
"\20" - outputs "\u00010"

This is problably because \8 and \9 are invalid octal escape sequences in javascript.
\[0-7], \[0-7][0-7], and \[0-7][0-7][0-7] are octal escape sequences, which is obvious at \10 (octal) == \u0008 (hex/unicode) == \b (backspace).
For me, the strange thing is that \18 is not treated as an invalid octal escape sequence, but as hexadecimal...

Related

com.univocity.parsers.common.TextParsingException

Trying to read the below data from a CSV results in a com.univocity.parsers.common.TextParsingException exception:
B1456741975-266,"","{""m"": {""difference"": 60}}","","","",2022-02-04T17:03:59.566Z
B1789753479-460,"","",",","","",2022-02-18T14:46:57.332Z
B1456741977-123,"","{""m"": {""difference"": 60}}","","","",2022-02-04T17:03:59.566Z
Here's the Pyspark (3.1.2) code used to read the data:
from pyspark.sql.dataframe import DataFrame
df = (spark.read.format("com.databricks.spark.csv")
.option("inferSchema", "true")
.option("header","false")
.option("multiline","true")
.option("quote",'"')
.option("escape",'\"')
.option("delimiter",",")
.option("unescapedQuoteHandling", "RAISE_ERROR")
.load('/mnt/source/analysis/error_in_csv.csv'))
This is the exception that I'm getting.
Caused by: com.univocity.parsers.common.TextParsingException: com.univocity.parsers.common.TextParsingException - Unescaped quote character '"' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input.
Internal state when error was thrown: line=2, column=3, record=1, charIndex=165, headers=[B1456741975-266, , {"m": {"difference": 60}}, , , , 2022-02-04T17:03:59.566Z]
Parser Configuration: CsvParserSettings:
Auto configuration enabled=true
Auto-closing enabled=true
Autodetect column delimiter=false
Autodetect quotes=false
Column reordering enabled=true
Delimiters for detection=null
Empty value=
Escape unquoted values=false
Header extraction enabled=null
Headers=null
Ignore leading whitespaces=false
Ignore leading whitespaces in quotes=false
Ignore trailing whitespaces=false
Ignore trailing whitespaces in quotes=false
Input buffer size=1048576
Input reading on separate thread=false
Keep escape sequences=false
Keep quotes=false
Length of content displayed on error=1000
Line separator detection enabled=true
Maximum number of characters per column=-1
Maximum number of columns=20480
Normalize escaped line separators=true
Null value=
Number of records to read=all
Processor=none
Restricting data in exceptions=false
RowProcessor error handler=null
Selected fields=field selection: []
Skip bits as whitespace=true
Skip empty lines=true
Unescaped quote handling=RAISE_ERRORFormat configuration:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\n
Quote character="
Quote escape character="
Quote escape escape character=null
Internal state when error was thrown: line=2, column=3, record=1, charIndex=165, headers=[B1456741975-266, , {"m": {"difference": 60}}, , , , 2022-02-04T17:03:59.566Z]
at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:402)
at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:623)
at org.apache.spark.sql.catalyst.csv.UnivocityParser$$anon$1.next(UnivocityParser.scala:389)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at scala.collection.TraversableOnce$FlattenOps$$anon$2.hasNext(TraversableOnce.scala:469)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:335)
... 33 more
Caused by: com.univocity.parsers.common.TextParsingException: Unescaped quote character '"' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input.
Internal state when error was thrown: line=2, column=3, record=1, charIndex=165, headers=[B1456741975-266, , {"m": {"difference": 60}}, , , , 2022-02-04T17:03:59.566Z]
at com.univocity.parsers.csv.CsvParser.handleValueSkipping(CsvParser.java:241)
at com.univocity.parsers.csv.CsvParser.handleUnescapedQuote(CsvParser.java:319)
at com.univocity.parsers.csv.CsvParser.parseQuotedValue(CsvParser.java:393)
at com.univocity.parsers.csv.CsvParser.parseSingleDelimiterRecord(CsvParser.java:177)
at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:109)
at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:581)
... 39 more
Can someone please advise? It looks like the quoted delimiter in the second line is causing this. Is there a way to avoid it without changing the source data itself?

regex pattern starting with "=" and exceptions

I'm trying to write an expression which will be used with json files for a vscode extension. My expression should start with "=\s*" and then I want it to select everything after the equal except for the following cases:
TRUE or FALSE after the equality
starting with digit
starting with ' or "
I have tried many things and separately each case I manage to make it work but when I try to put it all together, it doesn't work
Example of doc strings:
abc = test
abc = TRUE
abc = FALSE
abc = "test"
abc = 'test'
abc = 123
Out of these examples my regex should only keep the very first one and "test" can be anything.
What was the closest to the solution was this one /(=\s*)^(((?!TRUE|FALSE|[0-9]|\"|\').)*)$/gm
You can use
Find what: ^(.*?=)(?!\s*(?:TRUE|FALSE|[0-9"'])).*
Replace With: $1
Details:
^ - start of a line
(.*?=) - Group 1: any zero or more chars other than line break chars, as few as possible and then a = char
(?!\s*(?:TRUE|FALSE|[0-9"'])) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
\s* - zero or more whitespace
(?:TRUE|FALSE|[0-9"']) - TRUE, FALSE, digit or " or '
.* - the rest of the line.
See the regex demo and the demo screenshot:

LISP: how to properly encode a slash ("/") with cl-json?

I have code that uses the cl-json library to add a line, {"main" : "build/electron.js"} to a package.json file:
(let ((package-json-pathname (merge-pathnames *app-pathname* "package.json")))
(let
((new-json (with-open-file (package-json package-json-pathname :direction :input :if-does-not-exist :error)
(let ((decoded-package (json:decode-json package-json)))
(let ((main-entry (assoc :main decoded-package)))
(if (null main-entry)
(push '(:main . "build/electron.js") decoded-package)
(setf (cdr main-entry) "build/electron.js"))
decoded-package)))))
(with-open-file (package-json package-json-pathname :direction :output :if-exists :supersede)
(json:encode-json new-json package-json))
)
)
The code works, but the result has an escaped slash:
"main":"build\/electron.js"
I'm sure this is a simple thing, but no matter which inputs I try -- "//", "/", "#//" -- I still get the escaped slash.
How do I just get a normal slash in my output?
Also, I'm not sure if there's a trivial way for me to get pretty-printed output, or if I need to write a function that does this; right now the output prints the entire package.json file to a single line.
Special characters
The JSON Spec indicates that "Any character may be escaped.", but some of them MUST be escaped: "quotation mark, reverse solidus, and the control characters". The linked section is followed by a grammar that show "solidus" (/) in the list of escaped characters. I don't think it is really important in practice (typically it needs not be escaped), but that may explain why the library escapes this character.
How to avoid escaping
cl-json relies on an internal list of escaped characters named +json-lisp-escaped-chars+, namely:
(defparameter +json-lisp-escaped-chars+
'((#\" . #\")
(#\\ . #\\)
(#\/ . #\/)
(#\b . #\Backspace)
(#\f . #\)
(#\n . #\Newline)
(#\r . #\Return)
(#\t . #\Tab)
(#\u . (4 . 16)))
"Mapping between JSON String escape sequences and Lisp chars.")
The symbol is not exported, but you can still refer to it externally with ::. You can dynamically rebind the parameter around the code that needs to use a different list of escaped characters; for example, you can do as follows:
(let ((cl-json::+json-lisp-escaped-chars+
(remove #\/ cl-json::+json-lisp-escaped-chars+ :key #'car)))
(cl-json:encode-json-plist '("x" "1/5")))
This prints:
{"x":"1/5"}

Json Files parsing

So I am trying to open some json files to look for a publication year and sort them accordingly. But before doing this, I decided to experiment on a single file. I am having trouble though, because although I can get the files and the strings, when I try to print one word, it starts printinf the characters.
For example:
print data2[1] #prints
THE BRIDES ORNAMENTS, Viz. Fiue MEDITATIONS, Morall and Diuine. #results
but now
print data2[1][0] #should print THE
T #prints T
This is my code right now:
json_data =open(path)
data = json.load(json_data)
i=0
data2 = []
for x in range(0,len(data)):
data2.append(data[x]['section'])
if len(data[x]['content']) > 0:
for i in range(0,len(data[x]['content'])):
data2.append(data[x]['content'][i])
I probably need to look at your json file to be absolutely sure, but it seems to me that the data2 list is a list of strings. Thus, data2[1] is a string. When you do data2[1][0], the expected result is what you are getting - the character at the 0th index in the string.
>>> data2[1]
'THE BRIDES ORNAMENTS, Viz. Fiue MEDITATIONS, Morall and Diuine.'
>>> data2[1][0]
'T'
To get the first word, naively, you can split the string by spaces
>>> data2[1].split()
['THE', 'BRIDES', 'ORNAMENTS,', 'Viz.', 'Fiue', 'MEDITATIONS,', 'Morall', 'and', 'Diuine.']
>>> data2[1].split()[0]
'THE'
However, this will cause issues with punctuation, so you probably need to tokenize the text. This link should help - http://www.nltk.org/_modules/nltk/tokenize.html

What characters have to be escaped to prevent (My)SQL injections?

I'm using MySQL API's function
mysql_real_escape_string()
Based on the documentation, it escapes the following characters:
\0
\n
\r
\
'
"
\Z
Now, I looked into OWASP.org's ESAPI security library and in the Python port it had the following code (http://code.google.com/p/owasp-esapi-python/source/browse/esapi/codecs/mysql.py):
"""
Encodes a character for MySQL.
"""
lookup = {
0x00 : "\\0",
0x08 : "\\b",
0x09 : "\\t",
0x0a : "\\n",
0x0d : "\\r",
0x1a : "\\Z",
0x22 : '\\"',
0x25 : "\\%",
0x27 : "\\'",
0x5c : "\\\\",
0x5f : "\\_",
}
Now, I'm wondering whether all those characters are really needed to be escaped. I understand why % and _ are there, they are meta characters in LIKE operator, but I can't simply understand why did they add backspace and tabulator characters (\b \t)? Is there a security issue if you do a query:
SELECT a FROM b WHERE c = '...user input ...';
Where user input contains tabulators or backspace characters?
My question is here: Why did they include \b \t in the ESAPI security library? Are there any situations where you might need to escape those characters?
A guess concerning the backspace character: Imagine I send you an email "Hi, here's the query to update your DB as you wanted" and an attached textfile with
INSERT INTO students VALUES ("Bobby Tables",12,"abc",3.6);
You cat the file, see it's okay, and just pipe the file to MySQL. What you didn't know, however, was that I put
DROP TABLE students;\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b
before the INSERT STATEMENT which you didn't see because on console output the backspaces overwrote it. Bamm!
Just a guess, though.
Edit (couldn't resist):
The MySQL manual page for strings says:
\0   An ASCII NUL (0x00) character.
\'   A single quote (“'”) character.
\"   A double quote (“"”) character.
\b   A backspace character.
\n   A newline (linefeed) character.
\r   A carriage return character.
\t   A tab character.
\Z   ASCII 26 (Control-Z). See note following the table.
\\   A backslash (“\”) character.
\%   A “%” character. See note following the table.
\_   A “_” character. See note following the table.
Blacklisting (identifying bad characters) is never the way to go, if you have any other options.
You need to use a conbination of whitelisting, and more importantly, bound-parameter approaches.
Whilst this particular answer has a PHP focus, it still helps plenty and will help explain that just running a string through a char filter doesn't work in many cases. Please, please see Do htmlspecialchars and mysql_real_escape_string keep my PHP code safe from injection?
Where user input contains tabulators or backspace characters?
It's quite remarkable a fact that up to this day most users do believe that it's user input have to be escaped, and such escaping "prevents injections".
Java solution:
public static String filter( String s ) {
StringBuffer buffer = new StringBuffer();
int i;
for( byte b : s.getBytes() ) {
i = (int) b;
switch( i ) {
case 9 : buffer.append( " " ); break;
case 10 : buffer.append( "\\n" ); break;
case 13 : buffer.append( "\\r" ); break;
case 34 : buffer.append( "\\\"" ); break;
case 39 : buffer.append( "\\'" ); break;
case 92 : buffer.append( "\\" );
if( i > 31 && i < 127 ) buffer.append( new String( new byte[] { b } ) );
}
}
return buffer.toString();
}
couldn't one just delete the single quote(s) from user input?
eg: $input =~ s/\'|\"//g;