I see a weird behavior with double quotes with json.loads(). In the code given below x prints fine.
I want to understand reason for error when I print the value of y.
Why is 'a' printed inside single quotes when its actually inside double quotes.
import json
x = '[["a"]]'
y = "[['b']]"
print(json.loads(x))
print(json.loads(y))
Output
[['a']]
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 3 (char 2)
The JSON specification does not allow you to use single quotes for strings as explained here:
https://www.w3schools.com/js/js_json_syntax.asp
Related
Trying to read the below data from a CSV results in a com.univocity.parsers.common.TextParsingException exception:
B1456741975-266,"","{""m"": {""difference"": 60}}","","","",2022-02-04T17:03:59.566Z
B1789753479-460,"","",",","","",2022-02-18T14:46:57.332Z
B1456741977-123,"","{""m"": {""difference"": 60}}","","","",2022-02-04T17:03:59.566Z
Here's the Pyspark (3.1.2) code used to read the data:
from pyspark.sql.dataframe import DataFrame
df = (spark.read.format("com.databricks.spark.csv")
.option("inferSchema", "true")
.option("header","false")
.option("multiline","true")
.option("quote",'"')
.option("escape",'\"')
.option("delimiter",",")
.option("unescapedQuoteHandling", "RAISE_ERROR")
.load('/mnt/source/analysis/error_in_csv.csv'))
This is the exception that I'm getting.
Caused by: com.univocity.parsers.common.TextParsingException: com.univocity.parsers.common.TextParsingException - Unescaped quote character '"' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input.
Internal state when error was thrown: line=2, column=3, record=1, charIndex=165, headers=[B1456741975-266, , {"m": {"difference": 60}}, , , , 2022-02-04T17:03:59.566Z]
Parser Configuration: CsvParserSettings:
Auto configuration enabled=true
Auto-closing enabled=true
Autodetect column delimiter=false
Autodetect quotes=false
Column reordering enabled=true
Delimiters for detection=null
Empty value=
Escape unquoted values=false
Header extraction enabled=null
Headers=null
Ignore leading whitespaces=false
Ignore leading whitespaces in quotes=false
Ignore trailing whitespaces=false
Ignore trailing whitespaces in quotes=false
Input buffer size=1048576
Input reading on separate thread=false
Keep escape sequences=false
Keep quotes=false
Length of content displayed on error=1000
Line separator detection enabled=true
Maximum number of characters per column=-1
Maximum number of columns=20480
Normalize escaped line separators=true
Null value=
Number of records to read=all
Processor=none
Restricting data in exceptions=false
RowProcessor error handler=null
Selected fields=field selection: []
Skip bits as whitespace=true
Skip empty lines=true
Unescaped quote handling=RAISE_ERRORFormat configuration:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\n
Quote character="
Quote escape character="
Quote escape escape character=null
Internal state when error was thrown: line=2, column=3, record=1, charIndex=165, headers=[B1456741975-266, , {"m": {"difference": 60}}, , , , 2022-02-04T17:03:59.566Z]
at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:402)
at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:623)
at org.apache.spark.sql.catalyst.csv.UnivocityParser$$anon$1.next(UnivocityParser.scala:389)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at scala.collection.TraversableOnce$FlattenOps$$anon$2.hasNext(TraversableOnce.scala:469)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:335)
... 33 more
Caused by: com.univocity.parsers.common.TextParsingException: Unescaped quote character '"' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input.
Internal state when error was thrown: line=2, column=3, record=1, charIndex=165, headers=[B1456741975-266, , {"m": {"difference": 60}}, , , , 2022-02-04T17:03:59.566Z]
at com.univocity.parsers.csv.CsvParser.handleValueSkipping(CsvParser.java:241)
at com.univocity.parsers.csv.CsvParser.handleUnescapedQuote(CsvParser.java:319)
at com.univocity.parsers.csv.CsvParser.parseQuotedValue(CsvParser.java:393)
at com.univocity.parsers.csv.CsvParser.parseSingleDelimiterRecord(CsvParser.java:177)
at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:109)
at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:581)
... 39 more
Can someone please advise? It looks like the quoted delimiter in the second line is causing this. Is there a way to avoid it without changing the source data itself?
i try to construct JSON with string that contains "\n" in it like this :
ver_str= 'Package ID: version_1234\nBuild\nnumber: 154\nBuilt\n'
proj_ver_str = 'Version_123'
comb = '{"r_content": {0}, "s_version": {1}}'.format(ver_str,proj_ver_str)
json_content = json.loads()
d =json.dumps(json_content )
getting this error:
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Dev/python/new_tester/simple_main.py", line 18, in <module>
comb = '{"r_content": {0}, "s_version": {1}}'.format(ver_str,proj_ver_str)
KeyError: '"r_content"'
The error arises not because of newlines in your values, but because of { and } characters in your format string other than the placeholders {0} and {1}. If you want to have an actual { or a } character in your string, double them.
Try replacing the line
comb = '{"r_content": {0}, "s_version": {1}}'.format(ver_str,proj_ver_str)
with
comb = '{{"r_content": {0}, "s_version": {1}}}'.format(ver_str,proj_ver_str)
However, this will give you a different error on the next line, loads() missing 1 required positional argument: 's'. This is because you presumably forgot to pass comb to json.loads().
Replacing json.loads() with json.loads(comb) gives you another error: json.decoder.JSONDecodeError: Expecting value: line 1 column 15 (char 14). This tells you that you've given json.loads malformed JSON to parse. If you print out the value of comb, you see the following:
{"r_content": Package ID: version_1234
Build
number: 154
Built
, "s_version": Version_123}
This isn't valid JSON, because the string values aren't surrounded by quotes. So a JSON parsing error is to be expected.
At this point, let's take a look at what your code is doing and what you seem to want it to do. It seems you want to construct a JSON string from your data, but your code puts together a JSON string from your data, parses it to a dict and then formats it back as a JSON string.
If you want to create a JSON string from your data, it's far simpler to create a dict with your values and use json.dumps on that:
d = json.dumps({"r_content": ver_str, "s_version": proj_ver_str})
I'm trying to create a multi-line String in Scala as below.
val errorReport: String =
"""
|{
|"errorName":"blah",
|"moreError":"blah2",
|"errorMessage":{
| "status": "bad",
| "message": "Unrecognized token 'noformatting': was expecting 'null', 'true', 'false' or NaN
at [Source: (ByteArrayInputStream); line: 1, column: 25]"
| }
|}
"""
.stripMargin
It's a nested JSON and it's not displaying properly when I print it. The message field inside errorMessage (which is the output of calling getMessage on an instance of a Throwable) is causing the issue because it looks like there is a newline right before
at [Source: ....
If I get rid of that line the JSON displays properly. Any ideas on how to properly format this are appreciated.
EDIT: The issue is with the newline character. So I think the question is more concisely - how to handle the newline within the triple quotes so that it's still recognized as a JSON?
EDIT 2: message is being set by a variable like so:
"message": "${ex.getMessage}"
where ex is a Throwable. An example of the contents of that getMessage call is provided above.
I assume that your question has nothing to do with JSON, and that you're simply asking how to create very wide strings without violating the horizontal 80-character limit in your Scala code. Fortunately, Scala's string literals have at least the following properties:
You can go from ordinary code to string-literal mode using quotes "..." and triple quotes """...""".
You can go from string-literal mode to ordinary code mode using ${...}
Free monoid over characters is reified as methods, that is, there is the + operation that concatenates string literals.
The whole construction can be made robust to whitespace and indentation using | and stripMargin.
All together, it allows you to write down arbitrary string literals without ever violating horizontal character limits, in a way that is robust w.r.t. indentation.
In this particular case, you want to make a line break in the ambient scala code without introducing a line break in your text. For this, you simply
exit the string-literal mode by closing """
insert concatenation operator + in code mode
make a line-break
indent however you want
re-enter the string-literal mode again by opening """
That is,
"""blah-""" +
"""blah"""
will create the string "blah-blah", without line break in the produced string.
Applied to your concrete problem:
val errorReport: String = (
"""{
| "errorName": "blah",
| "moreError": "blah2",
| "errorMessage": {
| "status": "bad",
| "message": "Unrecognized token 'noformatting'""" +
""": was expecting 'null', 'true', 'false' or NaN at """ +
"""[Source: (ByteArrayInputStream); line: 1, column: 25]"
| }
|}
"""
).stripMargin
Maybe a more readable option would be to construct the lengthy message separately from the neatly indented JSON, and then use string interpolation to combine the two components:
val errorReport: String = {
val msg =
"""Unrecognized token 'noformatting': """ +
"""was expecting 'null', 'true', 'false' or NaN at """ +
"""[Source: (ByteArrayInputStream); line: 1, column: 25]"""
s"""{
| "errorName": "blah",
| "moreError": "blah2",
| "errorMessage": {
| "status": "bad",
| "message": "${msg}"
| }
|}
"""
}.stripMargin
If the message itself contains line breaks
Since JSON does not allow multiline string literals, you have to do something else:
To remove line breaks, use .replaceAll("\\n", "") or rather .replaceAll("\\n", " ")
To encode line breaks with the escape sequence \n, use .replaceAll("\\n", "\\\\n") (yes... backslashes...)
I have a variables that stores strings which contain both single and double quotes. For example, one variable, testing7, is the following string:
{'address_components': [{'long_name': 'Fairhope', 'short_name': 'Fairhope' 'types': ['locality', 'political']}...
I need to pass this variable through ast.literal_eval and then json.dumps, and finally json.loads. However when I try to pass it through ast.literal I get an error:
line 48, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
{'address_components': [{'long_name': 'Fairhope', 'short_name': 'Fairhope' 'types': ['locality', 'political']}...'
^
IndentationError: unexpected indent
If I copy and past the string into literal_eval it will work if enclosed with triple quotes:
#This works
ast.literal_eval('''{'address_components': [{'long_name': 'Fairhope', 'short_name': 'Fairhope' 'types': ['locality', 'political']}...''')
#This does not work
ast.literal_eval(testing7)
It seems that the error isn't related to quotes. The cause of the error is that you have space in front of the string.
For example:
>>>ast.literal_eval(" {'x':\"t\"}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/leo/miniconda3/lib/python3.6/ast.py", line 48, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/home/leo/miniconda3/lib/python3.6/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
{'x':"t"}
^
IndentationError: unexpected indent
You may want to strip the string before passing it into the function. Besides, I didn't see mixed double quotes and single quotes in your code. In a single quoted string, you may want to use \ to escape the single quote. For example:
>>> x = ' {"x":\'x\'}'.strip()
>>> x
'{"x":\'x\'}'
>>> ast.literal_eval(x)
{'x': 'x'}
Here is my code that does CSV parsing, using the text and attoparsec
libraries:
import qualified Data.Attoparsec.Text as A
import qualified Data.Text as T
-- | Parse a field of a record.
field :: A.Parser T.Text -- ^ parser
field = fmap T.concat quoted <|> normal A.<?> "field"
where
normal = A.takeWhile (A.notInClass "\n\r,\"") A.<?> "normal field"
quoted = A.char '"' *> many between <* A.char '"' A.<?> "quoted field"
between = A.takeWhile1 (/= '"') <|> (A.string "\"\"" *> pure "\"")
-- | Parse a block of text into a CSV table.
comma :: T.Text -- ^ CSV text
-> Either String [[T.Text]] -- ^ error | table
comma text
| T.null text = Right []
| otherwise = A.parseOnly table text
where
table = A.sepBy1 record A.endOfLine A.<?> "table"
record = A.sepBy1 field (A.char ',') A.<?> "record"
This works well for a variety of inputs but is not working in case that there
is a trailing \n at the end of the input.
Current behaviour:
> comma "hello\nworld"
Right [["hello"],["world"]]
> comma "hello\nworld\n"
Right [["hello"],["world"],[""]]
Wanted behaviour:
> comma "hello\nworld"
Right [["hello"],["world"]]
> comma "hello\nworld\n"
Right [["hello"],["world"]]
I have been trying to fix this issue but I ran out of idaes. I am almost
certain that it will have to be something with A.endOfInput as that is the
significant anchor and the only "bonus" information we have. Any ideas on how
to work that into the code?
One possible idea is to look at the end of the string before running the
Attoparsec parser and removing the last character (or two in case of \r\n)
but that seems to be a hacky solution that I would like avoid in my code.
Full code of the library can be found here: https://github.com/lovasko/comma