I am trying to parse a JSON from a website in order to extract some data. Somehow the JSON is "combined", meaning that the dataset for one product is described and without separating the next, the JSON simply continues. This throws an error in json.load (python3.8).
Here is the part in question:
"name": "30 stk"
}
}
} {
"#type":"Product"
jsonlint outputs the following error:
Error: Parse error on line 126:
...30 stk" } } } { "#type": "Produc
--------------------^
Expecting 'EOF', '}', ',', ']', got '{'
Full JSON:
{"#context":"https://schema.org","#type":"Product","#id":"/antistax-extra-venentabletten-bei-venenleiden-60stk-pzn-00002335","aggregateRating":{"#type":"AggregateRating","ratingValue":"4,8","reviewCount":"26"},"description":"AntistaxextraVenentablettenbeiVenenleiden60stkkaufenbeiderOnlineApothekeapo-discounter.Medikamente,Nahrungsergänzungenuvm.erhaltenSieinunsererVersandapothekezugünstigenPreisen.","name":"AntistaxextraVenentablettenbeiVenenleiden(60stk)","image":"https://www.apodiscounter.de/images/product_images/info_images/00002335_4.jpg","sku":"00002335","mpn":"00002335","productID":"00002335","category":"Venenerkrankung","brand":{"#type":"Organization","name":"Antistax"},"offers":{"#type":"Offer","availability":"https://schema.org/InStock","url":"https://www.apodiscounter.de/antistax-extra-venentabletten-bei-venenleiden-60stk-pzn-00002335","price":"27.49","priceValidUntil":"2021-01-26","priceCurrency":"EUR","category":"Filmtabletten","eligibleQuantity":"60stk","itemCondition":"NewCondition","seller":{"#type":"Organization","name":"apo-discounter.de"}},"issimilarto":[{"#type":"Product","mpn":"00002312","name":"AntistaxextraVenentablettenbeiVenenleiden","url":"https://www.apodiscounter.de/antistax-extra-venentabletten-bei-venenleiden-30stk-pzn-00002312","description":"AntistaxextraVenentablettenbeiVenenleiden30stkkaufenbeiderOnlineApothekeapo-discounter.Medikamente,Nahrungsergänzungenuvm.erhaltenSieinunsererVersandapothekezugünstigenPreisen.","image":"https://www.apodiscounter.de/images/product_images/info_images/00002312.jpg","sku":"00002312","brand":{"#type":"Organization","name":"Antistax"},"aggregateRating":{"#type":"AggregateRating","ratingValue":"5,0","reviewCount":"13"},"review":[{"#type":"Review","author":"ErnaD.","datePublished":"30.10.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"DomenicaZ.","datePublished":"04.09.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"RitaS.","datePublished":"11.07.2020","description":"kaumwirksam","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"VerifizierterKunde","datePublished":"10.04.2020","description":"Binnichtganzüberzeugt","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"VerifizierterKunde","datePublished":"06.03.2020","description":"BinpositivüberraschtvondemProdukt","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}}],"offers":{"#type":"Offer","availability":"https://schema.org/InStock","url":"https://www.apodiscounter.de/antistax-extra-venentabletten-bei-venenleiden-30stk-pzn-00002312","price":"15.99","priceValidUntil":"2021-01-26","priceCurrency":"EUR","eligibleQuantity":{"#type":"QuantitativeValue","name":"30stk"}}}{"#type":"Product","mpn":"05954715","name":"AntistaxextraVenentablettenbeiVenenschwäche","url":"https://www.apodiscounter.de/antistax-extra-venentabletten-bei-venenschwaeche-90stk-pzn-05954715","description":"AntistaxextraVenentablettenbeiVenenschwäche90stkkaufenbeiderOnlineApothekeapo-discounter.Medikamente,Nahrungsergänzungenuvm.erhaltenSieinunsererVersandapothekezugünstigenPreisen.","image":"https://www.apodiscounter.de/images/product_images/info_images/05954715.jpg","sku":"05954715","brand":{"#type":"Organization","name":"Antistax"},"aggregateRating":{"#type":"AggregateRating","ratingValue":"4,8","reviewCount":"167"},"review":[{"#type":"Review","author":"VerifizierterKunde","datePublished":"05.12.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"RoswithaT.","datePublished":"23.11.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"RalfZ.","datePublished":"22.11.2020","description":"Gutundpreiswert!","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"ReinhardS.","datePublished":"22.11.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"VerifizierterKunde","datePublished":"08.11.2020","description":"WennmanProblememitVenenhatundlangeaufdenBeinenist,hilftesdenTagbesserzumeistern...","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}}],"offers":{"#type":"Offer","availability":"https://schema.org/InStock","url":"https://www.apodiscounter.de/antistax-extra-venentabletten-bei-venenschwaeche-90stk-pzn-05954715","price":"33.49","priceValidUntil":"2021-01-26","priceCurrency":"EUR","eligibleQuantity":{"#type":"QuantitativeValue","name":"90stk"}}},{"#type":"Product","mpn":"16156023","name":"AntistaxextraVenentablettenbeiVenenleiden&Venenschwäche","url":"https://www.apodiscounter.de/antistax-extra-venentabletten-bei-venenleiden-venenschwaeche-180stk-pzn-16156023","description":"AntistaxextraVenentablettenbeiVenenleiden&Venenschwäche180stkkaufenbeiderOnlineApothekeapo-discounter.Medikamente,Nahrungsergänzungenuvm.erhaltenSieinunsererVersandapothekezugünstigenPreisen.","image":"https://www.apodiscounter.de/images/product_images/info_images/16156023.jpg","sku":"16156023","aggregateRating":{"#type":"AggregateRating","ratingValue":"5,0","reviewCount":"3"},"review":[{"#type":"Review","author":"IrinaL.","datePublished":"20.01.2021","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"AlfredS.","datePublished":"20.01.2021","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"VerifizierterKunde","datePublished":"11.01.2021","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}}],"offers":{"#type":"Offer","availability":"https://schema.org/InStock","url":"https://www.apodiscounter.de/antistax-extra-venentabletten-bei-venenleiden-venenschwaeche-180stk-pzn-16156023","price":"56.49","priceValidUntil":"2021-01-26","priceCurrency":"EUR","eligibleQuantity":{"#type":"QuantitativeValue","name":"180stk"}}}],"review":[{"#type":"Review","author":"KlausS.","datePublished":"11.12.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"4","worstRating":"1"}},{"#type":"Review","author":"OttiZ.","datePublished":"19.10.2020","description":"Zufrieden","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"MonikaM.","datePublished":"03.09.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"RitaP.","datePublished":"14.08.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"5","worstRating":"1"}},{"#type":"Review","author":"VerifizierterKunde","datePublished":"24.07.2020","description":"","name":"","reviewRating":{"#type":"Rating","bestRating":"5","ratingValue":"3","worstRating":"1"}}]}
How can I separate those two or is there a function for this? I tried .split('} {') but this did not work.
It seems you have a JSON array containing two JSON objects not separated by a comma.
You should just replace } { with }, {.
I'm attempting to read the following JSON file ("my_file.json") into R, which contains the following:
[{"id":"484","comment":"They call me "Bruce""}]
using the jsonlite package (0.9.12), the following fails:
library(jsonlite)
fromJSON(readLines('~/my_file.json'))
receiving an error:
"Error in parseJSON(txt) : lexical error: invalid char in json text.
84","comment":"They call me "Bruce""}]
(right here) ------^"
Here is the output from R escaping of the file:
readLines('~/my_file.json')
"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]"
Removing the quotes around "Bruce" solves the problem, as in:
my_file.json
[{"id":"484","comment":"They call me Bruce"}]
But what is the issue with the escapement?
In R strings literals can be defined using single or double quotes.
e.g.
s1 <- 'hello'
s2 <- "world"
Of course, if you want to include double quotes inside a string literal defined using double quotes you need to escape (using backslash) the inner quotes, otherwise the R code parser won't be able to detect the end of the string correctly (the same holds for single quote).
e.g.
s1 <- "Hello, my name is \"John\""
If you print (using cat¹) this string on the console, or you write this string on a file you will get the actual "face" of the string, not the R literal representation, that is :
> cat("Hello, my name is \"John\"")
Hello, my name is "John"
The json parser, reads the actual "face" of the string, so, in your case json reads :
[{"id":"484","comment":"They call me "Bruce""}]
not (the R literal representation) :
"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]"
That being said, also the json parser needs double-quotes escaping when you have quotes inside strings.
Hence, your string should be modified in this way :
[{"id":"484","comment":"They call me \"Bruce\""}]
If you simply modify your file by adding the backslashes you will be perfectly able to read the json.
Note that the corresponding R literal representation of that string would be :
"[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]"
in fact, this works :
> fromJSON("[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]")
id comment
1 484 They call me "Bruce"
¹
the default R print function (invoked also when you simply press ENTER on a value) returns the corresponding R string literal. If you want to print the actual string, you need to use print(quote=F,stringToPrint), or cat function.
EDIT (on #EngrStudent comment on the possibility to automatize quotes escaping) :
Json parser cannot do quotes escaping automatically.
I mean, try to put yourself in the computer's shoes and image you should parse this (unescaped) string as json: { "foo1" : " : "foo2" : "foo3" }
I see at least three possible escaping giving a valid json:
{ "foo1" : " : \"foo2\" : \"foo3" }
{ "foo1\" : " : "foo2\" : \"foo3" }
{ "foo1\" : \" : \"foo2" : "foo3" }
As you can see from this small example, escaping is really necessary to avoid ambiguities.
Maybe, if the string you want to escape has a really particular structure where you can recognize (without uncertainty) the double-quotes needing to be escaped, you can create your own automatic escaping procedure, but you need to start from scratch, because there's nothing built-in.