Invalid characters with fromJSON function - json

I am trying to import json file into R. For the purpose I am using packages such as rjson, jsonlite, RJSONIO, etc. I have tried different things as shown below but I got errors with all of them.
mydata <- fromJSON(paste(readLines("result_prod_14-15_08_17.json"), collapse=""))
The error I got is:
Error: lexical error: invalid char in json text.
{ "_id" : ObjectId("59920f495401ac79452a0
(right here) ------^
Using 'RJSONIO' package:
md <- stream_in(file("result_prod_14-15_08_17.json"))
produces the following error:
opening fileconnectionoldClass input connection.
Error: parse error: premature EOF
{
(right here) ------^
closing fileconnectionoldClass input connection.
It must be something straightforward but since I am new to R I struggle to figure it out.

Related

Snowflake throwing error (Error parsing JSON: misplaced { )

I am trying to load json files in to snow flake using copy command.I have two files of same structure.However one file loaded without issue,the other one is throwing the error
"Error parsing JSON: misplaced { "
The simple example select parse_json($1) record from values ('{{'); also errors with Error parsing JSON: misplaced {, pos 2 so your second file probably does in fact contain invalid JSON.
Try running the statement in validation mode (e.g. copy into mytable validation_mode = 'RETURN_ERRORS';) which will return a table containing useful troubleshooting info like the line number and character of the error(s).
The docs cover this here: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#validating-staged-files

How to extract large json file to csv using Python

I'm trying to convert a very large .json file to a .csv file. Here is a sample of the json file I have been using.
The file I'll be getting directly from a journal publisher in the same format.
The main purpose of this is to extract all the component from the .json file and put the information to our database.
Below is the code I have tried.
import csv, json, sys
if sys.argv[1] is not None and sys.argv[2] is not None:
fileInput = sys.argv[1]
fileOutput = sys.argv[2]
inputFile = open(fileInput, encoding="utf8") #open json file
outputFile = open(fileOutput, 'w') #load csv file
data = json.load(inputFile) #load json content
inputFile.close() #close the input file
output = csv.writer(outputFile) #create a csv.write
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values()) #values row
I'm getting this error:
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 542)
that is not valid json. the opening bracket on byte offset 0 is closed with a closing bracket on byte offset 383, then another bracket is created on byte offset 386, the new backet outside of the closing bracket on offset 383 which is created on byte offset 386 is illegal in json, the only thing that would be legal after the closing bracket is whitespace (spaces, tabs, newlines)
it looks a lot like 100 separate json's that are all line-separated, though, but there is no easy way of parsing that, as valid jsons may also contain newlines. if the data provider can guarantee that their individual jsons NEVER contains newlines, or that all their newlines are encoded in some other way than using hex 0A bytes, for example encoded with hex 5C6E instead of hex 0A, then you could ofc split up the jsons by newlines.. but that approach is unreliable if the data provider's jsons may contain newlines. (and the json specificaion allows newlines, 0x0A bytes, in jsons, so that would require your data provider to only use a newline-lacking subset of json.. if your provider is looking for a quick-fix to this issue: use NULL-bytes, hex 00, as the separator instead of hex 0x0A, because json never contains null bytes, those always has to be encoded in json, to "\u0000", then you could reliably split up the jsons by null-bytes)
here is what happens when i try to parse all 100 lines as individual jsons, splitting them by the 0x0A byte, using the code:
<?php
$jsons=file_get_contents("https://pastebin.com/raw/p9NbH2tG");
json_decode($jsons);
echo json_last_error_msg(),PHP_EOL;
$jsons=explode("\n",$jsons);
foreach($jsons as $json){
json_decode($json);
echo json_last_error_msg(),PHP_EOL;
}
output:
$ php foo.php
Syntax error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
No error
as you can see, each individual line in your file contains valid json, but as a whole, it's not valid json. but splitting them by newlines is NOT a reliable way, it just happens to work here because there are no newlines in any of the 100 jsons in your test file.
This looks a lot like the question asked here Django convert JSON to CSV
Can you share a sample of the json response you are getting? Perhaps there is an issue with attempting to decode multiple dictionaries etc.

Error while importing JSON file in r

I am trying to import a json file in r.
Have installed necessary packages & libraries.
But keep on getting an error.
Error: lexical error: invalid string in json text.
temp.jsonl
(right here) ------^
My code is as below -
library(rjson)
library(jsonlite)
library(RJSONIO)
install.packages("rjson")
install.packages("RJSONIO")
json_data_raw<-fromJSON("temp.jsonl")
Thanks
To read lines of JSON you can use the jsonlite::stream_in function.
df <- jsonlite::stream_in(file("temp.jsonl"))
Reference
The jsonlite stream_in and stream_out functions implement line-by-line processing of JSON data over a connection, such as a socket, url, file or pipe
JSON streaming in R
dat <- fromJSON(sprintf("[%s]", paste(readLines(filepath), collapse=",")))
Was able to fix the issue and create dataframe in r.
Thanks guys.

How to import Google Maps API into PostgreSQL?

I am trying to transfer data from a JSON file produced by the Google Maps API onto my PostgreSQL database. This is done through cURL and I made sure that the permissions have been correctly set.
The url:
https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=London&destinations=Paris&key=AIza-[key-redacted]-3z6ho-o
The query:
copy bookings.import(info) from program 'C:/temp/mycurl/curl "https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=London&destinations=Paris&key=AIzaSyBIhOMI68hTIFarH4jrb_eKUmvY3z6ho-o" --insecure'
However, when I try to do this on my table with column 'info' of type 'json', I get the following error:
ERROR: invalid input syntax for type json DETAIL: The input string
ended unexpectedly. CONTEXT: JSON data, line 1: { COPY import, line
1, column info: "{"
********** Error **********
ERROR: invalid input syntax for type json SQL state: 22P02 Detail: The
input string ended unexpectedly. Context: JSON data, line 1: { COPY
import, line 1, column info: "{"
I am trying to not include things such as PHP or any other tool currently, yet if the only option is that I would certainly consider it.
What exactly do you guys think I am doing wrong? Is it the syntax, the format or am I missing something?
Thanks!
COPY assumes that each newline indicates a new record. Unfortunately, the Google Maps DistanceMatrix API is pretty-printing your response which means that it comes through as 23 rows, none of which are valid JSON.
You can get around this by piping the curl response through something like jq.
copy imports(info) from program 'curl "https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=London&destinations=Paris&key=<my_key>" --insecure | /usr/local/bin/jq "." -c'
jq has lots of useful features if you want to massage the response a bit more before stashing it in the database.

How to load OSM (GeoJSON) data to ArangoDB?

How I can load OSM data to ArangoDB?
I loaded data sed named luxembourg-latest.osm.pbf from OSM, than converted it to JSON with OSMTOGEOJSON, after I tried to load result geojson to ArangoDB with next command: arangoimp --file out.json --collection lux1 --server.database geodb and got hude list of errors:
...
2017-03-17T12:44:28Z [7712] WARNING at position 719386: invalid JSON type (expecting object, probably parse error), offending context: ],
2017-03-17T12:44:28Z [7712] WARNING at position 719387: invalid JSON type (expecting object, probably parse error), offending context: [
2017-03-17T12:44:28Z [7712] WARNING at position 719388: invalid JSON type (expecting object, probably parse error), offending context: 5.867441,
...
What I am doing wrong?
upd: it's seems that converter osm2json converter should be run with option osmtogeojson --ndjson that produce items not as single Json, but in line by line mode.
As #dmitry-bubnenkov already found out, --ndjson is required to produce the right input for ArangoImp.
One has to know here, that ArangoImp expects a JSON-Subset (since it doesn't parse the json on its own) dubbed as JSONL.
Thus, Each line of the JSON-File is expected to become one json document in the collection after the import. To maximize performance and simplify the implementation, The json is not completely parsed before sending it to the server.
It tries to chop the JSON into chunks with the maximum request size that the server permits. It leans on the JSONL-line endings to isolate possible chunks.
However, the server expects valid JSON for sure. Sending the chopped part to the server with possibly incomplete JSON documents will lead to parse errors on the server, which is the error message you saw in your output.