Loading multiple JSON records into BigQuery using the console - json

I'm trying to upload some data into bigquery in JSON format using the BigQuery Console as described here.
If I have a single record in a JSON file I can upload it successfully. If I put two or more records in a JSON file with newline delimination then I get this error:
Error while reading data, error message: JSON parsing error in row starting at position 0: Parser terminated before end of string
I tried searching stackoverflow and google but didn't have any luck finding any information. The two records I uploaded with newline delimination are able to upload successfully as individual records in separate JSON files.

My editor must have been adding some other character on my newlines. I went back to my original json array of records and used:
cat test.json | jq -c '.[]' > testNDJSON.json
This fixed everything.

Related

Error when importing GeoJson into BigQuery

I'm trying to load GeoJson data [1] into BigQuery via Cloud Shell but I'm getting the following error:
Failed to parse JSON: Top-level GeoJson 'type' member should have value 'Feature', but was 'FeatureCollection'.; ParsedString returned false; Could not parse value; Parser terminated before end of string
It feels like the GeoJson file is not formatted properly for BQ but I have no idea if that's true or how to fix it.
[1] https://github.com/tonywr71/GeoJson-Data/blob/master/australian-suburbs.geojson
Expounding on #scespinoza's answer, I was able to convert to new-line delimited GeoJSON and load it to Bigquery with the following steps:
geojson2ndjson geodata.txt > geodata_converted.txt
Using this command, I encountered an error:
But was able to create a workaround by splitting the data into 2 tables, applying the same command.
Loaded table in Bigquery:
Your file is in standard GeoJSON format, but BigQuery only accepts new-line delimited GeoJSON files and individual GeoJSON objects (see documentation: https://cloud.google.com/bigquery/docs/geospatial-data#geojson-files). So, you should first convert the dataset to the appropiated format. Here is a good and simple explanation on how it works: https://stevage.github.io/ndgeojson/.

BigQuery loading data from bq command line tool - how to skip header rows

I have a CSV data file with a header row that I am using to populate a BigQuery table:
$ cat dummy.csv
Field1,Field2,Field3,Field4
10.5,20.5,30.5,40.5
10.6,20.6,30.6,40.6
10.7,20.7,30.7,40.7
When using the Web UI, there is a text box where I am able to specify how many header rows to skip. However, if I upload the data into BigQuery using the bq command line tool, I do not have an option to do this, and always get the following error:
$ bq load my-project:my-dataset.dummydata dummy.csv Field1:float,Field2:float,Field3:float,Field4:float
Upload complete.
Waiting on bqjob_r7eccfe35f_0000015e3e8c_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'my-project:bqjob_r7eccfe35f_0000015e3e8c_1': CSV table encountered too many errors, giving up. Rows: 1;
errors: 1.
Failure details:
- file-00000000: Could not parse 'Field1' as double for field Field1
(position 0) starting at location 0
The bq command line tool quickstart documentation also does not mention any options for skipping headers.
One simple/obvious solution is to edit dummy.csv to remove the header row, but this is not an option if pointing to a CSV file on Google Cloud Storage instead of the local file dummy.csv.
This is possible to do through the web interface, and through the Python API, so it should also be possible to do with the bq tool.
Checking bq help load revealed a --skip_leading_rows option:
--skip_leading_rows : The number of rows at the beginning of the source file to skip.
(an integer)
Also found this option in the bq command line tool documentation (which is not the same as the quickstart documentation, linked to above).
Adding a --skip_leading_rows=1 to the bq load command worked like a charm.
Here is the successful command:
$ bq load --skip_leading_rows=1 my-project:my-dataset.dummydata dummy.csv Field1:float,Field2:float,Field3:float,Field4:float
Upload complete.
Waiting on bqjob_r43eb07bad58_0000015ecea_1 ... (0s) Current status: DONE

How to import Google Maps API into PostgreSQL?

I am trying to transfer data from a JSON file produced by the Google Maps API onto my PostgreSQL database. This is done through cURL and I made sure that the permissions have been correctly set.
The url:
https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=London&destinations=Paris&key=AIza-[key-redacted]-3z6ho-o
The query:
copy bookings.import(info) from program 'C:/temp/mycurl/curl "https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=London&destinations=Paris&key=AIzaSyBIhOMI68hTIFarH4jrb_eKUmvY3z6ho-o" --insecure'
However, when I try to do this on my table with column 'info' of type 'json', I get the following error:
ERROR: invalid input syntax for type json DETAIL: The input string
ended unexpectedly. CONTEXT: JSON data, line 1: { COPY import, line
1, column info: "{"
********** Error **********
ERROR: invalid input syntax for type json SQL state: 22P02 Detail: The
input string ended unexpectedly. Context: JSON data, line 1: { COPY
import, line 1, column info: "{"
I am trying to not include things such as PHP or any other tool currently, yet if the only option is that I would certainly consider it.
What exactly do you guys think I am doing wrong? Is it the syntax, the format or am I missing something?
Thanks!
COPY assumes that each newline indicates a new record. Unfortunately, the Google Maps DistanceMatrix API is pretty-printing your response which means that it comes through as 23 rows, none of which are valid JSON.
You can get around this by piping the curl response through something like jq.
copy imports(info) from program 'curl "https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=London&destinations=Paris&key=<my_key>" --insecure | /usr/local/bin/jq "." -c'
jq has lots of useful features if you want to massage the response a bit more before stashing it in the database.

How to load OSM (GeoJSON) data to ArangoDB?

How I can load OSM data to ArangoDB?
I loaded data sed named luxembourg-latest.osm.pbf from OSM, than converted it to JSON with OSMTOGEOJSON, after I tried to load result geojson to ArangoDB with next command: arangoimp --file out.json --collection lux1 --server.database geodb and got hude list of errors:
...
2017-03-17T12:44:28Z [7712] WARNING at position 719386: invalid JSON type (expecting object, probably parse error), offending context: ],
2017-03-17T12:44:28Z [7712] WARNING at position 719387: invalid JSON type (expecting object, probably parse error), offending context: [
2017-03-17T12:44:28Z [7712] WARNING at position 719388: invalid JSON type (expecting object, probably parse error), offending context: 5.867441,
...
What I am doing wrong?
upd: it's seems that converter osm2json converter should be run with option osmtogeojson --ndjson that produce items not as single Json, but in line by line mode.
As #dmitry-bubnenkov already found out, --ndjson is required to produce the right input for ArangoImp.
One has to know here, that ArangoImp expects a JSON-Subset (since it doesn't parse the json on its own) dubbed as JSONL.
Thus, Each line of the JSON-File is expected to become one json document in the collection after the import. To maximize performance and simplify the implementation, The json is not completely parsed before sending it to the server.
It tries to chop the JSON into chunks with the maximum request size that the server permits. It leans on the JSONL-line endings to isolate possible chunks.
However, the server expects valid JSON for sure. Sending the chopped part to the server with possibly incomplete JSON documents will lead to parse errors on the server, which is the error message you saw in your output.

Parsing JSON log files using MS Message Analyzer

A documentation of MS Message Analyzer describes it can read JSON files.
I have a log file with JSON-encoded per line, like this:
{"userid":1,"action":"login","timestamp":1463734780036}
{"userid":1,"action":"logout","timestamp":1463734780036}
{"userid":2,"action":"login","timestamp":1463734780036}
{"userid":2,"action":"logout","timestamp":1463734780036}
However, it reads only first line.
I've tried wrapping log entries with an array:
[{"userid":1,"action":"login","timestamp":1463734780036},
{"userid":1,"action":"logout","timestamp":1463734780036},
{"userid":2,"action":"login","timestamp":1463734780036},
{"userid":2,"action":"logout","timestamp":1463734780036}]
MSMA read this as one big log entry.
I want to read each log entry as Message in MSMA, like CSV file:
MSMA doesn't support splitting Messages for JSON files?