Error when importing GeoJson into BigQuery - json

I'm trying to load GeoJson data [1] into BigQuery via Cloud Shell but I'm getting the following error:
Failed to parse JSON: Top-level GeoJson 'type' member should have value 'Feature', but was 'FeatureCollection'.; ParsedString returned false; Could not parse value; Parser terminated before end of string
It feels like the GeoJson file is not formatted properly for BQ but I have no idea if that's true or how to fix it.
[1] https://github.com/tonywr71/GeoJson-Data/blob/master/australian-suburbs.geojson

Expounding on #scespinoza's answer, I was able to convert to new-line delimited GeoJSON and load it to Bigquery with the following steps:
geojson2ndjson geodata.txt > geodata_converted.txt
Using this command, I encountered an error:
But was able to create a workaround by splitting the data into 2 tables, applying the same command.
Loaded table in Bigquery:

Your file is in standard GeoJSON format, but BigQuery only accepts new-line delimited GeoJSON files and individual GeoJSON objects (see documentation: https://cloud.google.com/bigquery/docs/geospatial-data#geojson-files). So, you should first convert the dataset to the appropiated format. Here is a good and simple explanation on how it works: https://stevage.github.io/ndgeojson/.

Related

Method Error in Julia: thinks CSV is boolean and won't convert to string

I am trying to read a CSV file into Julia. When I open the file in Excel, its a 199x7 matrix of numbers. I am using the following code to create a variable, Xrel:
Xrel = CSV.read(joinpath(data_path,"Xrel.csv"), header=false)
However, when I try to do this, Julia produces:
"MethodError: Cannot 'convert' an object of type Bool to an object of type String."
data_path is defined in previous code to save space.
I've checked my paths and opened the CSV without a problem in R - it's only in Julia that I am having an issue.
I am confused as to why Julia is saying that my data is Boolean when its a matrix of numbers?
How can I resolve this to read in my CSV file?
Thanks!!
I think you should use either CSV.File() or add a sink variable to CSV.read()
CSV.File(joinpath(data_path,"Xrel.csv"), header=false)
# or with a DataFrame as sink
using DataFrames
CSV.read(joinpath(data_path,"Xrel.csv"), DataFrame, header=false)
from the docs:
?CSV.read
CSV.read(source, sink::T; kwargs...) => T
Read and parses a delimited file, materializing directly using the sink function.
CSV.read supports all the same keyword arguments as CSV.File.

Loading multiple JSON records into BigQuery using the console

I'm trying to upload some data into bigquery in JSON format using the BigQuery Console as described here.
If I have a single record in a JSON file I can upload it successfully. If I put two or more records in a JSON file with newline delimination then I get this error:
Error while reading data, error message: JSON parsing error in row starting at position 0: Parser terminated before end of string
I tried searching stackoverflow and google but didn't have any luck finding any information. The two records I uploaded with newline delimination are able to upload successfully as individual records in separate JSON files.
My editor must have been adding some other character on my newlines. I went back to my original json array of records and used:
cat test.json | jq -c '.[]' > testNDJSON.json
This fixed everything.

Getting error when loading JSON from GCS

I am trying to load schema and data from GCS as JSON files. I am using command line for this purpose.
bq load --source_format=NEWLINE_DELIMITED_JSON --schema=gs://1samtest/JSONSample/personsDataSchema.json SSData.persons_data gs://1samtest/JSONSample/personsData.json
But I get this error:
//1SAMTEST/JSONSAMPLE/PERSONSDATASCHEMA.JSON is not a valid value
But when I change all paths to my local machine it works completely file. But don't know why its getting error for json.
If I run like below after creating table in BigQuery it works fine.
bq load --source_format=NEWLINE_DELIMITED_JSON SSData.persons_data "gs://1samtest/JSONSample/personsData.json"
The schema flag/param doesn't support URIs for GCS i.e. using gs://...
bq load --help
The [destination_table] is the fully-qualified table name of table to
create, or append to if the table already exists.
The [source] argument can be a path to a single local file, or a
comma-separated list of URIs.
The [schema] argument should be either the name of a JSON file or a text schema. This schema should be omitted if the table already has one.
Only the source flag/param (i.e. the data) can be used with GCS URIs.

Load geojson in bigquery

What is the best way to load the following geojson file in Google Big Query?
http://storage.googleapis.com/velibs/stations/test.json
I have a lot of json files like this (much bigger) on Google Storage, and I cannot download/modify/upload them all (it would take forever). Note that the file is not newline-delimited, so I guess it needs to be modified online.
Thanks all.
Step by step 2019:
If you get the error "Error while reading data, error message: JSON parsing error in row starting at position 0: Nested arrays not allowed.", you might have a GeoJSON file.
Transform GeoJSON into new-line delimited JSON with jq, load as CSV into BigQuery:
jq -c .features[] \
san_francisco_censustracts.json > sf_censustracts_201905.json
bq load --source_format=CSV \
--quote='' --field_delimiter='|' \
fh-bigquery:deleting.sf_censustracts_201905 \
sf_censustracts_201905.json row
Parse the loaded file in BigQuery:
CREATE OR REPLACE TABLE `fh-bigquery.uber_201905.sf_censustracts`
AS
SELECT FORMAT('%f,%f', ST_Y(centroid), ST_X(centroid)) lat_lon, *
FROM (
SELECT *, ST_CENTROID(geometry) centroid
FROM (
SELECT
CAST(JSON_EXTRACT_SCALAR(row, '$.properties.MOVEMENT_ID') AS INT64) movement_id
, JSON_EXTRACT_SCALAR(row, '$.properties.DISPLAY_NAME') display_name
, ST_GeogFromGeoJson(JSON_EXTRACT(row, '$.geometry')) geometry
FROM `fh-bigquery.deleting.sf_censustracts_201905`
)
)
Alternative approaches:
With ogr2ogr:
https://medium.com/google-cloud/how-to-load-geographic-data-like-zipcode-boundaries-into-bigquery-25e4be4391c8
https://medium.com/#mentin/loading-large-spatial-features-to-bigquery-geography-2f6ceb6796df
With Node.js:
https://github.com/mentin/geoscripts/blob/master/geojson2bq/geojson2bqjson.js
The bucket in the question no longer exists.... However five years later there is a new answer.
In July 2018, Google announced an alpha (now beta) of BigQuery GIS.
The docs highlight a limitation that
BigQuery GIS supports only individual geometry objects in GeoJSON.
BigQuery GIS does not currently support GeoJSON feature objects,
feature collections, or the GeoJSON file format.
This means that any Feature of Feature Collection properties would need to be added to separate columns, with a geography column to hold the geojson geography.
In this tutorial by a Google trainer, polygons in a shape file are converted into geojson strings inside rows of a CSV file using gdal.
ogr2ogr -f csv -dialect sqlite -sql "select AsGeoJSON(geometry) AS geom, * from LAYER_NAME" output.csv inputfilename.shp
You want to end up with one column with the geometry content like this
{"type":"Polygon","coordinates":[[....]]}
Other columns may contain feature properties.
The CSV can then be imported to BQ. Then a query on the table can be viewed in BigQuery Geo Viz. You need to tell it which column contains the geometry.

Load a json file from biq query command line

Is it possible to load data from a json file (not just csv) using the Big Query command line tool? I am able to load a simple json file using the GUI, however, the command line is assuming a csv, and I don't see any documentation on how to specify json.
Here's the simple json file I'm using
{"col":"value"}
With schema
col:STRING
As of version 2.0.12, bq does allow uploading newline-delimited JSON files. This is an example command that does the job:
bq load --source_format NEWLINE_DELIMITED_JSON datasetName.tableName data.json schema.json
As mentioned above, "bq help load" will give you all of the details.
1) Yes you can
2) The documentation is here . Go to step 3: Upload the table in documentation.
3) You have to use --source_format flag to tell the bq that you are uploading a JSON file and not a csv.
4) The complete commmand structure is
bq load [--source_format=NEWLINE_DELIMITED_JSON] [--project_id=your_project_id] destination_data_set.destination_table data_source_uri table_schema
bq load --project_id=my_project_bq dataset_name.bq_table_name gs://bucket_name/json_file_name.json path_to_schema_in_your_machine
5) You can find other bq load variants by
bq help load
It does not support JSON formatted data loading.
Here is the documentation (bq help load) for the loadcommand with the latest bq version 2.0.9:
USAGE: bq [--global_flags] <command> [--command_flags] [args]
load Perform a load operation of source into destination_table.
Usage:
load <destination_table> <source> [<schema>]
The <destination_table> is the fully-qualified table name of table to create, or append to if the table already exists.
The <source> argument can be a path to a single local file, or a comma-separated list of URIs.
The <schema> argument should be either the name of a JSON file or a text schema. This schema should be omitted if the table already has one.
In the case that the schema is provided in text form, it should be a comma-separated list of entries of the form name[:type], where type will default
to string if not specified.
In the case that <schema> is a filename, it should contain a single array object, each entry of which should be an object with properties 'name',
'type', and (optionally) 'mode'. See the online documentation for more detail:
https://code.google.com/apis/bigquery/docs/uploading.html#createtable
Note: the case of a single-entry schema with no type specified is
ambiguous; one can use name:string to force interpretation as a
text schema.
Examples:
bq load ds.new_tbl ./info.csv ./info_schema.json
bq load ds.new_tbl gs://mybucket/info.csv ./info_schema.json
bq load ds.small gs://mybucket/small.csv name:integer,value:string
bq load ds.small gs://mybucket/small.csv field1,field2,field3
Arguments:
destination_table: Destination table name.
source: Name of local file to import, or a comma-separated list of
URI paths to data to import.
schema: Either a text schema or JSON file, as above.
Flags for load:
/usr/local/bin/bq:
--[no]allow_quoted_newlines: Whether to allow quoted newlines in CSV import data.
-E,--encoding: <UTF-8|ISO-8859-1>: The character encoding used by the input file. Options include:
ISO-8859-1 (also known as Latin-1)
UTF-8
-F,--field_delimiter: The character that indicates the boundary between columns in the input file. "\t" and "tab" are accepted names for tab.
--max_bad_records: Maximum number of bad records allowed before the entire job fails.
(default: '0')
(an integer)
--[no]replace: If true erase existing contents before loading new data.
(default: 'false')
--schema: Either a filename or a comma-separated list of fields in the form name[:type].
--skip_leading_rows: The number of rows at the beginning of the source file to skip.
(an integer)
gflags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on the command line even if the program does not define a flag with that name.
IMPORTANT: flags in this list that have arguments MUST use the --flag=value format.
(default: '')