Load a json file from biq query command line - json
Is it possible to load data from a json file (not just csv) using the Big Query command line tool? I am able to load a simple json file using the GUI, however, the command line is assuming a csv, and I don't see any documentation on how to specify json.
Here's the simple json file I'm using
{"col":"value"}
With schema
col:STRING
As of version 2.0.12, bq does allow uploading newline-delimited JSON files. This is an example command that does the job:
bq load --source_format NEWLINE_DELIMITED_JSON datasetName.tableName data.json schema.json
As mentioned above, "bq help load" will give you all of the details.
1) Yes you can
2) The documentation is here . Go to step 3: Upload the table in documentation.
3) You have to use --source_format flag to tell the bq that you are uploading a JSON file and not a csv.
4) The complete commmand structure is
bq load [--source_format=NEWLINE_DELIMITED_JSON] [--project_id=your_project_id] destination_data_set.destination_table data_source_uri table_schema
bq load --project_id=my_project_bq dataset_name.bq_table_name gs://bucket_name/json_file_name.json path_to_schema_in_your_machine
5) You can find other bq load variants by
bq help load
It does not support JSON formatted data loading.
Here is the documentation (bq help load) for the loadcommand with the latest bq version 2.0.9:
USAGE: bq [--global_flags] <command> [--command_flags] [args]
load Perform a load operation of source into destination_table.
Usage:
load <destination_table> <source> [<schema>]
The <destination_table> is the fully-qualified table name of table to create, or append to if the table already exists.
The <source> argument can be a path to a single local file, or a comma-separated list of URIs.
The <schema> argument should be either the name of a JSON file or a text schema. This schema should be omitted if the table already has one.
In the case that the schema is provided in text form, it should be a comma-separated list of entries of the form name[:type], where type will default
to string if not specified.
In the case that <schema> is a filename, it should contain a single array object, each entry of which should be an object with properties 'name',
'type', and (optionally) 'mode'. See the online documentation for more detail:
https://code.google.com/apis/bigquery/docs/uploading.html#createtable
Note: the case of a single-entry schema with no type specified is
ambiguous; one can use name:string to force interpretation as a
text schema.
Examples:
bq load ds.new_tbl ./info.csv ./info_schema.json
bq load ds.new_tbl gs://mybucket/info.csv ./info_schema.json
bq load ds.small gs://mybucket/small.csv name:integer,value:string
bq load ds.small gs://mybucket/small.csv field1,field2,field3
Arguments:
destination_table: Destination table name.
source: Name of local file to import, or a comma-separated list of
URI paths to data to import.
schema: Either a text schema or JSON file, as above.
Flags for load:
/usr/local/bin/bq:
--[no]allow_quoted_newlines: Whether to allow quoted newlines in CSV import data.
-E,--encoding: <UTF-8|ISO-8859-1>: The character encoding used by the input file. Options include:
ISO-8859-1 (also known as Latin-1)
UTF-8
-F,--field_delimiter: The character that indicates the boundary between columns in the input file. "\t" and "tab" are accepted names for tab.
--max_bad_records: Maximum number of bad records allowed before the entire job fails.
(default: '0')
(an integer)
--[no]replace: If true erase existing contents before loading new data.
(default: 'false')
--schema: Either a filename or a comma-separated list of fields in the form name[:type].
--skip_leading_rows: The number of rows at the beginning of the source file to skip.
(an integer)
gflags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on the command line even if the program does not define a flag with that name.
IMPORTANT: flags in this list that have arguments MUST use the --flag=value format.
(default: '')
Related
JMeter - Save complete JSON response of all the request to CSV file for test data preparation
I need to create test data preparation script and capture JSON response data to CSV file. In the actual test, I need to read parameters from CSV file. Is there any possibilities of saving entire JSON data as filed in CSV file (or) need to extract each filed and save it to CSV file?
The main issue JSON have comma, You can overcome it by saving JSON to file and use different delimiter instead of comma separated, for example # Then read file using CSV Data Set Config using # Delimiter Delimiter to be used to split the records in the file. If there are fewer values on the line than there are variables the remaining variables are not updated - so they will retain their previous value (if any). Also you can save JSON in every row and then get data using different delimiter as #
You can save entire JSON response into a JMeter Variable by adding a Regular Expression Extractor as a child of the HTTP Request sampler which returns JSON and configuring it like: Name of created variables: anything meaningful, i.e. response Regular Expression: (?s)(^.*) Template: $1$ Then you need to declare this response as a Sample Variable by adding the next line to user.properties file: sample_variables=response And finally you can use Flexible File Writer plugin to store the response variable into a file, if you don't have any other Sample Variables you should use variable#0
How to remove a specific data from the beginning of the json/avro schema file and the last bracket from the end of the file?
I am extracting the schema of a table from an oracle DB using Apache Nifi which I need to use to create a table in BigQuery. The extract SQL processor in NiFi is giving me a schema file which I am saving in my home directory. Now to use this schema file in BigQuery, I need to remove a certain part of the schema file from the beginning and end. How do I do this in unix using sed/awk? Here is the content of the output file: Obj^A^D^Vavro.schema<88>^L{"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"FEED_SEQUENCE","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":0}]},{"name":"FEED_UNIQUE_NAME","type":["null","string"]},{"name":"COUNTRY_CODE","type":["null","string"]},{"name":"EXTRACTION_TYPE","type":["null","string"]},{"name":"PROJECT_SEQUENCE","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":0}]},{"name":"CREATED_BY","type":["null","string"]},{"name":"CREATED_DATE","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"UPDATED_BY","type":["null","string"]},{"name":"UPDATED_DATE","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"FEED_DESC","type":["null","string"]}]}^Tavro.codec^Hnull^#àÂ<87>)[ù<8b><97><90>"õ^S<98>[<98>± I want to remove the Initial part Obj^A^D^Vavro.schema<88>^L{"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields": and the ending part }^Tavro.codec^Hnull^#àÂ<87>)[ù<8b><97><90>"õ^S<98>[<98>± from the above.
Considering that You want to remove everything outside first [ and last ]: sed 's/^[^[]*//;s/[^]]*$//' Test: $ cat out.file Obj^A^D^Vavro.schema<88>^L{"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"FEED_SEQUENCE","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":0}]},{"name":"FEED_UNIQUE_NAME","type":["null","string"]},{"name":"COUNTRY_CODE","type":["null","string"]},{"name":"EXTRACTION_TYPE","type":["null","string"]},{"name":"PROJECT_SEQUENCE","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":0}]},{"name":"CREATED_BY","type":["null","string"]},{"name":"CREATED_DATE","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"UPDATED_BY","type":["null","string"]},{"name":"UPDATED_DATE","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"FEED_DESC","type":["null","string"]}]}^Tavro.codec^Hnull^#àÂ<87>)[ù<8b><97><90>"õ^S<98>[<98>± $ sed 's/^[^[]*//;s/[^]]*$//' out.file [{"name":"FEED_SEQUENCE","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":0}]},{"name":"FEED_UNIQUE_NAME","type":["null","string"]},{"name":"COUNTRY_CODE","type":["null","string"]},{"name":"EXTRACTION_TYPE","type":["null","string"]},{"name":"PROJECT_SEQUENCE","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":0}]},{"name":"CREATED_BY","type":["null","string"]},{"name":"CREATED_DATE","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"UPDATED_BY","type":["null","string"]},{"name":"UPDATED_DATE","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"FEED_DESC","type":["null","string"]}]
You can use ExtractAvroMetadata processor to extract only the avro.schema from the avro flowfile. In the processor for Metadata Keys property specify value as avro.schema, then processor extracts avro metadata and keep as flowfile attribute. Use the attribute value(${avro.schema} in ReplaceText processor to overwrite the content of flowfile and create the table.
the data in 'd' file, by gnu sed, sed -E 's/^[^\[]+(\[\{.+\})[^\}]+/\1/' d consider use regex in Perl if you'd work on Json string manipulation
Oracle SQLcl: Spool to json, only include content in items array?
I'm making a query via Oracle SQLcl. I am spooling into a .json file. The correct data is presented from the query, but the format is strange. Starting off as: SET ENCODING UTF-8 SET SQLFORMAT JSON SPOOL content.json Follwed by a query, produces a JSON file as requested. However, how do I remove the outer structure, meaning this part: {"results":[{"columns":[{"name":"ID","type":"NUMBER"}, {"name":"LANGUAGE","type":"VARCHAR2"},{"name":"LOCATION","type":"VARCHAR2"},{"name":"NAME","type":"VARCHAR2"}],"items": [ // Here is the actual data I want to see in the file exclusively ] I only want to spool everything in the items array, not including that key itself. Is this possible to set as a parameter before querying? Reading the Oracle docs have not yielded any answers, hence asking here.
Thats how I handle this. After output to some file, I use jq command to recreate the file with only the items ssh cat file.json | jq --compact-output --raw-output '.results[0].items' > items.json ` Using this library = https://stedolan.github.io/jq/
Reading csv without specifying enclosure characters in Weka
I have a dataset that I want to open in Weka, so I converted it as csv file. (The file contains some text including commas/apostrophes/quotation marks, while its seperator is pipeline character.) When I try to read this csv file, in options window, I specify pipeline (|) as my fieldSeperator, leave enclosureCharacters empty, and don't touch the rest of the options. This can be seen in the screenshot: Then I get this error: File not recognised as an 'CSV data files' file. Reason: Enclosures can only be single characters. Seems like Weka's csv loader does not accept enclosureCharacters field empty? What can I write into this field? I think my file does not have enclosures for its text data.
Getting error when loading JSON from GCS
I am trying to load schema and data from GCS as JSON files. I am using command line for this purpose. bq load --source_format=NEWLINE_DELIMITED_JSON --schema=gs://1samtest/JSONSample/personsDataSchema.json SSData.persons_data gs://1samtest/JSONSample/personsData.json But I get this error: //1SAMTEST/JSONSAMPLE/PERSONSDATASCHEMA.JSON is not a valid value But when I change all paths to my local machine it works completely file. But don't know why its getting error for json. If I run like below after creating table in BigQuery it works fine. bq load --source_format=NEWLINE_DELIMITED_JSON SSData.persons_data "gs://1samtest/JSONSample/personsData.json"
The schema flag/param doesn't support URIs for GCS i.e. using gs://... bq load --help The [destination_table] is the fully-qualified table name of table to create, or append to if the table already exists. The [source] argument can be a path to a single local file, or a comma-separated list of URIs. The [schema] argument should be either the name of a JSON file or a text schema. This schema should be omitted if the table already has one. Only the source flag/param (i.e. the data) can be used with GCS URIs.