Need help! - Unable to load JSON using COPY command - json

Need your expertise here!
I am trying to load a JSON file (generated by JSON dumps) into redshift using copy command which is in the following format,
[
{
"cookieId": "cb2278",
"environment": "STAGE",
"errorMessages": [
"70460"
]
}
,
{
"cookieId": "cb2271",
"environment": "STG",
"errorMessages": [
"70460"
]
}
]
We ran into the error - "Invalid JSONPath format: Member is not an object."
when I tried to get rid of square braces - [] and remove the "," comma separator between JSON dicts then it loads perfectly fine.
{
"cookieId": "cb2278",
"environment": "STAGE",
"errorMessages": [
"70460"
]
}
{
"cookieId": "cb2271",
"environment": "STG",
"errorMessages": [
"70460"
]
}
But in reality most JSON files from API s have this formatting.
I could do string replace or reg ex to get rid of , and [] but I am wondering if there is a better way to load into redshift seamlessly with out modifying the file.

One way to convert a JSON array into a stream of the array's elements is to pipe the former into jq '.[]'. The output is sent to stdout.
If the JSON array is in a file named input.json, then the following command will produce a stream of the array's elements on stdout:
$ jq ".[]" input.json
If you want the output in jsonlines format, then use the -c switch (i.e. jq -c ......).
For more on jq, see https://stedolan.github.io/jq

Related

How to add commas in between JSON objects using Linux Shell and SnowSQL?

While there are several posts about this topic on Stack Overflow, none match my exact use case. I am using a Linux shell script to run SnowSQL to generate a json file.
========================
My json file needs to have a comma between json objects.
This:
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
}
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
...needs to look this:
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
},
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
Here is my complete ksh script:
#!/usr/bin/ksh
. /appl/.snf_logon
export SNOW_PKEY_FILE=$(mktemp ./pkey-XXXXXX)
trap "rm -f ${SNOW_PKEY_FILE}" EXIT
LibGetSnowCred
{
outFile=JSON_FILE_TYPE_TEST.json
inDir=/testing
outFileNm=#my_db.my_schema.my_file_stage/${outFile}
snowsql \
--private-key-path $SNOW_PKEY_FILE \
-o exit_on_error=true \
-o friendly=false \
-o timing=false \
-o log_level=ERROR \
-o echo=true <<!
COPY INTO ${outFileNm}
FROM (SELECT object_construct(
'UUID',UUID
,'CAMPAIGN',CAMPAIGN)
FROM my_db.my_schema.JSON_Test_Table
LIMIT 2)
FILE_FORMAT=(
TYPE=JSON
COMPRESSION=NONE
)
OVERWRITE=True
HEADER=False
SINGLE=True
MAX_FILE_SIZE=4900000000
;
get ${outFileNm} file://${inDir}/;
rm ${outFileNm};
!
if [ $? -eq 0 ]; then
echo "Export successful"
else
echo "ERROR in export"
fi
}
Is the best practice to add the comma during the SELECT or after the file is generated and how?
With or without that comma, the text is still not JSON but just a random text that looks like JSON. You export several rows, each row as an independent object. You need to gather all these objects into an array to produce a valid JSON.
A JSON that encodes an array of rows looks like this:
[
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
},
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
]
The easiest way to produce this output would be to ask the database, if it supports this option (to wrap all the records into a list before generating the JSON, to not export each record in a separate JSON).
If this is not possible then you have a file that contains multiple JSONs. You can use jq to convert these individual JSONs into a JSON similar to the one described above (encoding an array of objects).
It is as simple as that:
jq --slurp '.' input_file > output_file
The option --slurp tells jq to read all the JSONs from the file input_file in memory, to parse them and to put them into an array. That is the program input.
'.' is the jq program. It says "dump the current object". It does not do any processing to the input data. The current object is the array.
After it executes the program (which, in this case doesn't do anything), jq dumps the modified value (as JSON, of course) to the standard output (by default, on screen).
The > output_file part redirects this output to a file (named output_file) instead of showing it on screen.
You can see how it works on the jq playground.

how to extract multiple values from json object using jq command

I am trying to get multiple values from a json object.
{
"nextToken": "9i2x1mbCpfo5hQ",
"jobSummaryList": [
{
"jobName": "012210",
"jobId": "0196f81cae73"
}
]
}
I want nextToken's value and jobName in one jq command.
https://stedolan.github.io/jq/manual/
jq '.nextToken, .jobSummaryList[].jobName' file

Retrieve one (last) value from influxdb

I'm trying to retrieve the last value inserted into a table in influxdb. What I need to do is then post it to another system via HTTP.
I'd like to do all this in a bash script, but I'm open to Python also.
$ curl -sG 'https://influx.server:8086/query' --data-urlencode "db=iotaWatt" --data-urlencode "q=SELECT LAST(\"value\") FROM \"grid\" ORDER BY time DESC" | jq -r
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "grid",
"columns": [
"time",
"last"
],
"values": [
[
"2018-01-17T04:15:30Z",
690.1
]
]
}
]
}
]
}
What I'm struggling with is getting this value into a clean format I can use. I don't really want to use sed, and I've tried jq but it complains the data is a string and not an index:
jq: error (at <stdin>:1): Cannot index array with string "series"
Anyone have a good suggestion?
Pipe that curl to the jq below
$ your_curl_stuff_here | jq '.results[].series[]|.name,.values[0][]'
"grid"
"2018-01-17T04:15:30Z"
690.1
The results could be stored into a bash array and used later.
$ results=( $(your_curl_stuff_here | jq '.results[].series[]|.name,.values[0][]') )
$ echo "${results[#]}"
"grid" "2018-01-17T04:15:30Z" 690.1
# Individual values could be accessed using "${results[0]}" and so, mind quotes
All good :-)
Given the JSON shown, the jq query:
.results[].series[].values[]
produces:
[
"2018-01-17T04:15:30Z",
690.1
]
This seems to be the output you want, but from the point of view of someone who is not familiar with influxdb, the requirements seem very opaque, so you might want to consider a variant, such as:
.results[-1].series[-1].values[-1]
which in this case produces the same result, as it happens.
If you just want the atomic values, you could simply append [] to either of the queries above.

jq construct with value strings spanning multiple lines

I am trying to form a JSON construct using jq that should ideally look like below:-
{
"api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
"app_guid": "XXXXXXXXXXXXXX",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": [
"event"
],
"traffic_including": [
"unattributed_traffic"
],
"time_zone": "Australia/NSW",
"delivery_format": "csv",
"columns_order": [
"attribution_attribution_action",
"attribution_campaign",
"attribution_campaign_id",
"attribution_creative",
"attribution_date_adjusted",
"attribution_date_utc",
"attribution_matched_by",
"attribution_matched_to",
"attribution_network",
"attribution_network_id",
"attribution_seconds_since",
"attribution_site_id",
"attribution_site_id",
"attribution_tier",
"attribution_timestamp",
"attribution_timestamp_adjusted",
"attribution_tracker",
"attribution_tracker_id",
"attribution_tracker_name",
"count",
"custom_dimensions",
"device_id_adid",
"device_id_android_id",
"device_id_custom",
"device_id_idfa",
"device_id_idfv",
"device_id_kochava",
"device_os",
"device_type",
"device_version",
"dimension_count",
"dimension_data",
"dimension_sum",
"event_name",
"event_time_registered",
"geo_city",
"geo_country",
"geo_lat",
"geo_lon",
"geo_region",
"identity_link",
"install_date_adjusted",
"install_date_utc",
"install_device_version",
"install_devices_adid",
"install_devices_android_id",
"install_devices_custom",
"install_devices_email_0",
"install_devices_email_1",
"install_devices_idfa",
"install_devices_ids",
"install_devices_ip",
"install_devices_waid",
"install_matched_by",
"install_matched_on",
"install_receipt_status",
"install_san_original",
"install_status",
"request_ip",
"request_ua",
"timestamp_adjusted",
"timestamp_utc"
]
}
What I have tried unsuccessfully thus far is below:-
json_construct=$(cat <<EOF
{
"api_key": "6AEC90B5-4169-59AF-7AC9-D655F83B4825",
"app_guid": "komacca-s-rewards-app-au-ios-production-cv8tx71",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": ["event"],
"traffic_including": ["unattributed_traffic"],
"time_zone": "Australia/NSW",
"delivery_format": "csv"
"columns_order": ["attribution_attribution_action","attribution_campaign","attribution_campaign_id","attribution_creative","attribution_date_adjusted","attribution_date_utc","attribution_matched_by","attribution_matched_to","attributio
network","attribution_network_id","attribution_seconds_since","attribution_site_id","attribution_tier","attribution_timestamp","attribution_timestamp_adjusted","attribution_tracker","attribution_tracker_id","attribution_tracker_name","
unt","custom_dimensions","device_id_adid","device_id_android_id","device_id_custom","device_id_idfa","device_id_idfv","device_id_kochava","device_os","device_type","device_version","dimension_count","dimension_data","dimension_sum","ev
t_name","event_time_registered","geo_city","geo_country","geo_lat","geo_lon","geo_region","identity_link","install_date_adjusted","install_date_utc","install_device_version","install_devices_adid","install_devices_android_id","install_
vices_custom","install_devices_email_0","install_devices_email_1","install_devices_idfa","install_devices_ids","install_devices_ip","install_devices_waid","install_matched_by","install_matched_on","install_receipt_status","install_san_
iginal","install_status","request_ip","request_ua","timestamp_adjusted","timestamp_utc"]
}
EOF)
followed by:-
echo "$json_construct" | jq '.'
I get the following error:-
parse error: Expected separator between values at line 10, column 15
I am guessing it is because of the string literal which spans to multiple lines that jq is unable to parse it.
Use jq itself:
my_formatted_json=$(jq -n '{
"api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
"app_guid": "XXXXXXXXXXXXXX",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": ["event"],
"traffic_including": ["unattributed_traffic"],
"time_zone": "Australia/NSW",
"delivery_format": "csv",
"columns_order": [
"attribution_attribution_action",
"attribution_campaign",
...,
"timestamp_utc"
]
}')
Your input "JSON" is not valid JSON, as indicated by the error message.
The first error is that a comma is missing after the key/value pair: "delivery_format": "csv", but there are others -- notably, JSON strings cannot be split across lines. Once you fix the key/value pair problem and the JSON strings that are split incorrectly, jq . will work with your text. (Note that once your input is corrected, the longest JSON string is quite short -- 50 characters or so -- whereas jq has no problems processing strings of length 10^8 quite speedily ...)
Generally, jq is rather permissive when it comes to JSON-like input, but if you're ever in doubt, it would make sense to use a validator such as the online validator at jsonlint.com
By the way, the jq FAQ does suggest various ways for handling input that isn't strictly JSON -- see https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json
Along the lines of chepner's suggestion since jq can read raw text data you could just use a jq filter to generate a legal json object from your script variables. For example:
#!/bin/bash
# whatever logic you have to obtain bash variables goes here
key=XXXXXXXXXX-7AC9-D655F83B4825
guid=XXXXXXXXXXXXXX
# now use jq filter to read raw text and construct legal json object
json_construct=$(jq -MRn '[inputs]|map(split(" ")|{(.[0]):.[1]})|add' <<EOF
api_key $key
app_guid $guid
EOF)
echo $json_construct
Sample Run (assumes executable script is in script.sh)
$ ./script.sh
{ "api_key": "XXXXXXXXXX-7AC9-D655F83B4825", "app_guid": "XXXXXXXXXXXXXX" }
Try it online!

Why won't jsontool parse my array?

I'm having trouble parsing this json, which contains an array as its outermost element:
response=[ { "__type": "File", "name": "...tfss-ea51ec70-d3a8-45e5-abbf-294f2c2c81f0-myPicture.jpg", "url": "http://files.parse.com/ac3f079b-cacb-49e9-bd74-8325f993f308/...tfss-ea51ec70-d3a8-45e5-abbf-294f2c2c81f0-myPicture.jpg" } ]
for blob in $response
do
url=$(echo $blob | json url)
done
But this last json parsing gives a bunch of errors:
json: error: input is not JSON: Bad object at line 2, column 1:
^
Do I need to do something special to turn a JSON array into a bash array, or vice versa?
You should quote the value of reponse to protect it from the shell trying to interpret it:
response='[ { "__type": "File", "name": "...tfss-ea51ec70-d3a8-45e5-abbf-294f2c2c81f0-myPicture.jpg", "url": "http://files.parse.com/ac3f079b-cacb-49e9-bd74-8325f993f308/...tfss-ea51ec70-d3a8-45e5-abbf-294f2c2c81f0-myPicture.jpg" } ]'
And you can't expect the shell to be able to parse JSON, so for blob in $response isn't going to work out. According to http://trentm.com/json/, using -a should handle the array:
while read url ; do
# use $url here...
done < <(echo "$response" | json -a url)
After putting your json in a file and piping it through jsontool it works just fine, so I'm guessing your file has some weird whitespace which is breaking in jsontool.
Try this instead:
cat your_file.json | sed 's/[[:space:]]\+$//' | json