Get JSON files from particular interval based on date field - json

I've a lot json file the structure of which looks like below:
{
key1: 'val1'
key2: {
'key21': 'someval1',
'key22': 'someval2',
'key23': 'someval3',
'date': '2018-07-31T01:30:30Z',
'key25': 'someval4'
}
key3: []
... some other objects
}
My goal is to get only these files where date field is from some period.
For example from 2018-05-20 to 2018-07-20.
I can't base on date of creation this files, because all of this was generated in one day.
Maybe it is possible using sed or similar program?

Fortunately, the date in this format can be compared as a string. You only need something to parse the JSONs, e.g. Perl:
perl -l -0777 -MJSON::PP -ne '
$date = decode_json($_)->{key2}{date};
print $ARGV if $date gt "2018-07-01T00:00:00Z";
' *.json
-0777 makes perl slurp the whole files instead of reading them line by line
-l adds a newline to print
$ARGV contains the name of the currently processed file
See JSON::PP for details. If you have JSON::XS or Cpanel::JSON::XS, you can switch to them for faster processing.
I had to fix the input (replace ' by ", add commas, etc.) in order to make the parser happy.

If your files actually contain valid JSON, the task can be accomplished in a one-liner with jq, e.g.:
jq 'if .key2.date[0:10] | (. >= "2018-05-20" and . <= "2018-07-31") then input_filename else empty end' *.json
This is just an illustration. jq has date-handling functions for dealing with more complex requirements.
Handling quasi-JSON
If your files contain quasi-JSON, then you could use jq in conjunction with a JSON rectifier. If your sample is representative, then hjson
could be used, e.g.
for f in *.qjson
do
hjson -j $f | jq --arg f "$f" '
if .key2.date[0:7] == "2018-07" then $f else empty end'
done

Try like this:
Find a online converter. (for example: https://codebeautify.org/json-to-excel-converter#) and convert Json to CSV
Open CSV file with Excel
Filter your data

Related

Bash: Ignore key value pairs from a JSON that failed to parse using jq

I'm writing a bash script to read a JSON file and export the key-value pairs as environment variables. Though I could extract the key-value pairs, I'm struggling to skip those entries that failed to parse by jq.
JSON (key3 should fail to parse)
{
"KEY1":"ABC",
"KEY2":"XYZ",
"KEY3":"---ABC---\n
dskfjlksfj"
}
Here is what I tried
for pair in $(cat test.json | jq -r -R '. as $line | try fromjson catch $line | to_entries | map("\(.key)=\(.value)") | .[]' ); do
echo $pair
export $pair
done
And this is the error
jq: error (at <stdin>:1): string ("{") has no keys
jq: error (at <stdin>:2): string (" \"key1...) has no keys
My code is based on these posts:
How to convert a JSON object to key=value format in jq?
How to ignore broken JSON line in jq?
Ignore Unparseable JSON with jq
Here's a response to the revised question. Unfortunately, it will only be useful in certain limited cases, not including the example you give. (Basically, it depends on jq's parser being able to recover before the end of file.)
while read -r line ; do
echo export "$line"
done < <(< test.json jq -rn '
def do:
try inputs catch null
| objects
| to_entries[]
| "\(.key)=\"\(.value|#sh)\"" ;
recurse(do) | select(.)
')
Note that further refinements may be warranted, especially if there is potentially something fishy about the key names being used as shell variable names.
[Note: this response was made to the original question, which has since been changed. The response essentially assumes the input consists of JSONLines interspersed with other lines.)
Since the goal seems to be to ignore lines that don't have valid key-value pairs, you can simply use catch empty:
while read -r line ; do
echo export "$line"
done < <(test.json jq -r -R '
try fromjson catch empty
| objects
| to_entries[]
| "\(.key)=\"\(.value|#sh)\""
')
Note also the use of #sh and of the shell's read, and the fact that .value (in jq) and $line (in the shell) are both quoted. These are all important for robustness, though further refinements might still be necessary for additional robustness.
Perhaps there is an algorithm that will repair the broken JSON produced by the upstream system. If not, the following is a horrible but possibly useful "hack" that will at least capture KEY1 and KEY2 in the example in the Q:
jq -Rr '
capture("\"(?<key>[^\"]*)\"[ \t]*:[ \t]*(?<value>[^}]+)")
| (.value |= sub("[ \t]+$"; "") ) # trailing whitespace
| if .value|test("^\".*\"") then .value |= sub("\"[ \t]*[,}[ \t]*$"; "\"") else . end
| select(.value | test("^\".*\"$") or (contains("\"")|not) ) # a string or not a string
| "\(.key)=\(.value|#sh)"
'
The broken JSON in the example could be repaired in a number of ways, e.g.:
sed '/\\n$/{N; s/\\n\n/\\n/;}'
produces:
{
"KEY1":"ABC",
"KEY2":"XYZ",
"KEY3":"---ABC---\ndskfjlksfj"
}
At least that's JSON :-)

Bulk update values in json files (writing files)

I have a set of JSON files in a local folder. What I want to do is change a particular string value in it, permanently. That means, deleting or modifying the old entry, writing a new one, and saving it.
Below is the format of the file:
{
"name": "ABC #1",
"description": "This is the description",
"image": "ipfs://NewUriToReplace/1.png",
"dna": "a56c520f57ba2a861de8c78099b4691f9dad6e87",
"edition": 1,
"date": 1641634646966,
"creator": "Team Dreamlabs",
"attributes": [
{
I want to change ABA #1 to ABC #9501 in this file, ABC #2 to ABC #9502 in the text file, and so on. How do I do that on MAC in one go?
As I understand from the example, you are adding a value of 9500 to your integers after the symbol #.
Because this kind of a replacement is a kind of string operation, a cycle with command sed might be used:
for f in *.json; do sed -i.bak 's/\("name": "ABC #\)\([0-9]\)",/\1950\2",/' $f; done
it just replaces a single digit to the new composition... Despite it responses to the example, obviously, it would not work for more than number #9.
Then we need to use a bash function:
function add_number() { old_number=$(cat $1 | sed -n 's/[ ]*"name": "ABC #\([0-9]*\)",/\1/p'); new_number=$(($old_number+9500)); sed -i.bak "s/\(\"name\": \"ABC #\)\([0-9]*\)\",/\1${new_number}\",/" $1; }; for f in *.json; do add_number $f ; done
The function add_number extracts the integer value, then adds a desired number to it and then replaces content of the file.
For both extraction and replacing the sed is used again.
At extraction flag -n allows to limit the amount of lines at sed output and mode p prints the result of replacement. Also, we do not want spaces symbols to pass into this assignment.
At replacement double quotes used in order to enable the bash to use the variable value inside of sed. Also, the real quotes are masked.
Regarding addition from the comment below, in order to make replacement in another line with tag edition (and using the same number), just a new replacement sed operation should be added with amended regular expression to fit this line.
Finally, the overall code in a better look:
function add_number() {
old_number=$(cat $1 | sed -n 's/[ ]*"name": "ABC #\([0-9]*\)",/\1/p')
new_number=$(($old_number+9500))
sed -i.bak "s/\(\"name\": \"ABC #\)[0-9]*\",/\1${new_number}\",/" $1
sed -i.bak "s/\(\"edition\": \)[0-9]*,/\1${new_number},/" $1
}
for f in *.json
do add_number $f
done
Those previous answers helped me to write this code:
using variables inside of sed
assigning the variable
If you are going to manipulate your JSON files on more than just this one occasion, then you might want to consider using tools that are designed to accomplish such tasks with ease.
One popular choice could be jq which is a "lightweight and flexible command-line JSON processor" that "has zero runtime dependencies" and is also available for OS X. By using jq within your shell, the following would be one way to accomplish what you have asked for.
Adding the numeric value 9500 to the number sitting in the field called edition:
jq '.edition += 9500' file.json
Interpreting a part of a string as number, adding again 9500 to it, and recomposing the string:
jq '.name |= ((./"#" | .[1] |= "\(tonumber + 9500)") | join("#"))' file.json
On the whole, iterating over your files, making both changes at once, writing to a temporary file and replacing the original on success, while having the value to be added as external variable:
v=9500
for f in *.json; do jq --argjson v $v '
.edition += $v | .name |= ((./"#" | .[1] |= "\(tonumber + $v)") | join("#"))
' "$f" > "$f.new" && mv "$f.new" "$f"
done
Here is an online "playground for jq", set up to simulate the application of my code from above to three imaginary files of yours. Feel free to edit the jq filter and/or the input JSON in order to see what could be possible using jq.

replace comma in json file's field with jq-win

I have a problem in working JSON file. I launch curl in AutoIt sciript to download a json file from web and then convert it to csv format by jq-win
jq-win32 -r ".[]" -c class.json>class.txt
and the json is in the following format:
[
{
"id":"1083",
"name":"AAAAA",
"channelNumber":8,
"channelImage":""},
{
"id":"1084",
"name":"bbbbb",
"channelNumber":7,
"channelImage":""},
{
"id":"1088",
"name":"CCCCCC",
"channelNumber":131,
"channelImage":""},
{
"id":"1089",
"name":"DDD,DDD",
"channelNumber":132,
"channelImage":""},
]
after jq-win, the file should become:
{"id":"1083","name":"AAAAA","channelNumber":8,"channelImage":""}
{"id":"1084","name":"bbbbb","channelNumber":7,"channelImage":""}
{"id":"1088","name":"CCCCCC","channelNumber":131,"channelImage":""}
{"id":"1089","name":"DDD,DDD","channelNumber":132,"channelImage":""}
and then the csv file will be further process by the AutoIt script and become:
AAAAA,1083
bbbbb,1084
CCCCCC,1088
DDD,DDD,1089
The json has around 300 records and among them, 5~6 record has comma in it eg DDD,DDD
so when I tried read in the csv file by _FileReadToArray, the comma in DDD,DDD cause trouble.
My question is: can I replace comma in the field using jq-win ?
(I tried use fart.exe but it will replace all comma in json file which is not suitable for me.)
Thanks a lot.
Regds
LAM Chi-fung
can I replace comma in the field using jq-win ?
Yes. For example, use gsub, pretty much as you’d use awk’s gsub, e.g.
gsub(","; "|")
If you want more details, please provide more details as per [mcve].
Example
With the given JSON input, the jq program:
.[]
| .name |= gsub(",";";")
| [.[]]
| map(tostring)
| join(",")
yields:
1083,AAAAA,8,
1084,bbbbb,7,
1088,CCCCCC,131,
1089,DDD;DDD,132,

How to delete the last character of prior line with sed

I'm trying to delete a line with a the last character of the prior line with sed:
I have a json file :
{
"name":"John",
"age":"16",
"country":"Spain"
}
I would like to delete country of all entries, to do that I have to delete the comma for the json syntax of the prior line.
I'm using this pattern :
sed '/country/d' test.json
sed -n '/resolved//.$//{x;d;};1h;1!{x;p;};${x;p;}' test.json
Editor's note:
The OP later clarified the following additional requirements, which invalidated some of the existing answers:
- multiple occurrences of country properties should be removed
- across all levels of the object hierarchy
- whitespace variations should be tolerated
Using a proper JSON parser such as jq is generally the best choice (see below), but if installing a utility is not an option, try this GNU sed command:
$ sed -zr 's/,\s*"country":[^\n]+//g' test.json
{
"name":"John",
"age":"16"
}
-z splits the input into records by NULs, which, in this case means that the whole file is read at once, which enables cross-line substitutions.
-r enables extended regular expressions for a more modern syntax with more features.
s/,\n"country":\s*//g replaces all occurrences of a comma followed by a (possibly empty) run of whitespace (including possibly a newline) and then "country" through the end of that line with the empty string, i.e., effectively removes the matched strings.
Note that this assumes that no other property or closing } follows such a country property on the same line.
To demonstrate a more robust solution based on jq.
Bertrand Martel's helpful answer contains a jq solution, which, however, does not address the requirement (added later) of replacing country attributes anywhere in the input object hierarchy.
In a not-yet-released version of jq higher than v1.5.2, a builtin walk/1 function will be available, which enables the following simple solution:
# Walk all nodes and remove a "country" property from any object.
jq 'walk(if type == "object" then del (.country) else . end)' test.json
In v1.5.2 and below, you can define a simplified variant of walk yourself:
jq '
# Define recursive function walk_objects/1 that walks all objects in the
# hierarchy.
def walk_objects(f): . as $in |
if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk_objects(f)) } ) | f
elif type == "array" then map( walk_objects(f) )
else . end;
# Walk all objects and remove a "country" property, if present.
walk_objects(del(.country))
' test.json
As pointed out before you should really consider using a JSON parser to parse JSON.
When that is said you can slurp the whole file, remove newlines and then replace
accordantly:
$ sed ':a;N;$!ba;s/\n//g;s/,"country"[^}]*//' test.json
{"name":"John","age":"16"}
Breakdown:
:a; # Define label 'a'
N; # Append next line to pattern space
$!ba; # Goto 'a' unless it's the last line
s/\n//g; # Replace all newlines with nothing
s/,"country"[^}]*// # Replace ',"country...' with nothing
This might work for you (GNU sed):
sed 'N;s/,\s*\n\s*"country".*//;P;D' file
Read two lines into the pattern space and remove substitution string.
N.B. Allows for spaces either side of the line.
You can use a JSON parser like jq to parse json file. The following will return the document without the country field and write the new document in result.json :
jq 'del(.country)' file.json > result.json

Splitting / chunking JSON files with JQ in Bash or Fish shell?

I have been using the wonderful JQ library to parse and extract JSON data to facilitate re-importing. I am able to extract a range easily enough, but am unsure as to how you could loop through in a script and detect the end of the file, preferably in a bash or fish shell script.
Given a JSON file that is wrapped in a "results" dictionary, how can I detect the end of the file?
From testing, I can see that I will get an empty array nested in my desired structure, but how can you detect the end of file condition?:
jq '{ "results": .results[0:500] }' Foo.json > 0000-0500/Foo.json
Thanks!
I'd recommend using jq to split-up the array into a stream of the JSON objects you want (one per line), and then using some other tool (e.g. awk) to populate the files. Here's how the first part can be done:
def splitup(n):
def _split:
if length == 0 then empty
else .[0:n], (.[n:] | _split)
end;
if n == 0 then empty elif n > 0 then _split else reverse|splitup(-n) end;
# For the sake of illustration:
def data: { results: [range(0,20)]};
data | .results | {results: splitup(5) }
Invocation:
$ jq -nc -f splitup.jq
{"results":[0,1,2,3,4]}
{"results":[5,6,7,8,9]}
{"results":[10,11,12,13,14]}
{"results":[15,16,17,18,19]}
For the second part, you could (for example) pipe the jq output to:
awk '{ file="file."++n; print > file; close(file); }'
A variant you might be interested in would have the jq filter emit both the filename and the JSON on alternate lines; the awk script would then read the filename as well.