How to edit specific textline of large json/textfile (~25gb)? - json

I have a json file with ElasticSearch events which can't be parsed by jq (funnily the json comes from jq) due to a missing comma. Here is an extract from the problematic place in the json file:
"end",
"protocol"
],
"dataset": "test",
"outcome": "success"
},
"#timestamp": "2020-08-23T04:47:10.000+02:00"
}
{
"agent": {
"hostname": "fb",
"type": "filebeat"
},
"destination": {
My jq command crashes at the closing brace (above "agent") as there is missing a comma after that brace (since a new event starts there). Now I know exactly the line and would like to add a comma there but couldn't find any options on how to do that efficiently. Since the file is around 25gb it is unsuitable to open it by nano or other tools. The error is parse error: Expected separator between values at line 192388762
Does anyone know if there is an efficient way to add a comma there so it looks like this?
"#timestamp": "2020-08-23T04:47:10.000+02:00"
},
{
"agent": {
Is there a command which I can tell to go to line X, column 1 and add a comma there (after column1)?

Are there brackets [] surrounding all these objects? If so, it is an array and there's missing commas indeed. But jq wouldn't have missed to produce them unless the previous filter was designed on purpose to behave that way. If there aren't surrounding brackets (which I presume according to the indentation of the sample), then it is a stream of objects that do not need a comma in between. In fact, putting a comma in between without the surrounding brackets would render the file inprocessible as it wouldn't be valid JSON anymore.
If it is a faulty array (the former case) maybe you're better off not using jq but rather a text stream editor such as sed or awk as you seem to know exactly where the commas are missing ("Is there a command which I can tell to go to line X, column 1 and add a comma there?")
If it is infact a stream of objects (the latter case), then you could use jq --slurp '…' or jq -n '[inputs] | …' to make it an array (surrounded by brackets and with commas in between) but the file (25 GB) has to fit entirely into your memory. If it doesn't, you need to use jq --stream '…' and handle the document (which has a different format then) according to the documentation for processing streams.
Illustrations:
This is an array of objects:
[
{"a": 1},
{"b": 2},
{"c": 3}
]
This is a stream of objects:
{"a": 1}
{"b": 2}
{"c": 3}
This is not valid JSON:
{"a": 1},
{"b": 2},
{"c": 3}

Related

Replacing a json object value with constant string

I have below json content in a json file and "pos" value in below json is a variable and need to replace it with a constant like NNNN.
{
"qual_table": "TKGGU1.TCUSTORD",
"op_type": "INSERT",
"op_ts": "YYYY-MM-DDTHH:MI:SS.FFFFFFZ",
"pos": "G-AAAAAJcRAAAAAAAAAAAAAAAAAAAHAAoAAA==10687850.2.31.996",
"xid": "0.2.31.996",
"after": {
"CUST_CODE": "BILL",
"ORDER_DATE": "YYYY-MM-DD:HH:MI:SS",
"PRODUCT_CODE": "CAR",
"ORDER_ID": 765,
"PRODUCT_PRICE": 1500000,
"PRODUCT_AMOUNT": 3,
"TRANSACTION_ID": 100
}
}
I tried perl -i -pe 's/[A-Z]-[A-Z]*\.==*/ NNNN/g' filename.json but this is not working.
Could you suggest me a regular expression for this. Pls note this is a variable and have different length everytime.
jq is the de-facto standard tool for working with JSON from scripts and the command line. It makes updating a field of an object with a new value trivial:
$ jq '.pos = "NNNN"' input.json
{
"qual_table": "TKGGU1.TCUSTORD",
"op_type": "INSERT",
"op_ts": "YYYY-MM-DDTHH:MI:SS.FFFFFFZ",
"pos": "NNNN",
"xid": "0.2.31.996",
"after": {
"CUST_CODE": "BILL",
"ORDER_DATE": "YYYY-MM-DD:HH:MI:SS",
"PRODUCT_CODE": "CAR",
"ORDER_ID": 765,
"PRODUCT_PRICE": 1500000,
"PRODUCT_AMOUNT": 3,
"TRANSACTION_ID": 100
}
}
(Note that when standard output is a pipe or file or something other than a terminal, the JSON is printed in compact form on a single line.)
The equivalent perl one(ish)-liner would be something like
perl -MJSON::MaybeXS -MFile::Slurper=read_text -E '
my $json = JSON::MaybeXS->new;
my $obj = $json->decode(read_text($ARGV[0]));
$obj->{pos} = "NNNN";
say $json->encode($obj);' input.json
(Adjust as needed for your preferred JSON and file-reading modules)
perl -pe 's/"pos"\s*:\s*"\K[^"]*/NNNN/g' filename.json
should work for any length of pos as long as there is no quote character or newline in the string.
As Shawn pointed out it'd be better to use a json-manipulating tool instead of a string-manipulating one.
Otherwise you'll have to rely on assumptions.

How to Parse / substitute value in jq for json file

How to parse / substitute value in jquery.
Json file as below:
{
"09800214851900C3": {
"label": "P7-R1-R16:S2",
"name": "Geist Upgradable rPDU",
"state": "normal",
"order": 0,
"type": "i03",
"snmpInstance": 1,
"lifetimeEnergy": "20155338",
"outlet": {},
"alarm": {
"severity": "",
"state": "none"
},
"layout": {
"0": [
"entity/total0",
"entity/phase0",
"entity/phase1",
"entity/phase2"
]
}
}
}
I want to do like below, But this is not working. Any idea/leads on this will be appreciated.
a=09800214851900C3
jsonfile.json | jq '.${a}.label'
Your current try has the following problems :
jsonfile.json isn't a command, so you can't use it as the first token of a command line. You could cat jsonfile.json | jq ..., but the prefered way to have jq work on a file is to use jq 'command' file
you define a variable a in your shell, but you try to reference it inside a single-quoted string, which prevents the shell from expanding it to its actual value. A shell based solution is to use double-quotes to have the variable expanded, but it's preferable to define the variable in the context of jq itself, using a --arg varname value option
09800214851900C3 isn't considered a "simple, identifier-like key" by jq (because it starts with a digit), so the standard way of accessing the value associated to this key (.key) doesn't work and you need to use ."09800214851900C3" or .["09800214851900C3"] instead
In conclusion I believe you will want to use the following command :
jq --arg a 09800214851900C3 '.[$a].label' jsonfile.json

Adding json array via JQ introduces unicode characters in string

I have a JSON file in which I want to append an array element, using bash and latest JQ installed. I am able to append it but the resulting string has unicode characters as can be seen below. The first element in validators array is the original and the second is the appended code. (not the whole json file)
"validators": [
{
"address": "85BAF568E7F89277E47D3FC8E111775A4F6992FA",
"pub_key": {
"type": "tendermint/PubKeyEd25519",
"value": "BCzCLcW7rZ9VJgAtEUoDN17qcZw8ZvpYbPsL6eOy3No="
},
"power": "10",
"name": ""
},
{
"address": "\u001b[32m\"F75E15A3949824B685A3C5BFCDEED7E3DA4277AE\"\u001b[0m\r",
"pub_key": "\u001b[37m{\u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[37m:\u001b[0m\u001b[32m\"tendermint/PubKeyEd25519\"\u001b[0m\u001b[37m,\u001b[0m\u001b[34;1m\"value\"\u001b[0m\u001b[37m:\u001b[0m\u001b[32m\"INeR51z41k6jPAEJ5rV+1TY+4sxnbIykc4bfJFmSCQ8=\"\u001b[0m\u001b[37m\u001b[37m}\u001b[0m\r",
"power": "10",
"name": "node2"
}
]
Printing the address element separately prints the element without any utf/unicode encoding chars.
{
"type": "tendermint/PubKeyEd25519",
"value": "BCzCLcW7rZ9VJgAtEUoDN17qcZw8ZvpYbPsL6eOy3No="
}
I merge the code using the following code:
cat genesis.json.src | jq --arg pub_key $PK --arg name node$i --arg addr $ADDR '.validators+= [{address: $addr, pub_key: $pub_key, power:"10",name:$name}]' > genesis.json.dest
I am running macOS. Any help or suggestion would be appreciated.
As #choroba mentioned in the comment, this is colour sequence characters. I removed them by adding a -M flag for JQ that disables colours.

Reflect vs Regex to check for empty JSON array properties in Golang

I receive a JSON Array from our Client which properties are empty:
[
{},{},{},{},{}
]
Normally it looks like this e.g:
[
{"Name": "foo", "Text": "Costumer"},
{"Name": "foo", "Text": "Employer"},
{"Name": "foo", "Text": "Costumer"},
{"Name": "foo", "Text": "Emplopyer"},
{"Name": "foo", "Text": "Employer"}
]
As far as my teacher said there is 2 possible ways to check for those empty properties:
regexp package && reflect package
Which should I use for performance?
And please explain why u would choose that package over the other
The most performant and error-proof way would be to manually parse the JSON tokens yourself with json's Decoder.Token and related methods.
This avoids the json package's normal use of Reflect entirely (since you're not unmarshaling into an arbitrary struct), and it avoids error-prone regular expressions. It will likely out-perform regex, too, but a benchmark will be necessary to be sure.
But it will be some verbose, and arguably ugly code.

Which form is valid between starting with square and curly brackets?

I have two json form. I tried to validate in jsonlint. It show error for first and it validate to second json.
Wrong json:
[ "name": {} ]
True json:
{ "name": {} }
Can any one explain why first one is wrong while second one true.
[ starts an array initializer. Valid entries are values separated by comments. Example:
["one", 2, "three"]
{ starts an object initializer. Valid entries are name/value pairs where each pair is a names in double quotes followed by a colon (:) followed by any valid value. Examples:
{"name": "value"}
{"name": {}}
{"name": ["one", 2, "three"]}
All of this is covered by the website and the standard.
Your first example is invalid because it's trying to define a name/value pair where a value is expected (in an array entry).