How to extract data from JSON converted from ruby script? - json

How to convert a ruby file to json?
I use the above approach (rb2json0.rb is in the above link) to convert a ruby script to JSON. But the JSON is not well formatted as it only has arrays but not dictionaries, making it difficult to work with the JSON output.
I specifically want to extract fields in update_info, e.g.,
Name
Description
License
References
Author
and to extract the fields in register_options, e.g.,
LHOST
SOURCE
FILENAME
DOCAUTHOR
Note that the extraction should not assume the field names are fixed to these specific ones, as other field names can be used in other similar files.
The output should be a two-column TSV, with the field name as the first column and the field value as the second column. For example,
Name<TAB>Microsoft Word UNC Path Injector
...
Could anybody let me know the best jq way to achieve this? Thanks.
word_unc_injector.rb is at
https://github.com/rapid7/metasploit-framework/blob/master/modules/auxiliary/docx/word_unc_injector.rb
$ rb2json0.rb < word_unc_injector.rb | jq . # too long to include all output.
[
"program",
[
[
"command",
[
"#ident",
"require",
[
...
EDIT. The full solution of this problem may be complicated. But the first step might be to extract the part corresponding to update_info. Here is the relevant JSON fragment.
...
[
"method_add_arg",
[
"fcall",
[
"#ident",
"update_info",
[
24,
10
]
]
],
[
"arg_paren",
[
"args_add_block",
[
[
"var_ref",
[
"#ident",
"info",
[
24,
22
]
]
],
[
"bare_assoc_hash",
[
[
"assoc_new",
[
"string_literal",
[
"string_content",
[
"#tstring_content",
"Name",
...

The following illustrates how to convert the array-oriented JSON produced by rb2json0.rb to a more object-oriented JSON, in a way that you can query for update_info#ident in a straightforward way.
def objectify:
if type == "array"
then if length>=2 and (.[0]|type) == "string" and (.[1]|type) == "array"
then {(.[0]): ( .[1:] | map(objectify)) | objectify }
elif length>=3 and .[0][0:1] == "#" and (.[1]|type) == "string"
then { (.[1] + .[0]): ( .[2:] | map(objectify)) | objectify }
else map(objectify)
end
else .
end;
Illustrative query
Given the snippet shown, the following produces the output shown below:
objectify | .. | objects | .["update_info#ident"]? // empty
Output
[[24,10]]

Related

How to extract certain data using Perl from a file?

I have data that needs to be extracted from a file, the lines I need for the moment are name,location and host. This is example of the extract. How would I go about getting these lines into a separate file? I have the Original file and the new file i want to create as the input/output file, there are thousands of devices contained within the output file and they are all the same formatting as in my example.
#!/usr/bin/perl
use strict;
use warnings;
use POSIX qw(strftime);
#names of files to be input output
my $inputfile = "/home/nmis/nmis_export.csv";
my $outputfile = "/home/nmis/nmis_data.csv";
open(INPUT,'<',$inputfile) or die $!;
open(OUTPUT, '>',$outputfile) or die $!;
my #data = <INPUT>;
close INPUT;
my $line="";
foreach $line (#data)
{
======Sample Extract=======
**"group" : "NMIS8",
"host" : "1.2.3.4",
"location" : "WATERLOO",
"max_msg_size" : 1472,
"max_repetitions" : 0,
"model" : "automatic",
"netType" : "lan",
"ping" : 1,
"polling_policy" : "default",
"port" : 161,
"rancid" : 0,
"roleType" : "access",
"serviceStatus" : "Production",
"services" : null,
"threshold" : 1,
"timezone" : 0,
"version" : "snmpv2c",
"webserver" : 0
},
"lastupdate" : 1616690858,
"name" : "test",
"overrides" : {}
},
{
"activated" : {
"NMIS" : 1
},
"addresses" : [],
"aliases" : [],
"configuration" : {
"Stratum" : 3,
"active" : 1,
"businessService" : "",
"calls" : 0,
"cbqos" : "none",
"collect" : 0,
"community" : "public",
"depend" : [
"N/A"
],
"group" : "NMIS8",
"host" : "1.2.3.5",
"location" : "WATERLOO",
"max_msg_size" : 1472,
"max_repetitions" : 0,
"model" : "automatic",
"netType" : "lan",
"ping" : 1,
"polling_policy" : "default",
"port" : 161,
"rancid" : 0,
"roleType" : "access",
"serviceStatus" : "Production",
"services" : null,
"threshold" : 1,
"timezone" : 0,
"version" : "snmpv2c",
"webserver" : 0
},
"lastupdate" : 1616690858,
"name" : "test2",
"overrides" : {}
},**
I would use jq for this not Perl. You just need to query a JSON document. That's what jq is for. You can see an example here
The jq query I created is this one,
.[] | {name: .name, group: .configuration.group, location: .configuration.location}
This breaks down into
.[] # iterate over the array
| # create a filter to send it to
{ # that produces an object with the bellow key/values
.name,
group: .configuration.group,
location: .configuration.location
}
It provides an output like this,
{
"name": "test2",
"group": "NMIS8",
"location": "WATERLOO"
}
{
"name": "test2",
"group": "NMIS8",
"location": "WATERLOO"
}
You can use this to generate a csv
jq -R '.[] | [.name, .configuration.group, .configuration.location] | #csv' ./file.json
Or this to generate a csv with a header,
jq -R '["name","group","location"], (.[] | [.name, .configuration.group, .configuration.location]) | #csv' ./file.json
You can use the JSON distribution for this. Read the entire file in one fell swoop to put the entire JSON string into a scalar (as opposed to putting it into an array and iterating over it), then simply decode the string into a Perl data structure:
use warnings;
use strict;
use JSON;
my $file = 'file.json';
my $json_string;
{
local $/; # Locally reset line endings to nothing
open my $fh, '<', $file or die "Can't open file $file!: $!";
$json_string = <$fh>; # Slurp in the entire file
}
my $perl_data_structure = decode_json $json_string;
As what you have there is JSON, you should parse it with a JSON parser. JSON::PP is part of the standard Perl distribution. If you want something faster, you could install something else from CPAN.
Update: I included a link to JSON::PP in my answer. Did you follow that link? If you did, you would have seen the documentation for the module. That has more information about how to use the module than I could include in an answer on SO.
But it's possible that you need a little more high-level information. The documentation says this:
JSON::PP is a pure perl JSON decoder/encoder
But perhaps you don't know what that means. So here's a primer.
JSON is a text format for storing complex data structures. The format was initially used in Javascript (the acronym stands for "JavaScript Object Notation") but it is now a standard that is used across pretty much all programming languages.
You rarely want to actually deal with JSON in a program. A JSON document is just text and manipulating that would require some complex regular expressions. When dealing with JSON, the usual approach is to "decode" the JSON into a data structure inside your program. You can then manipulate the data structure however you want before (optionally) "encoding" the data structure back into JSON so you can write it to an output file (in your case, you don't need to do that as you want your output as CSV).
So there are pretty much only two things that a Perl JSON library needs to do:
Take some JSON text and decode it into a Perl data structure
Take a Perl data structure and encode it into JSON text
If you look at the JSON::PP documentation you'll see that it contains two functions, encode_json() and decode_json() which do what I describe above. There's also an OO interface, but let's not overcomplicate things too quickly.
So your program now needs to have the following steps:
Read the JSON from the input file
Decode the JSON into a Perl data structure
Walk the Perl data structure to extract the items that you need
Write the required items into your output file (for which Text::CSV will be useful
Having said all that, it really does seem to me that the jq solution suggested by user157251 is a much better idea.

ID lookup from an external file in JQ

I have a lookup file that maps IDs from one system onto another:
[
{
"idA": 2547,
"idB": "5d0bf91d191c6554d14572a6"
},
{
"idA": 2549,
"idB": "5b0473f93d4e53db19f8c249"
},
{
"idA": 2550,
"idB": "5d0bfabc8f20917b92ff07dc"
},
...
And I have a data file with values and an ID from one of these systems:
[
{
"idB": "5d0bf91d191c6554d14572a6",
"description": "Description for 5d0bf91d191c6554d14572a6"
},
{
"idB": "5d0bf49e9236c57281811cfc",
"description": "Description for 5d0bf49e9236c57281811cfc"
},
{
"idB": "5d0bfabc8f20917b92ff07dc",
"description": "Description for 5d0bfabc8f20917b92ff07dc"
},
...
I want to produce a new file of the descriptions with their IDs converted to the idA values in the lookup file. I tried this:
jq --slurpfile idmap ids.json 'map( {"description":.description, "id": (.idB as $b|$idmap[][]|select(.idB==$b)|.idA) } )' descriptions.json
But it produces only an empty array.
I have to double-dereference $idmap because slurping a file "binds an array of the parsed JSON values to the given global variable" -- so just doing $idmap[] throws an error, jq: error (at descriptions.json:70): Cannot index array with string "idB".
Can anyone explain what I'm doing wrong here?
Here's a concise and straightforward solution to the stated problem.
For simplicity, we'll begin by constructing a dictionary containing the relevant mapping using INDEX/2:
INDEX($idmap[]; .idB) | map_values(.idA)
Now the task is easy:
(INDEX($idmap[]; .idB) | map_values(.idA)) as $dict
| map( {description, "idA": $dict[.idB] } )
This assumes an invocation that uses --argfile idmap ids.json to avoid
the unwanted "slurping" caused by --slurpfile, but if the latter is used, then you would use $idmap[][] instead as noted in the original question.
Since the sample snippets do not include any matching "idB" values, there is little point in showing the output that would be obtained using these snippets.
Variation
If the objects in descriptions.js had other keys that should be retained, then the following variant would probably be a more useful guide:
(INDEX($idmap[]; .idB) | map_values(.idA)) as $dict # or $idmap[][] as above
| map( .idA = $dict[.idB] | del(.idB) )

select array by any element exactly matched to specified value

Anyone know to how to implement this?
I have an array which has a nested array, say tagNames, I want to select all item which tagNames contains "auto-test" exactly, not "auto-test2".
{
"servers":[
{"id":1, "tagNames": ["auto-test", "xxx"]},
{"id":2, "tagNames": ["auto-test2", "xxxx"]}
]
}
So far, I am using
echo '{"servers":[{"id":1,"tagNames":["auto-test","xxx"]},{"id":2,"tagNames":["auto-test2","xxxx"]}]}' |\
jq '[ .servers[] | select(.tagNames | contains(["auto-test"])) ]'
I got two records, but I just want the first one.
[
{
"id": 1,
"tagNames": [
"auto-test",
"xxx"
]
},
{
"id": 2,
"tagNames": [
"auto-test2",
"xxxx"
]
}
]
So I want this:
[
{
"id": 1,
"tagNames": [
"auto-test",
"xxx"
]
}
]
Any idea?
You shouldn't use contains/1 as it will not work the way you might expect it, particularly if you're dealing with strings. It will recursively check if all parts are contained. So not only will it check if the string is contained in the array, but if the string is also a substring.
You'll want to write out your conditions checking any and all tags against your criteria.
[.servers[] | select(any(.tagNames[]; . == "auto-test") and all(.tagNames[]; . != "auto-test2"))]
One way would be to use index/1, e.g.
.servers[]
| select( .tagNames | index("auto-test"))
This produces:
{"id":1,"tagNames":["auto-test","xxx"]}
If you want that wrapped in an array, you could (for example) wrap the filter above in square brackets.
Another solution is to use the idiom: first(select(_)):
jq '.servers[] | first(select(.tagNames[]=="auto-test"))' file
If the first is omitted, then the same item in the servers array might be emitted more than once.

Need to get all key value pairs from a JSON containing a specific character '/'

I have a specific json content for which I need to get all keys which contains the character / in their values.
JSON
{ "dig": "sha256:d2aae00e4bc6424d8a6ae7639d41cfff8c5aa56fc6f573e64552a62f35b6293e",
"name": "example",
"binding": {
"wf.example.input1": "/path/to/file1",
"wf.example.input2": "hello",
"wf.example.input3":
["/path/to/file3",
"/path/to/file4"],
"wf.example.input4": 44
}
}
I know I can get all the keys containing file path or array of file paths using query jq 'paths(type == "string" and contains("/"))'. This would give me an output like:
[ "binding", "wf.example.input1" ]
[ "binding", "wf.example.input3", 0]
[ "binding", "wf.example.input3", 1 ]
Now that i have all the elements that contains some file paths as their values, is there a way to fetch both key and value for the same and then store them as another JSON? For example, in JSON mentioned for this question, I need to get the output as another JSON containing all the matched paths. My output JSON should look something like below.
{ "binding":
{ "wf.example.input1": "/path/to/file1",
"wf.example.input3": [ "/path/to/file3", "/path/to/file4" ]
}
}
The following jq filter will produce the desired output if given input that is very similar to the example, but it is far from robust and glosses over some details that are unclear from the problem description. However, it should be easy enough to modify the filter in accordance with more precise specifications:
. as $in
| reduce paths(type == "string" and test("/")) as $path ({};
($in|getpath($path)) as $x
| if ($path[-1]|type) == "string"
then .[$path[-1]] = $x
else .[$path[-2]|tostring] += [$x]
end )
| {binding: .}
Output:
{
"binding": {
"wf.example.input1": "/path/to/file1",
"wf.example.input3": [
"/path/to/file3",
"/path/to/file4"
]
}
}

Extract schema of nested JSON object

Let's assume this is the source json file:
{
"name": "tom",
"age": 12,
"visits": {
"2017-01-25": 3,
"2016-07-26": 4,
"2016-01-24": 1
}
}
I want to get:
[
"age",
"name",
"visits.2017-01-25",
"visits.2016-07-26",
"visits.2016-01-24"
]
I am able to extract the keys using: jq '. | keys' file.json, but this skips nested fields. How to include those?
With your input, the invocation:
jq 'leaf_paths | join(".")'
produces:
"name"
"age"
"visits.2017-01-25"
"visits.2016-07-26"
"visits.2016-01-24"
If you want to include "visits", use paths. If you want the result as a JSON array, enclose the filter with square brackets: [ ... ]
If your input might include arrays, then unless you are using jq 1.6 or later, you will need to convert the integer indices to strings explicitly; also, since leaf_paths is now deprecated, you might want to use its def. The result:
jq 'paths(scalars) | map(tostring) | join(".")'
allpaths
To include paths to null, you could use allpaths defined as follows:
def allpaths:
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
Example:
{"a": null, "b": false} | allpaths | join(".")
produces:
"a"
"b"
all_leaf_paths
Assuming jq version 1.5 or higher, we can get to all_leaf_paths by following the strategy used in builtins.jq, that is, by adding these definitions:
def allpaths(f):
. as $in | allpaths | select(. as $p|$in|getpath($p)|f);
def isscalar:
. == null or . == true or . == false or type == "number" or type == "string";
def all_leaf_paths: allpaths(isscalar);
Example:
{"a": null, "b": false, "object":{"x":0} } | all_leaf_paths | join(".")
produces:
"a"
"b"
"object.x"
Some time ago, I wrote a structural-schema inference engine that
produces simple structural schemas that mirror the JSON documents under consideration,
e.g. for the sample JSON given here, the inferred schema is:
{
"name": "string",
"age": "number",
"visits": {
"2017-01-25": "number",
"2016-07-26": "number",
"2016-01-24": "number"
}
}
This is not exactly the format requested in the original posting, but
for large collections of objects, it does provide a useful overview.
More importantly, there is now a complementary validator for
checking whether a collection of JSON documents matches a structural
schema. The validator checks against schemas written in
JESS (JSON Extended Structural Schemas), a superset of the simple
structural schemas (SSS) produced by the schema inference engine .
(The idea is that one can use the SSS as a starting point to add
more elaborate constraints, including recursive constraints,
within-document referential integrity constraints, etc.)
For reference, here is how one the SSS for your sample.json
would be produced using the "schema" module:
jq 'include "schema"; schema' source.json > source.schema.json
And to validate source.json against a SSS or ESS:
JESS --schema source.schema.json source.json
This does what you want but it doesn't return the data in an array, but it should be an easy modification:
https://github.com/ilyash/show-struct
you can also check out this page:
https://ilya-sher.org/2016/05/11/most-jq-you-will-ever-need/