How to convert a ruby file to json?
I use the above approach (rb2json0.rb is in the above link) to convert a ruby script to JSON. But the JSON is not well formatted as it only has arrays but not dictionaries, making it difficult to work with the JSON output.
I specifically want to extract fields in update_info, e.g.,
Name
Description
License
References
Author
and to extract the fields in register_options, e.g.,
LHOST
SOURCE
FILENAME
DOCAUTHOR
Note that the extraction should not assume the field names are fixed to these specific ones, as other field names can be used in other similar files.
The output should be a two-column TSV, with the field name as the first column and the field value as the second column. For example,
Name<TAB>Microsoft Word UNC Path Injector
...
Could anybody let me know the best jq way to achieve this? Thanks.
word_unc_injector.rb is at
https://github.com/rapid7/metasploit-framework/blob/master/modules/auxiliary/docx/word_unc_injector.rb
$ rb2json0.rb < word_unc_injector.rb | jq . # too long to include all output.
[
"program",
[
[
"command",
[
"#ident",
"require",
[
...
EDIT. The full solution of this problem may be complicated. But the first step might be to extract the part corresponding to update_info. Here is the relevant JSON fragment.
...
[
"method_add_arg",
[
"fcall",
[
"#ident",
"update_info",
[
24,
10
]
]
],
[
"arg_paren",
[
"args_add_block",
[
[
"var_ref",
[
"#ident",
"info",
[
24,
22
]
]
],
[
"bare_assoc_hash",
[
[
"assoc_new",
[
"string_literal",
[
"string_content",
[
"#tstring_content",
"Name",
...
The following illustrates how to convert the array-oriented JSON produced by rb2json0.rb to a more object-oriented JSON, in a way that you can query for update_info#ident in a straightforward way.
def objectify:
if type == "array"
then if length>=2 and (.[0]|type) == "string" and (.[1]|type) == "array"
then {(.[0]): ( .[1:] | map(objectify)) | objectify }
elif length>=3 and .[0][0:1] == "#" and (.[1]|type) == "string"
then { (.[1] + .[0]): ( .[2:] | map(objectify)) | objectify }
else map(objectify)
end
else .
end;
Illustrative query
Given the snippet shown, the following produces the output shown below:
objectify | .. | objects | .["update_info#ident"]? // empty
Output
[[24,10]]
I would like to search a JSON file for some key or value, and have it print where it was found.
For example, when using jq to print out my Firefox' extensions.json, I get something like this (using "..." here to skip long parts) :
{
"schemaVersion": 31,
"addons": [
{
"id": "wetransfer#extensions.thunderbird.net",
"syncGUID": "{e6369308-1efc-40fd-aa5f-38da7b20df9b}",
"version": "2.0.0",
...
},
{
...
}
]
}
Say I would like to search for "wetransfer#extensions.thunderbird.net", and would like an output which shows me where it was found with something like this:
{ "addons": [ {"id": "wetransfer#extensions.thunderbird.net"} ] }
Is there a way to get that with jq or with some other json tool?
I also tried to simply list the various ids in that file, and hoped that I would get it with jq '.id', but that just returned null, because it apparently needs the full path.
In other words, I'm looking for a command-line json parser which I could use in a way similar to Xpath tools
The path() function comes in handy:
$ jq -c 'path(.. | select(. == "wetransfer#extensions.thunderbird.net"))' input.json
["addons",0,"id"]
The resulting path is interpreted as "In the addons field of the initial object, the first array element's id field matches". You can use it with getpath(), setpath(), delpaths(), etc. to get or manipulate the value it describes.
Using your example with modifications to make it valid JSON:
< input.json jq -c --arg s wetransfer#extensions.thunderbird.net '
paths as $p | select(getpath($p) == $s) | null | setpath($p;$s)'
produces:
{"addons":[{"id":"wetransfer#extensions.thunderbird.net"}]}
Note
If there are N paths to the given value, the above will produce N lines. If you want only the first, you could wrap everything in first(...).
Listing all the "id" values
I also tried to simply list the various ids in that file
Assuming that "id" values of false and null are of no interest, you can print all the "id" values of interest using the jq filter:
.. | .id? // empty
I am trying to form a JSON construct using jq that should ideally look like below:-
{
"api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
"app_guid": "XXXXXXXXXXXXXX",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": [
"event"
],
"traffic_including": [
"unattributed_traffic"
],
"time_zone": "Australia/NSW",
"delivery_format": "csv",
"columns_order": [
"attribution_attribution_action",
"attribution_campaign",
"attribution_campaign_id",
"attribution_creative",
"attribution_date_adjusted",
"attribution_date_utc",
"attribution_matched_by",
"attribution_matched_to",
"attribution_network",
"attribution_network_id",
"attribution_seconds_since",
"attribution_site_id",
"attribution_site_id",
"attribution_tier",
"attribution_timestamp",
"attribution_timestamp_adjusted",
"attribution_tracker",
"attribution_tracker_id",
"attribution_tracker_name",
"count",
"custom_dimensions",
"device_id_adid",
"device_id_android_id",
"device_id_custom",
"device_id_idfa",
"device_id_idfv",
"device_id_kochava",
"device_os",
"device_type",
"device_version",
"dimension_count",
"dimension_data",
"dimension_sum",
"event_name",
"event_time_registered",
"geo_city",
"geo_country",
"geo_lat",
"geo_lon",
"geo_region",
"identity_link",
"install_date_adjusted",
"install_date_utc",
"install_device_version",
"install_devices_adid",
"install_devices_android_id",
"install_devices_custom",
"install_devices_email_0",
"install_devices_email_1",
"install_devices_idfa",
"install_devices_ids",
"install_devices_ip",
"install_devices_waid",
"install_matched_by",
"install_matched_on",
"install_receipt_status",
"install_san_original",
"install_status",
"request_ip",
"request_ua",
"timestamp_adjusted",
"timestamp_utc"
]
}
What I have tried unsuccessfully thus far is below:-
json_construct=$(cat <<EOF
{
"api_key": "6AEC90B5-4169-59AF-7AC9-D655F83B4825",
"app_guid": "komacca-s-rewards-app-au-ios-production-cv8tx71",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": ["event"],
"traffic_including": ["unattributed_traffic"],
"time_zone": "Australia/NSW",
"delivery_format": "csv"
"columns_order": ["attribution_attribution_action","attribution_campaign","attribution_campaign_id","attribution_creative","attribution_date_adjusted","attribution_date_utc","attribution_matched_by","attribution_matched_to","attributio
network","attribution_network_id","attribution_seconds_since","attribution_site_id","attribution_tier","attribution_timestamp","attribution_timestamp_adjusted","attribution_tracker","attribution_tracker_id","attribution_tracker_name","
unt","custom_dimensions","device_id_adid","device_id_android_id","device_id_custom","device_id_idfa","device_id_idfv","device_id_kochava","device_os","device_type","device_version","dimension_count","dimension_data","dimension_sum","ev
t_name","event_time_registered","geo_city","geo_country","geo_lat","geo_lon","geo_region","identity_link","install_date_adjusted","install_date_utc","install_device_version","install_devices_adid","install_devices_android_id","install_
vices_custom","install_devices_email_0","install_devices_email_1","install_devices_idfa","install_devices_ids","install_devices_ip","install_devices_waid","install_matched_by","install_matched_on","install_receipt_status","install_san_
iginal","install_status","request_ip","request_ua","timestamp_adjusted","timestamp_utc"]
}
EOF)
followed by:-
echo "$json_construct" | jq '.'
I get the following error:-
parse error: Expected separator between values at line 10, column 15
I am guessing it is because of the string literal which spans to multiple lines that jq is unable to parse it.
Use jq itself:
my_formatted_json=$(jq -n '{
"api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
"app_guid": "XXXXXXXXXXXXXX",
"time_start": 1508677200,
"time_end": 1508763600,
"traffic": ["event"],
"traffic_including": ["unattributed_traffic"],
"time_zone": "Australia/NSW",
"delivery_format": "csv",
"columns_order": [
"attribution_attribution_action",
"attribution_campaign",
...,
"timestamp_utc"
]
}')
Your input "JSON" is not valid JSON, as indicated by the error message.
The first error is that a comma is missing after the key/value pair: "delivery_format": "csv", but there are others -- notably, JSON strings cannot be split across lines. Once you fix the key/value pair problem and the JSON strings that are split incorrectly, jq . will work with your text. (Note that once your input is corrected, the longest JSON string is quite short -- 50 characters or so -- whereas jq has no problems processing strings of length 10^8 quite speedily ...)
Generally, jq is rather permissive when it comes to JSON-like input, but if you're ever in doubt, it would make sense to use a validator such as the online validator at jsonlint.com
By the way, the jq FAQ does suggest various ways for handling input that isn't strictly JSON -- see https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json
Along the lines of chepner's suggestion since jq can read raw text data you could just use a jq filter to generate a legal json object from your script variables. For example:
#!/bin/bash
# whatever logic you have to obtain bash variables goes here
key=XXXXXXXXXX-7AC9-D655F83B4825
guid=XXXXXXXXXXXXXX
# now use jq filter to read raw text and construct legal json object
json_construct=$(jq -MRn '[inputs]|map(split(" ")|{(.[0]):.[1]})|add' <<EOF
api_key $key
app_guid $guid
EOF)
echo $json_construct
Sample Run (assumes executable script is in script.sh)
$ ./script.sh
{ "api_key": "XXXXXXXXXX-7AC9-D655F83B4825", "app_guid": "XXXXXXXXXXXXXX" }
Try it online!
For some extreme reason, I can't use jq or other cli tool. I need to extract the value of "name" from any json matching this puppet metadata.json. format.
the json might not be properly formatted and indented but will be valid. Meaning, white spaces, and line breaks, carriage backs might be inserted in eligible places.
Note that there could be "name" elements in dependencies array.
So, how to extract the value only using standard unix commands and/or shell script without installing any application like jq or other tools?
Thank you!!
{
"name": "examplecorp-mymodule",
"version": "0.0.1",
"author": "Pat",
"license": "Apache-2.0",
"summary": "A module for a thing",
"source": "https://github.com/examplecorp/examplecorp-mymodule",
"project_page": "https://forge.puppetlabs.com/examplecorp/mymodule",
"issues_url": "https://github.com/examplecorp/examplecorp-mymodule/issues",
"tags": ["things", "stuff"],
"operatingsystem_support": [
{
"operatingsystem":"RedHat",
"operatingsystemrelease":[ "5.0", "6.0" ]
},
{
"operatingsystem": "Ubuntu",
"operatingsystemrelease": [ "12.04", "10.04" ]
}
],
"dependencies": [
{ "name": "puppetlabs/stdlib", "version_requirement": ">=3.2.0 <5.0.0" },
{ "name": "puppetlabs/firewall", "version_requirement": ">= 0.0.4" }
]
}
It's ugly, awful, horrible, not structure-aware and will give you incorrect results if you have extra contents in your input file that look similar to what you're trying to find -- but...
#!/bin/bash
# ^- NOT /bin/sh; shell-native regexes are a bash extension
contents=$(<in.json)
if [[ $contents =~ '"name":'[[:space:]]*'"'([^\"]*)'"' ]]; then
echo "Found name: ${BASH_REMATCH[1]}"
fi
Now, let's talk about some of the ways this answer is broken (and using jq would be better):
It finds the first name, even if it's not one at an outer nesting layer. That is to say, if "dependencies": [ { "name": "puppetlabs/stdlib", "version_requirement": ">=3.2.0 <5.0.0" } ] comes before "name": "examplecorp-mymodule", guess which result is being found? (The easy workarounds to this would involve making assumptions about whitespace/formatting, and are thus not proof against all possible JSON expressions of the same data).
It won't unescape contents inside your name that require, well, unescaping (think about names containing symbols encoded as &foo;).
It isn't multibyte-character aware, and thus isn't guaranteed to emit output that aligns on codepoint boundaries.
If you have a name with an escaped \" subsequence... well, guess what happens there?
Etc. It's not quite as awful as trying to parse XML with regular expressions (JSON is easier!), but it's still quite a mess.
This should work for you:
jq '.name' metadata.json
Need your expertise here!
I am trying to load a JSON file (generated by JSON dumps) into redshift using copy command which is in the following format,
[
{
"cookieId": "cb2278",
"environment": "STAGE",
"errorMessages": [
"70460"
]
}
,
{
"cookieId": "cb2271",
"environment": "STG",
"errorMessages": [
"70460"
]
}
]
We ran into the error - "Invalid JSONPath format: Member is not an object."
when I tried to get rid of square braces - [] and remove the "," comma separator between JSON dicts then it loads perfectly fine.
{
"cookieId": "cb2278",
"environment": "STAGE",
"errorMessages": [
"70460"
]
}
{
"cookieId": "cb2271",
"environment": "STG",
"errorMessages": [
"70460"
]
}
But in reality most JSON files from API s have this formatting.
I could do string replace or reg ex to get rid of , and [] but I am wondering if there is a better way to load into redshift seamlessly with out modifying the file.
One way to convert a JSON array into a stream of the array's elements is to pipe the former into jq '.[]'. The output is sent to stdout.
If the JSON array is in a file named input.json, then the following command will produce a stream of the array's elements on stdout:
$ jq ".[]" input.json
If you want the output in jsonlines format, then use the -c switch (i.e. jq -c ......).
For more on jq, see https://stedolan.github.io/jq