JQ Capitalize first letter of each word - json

I have a large JSON file that I am using JQ to pair down to only those elements I need. I have that working but there are some values that are string in all caps. Unfortunately, while jq has ascii_downcase and ascii_upcase, it does not have a built in function for uppercasing only the first letter of each word.
I need to only perform this on brand_name and generic_name, while ensure that the manufacturer name is also first letter capitalized with the exception of things like LLC which should remain capitalized.
Here's my current jq statement:
jq '.results[] | select(.openfda.brand_name != null or .openfda.generic_name != null or .openfda.rxcui != null) | select(.openfda|has("rxcui")) | {brand_name: .openfda.brand_name[0], generic_name: .openfda.generic_name[0], manufacturer: .openfda.manufacturer_name[0], rxcui: .openfda.rxcui[0]}' filename.json > newfile.json
This is a sample output:
{
"brand_name": "VELTIN",
"generic_name": "CLINDAMYCIN PHOSPHATE AND TRETINOIN",
"manufacturer": "Almirall, LLC",
"rxcui": "882548"
}
I need the output to be:
{
"brand_name": "Veltin",
"generic_name": "Clindamycin Phosphate And Tretinoin",
"manufacturer": "Almirall, LLC",
"rxcui": "882548"
}

Suppose we are given an array of words that are to be left as is, e.g.:
def exceptions: ["LLC", "USA"];
We can then define a capitalization function as follows:
# Capitalize all the words in the input string other than those specified by exceptions:
def capitalize:
INDEX(exceptions[]; .) as $e
| [splits("\\b") | select(length>0)]
| map(if $e[.] then . else (.[:1]|ascii_upcase) + (.[1:] |ascii_downcase) end)
| join("");
For example, given "abc-DEF ghi USA" as input, the result would be "Abc-Def Ghi USA".

Split at space characters to get an array of words, then split again at the empty string to get an array of characters. For the inner array, use ascii_downcase on all elements but the first, then put all back together using add on the inner and join with a space character on the outer array.
(.brand_name, .generic_name) |= (
(. / " ") | map(. / "" | .[1:] |= map(ascii_downcase) | add) | join(" ")
)
{
"brand_name": "Veltin",
"generic_name": "Clindamycin Phosphate And Tretinoin",
"manufacturer": "Almirall, LLC",
"rxcui": "882548"
}
Demo
To ignore certain words from being processed, capture them with an if condition:
map_values((. / " ") | map(
if IN("LLC", "AND") then .
else . / "" | .[1:] |= map(ascii_downcase) | add end
) | join(" "))
{
"brand_name": "Veltin",
"generic_name": "Clindamycin Phosphate AND Tretinoin",
"manufacturer": "Almirall, LLC",
"rxcui": "882548"
}
Demo

Related

Is there a way to filter a JSON object using jq to only include those with a key matching a value from a known list?

I have a JSON array, and another text file that contains a list of values.
[
{
"key": "foo",
"detail": "bar"
},
...
]
I need to filter the array elements to only those that have a "key" value that is found in the list of values.
The list of values is a text file containing a single item per-line.
foo
baz
Is this possible to do using jq?
You can use the following:
jq --rawfile to_keep_file to_keep.txt '
( [ $to_keep_file | match(".+"; "g").string | { (.): true } ] | add ) as $to_keep_lkup |
map(select($to_keep_lkup[.key]))
' to_filter.json
or
(
jq -sR . to_keep.txt
cat to_filter.json
) | jq -n '
( [ input | match(".+"; "g").string | { (.): true } ] | add ) as $to_keep_lkup |
inputs | map(select($to_keep_lkup[.key]))
'
The former requires jq v1.6, the first version to provide --rawfile.
jqplay

jq - converting json to cvs - how to treat "null" as string?

I have the following json file which I would like to convert to csv:
{
"id": 1,
"date": "2014-05-05T19:07:48.577"
}
{
"id": 2,
"date": null
}
Converting it to csv with the following jq produces:
$ jq -sr '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv' < test.json
"date","id"
"2014-05-05T19:07:48.577",1
,2
Unfortunately, for the line with "id" equal to "2", the date column was not set to "null" - instead, it was empty. This in turn makes MySQL error on import if it's a datetime column (it expects a literal "null" if we don't have a date, and errors on "").
How can I make jq print the literal "null", and not ""?
I'd go with:
(map(keys_unsorted) | add | unique) as $cols
| $cols,
(.[] | [.[$cols[]]] | map(. // "null") )
| #csv
First, using keys_unsorted avoids useless sorting.
Second, [.[$cols[]]] is an important, recurrent and idiomatic pattern, used to ensure an array is constructed in the correct order without resorting to the reduce sledge-hammer.
Third, although map(. // "null") seems to be appropriate here, it should be noted that this expression will also replace false with "null", so, it would not be appropriate in general. Instead, to preserve false, one could write map(if . == null then "null" else . end).
Fourth, it should be noted that using map(. // "null") as above will also mask missing values of any of the keys, so if one wants some other behavior (e.g., raising an error if id is missing), then an alternative approach would be warranted.
The above assumes the stream of JSON objects shown in the question is "slurped", e.g. using jq's -s command-line option.
Use // as alternative operator for your cell value:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.] // "null")) as $rows | $cols, $rows[] | #csv' < test.json
(The whole string is pretty good explained here: https://stackoverflow.com/a/32965227/16174836)
You can "stringify" the value using tostring by changing map($row[.]) into map($row[.]|tostring):
$ cat so2332.json
{
"id": 1,
"date": "2014-05-05T19:07:48.577"
}
{
"id": 2,
"date": null
}
$ jq --slurp --raw-output '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.]|tostring)) as $rows | $cols, $rows[] | #csv' so2332.json
"date","id"
"2014-05-05T19:07:48.577","1"
"null","2"
Note that the use of tostring will cause the numbers to be converted to strings.

jq: error (at <stdin>:0): Cannot iterate over string, cannot execute unique problem

We are trying to parse a JSON file to a tsv file. We are having problems trying to eliminate duplicate Id with unique.
JSON file
[
{"Id": "101",
"Name": "Yugi"},
{"Id": "101",
"Name": "Yugi"},
{"Id": "102",
"Name": "David"},
]
cat getEvent_all.json | jq -cr '.[] | [.Id] | unique_by(.[].Id)'
jq: error (at :0): Cannot iterate over string ("101")
A reasonable approach would be to use unique_by, e.g.:
unique_by(.Id)[]
| [.Id, .Name]
| #tsv
Alternatively, you could form the pairs first:
map([.Id, .Name])
| unique_by(.[0])[]
| #tsv
uniques_by/2
For very large arrays, though, or if you want to respect the original ordering, a sort-free alternative to unique_by should be considered. Here is a suitable, generic, stream-oriented alternative:
def uniques_by(stream; f):
foreach stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s
else ($s|tostring) end) as $y
| if .[$t][$y] then .emit = false
else .emit = true | (.item = $x) | (.[$t][$y] = true)
end;
if .emit then .item else empty end );

Filter empty and/or null values with jq

I have a file with jsonlines and would like to find empty values.
{"name": "Color TV", "price": "1200", "available": ""}
{"name": "DVD player", "price": "200", "color": null}
And would like to output empty and/or null values and their keys:
available: ""
color: null
I think it should be something like cat myexample | jq '. | select(. == "")', but is not working.
The tricky part here is emitting the keys without quotation marks in a way that the empty string is shown with quotation marks. Here is one solution that works with jq's -r command-line option:
to_entries[]
| select(.value | . == null or . == "")
| if .value == "" then .value |= "\"\(.)\"" else . end
| "\(.key): \(.value)"
Once the given input has been modified in the obvious way to make it valid JSON, the output is exactly as specified.
Some people may find the following jq program more useful for identifying keys with null or empty string values:
with_entries(select(.value |.==null or . == ""))
With the sample input, this program would produce:
{"available":""}
{"color":null}
Adding further information, such as the input line or object number, would also make sense, e.g. perhaps:
with_entries(select(.value |.==null or . == ""))
| select(length>0)
| {n: input_line_number} + .
With a single with_entries(if .value == null or .value == " then empty else . end) filter expression it's possible to filter out null and empty ("") values.
Without filtering:
echo '{"foo": null, "bar": ""}' | jq '.'
{
"foo": null,
"bar": ""
}
With filtering:
s3 echo '{"foo": null, "bar": ""}' | jq 'with_entries(if .value == null or .value == "" then empty else . end)'
{}
Take a look at this snippet https://blog.nem.ec/code-snippets/jq-ignore-nulls/
jq -r '.firstName | select( . != null )' file.json

jq iterate over a array of values a subset at a time

I have json (that actually starts as csv) of the form of an array of elements of the form:
{
"field1" : "value1"
"field2.1; Field2.2 Field2.3" : "Field2.1Value0; Field2.2Value0; Field2.3Value0; Field2.1Value1; Field2.2Value1; Field2.3Value1; ..."
}
...
I would like to iterate over the string of the field "field2.1; Field2.2 Field2.3", three ";" separated items at a time to produce an array of key value pairs
{
"field1" : "value1"
"newfield" : [
{ "Field2.1": "Field2.1Value0",
"Field2.2": "Field2.2Value0",
"Field2.3": "Field2.1Value0" },
{ "Field2.1": "Field2.1Value1",
"Field2.2": "Field2.2Value1",
"Field2.3": "Field2.3Value1"},
...
]
}
...
note that there are actually a couple of keys that need to be expanded like this. Each with a variable number of "sub-keys".
In other words, the original CSV file contains some columns that represent tuples of field values separated by semicolons.
I know how to get down to the "field2.1; Field2.2 Field2.3" and say split it on the ";" but then I'm stuck trying to iterate through that 3 (or however many) items at a time to produce the separate 3 tuples.
The real world example/context is the format of the CSV from catalog export from the Google Play Store.
For example Field2.1 is Locale, Field2.2 is Title and Field3.3 is Description:
jq '."Locale; Title; Description" |= split(";") '
If possible, then it would be nice if the iteration is based on the number of semicolon separated "subfields" in the key value. There is another column that has a similar format for the price in each country.
The following assumes the availability of splits/1 for splitting a string based on a regex. If your jq does not have it, and if you cannot or don't want to upgrade, you could devise a workaround using split/1, which only works on strings.
First, let's start with a simple variant of the problem that does not require recycling the headers. If the following jq program is in a file (say program.jq):
# Assuming header is an array of strings,
# create an object from an array of values:
def objectify(headers):
. as $in
| reduce range(0; headers|length) as $i ({}; .[headers[$i]] = ($in[$i]) );
# From an object of the form {key: _, value: _},
# construct an object by splitting each _
def devolve:
if .key|index(";")
then .key as $key
| ( [.value | splits("; *")] ) | objectify([$key | splits("; *")])
else { (.key): .value }
end;
to_entries | map( devolve )
and if the following JSON is in input.json:
{
"field1" : "value1",
"field2.1; Field2.2; Field2.3" : "Field2.1Value0; Field2.2Value0; Field2.3Value0"
}
then the invocation:
jq -f program.jq input.json
should yield:
[
{
"field1": "value1"
},
{
"field2.1": "Field2.1Value0",
"Field2.2": "Field2.2Value0",
"Field2.3": "Field2.3Value0"
}
]
It might make sense to add some error-checking or error-correcting code.
Recycling the headers
Now let's modify the above so that headers will be recycled in accordance with the problem statement.
def objectifyRows(headers):
(headers|length) as $m
| (length / $m) as $n
| . as $in
| reduce range(0; $n) as $i ( [];
.[$i] = (reduce range(0; $m) as $h ({};
.[headers[$h]] = $in[($i * $m) + $h] ) ) );
def devolveRows:
if .key|index(";")
then .key as $key
| ( [.value | splits("; *")] )
| objectifyRows([$key | splits("; *")])
else { (.key): .value }
end;
to_entries | map( devolveRows )
With input:
{
"field1" : "value1",
"field2.1; Field2.2; Field2.3" :
"Field2.1Value0; Field2.2Value0; Field2.3Value0; Field2.4Value0; Field2.5Value0; Field2.6Value0"
}
the output would be:
[
{
"field1": "value1"
},
[
{
"field2.1": "Field2.1Value0",
"Field2.2": "Field2.2Value0",
"Field2.3": "Field2.3Value0"
},
{
"field2.1": "Field2.4Value0",
"Field2.2": "Field2.5Value0",
"Field2.3": "Field2.6Value0"
}
]
]
This output can now easily be tweaked along the lines suggested by the OP, e.g. to introduce a new key, one could pipe the above into:
.[0] + { newfield: .[1] }
Functional definitions
Here are reduce-free but efficient (assuming jq >= 1.5) implementations of objectify and objectifyRows:
def objectify(headers):
[headers, .] | transpose | map( {(.[0]): .[1]} ) | add;
def objectifyRows(headers):
def gather(n):
def g: if length>0 then .[0:n], (.[n:] | g ) else empty end;
g;
[gather(headers|length) | objectify(headers)] ;
Here is my almost final solution that inserts the new key as well as uses the first element of the ";" list as the key for sorting the array.
def objectifyRows(headers):
(headers|length) as $m
| (headers[0]) as $firstkey
| (length / $m) as $n
| . as $in
| reduce range(0; $n) as $i ( [];
.[$i] = (reduce range(0; $m) as $h ({};
.[headers[$h]] = $in[($i * $m) + $h] ) ) )
;
def devolveRows:
if .key|index(";")
then .key as $multikey
| ( [.value | splits("; *")] )
# Create a new key with value being an array of the "splits"
| { ($multikey): objectifyRows([$multikey | splits("; *")])}
# here "arbitrarily" sort by the first split key
| .[$multikey] |= sort_by(.[[$multikey | splits("; *")][0]])
else { (.key): .value }
end;
to_entries | map( devolveRows )