Convert plain text with a specific format into JSON in VIM - json

All my university notes are in JSON format and when I get a set of practical questions from a pdf it is formatted like this:
1. Download and compile the code. Run the example to get an understanding of how it works. (Note that both
threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this
is an interface issue, not of concern in this course.)
2. Explore the classes SumTask and StringTask as well as the abstract class Task.
3. Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is
called.
4. Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have
to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.)
Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger
than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer
for a discussion.
5. Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times,
but “pop()”s off only the first task in the queue and executes it.
6. Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as
the following to the SumTask class definition:
private static final String taskType = "SumTask";
Investigate what “static” and “final” mean.
7. More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they
implement this interface. Here’s an example interface:
What I would like to do is copy it into vim and execute a find and replace to convert it into this:
"1": {
"Task": "Download and compile the code. Run the example to get an understanding of how it works. (Note that both threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this is an interface issue, not of concern in this course.)",
"Solution": ""
},
"2": {
"Task": "Explore the classes SumTask and StringTask as well as the abstract class Task.",
"Solution": ""
},
"3": {
"Task": "Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is called.",
"Solution": ""
},
"4": {
"Task": "Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.) Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer for a discussion.",
"Solution": ""
},
"5": {
"Task": "Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times, but “pop()”s off only the first task in the queue and executes it.",
"Solution": ""
},
"6": {
"Task": "Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as the following to the SumTask class definition: private static final String taskType = 'SumTask'; Investigate what “static” and “final” mean.",
"Solution": ""
},
"7": {
"Task": "More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they implement this interface. Here’s an example interface:",
"Solution": ""
}
After trying to figure this out during the practical (instead of actually doing the practical) this is the closest I got:
%s/\([1-9][1-9]*\)\. \(\_.\{-}\)--end--/"\1": {\r "Task": "\2",\r"Solution": "" \r},/g
The 3 problems with this are
I have to add --end-- to the end of each question. I would like it to know when the question ends by looking ahead to a line which starts with [1-9][1-9]*. unfortunately when I search for that It also replaces that part.
This keeps all the new lines within the question (which is invalid in JSON). I would like it to remove the new lines.
The last entry should not contain a "," after the input because that would also be invalid JSON (Note I don't mind this very much as it is easy to remove the last "," manually)
Please keep in mind I am very bad at regular expressions and one of the reasons I am doing this is to learn more about regex so please explain any regex you post as a solution.

In two steps:
%s/\n/\ /g
to solve problem 2, and then:
%s/\([1-9][1-9]*\)\. \(\_.\{-}\([1-9][1-9]*\. \|\%$\)\#=\)/"\1": {\r "Task": "\2",\r"Solution": "" \r},\r/g
to solve problem 1.
You can solve problem 3 with another replace round. Also, my solution inserts an unwanted extra space at the end of the task entries. Try to remove it yourself.
Short explanation of what I have added:
\|: or;
\%$: end of file;
\#=: find but don't include in match.

If each item sits in single line, I would transform the text with macro, it is shorter and more straightforward than the :s:
I"<esc>f.s": {<enter>"Task": "<esc>A"<enter>"Solution": ""<enter>},<esc>+
Record this macro in a register, like q, then you can just replay it like 100#q to do the transformation.
Note that
the result will leave a comma , and the end, just remove it.
You can also add indentations during your macro recording, then your json will be "pretty printed". Or you can make it sexy later with other tool.

You could probably do this with one large regular expression, but that quickly becomes unmaintainable. I would break the task up into 3 steps instead:
Separate each numbered step into its own paragraph .
Put each paragraph on its own line .
Generate the JSON .
Taken together:
%s/^[0-9]\+\./\r&/
%s/\(\S\)\n\(\S\)/\1 \2/
%s/^\([0-9]\+\)\. *\(.*\)/"\1": {\r "Task": "\2",\r "Solution": ""\r},/
This solution also leaves a comma after the last element. This can be removed with:
$s/,//
Explanation
%s/^[0-9]\+\./\r&/ this matches a line starting with a number followed by a dot, e.g. 1., 8., 13., 131, etc. and replaces it with a newline (\r) followed by the match (&).
%s/\(\S\)\n\(\S\)/\1 \2/ this removes any newline that is flanked by non-white-space characters (\S).
%s/^\([0-9]\+\)\. *\(.*\) ... capture the number and text in \1 and \2.
... /"\1": {\r "Task": "\2",\r "Solution": ""\r},/ format text appropriately.
Alternative way using sed, awk and jq
You can perform steps one and two from above straightforwardly with sed and awk:
sed 's/^[0-9]\+\./\n&/' infile
awk '$1=$1; { print "\n" }' RS= ORS=' '
Using jq for the third step ensures that the output is valid JSON:
jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'
Here as one command line:
sed 's/^[0-9]\+\./\n&/' infile |
awk '$1=$1; { print "\n" }' RS= ORS=' ' |
jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'

Related

jq filter to ignore values in select statement based on array values

Given the following JSON input :
{
"hostname": "server1.domain.name\nserver2.domain.name\n*.gtld.net",
"protocol": "TCP",
"port": "8080\n8443\n9500-9510",
"component": "Component1",
"hostingLocation": "DC1"
}
I would like to obtain the following JSON output :
{
"hostname": [
"server1.domain.name",
"server2.domain.name",
"*.gtld.net"
],
"protocol": "TCP",
"port": [
"8080-8080",
"8443-8443",
"9500-9510"
],
"component": "Component1",
"hostingLocation": "DC1"
}
Considering :
That the individual values in the port array may, or may not, be separated by a - character (I have no control over this).
That if an individual value in the port array does not contain the - separator, I then need to add it and then repeat the array value after the - separator. For example, 8080 becomes 8080-8080, 8443 becomes 8443-8443 and so forth.
And finally, that if a value in the port array is already of the format value-value, I should simply leave it unmodified.
I've been banging my head against this filter all afternoon, after reading many examples both here and in the official jq online documentation. I simply can't figure out how to accomodate consideration #3 above.
The filter I have now :
{hostname: .hostname | split("\n"), protocol: .protocol, port: .port | split("\n") | map(select(. | contains("-") | not)+"-"+.), component: .component, hostingLocation: .hostingLocation}
Yields the following output JSON :
{
"hostname": [
"server1.domain.name",
"server2.domain.name",
"*.gtld.net"
],
"protocol": "TCP",
"port": [
"8080-8080",
"8443-8443"
],
"component": "Component1",
"hostingLocation": "DC1"
}
As you can see above, I subsequently lose the 9500-9510 value as it already contains the - string which my filter weeds out.
If my logic does not fail me, I would need to stick an if statement within my select statement to conditionally only send array values that do not contain the string - to my select statement but leave array values that do contain the separator untouched. However, I cannot seem to figure this last piece out.
I will happily accept any alternative filter that yields the desired output, however I am also really keen on understanding where my logics fails in the above filter.
Thanks in advance to anyone spending their valuable time helping me out!
/Joel
First, we split the hostname string by a newline character (.hostname /= "\n") and do the same with the port string (.port /= "\n"). Actually, we can combine these identical operations into one: (.hostname, .port) /= "\n"
Next, for every element of the port array (.port[]) we split by any non-digit character (split("[^\\d]";"g")) resulting in an array of digit-only strings, from which we take the first element (.[0]), then a dash sign, and finally either the second element, if present, otherwise the first one again (.[1]//.[0])
With your input in a file called input.json, the following should convert it into the desired format:
jq '
(.hostname, .port) /= "\n" |
.port[] |= (split("[^\\d]";"g") | "\(.[0])-\(.[1]//.[0])")
' input.json
Regarding your considerations:
As we split at any non-digit character, it makes no difference what other character separates the values of a port range. If more than one character could separate them (e.g. an arrow -> or with spaces before and after the dash sign -), simply replace the regex [^\\d] with [^\\d]+ for capturing more than one non-digit character.
and 3. We always produce a range by including a dash sign and a second value, which depending on the presence of a second item may be either that or the first one again.
Regarding your approach:
Inside map you used select which evaluates to empty if the condition (contains("-") | not) is not met. As "9500-9510" does indeed contain a dash sign, it didn't survive. An if statement inside the select statement wouldn't help because even if select doesn't evaluate to empty it still doesn't modify anything, it just reproduces its input unchanged. Therefore, if select is letting through both cases (containing and not containing dash signs) it becomes useless. You could, however, work with an if statement outside of the select statement, but I considered the above solution as a simpler approach.

Display an object if a nephew element array contains a value

Select objects based on value of variable in object using jq
That shows how to return values directly above the selection criteria but how would I get another object that was adjacent to a value above my selection criteria?
Given the data below, what jq invocation would return the French name of planets whose moon(s) have been spoiled? (this is a structural reproduction of the live data with which I am working -- which actually uses the word "value" in this way, so that's not helping)
{"kind":"solarsystem","name":"Sol",
"Planets": [
{ "kind":"habitable",
"names": { "english":"Earth","french":"Terre"},
"satellites" : [
{"name":"The Moon",
"parameters": [
{"name":"diameter", "intValue":"3476"},
{"name":"diameter_units", "value":"km"},
{"name":"unspoiled","value":"no"}]}]},
{"kind":"uninhabitable",
"names": {"english":"Mars","french":"Mars"},
"satellites" : [
{"name":"Phobos",
"parameters": [
{"name":"diameter", "intValue":"2200"},
{"name":"diameter_units", "value":"m"},
{"name":"unspoiled","value":"yes"}]},
{"name":"Deimos",
"parameters": [
{"name":"diameter", "intValue":"1200"},
{"name":"diameter_units", "value":"m"},
{"name":"unspoiled","value":"yes"}]}]}]}
The program below selects planets whose moons have all been spoiled. As each parameter is a name-value pair, we can use from_entries to transform the array of parameters into an object and retrieve the unspoiled status with just .unspoiled, and thus avoid another select to find the parameter we're interested in.
.Planets[] | select(.satellites | all(.parameters | from_entries .unspoiled == "no")) .names.french
If a single spoiled moon is enough, change all to any.
Online demo
And here, also a solution for the same JSON query using an alternative tool (jtc):
In the simplest form, the following will do:
bash $ <file.json jtc -w'[value]:<no>:[-5][names][french]'
"Terre"
However, that solution will return planet's french name for each of the moon, e.g., for spoiled moons it would give this:
bash $ <file.json jtc -w'[value]:<yes>:[-5][names][french]'
"Mars"
"Mars"
bash $
For the case when there're multiple moons but the name is required only once, strengthen the query like this (showcasing here spoiled moons):
bash $ <file.json jtc -w'<satellites>l:[value]:<yes>[-5][names][french]'
"Mars"
bash $
PS. I'm a deveoper of jtc unix JSON processor
PPS. the above disclaimer is required by SO.
Update:
the answer was updated based on discussion in comments with #oguzismail to enhance structural relationship between value and french labels so that other (irrelevant) possible value matches won't trigger false positives.
If, by a chance, the structural relation [-5][names] is not enough, the query then can be ultimately enhanced by inserting <unspoiled>[-1] before [value]... lexeme

How do I search for a string in this JSON with Python

My JSON file looks something like:
{
"generator": {
"name": "Xfer Records Serum",
....
},
"generator": {
"name: "Lennar Digital Sylenth1",
....
}
}
I ask the user for search term and the input is searched for in the name key only. All matching results are returned. It means if I input 's' only then also both the above ones would be returned. Also please explain me how to return all the object names which are generators. The more simple method the better it will be for me. I use json library. However if another library is required not a problem.
Before switching to JSON I tried XML but it did not work.
If your goal is just to search all name properties, this will do the trick:
import re
def search_names(term, lines):
name_search = re.compile('\s*"name"\s*:\s*"(.*' + term + '.*)",?$', re.I)
return [x.group(1) for x in [name_search.search(y) for y in lines] if x]
with open('path/to/your.json') as f:
lines = f.readlines()
print(search_names('s', lines))
which would return both names you listed in your example.
The way the search_names() function works is it builds a regular expression that will match any line starting with "name": " (with varying amount of whitespace) followed by your search term with any other characters around it then terminated with " followed by an optional , and the end of string. Then applies that to each line from the file. Finally it filters out any non-matching lines and returns the value of the name property (the capture group contents) for each match.

How to parse multidimensional JSON array in bash using jsawk?

I have an array like below. I want to parse entire data to my bash array.
So i can call the first "JSON addressLineOne" from ${bashaddr[0]}, and etc.
[
{
"id":"f0c546d5-0ce4-55ee-e043-516e0f0afdc1",
"cardType":"WMUSGESTORECARD",
"lastFour":"1682",
"cardExpiryDate":"2012-01-16",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Apt venue",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"534534",
"isDefault":false
},
{
"id":"f0c546d5-0ce0-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"2731",
"cardExpiryDate":"2009-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"119 maple ave.",
"addressLineTwo":"",
"city":"uncasville",
"state":"CT",
"postalCode":"06382",
"phone":"7676456",
"isDefault":false
},
{
"id":"f0c546d5-0ce2-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"6025",
"cardExpiryDate":"2011-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Angeline Street",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"7867876",
"isDefault":false
}
]
I have tried like this:
#!/bin/bash
addressLineOne="$(echo $card | jsawk 'return this.addressLineOne')"
but it gives me the entire address:
["address 1","address 2","address 3"]
Thank you.
I wrote the answer below before reading the comments, but this is exactly the same answer as #4ae1e1 provided, except I don't put -r tag in case you want the values to remain quoted (e.g. passing this as an argument somewhere else).
I know this is not jsawk, but do consider jq:
jq '.[].addressLineOne' yourfile.txt
And to access specific values you can put record number in the square brackets (starting with 0 for the first address and so on). For example to get the address for the third record:
jq '.[2].addressLineOne' yourfile.txt
For learning more about jq and advanced uses, check: http://jqplay.org
What you need to do is make use of the -a switch to apply some post processing and filter the output array like this:
jsawk 'return this.addressLineOne' -a 'return this[0]'
From the documentation:
-b <script> | -a <script>
Run the specified snippet of JavaScript before (-b) or after (-a)
processing JSON input. The `this` object is set to the whole JSON
array or object. This is used to preprocess (-b) or postprocess
(-a) the JSON array before or after the main script is applied.
This option can be specified multiple times to define multiple
before/after scripts, which will be applied in the order they
appeared on the command line.

Referencing JSON elements in AppleScript

I have a JSON result I am trying to work with in AppleScript, but because the top level items are "unnamed" I can only access them by piping the item reference, which in this case is a number. As a result, I can't iterate through it, it has to be hard coded (scroll down to the last code sample to see what I mean)
For example, this is the JSON I'm looking at:
{
"1": {
"name": "Tri 1"
},
"2": {
"name": "Tri 2"
},
"3": {
"name": "Tri 3"
},
"4": {
"name": "Orb Dave"
},
"5": {
"name": "Orb Fah"
}
}
With the help of JSON Helper I get the JSON to a more usable format (for AppleScript).
{|3|:{|name|:"Tri 3"}, |1|:{|name|:"Tri 1"}, |4|:{|name|:"Orb Dave"}, |2|:{|name|:"Tri 2"}, |5|:{|name|:"Orb Fah"}}
I can then use this code to get a list of "lights" the objects in question:
set lights to (every item in theReturn) as list
repeat with n from 1 to count of lights
set light to item n of lights
log n & light
end repeat
From that, I get:
(*1, Tri 3*)
(*2, Tri 1*)
(*3, Orb Dave*)
(*4, Tri 2*)
(*5, Orb Fah*)
You may notice the result is not in the desired order. The index is the index within the list of lights. It's not the number that appears at the top of the object. If you look to the top two pre-formated areas, you'll see the items 1,2 and 3 are Tri 1, Tri 2, and Tri 3. It is correct that Tri 3 comes first, Tri 1 second, and an Orb is third.
What I need to do is find a way to be able to iterate through the JSON in any order (sorted or not) and be able to line up "1" with "Tri 1", "3" with "Tri 3" and "5" with "Orb Fah". But I can't find ANY way to interact with the returned JSON that lets me reference the third light and return it's name. The ONLY way I can seem to be able to do it is to hard code the light indexes, such that:
log |name| of |1| of theReturn
log |name| of |2| of theReturn
log |name| of |3| of theReturn
log |name| of |4| of theReturn
log |name| of |5| of theReturn
which gives me the correct light with the correct name:
(*Tri 1*)
(*Tri 2*)
(*Tri 3*)
(*Orb Dave*)
(*Orb Fah*)
I'm thinking the problem is arising because the light ID doesn't have a descriptor or sorts. That I can't change, but I need to iterate through them programatically. Hard coding them as above is not acceptable.
Any help would be appreciated
You are dealing with a list of records here, not a list of lists. Records are key/value pairs. They do not have indexes like a list. That makes it easy if you know the keys because you just ask for the one you want. And your records have records inside them so you have 2 layers of records. Therefore if you want the value of the |name| record corresponding to |3| record then ask for it as you've discovered...
set jsonRecord to {|3|:{|name|:"Tri 3"}, |1|:{|name|:"Tri 1"}, |4|:{|name|:"Orb Dave"}, |2|:{|name|:"Tri 2"}, |5|:{|name|:"Orb Fah"}}
set record3name to |name| of |3| of jsonRecord
The downside of records in applescript is that there is no command to find the record keys. Other programming languages give you the tools to find the keys (like objective-c) but applescript does not. You have to know them ahead of time and use them as I showed.
If you don't know the keys ahead of time then you can either use JSON Helper to give you the results in a different form or use a different programming language (python, ruby, etc) to extract the information from the records.
One other option you have is to just use the json text itself without using JSON Helper. For example, if you have the json as text then you can extract the information using standard applescript commands for text objects. Your json text has the information you want on the 3rd line, the 6th, 9th etc. You could use that to your advantage and do something like this...
set jsonText to "{
\"1\": {
\"name\": \"Tri 1\"
},
\"2\": {
\"name\": \"Tri 2\"
},
\"3\": {
\"name\": \"Tri 3\"
},
\"4\": {
\"name\": \"Orb Dave\"
},
\"5\": {
\"name\": \"Orb Fah\"
}
}"
set jsonList to paragraphs of jsonText
set namesList to {}
set AppleScript's text item delimiters to ": \""
repeat with i from 3 to count of jsonList by 3
set theseItems to text items of (item i of jsonList)
set end of namesList to text 1 through -2 of (item 2 of theseItems)
end repeat
set AppleScript's text item delimiters to ""
return namesList
For each index, loop through all the items in the list looking for the one whose name matches the index:
tell application "System Events"
-- Convert the JSON file to a property list using plutil.
do shell script "plutil -convert xml1 /Users/mxn/Desktop/tri.json -o /Users/mxn/Desktop/tri.plist"
-- Read in the plist
set theItems to every property list item of property list file "/Users/mxn/Desktop/tri.plist"
set theLights to {}
-- Iterate once per item in the plist.
repeat with i from 1 to count of theItems
set theName to i as text
-- Find the item whose name is the current index.
repeat with theItem in theItems
if theItem's name is theName then
-- We found it, so add it to the results.
set theValue to theItem's value
copy {i, theValue's |name|} to the end of theLights
-- Move on to the next index.
exit repeat
end if
end repeat
end repeat
return theLights
end tell
Result:
{{1, "Tri 1"}, {2, "Tri 2"}, {3, "Tri 3"}, {4, "Orb Dave"}, {5, "Orb Fah"}}
Ideally, instead of the nested loop, we’d be able to say something like this:
set theName to i as text
set theItem to (the first item in theItems whose name is theName)
But unfortunately that produces an error.
This solution also demonstrates an alternative to JSON Helper: you can convert the JSON file to a property list using the handy plutil command line tool and use System Events' built-in support for property lists.