Mike Bostock's ndjson-split discards data? - json

I've gone through Mike Bostock's excellent tutorials on Command-Line Cartography and I'm confused by his use of his ndjson-split utility. That program is used to split up an array of objects in a json file, putting each object in the array on a single line. (Reference: https://github.com/mbostock/ndjson-cli)
In Part Two of the tutorial (https://medium.com/#mbostock/command-line-cartography-part-2-c3a82c5c0f3#.624i8b4iy) Mike uses ndjson-split on a geojson file:
ndjson-split 'd.features' \
< ca-albers.json \
> ca-albers.ndjson
He explains:
The output here looks underwhelmingly similar to the ca-albers.json we
saw previously; the only difference is that there is one feature (one
census tract) per line.
However, it seems there is another big difference. The new file does not contain all of the data that was in the original file. Specifically, the start of the original JSON object, {"type":"FeatureCollection" ... is gone.
Mike doesn't explain why this additional key is not needed in the geojson file (the resulting files work perfectly).
Anyone know why? Is this key not needed for valid geoJSON?

Related

Is it possible to extract from a map JSON file a list of a city's neighborhoods in a tree-structure format?

Forgive my ignorance, I am not experienced with JSON files. I've been trying to get a tree structure list of all the neighborhoods and locations in the city of Cape Town and this seems to be my last resort.
Unfortunately, I can't even open the file that can be found on this website - http://odp.capetown.gov.za/datasets/official-suburbs?geometry=18.107%2C-34.187%2C19.034%2C-33.988
Could someone tell me if it's possible to extract such as list.
I'd be forever thankful if someone could help me. Thank you in advance
[I am making my comments an answer since I see no other suggestions and no information provided]
I am on a unix/linux shell but the following tools can also be found for windows. My solution for getting a quick list would be:
curl https://opendata.arcgis.com/datasets/8ebcd15badfe40a4ab759682aacf8439_75.geojson |\
jq '.features | .[] | .properties.OFC_SBRB_NAME'
Which gives you:
"HYDE PARK"
"SPRINGFIELD"
"NIEUW MAASTRECHT-2"
"CHARLESVILLE"
"WILDWOOD"
"MALIBU VILLAGE"
"TUSCANY GLEN"
"VICTORIA MXENGE"
"KHAYELITSHA"
"CASTLE ROCK"
"MANSFIELD INDUSTRIA"
...
Explanation:
curl https://... - curl downloads the JSON file from the API you are using
jq: can process JSON on terminal and extract information. I do this in three steps:
.features: GeoJSON format seems to have a standard schema. All the retuned entries are in features array
.[] returns all elements in the array docs here
.properties.OFC_SBRB_NAME: Each element of the array has a field called "properties" which from my understanding carries/includes metadata of this entry. One of those properties in OFC_SBRB_NAME which looks like a name and is the only string in each element. Thus I extract this.
Hope it helps. If you add more detail as to which platform you are using or language, etc I can update the answer, however, the methodology should remain the same I think

Does anyone have a script to convert a Chrome Bookmarks file with [sub]*folders into a CVS file?

I want to be able to do Vimdiffs and Vimfolds on Bookmarks files that have been converted to CVS files ie with one description and one uri per line. However, because the Bookmarks file has multiple levels for the folders, the CSV file will also need fields for the different levels of folder names on each line.
I am new to jq but it seems like it should be able to do this sort of conversion?
Thanks,
Phil.
Have you tried to use any free tools like: https://json-csv.com/
or json2csv: https://www.npmjs.com/package/json2csv
If neither of those works, perhaps this approach.
When I need to reconstruct data I write a set of loops that identify each property I want for each line in my CSV. Let's say my JSON has Name, Email, Phone but for some reason all are at different object levels in my JSON.
First right a loop that resolves Name, then a loop for Email, and one for Phone. At the end of the first loop call the second, and from the second call the third.
Then you can use jq -n which allows to create JSON with no input.
So your CSV output would be like jq -n '{NewName: .["'$Name'"]}'
once you have a clean JSON with all data points at the same level CSV conversion is smooth.
Hope this helps

Oracle SQLcl: Spool to json, only include content in items array?

I'm making a query via Oracle SQLcl. I am spooling into a .json file.
The correct data is presented from the query, but the format is strange.
Starting off as:
SET ENCODING UTF-8
SET SQLFORMAT JSON
SPOOL content.json
Follwed by a query, produces a JSON file as requested.
However, how do I remove the outer structure, meaning this part:
{"results":[{"columns":[{"name":"ID","type":"NUMBER"},
{"name":"LANGUAGE","type":"VARCHAR2"},{"name":"LOCATION","type":"VARCHAR2"},{"name":"NAME","type":"VARCHAR2"}],"items": [
// Here is the actual data I want to see in the file exclusively
]
I only want to spool everything in the items array, not including that key itself.
Is this possible to set as a parameter before querying? Reading the Oracle docs have not yielded any answers, hence asking here.
Thats how I handle this.
After output to some file, I use jq command to recreate the file with only the items
ssh cat file.json | jq --compact-output --raw-output '.results[0].items' > items.json
`
Using this library = https://stedolan.github.io/jq/

Parsing json output for hive

I need to automatically move new cases (TheHive-Project) to LimeSurvey every 5 minutes. I have figured out the basis of the API script to add responses to LimeSurvey. However, I can't figure out how to add only new cases, and how to parse the Hive case data for the information I want to add.
So far I've been using curl to get a list of cases from hive. The following is the command and the output.
curl -su user:pass http://myhiveIPaddress:9000/api/case
[{"createdBy":"charlie","owner":"charlie","createdAt":1498749369897,"startDate":1498749300000,"title":"test","caseId":1,"user":"charlie","status":"Open","description":"testtest","tlp":2,"tags":[],"flag":false,"severity":1,"metrics":{"Time for Alert to Handler Pickup":2,"Time from open to close":4,"Time from compromise to discovery":6},"updatedBy":"charlie","updatedAt":1498751817577,"id":"AVz0bH7yqaVU6WeZlx3w","_type":"case"},{"createdBy":"charlie","owner":"charlie","title":"testtest","caseId":3,"description":"ddd","user":"charlie","status":"Open","createdAt":1499446483328,"startDate":1499446440000,"severity":2,"tlp":2,"tags":[],"flag":false,"id":"AV0d-Z0DqHSVxnJ8z_HI","_type":"case"},{"createdBy":"charlie","owner":"charlie","createdAt":1499268177619,"title":"test test","user":"charlie","status":"Open","caseId":2,"startDate":1499268120000,"tlp":2,"tags":[],"flag":false,"description":"s","severity":1,"metrics":{"Time from open to close":2,"Time for Alert to Handler Pickup":3,"Time from compromise to discovery":null},"updatedBy":"charlie","updatedAt":1499268203235,"id":"AV0TWOIinKQtYP_yBYgG","_type":"case"}]
Each field is separated by the delimiter },{.
In regards to parsing out specific information from each case, I previously tried to just use the cut command. This mostly worked until I reached "metrics"; it doesn't always work for metrics because they will not always be listed in the same order.
I have asked my boss for help, and he told me this command might get me going in the right direction to adding only new hive cases to the survey, but I'm still very lost and want to avoid asking too much again.
curl -su user:pass http://myhiveIPaddress:9000/api/case | sed 's/},{/\n/g' | sed 's/\[{//g' | sed 's/}]//g' | awk -F '"caseId":' {'print $2'} | cut -f 1 -d , | sort -n | while read line; do echo '"caseId":'$line; done
Basically, I'm in way over my head and feel like I have no idea what I'm doing. If I need to clarify anything, or if it would help for me to post what I have so far in my API script, please let me know.
Update
Here is the potential logic for the script I'd like to write.
get list of hive cases (curl ...)
read each field, delimited by },{
while read each field, check /tmp/addedHiveCases to see if caseId of field already exists
--> if it does not exist in file, add case to limesurvey and add caseId to /tmp/addedHiveCases
--> if it does exist, skip to next field
why are you thinking that the fields are separated by a "},{" delimiter?
The response of the /api/case API is a valid JSON format, that lists the cases.
Can you use a Python script to play with the API? If yes, I can help you write the script you need.

Apache Nifi - store lines into 1 file

Using Apache Nifi, I created a flow that read a Json file and splits it line by line in order to verify if the content is correct. After that I have 2 outputs: 1 - for successful line and 2-for unsuccessful ones and the output is a Json file.
For the moment, all the lines are stored into separate files but what I want to do is to store each "good" line into 1 file and each "bad" one in another.
What processor should I use?
The RouteText processor was designed for exactly what you are trying to do. It allows you to route lines of text to different relationships based on expressions you create. It bundles the lines from each FlowFile together for each relationship.
You can see the documentation for it here: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.RouteText/index.html
You can get an example template (doing almost exactly what you would like to do) using RouteText here: https://github.com/hortonworks-gallery/nifi-templates/blob/master/templates/SplitRouteMergeVsRouteText.xml