XQuery how to count all the "spells" in JSON - json

I have the following JSON file:
"spells": [
{
"spell":"Aberto",
"effect":"opens objects",
"_id":"5b74ebd5fb6fc0739646754c",
"type":"Charm"
},
{
"spell":"Accio",
"effect":"Summons an object",
"__v":0,
"_id":"5b74ecfa3228320021ab622b",
"type":"Charm"
},
{
"spell":"Age Line",
"effect":"Hides things from younger people",
"__v":0,
"_id":"5b74ed2f3228320021ab622c",
"type":"Enchantment"
},
{
"spell":"Aguamenti",
"effect":"shoots water from wand",
"__v":0,
"_id":"5b74ed453228320021ab622d",
"type":"Charm"
},
{
"spell":"Alarte Ascendare",
"effect":"shoots things high in the air",
"__v":0,
"_id":"5b74ed583228320021ab622e",
"type":"Spell"
}
}
Can you help me how to count all the spells with XQuery where the "type" = "Spell" and separately all the spells where the "Type"= "charm". The JSON file is much bigger, I just don't wanted to paste here the whole file. Thank you.

It seems like a straight-forward grouping and counting then:
declare variable $spell-types as xs:string* external := ('Spell', 'Charm');
for $spell in ?spells?*[?type = $spell-types]
group by $t := $spell?type
return $t || ' : ' || count($spell)
https://xqueryfiddle.liberty-development.net/nc4P6y2
Or, as Michael Kay has pointed out, with a given sequence of values it suffices to use
for $spell-type in $spell-types
return $spell-type || ' : ' || count(?spells?*[?type = $spell-type])
https://xqueryfiddle.liberty-development.net/nc4P6y2/1

Related

how to parse more complex human-oriented text output to machine-friently style?

This is the question about how to parse "unparseable" output into json, or to something easily consumable as json. This is "little" bit behind trivial stuff, so I'd like to know, how do you solve these things in principle, it's not about this specific example only. But example:
We have this command, which shows data about audio inputs:
pacmd list-sink-inputs
it prints something like this:
2 sink input(s) available.
index: 144
driver: <protocol-native.c>
flags:
state: RUNNING
sink: 4 <alsa_output.pci-0000_05_00.0.analog-stereo>
volume: front-left: 15728 / 24% / -37.19 dB, front-right: 15728 / 24% / -37.19 dB
balance 0.00
muted: no
current latency: 70.48 ms
requested latency: 210.00 ms
sample spec: float32le 2ch 44100Hz
channel map: front-left,front-right
Stereo
resample method: copy
module: 13
client: 245 <MPlayer>
properties:
media.name = "UNREAL! Tetris Theme on Violin and Guitar-TnDIRr9C83w.webm"
application.name = "MPlayer"
native-protocol.peer = "UNIX socket client"
native-protocol.version = "32"
application.process.id = "1543"
application.process.user = "mmucha"
application.process.host = "vbDesktop"
application.process.binary = "mplayer"
application.language = "C"
window.x11.display = ":0"
application.process.machine_id = "720184179caa46f0a3ce25156642f7a0"
application.process.session_id = "2"
module-stream-restore.id = "sink-input-by-application-name:MPlayer"
index: 145
driver: <protocol-native.c>
flags:
state: RUNNING
sink: 4 <alsa_output.pci-0000_05_00.0.analog-stereo>
volume: front-left: 24903 / 38% / -25.21 dB, front-right: 24903 / 38% / -25.21 dB
balance 0.00
muted: no
current latency: 70.50 ms
requested latency: 210.00 ms
sample spec: float32le 2ch 48000Hz
channel map: front-left,front-right
Stereo
resample method: speex-float-1
module: 13
client: 251 <MPlayer>
properties:
media.name = "Trombone Shorty At Age 13 - 2nd Line-k9YUi3UhEPQ.webm"
application.name = "MPlayer"
native-protocol.peer = "UNIX socket client"
native-protocol.version = "32"
application.process.id = "2831"
application.process.user = "mmucha"
application.process.host = "vbDesktop"
application.process.binary = "mplayer"
application.language = "C"
window.x11.display = ":0"
application.process.machine_id = "720184179caa46f0a3ce25156642f7a0"
application.process.session_id = "2"
module-stream-restore.id = "sink-input-by-application-name:MPlayer"
very nice. But we don't want to show user all of this, we just want to show index (id of input), application.process.id, application.name and media.name, in some reasonable format. It would be great to parse it somehow to json, but even if I preprocess it somehow, the jq is way beyond my capabilities and quite complex. I tried multiple approaches using jq, with regex or without, but I wasn't able to finish it. And I guess we cannot rely on order or presence of all fields.
I was able to get the work "done", but it's messy, inefficient, and namely expects no semicolons in media name or app name. Not acceptable solution, but this is the only thing I was able to bring to the "end".
incorrect solution:
cat exampleOf2Inputs |
grep -e "index: \|application.process.id = \|application.name = \|media.name = " |
sed "s/^[ \t]*//;s/^\([^=]*\) = /\1: /" |
tr "\n" ";" |
sed "s/$/\n/;s/index:/\nindex:/g" |
tail -n +2 |
while read A; do
index=$(echo $A|sed "s/^index: \([0-9]*\).*/\1/");
pid=$(echo $A|sed 's/^.*application\.process\.id: \"\([0-9]*\)\".*$/\1/');
appname=$(echo $A|sed 's/^.*application\.name: \"\([^;]*\)\".*$/\1/');
medianame=$(echo $A|sed 's/^.*media\.name: \"\([^;]*\)\".*$/\"\1\"/');
echo "pid=$pid index=$index appname=$appname medianame=$medianame";
done
~ I grepped the interessant part, replaced newlines with semicolon, split to multiple lines, and just extract the data multiple times using sed. Crazy.
Here the output is:
pid=1543 index=144 appname=MPlayer medianame="UNREAL! Tetris Theme on Violin and Guitar-TnDIRr9C83w.webm"
pid=2831 index=145 appname=MPlayer medianame="Trombone Shorty At Age 13 - 2nd Line-k9YUi3UhEPQ.webm"
which is easily convertable to any format, but the question was about json, so to:
[
{
"pid": 1543,
"index": 144,
"appname": "MPlayer",
"medianame": "UNREAL! Tetris Theme on Violin and Guitar-TnDIRr9C83w.webm"
},
{
"pid": 2831,
"index": 145,
"appname": "MPlayer",
"medianame": "Trombone Shorty At Age 13 - 2nd Line-k9YUi3UhEPQ.webm"
}
]
Now I'd like to see, please, how are these things done correctly.
If the input is as reasonable as shown in the Q, the following approach that only uses jq should be possible.
An invocation along the following lines is assumed:
jq -nR -f parse.jq input.txt
def parse:
def interpret:
if . == null then .
elif startswith("\"") and endswith("\"")
then .[1:-1]
else tonumber? // .
end;
(capture( "(?<key>[^\t:= ]*)(: | = )(?<value>.*)" ) // null)
| if . then .value = (.value | interpret) else . end
;
# Construct one object for each "segment"
def construct(s):
[ foreach (s, 0) as $kv (null;
if $kv == 0 or $kv.index
then .complete = .accumulator | .accumulator = $kv
else .complete = null | .accumulator += $kv
end;
.complete // empty ) ]
;
construct(inputs | parse | select(.) | {(.key):.value})
| map( {pid: .["application.process.id"],
index,
appname: .["application.name"],
medianame: .["media.name"]} )
With the example input, the output would be:
[
{
"pid": "1543",
"index": 144,
"appname": "MPlayer",
"medianame": "UNREAL! Tetris Theme on Violin and Guitar-TnDIRr9C83w.webm"
},
{
"pid": "2831",
"index": 145,
"appname": "MPlayer",
"medianame": "Trombone Shorty At Age 13 - 2nd Line-k9YUi3UhEPQ.webm"
}
]
Brief explanation
parse parses one line. It assumes that whitespace (blank and tab characters) on each line before the key name can be ignored.
construct is responsible for grouping the lines (presented as a stream of key-value single-key objects) corresponding to a particular value of “index”. It produces an array of objects, one for each value of “index”.
I don't know about "correctly", but this is what I'd do:
pacmd list-sink-inputs | awk '
BEGIN { print "[" }
function print_record() {
if (count++) {
print " {"
printf " %s,\n", print_number("pid")
printf " %s,\n", print_number("index")
printf " %s,\n", print_string("appname")
printf " %s\n", print_string("medianame")
print " },"
}
delete record
}
function print_number(key) { return sprintf("\"%s\": %d", key, record[key]) }
function print_string(key) { return sprintf("\"%s\": \"%s\"", key, record[key]) }
function get_quoted_value() {
if (match($0, /[^"]+"$/))
return substr($0, RSTART, RLENGTH-1)
else
return "?"
}
$1 == "index:" { print_record(); record["index"] = $2 }
$1 == "application.process.id" { record["pid"] = get_quoted_value() }
$1 == "application.name" { record["appname"] = get_quoted_value() }
$1 == "media.name" { record["medianame"] = get_quoted_value() }
END { print_record(); print "]" }
' |
tac | awk '/},$/ && !seen++ {sub(/,$/,"")} 1' | tac
where the tac|awk|tac line removes the trailing comma from the last JSON object in the list.
[
{
"pid": 1543,
"index": 144,
"appname": "MPlayer",
"medianame": "UNREAL! Tetris Theme on Violin and Guitar-TnDIRr9C83w.webm"
},
{
"pid": 2831,
"index": 145,
"appname": "MPlayer",
"medianame": "Trombone Shorty At Age 13 - 2nd Line-k9YUi3UhEPQ.webm"
}
]
You could just pipe your output into:
sed -E '
s/pid=([0-9]+) index=([0-9]+) appname=([^ ]+) medianame=(.*)/{"pid": \1, "index": \2, "appname": "\3", "medianame": \4},/
1s/^/[/
$s/,$/]/
' | jq .

Conditional expression in SnapLogic

I need to check whether or not an entry is present in the data output from a REST call. The JSON output looks something like this:
{
"entity": {
"entries":[
{
"ID": "1",
"Pipeline": "Pipeline_1",
"State":"Completed"
}
],
"duration":1074,
"create_time":"2010-10-10"
}
}
I want to check if for example, Pipeline_1 is missing, then I want the pipeline to print out that 'Pipeline_1 is missing', if not - null. I have tried using the ternary (?) expression:
!$Pipeline.contains ("Pipeline_1") ? "Pipeline_1 is missing" : null && !$Pipeline.contains ("Pipeline_2") ? "Pipeline_2 is missing" : null
I'm having problems with the syntax and I just can't get it right using this method, because it only processes the first query.
I have also tried using the match method, but haven't had success with it either:
match $Pipeline {
$Pipeline!=("Pipeline_1") => 'Pipeline_1 is missing',
$Pipeline!=("Pipeline_2") => 'Pipeline_2 is missing',
_ => 'All of the pipelines have been executed successfully'
}
I have to check for multiple conditions. Any suggestions on how I should nest the conditional expressions? Thank you in advance.
Assuming that you are not splitting the array $entity.entries[*] and processing the incoming document as is, following is a possible solution.
Test Pipeline:
Input:
{
"entity": {
"entries": [
{
"ID": "1",
"Pipeline": "Pipeline_1",
"State": "Completed"
}
],
"duration": 1074,
"create_time": "2010-10-10"
}
}
Expression:
{
"Pipeline_1": $entity.entries.reduce((a, c) => c.Pipeline == "Pipeline_1" || a, false),
"Pipeline_2": $entity.entries.reduce((a, c) => c.Pipeline == "Pipeline_2" || a, false)
}.values().reduce((a, c) => c && a, true) ? "All pipelines executed successfully" : "Pipeline(s) missing"
Output:
If you don't want to do it in a single expression, then you can use a Conditional snap like as follows.
Following is the output of the Conditional snap.
Then you can process it as you please.

Lua json schema validator

I have been looking for over 4 days now but I havent been able to find much support on code for lua based json schema compiler. Mainly I have been dealing with
ljsonschema (https://github.com/jdesgats/ljsonschema)
rjson (https://luarocks.org/modules/romaboy/rjson)
But either of the above have not been straight forward to use.
After dealing with issues on the luarocks, I finally got ljsonschema working but the JSON syntax looks different than normal JSON structure - For ex: equals in place of semi colon, no double quotes for key names etc.
ljsonschema supports
{ type = 'object', properties = {
foo = { type = 'string' },
bar = { type = 'number' },},}
I require :
{ "type" : "object",
"properties" : {
"foo" : { "type" : "string" },
"bar" : { "type" : "number" }}}
With rjson there is an issue with the installation location itself. Though the installation goes fine, it is never able to find the .so file while running the lua code. Plus there is not much development support that I could find.
Please help point in the right direction, in case I am missing something.
I have the json schema & a sample json, I just need a lua code to help write a program around it.
This is to write a custom JSON Validation Plugin for Kong CE.
UPDATED:
I would like the below code to work with ljsonschema:
local jsonschema = require 'jsonschema'
-- Note: do cache the result of schema compilation as this is a quite
-- expensive process
local myvalidator = jsonschema.generate_validator{
"type" : "object",
"properties" : {
"foo" : { "type" : "string" },
"bar" : { "type" : "number" }
}
}
print(myvalidator { "foo":"hello", "bar":42 })
But I get the error : '}' expected (to close '{' at line 5) near ':'
it looks like the argument to generate_validator and myvalidator are lua tables, not raw json strings. You'll want to parse the json first:
> jsonschema = require 'jsonschema'
> dkjson = require('dkjson')
> schema = [[
>> { "type" : "object",
>> "properties" : {
>> "foo" : { "type" : "string" },
>> "bar" : { "type" : "number" }}}
>> ]]
> s = dkjson.decode(schema)
> myvalidator = jsonschema.generate_validator(s)
>
> json = '{ "foo": "bar", "bar": 42 }'
> print(myvalidator(json))
false wrong type: expected object, got string
> print(myvalidator(dkjson.decode(json)))
true
Ok, I think rapidjason came to be helpful:
Refer the link
Here is a sample working code :
local rapidjson = require('rapidjson')
function readAll(file)
local f = assert(io.open(file, "rb"))
local content = f:read("*all")
f:close()
return content
end
local jsonContent = readAll("sampleJson.txt")
local sampleSchema = readAll("sampleSchema.txt")
local sd = rapidjson.SchemaDocument(sampleSchema)
local validator = rapidjson.SchemaValidator(sd)
local d = rapidjson.Document(jsonContent)
local ok, message = validator:validate(d)
if ok then
print("json OK")
else
print(message)
end

PYTHON3 | Parsing JSON | index out of range | can't access to 1st column of an array

I have this JSON (I don't give you the whole thing because it's freaking long but you don't need the rest.)
cve" : {
"data_type" : "CVE",
"data_format" : "MITRE",
"data_version" : "4.0",
"CVE_data_meta" : {
"ID" : "CVE-2018-9991",
"ASSIGNER" : "cve#mitre.org"
},
"affects" : {
"vendor" : {
"vendor_data" : [ {
"vendor_name" : "frog_cms_project",
"product" : {
"product_data" : [ {
"product_name" : "frog_cms",
"version" : {
"version_data" : [ {
"version_value" : "0.9.5"
} ]
}
} ]
}
} ]
}
},
What I want to do is to print the vendor name of this cve.
So, what I did is :
with open("nvdcve-1.0-2018.json", "r") as file:
data = json.load(file)
increment = 0
number_cve = data["CVE_data_numberOfCVEs"]
while increment < int(number_cve):
print (data['CVE_Items'][increment]['cve']['CVE_data_meta']['ID'])
print (',')
print (data['CVE_Items'][increment]['cve']['affects']['vendor']['vendor_data'][0]['vendor_name'])
print ("\n")
increment +=
The reason I did a while is because in the JSON file, there is a lot of CVEs, this is why I did data['CVE_Items'][increment]['cve'] (and this part works fine, the line `print (data['CVE_Items'][increment]['cve']['CVE_data_meta']['ID'] is working well).
My error is in the print (data['CVE_Items'][increment]['cve']['affects']['vendor']['vendor_data'][0]['vendor_name']) line, python returns a list index out of range error.
But if I'm reading this JSON well, vendor_data is an array of 1 column so vendor_name is the ['vendor_data'][0]['vendor_name'] isn't it ?
The only way to parse the vendor_name i found is :
for value in data['CVE_Items'][a]['cve']['affects']['vendor']['vendor_data']:
print (value['vendor_name'])
instead of print (data['CVE_Items'][increment]['cve']['affects']['vendor']['vendor_data'][0]['vendor_name'])
And doing a for just for one iteration is pretty disgusting :s, but at least, value is the data['CVE_Items'][a]['cve']['affects']['vendor']['vendor_data'][0] that I wanted....
Anyone knows something about it ?
Make sure every CVE_Item has an vender_data.
Example:
with open("nvdcve-1.0-2018.json", "r") as file:
data = json.load(file)
increment = 0
number_cve = data["CVE_data_numberOfCVEs"]
while increment < int(number_cve):
print (data['CVE_Items'][increment]['cve']['CVE_data_meta']['ID'])
print (',')
if (len(data['CVE_Items'][increment]['cve']['affects']['vendor']['vendor_data']) > 0) :
print (data['CVE_Items'][increment]['cve']['affects']['vendor']['vendor_data'][0]['vendor_name'])
print ("\n")
increment +=
Thanks to Ron Nabuurs' answer i found that all my vendor_data does not always have a vendor_name. So it is why the for works and not the print.
(the for check if the object is non null, else it stops).
So what I did is :
try:
print (data['CVE_Items'][increment]['cve']['affects']['vendor']['vendor_data'][0]['vendor_name'])
print (',')
except:
pass

Parsing JSON from Google Distance Matrix API with Corona SDK

So I'm trying to pull data from a JSON string (as seen below). When I decode the JSON using the code below, and then attempt to index the duration text, I get a nil return. I have tried everything and nothing seems to work.
Here is the Google Distance Matrix API JSON:
{
"destination_addresses" : [ "San Francisco, CA, USA" ],
"origin_addresses" : [ "Seattle, WA, USA" ],
"rows" : [
{
"elements" : [
{
"distance" : {
"text" : "1,299 km",
"value" : 1299026
},
"duration" : {
"text" : "12 hours 18 mins",
"value" : 44303
},
"status" : "OK"
}]
}],
"status" : "OK"
}
And here is my code:
local json = require ("json")
local http = require("socket.http")
local myNewData1 = {}
local SaveData1 = function (event)
distanceReturn = ""
distance = ""
local URL1 = "http://maps.googleapis.com/maps/api/distancematrix/json?origins=Seattle&destinations=San+Francisco&mode=driving&&sensor=false"
local response1 = http.request(URL1)
local data2 = json.decode(response1)
if response1 == nil then
native.showAlert( "Data is nill", { "OK"})
print("Error1")
distanceReturn = "Error1"
elseif data2 == nill then
distanceReturn = "Error2"
native.showAlert( "Data is nill", { "OK"})
print("Error2")
else
for i = 1, #data2 do
print("Working")
print(data2[i].rows)
for j = 1, #data2[i].rows, 1 do
print("\t" .. data2[i].rows[j])
for k = 1, #data2[i].rows[k].elements, 1 do
print("\t" .. data2[i].rows[j].elements[k])
for g = 1, #data2[i].rows[k].elements[k].duration, 1 do
print("\t" .. data2[i].rows[k].elements[k].duration[g])
for f = 1, #data2[i].rows[k].elements[k].duration[g].text, 1 do
print("\t" .. data2[i].rows[k].elements[k].duration[g].text)
distance = data2[i].rows[k].elements[k].duration[g].text
distanceReturn = data2[i].rows[k].elements[k].duration[g].text
end
end
end
end
end
end
timer.performWithDelay (100, SaveData1, 999999)
Your loops are not correct. Try this shorter solution.
Replace all your "for i = 1, #data2 do" loop for this one below:
print("Working")
for i,row in ipairs(data2.rows) do
for j,element in ipairs(row.elements) do
print(element.duration.text)
end
end
This question was solved on Corona Forums by Rob Miracle (http://forums.coronalabs.com/topic/47319-parsing-json-from-google-distance-matrix-api/?hl=print_r#entry244400). The solution is simple:
"JSON and Lua tables are almost identical data structures. In this case your table data2 has top level entries:
data2.destination_addresses
data2.origin_addresses
data2.rows
data2.status
Now data2.rows is another table that is indexed by numbers (the [] brackets) but here is only one of them, but its still an array entry:
data.rows[1]
Then inside of it is another numerically indexed table called elements.
So far to get to the element they are (again there is only one of them
data2.rows[1].elements[1]
then it's just accessing the remaining elements:
data2.rows[1].elements[1].distance.text
data2.rows[1].elements[1].distance.value
data2.rows[1].elements[1].duration.text
data2.rows[1].elements[1].duration.value
There is a great table printing function called print_r which can be found in the community code which is great for dumping tables like this to see their structure."