Retrieve value from object in Javascript in XPATH - html

I need to extract information from HTML files. For most of them, I just need to match a particular DOM element's content or attribute, so I use XPATH expressions like //a[#class="targeturl"]/#href and the command line tool xidel.
In a different batch of files the information I want is in a script, not so readily available:
<html>
<head><!-- ... --></head>
<body>
...
<script>
...
var o = {
"numeric": 1234,
"target": "TARGET",
"urls": "http://example.com",
// Commented pair "strings": "...",
"arrays": [
{
"more": true
}
,
{
"itgoeson": true
}
]
};
</script>
...
</body>
</html>
Note that the object containing the value I want to get is not valid JSON. However, it seems to respect one key-value pair per line.
What can I pass to xidel --xpath "???" to get this TARGET?
I've tried different thing with XPATH functions but I can't get to a solution without piping to other commands (match tells me yes/no, replace works line by line..., etc).

Try to implement below XPath:
substring-before(substring-after(//script, '"target": '), ",")

What can I pass to xidel --xpath "???" to get this TARGET?
Since var o is actually JSON, I suggest you treat it as such:
-e "json(
//script/extract(
.,
'var o = (.+);',
1,'s'
)[.]
)/target"
Extract {"field1": 1234, "target": "TARGET", "morefields": "..."} from the <script> element node (the json covers several lines, so don't forget the 's' regex-flag).
Interpret the output as json by wrapping json( ) around it (or //script/...[.] ! json(.)) and select the target attribute.
[edit]
To remove the comments (beginning with //):
-e "json(
//script/replace(
extract(
.,
'var o = (.+);',
1,'s'
)[.],
'\s+//.+',
''
)
)/target"
Not the most prettiest query, but it works.
[/edit]

Related

jq - Selecting objects containing certain key

Let's say I have this JSON file below:
{
"team": {
"money": 100,
},
"group": {
"money": 200,
"snack": true,
}
}
I want to select the objects which has a "snack" key including its parent. The current command I'm using is:
jq '..|objects|select(has("snack"))' json
This however, does not include the parent, which in this case is "group". How do I select the parent of the selected object as well?
Instead of using .., you could use paths. That is, you'd select the paths that lead to the items of interest, and work from there. So you'd start with:
paths(objects) as $p
| select(getpath($p)|has("snack"))
| $p
For the given input (after having been corrected), this would yield:
["group"]
So you might want to replace the $p in the last line by $p[-1], but it's not altogether clear how useful that would be. More useful would be getpath( $p[:-1] )

Python: create json query

I'm trying to get python to create a json formatted like :
[
{
"machine_working": true
},
{
"MachineName": "TBL165-169",
"MachineType": "Rig Test"
}
]
However, i can seam to do it, this is the code i have currently but its giving me error
this_is_a_dict_too=[]
this_is_a_dict_too = dict(State="on",dict(MachineType="machinetype1",MachineName="MachineType2"))
File "c:\printjson.py", line 40
this_is_a_dict_too = dict(Statedsf="test",dict(MachineType="Rig Test",MachineName="TBL165-169")) SyntaxError: non-keyword arg after
keyword arg
this_is_a_dict_too = [dict(machine_working=True),dict(MachineType="machinetype1",MachineName="MachineType2")]
print(this_is_a_dict_too)
You are trying to make dictionary in dictionary, the error message say that you try to add element without name (corresponding key)
dict(a='b', b=dict(state='on'))
will work, but
dict(a='b', dict(state='on'))
won't.
The thing that you presented is list, so you can use
list((dict(a='b'), dict(b='a')))
Note that example above use two dictionaries packed into tuple.
or
[ dict(a='b'), dict(b='a') ]

How to Convert a list of tuples into a Json string

I have a Erlang list of tuples as follows:
[ {{"a"},[2],[{3,"b"},{4,"c"}],[5,"d"],[1,1],{e},["f"]} ,
{{"g"},[3],[{6,"h"},{7,"i"}],[{8,"j"}],[1,1,1],{k},["L"]} ]
I wanted this list of tuples in this form:
<<" [ {{"a"},[2],[{3,"b"},{4,"c"}],[5,"d"],[1,1],{e},["f"]} ,
{{"g"},[3],[{6,"h"},{7,"i"}],[{8,"j"}],[1,1,1],{k},["L"]}] ">>
So I tried using JSON parsing libraries in erlang (both jiffy and jsx )
Here is what I did:
A=[ {{"a"},[2],[{3,"b"},{4,"c"}],[5,"d"],[1,1],{e},["f"]} ,
{{"g"},[3],[{6,"h"},{7,"i"}],[{8,"j"}],[1,1,1],{k},["L"]} ],
B=erlang:iolist_to_binary(io_lib:write(A)),
jsx:encode(B).
and I get the following output(here I have changed the list to binary since jsx accepts binary):
<<"[{{[97]},[2],[{3,[98]},{4,[99]}],[5,[100]],[1,1],{e},[[102]]},{{[103]},
[3],[{6,[104]},{7,[105]}],[{8,[106]}],[1,1,1],{k},[[76]]}]">>
jiffy:encode(B) also gives the same output.
Can anyone help me to get the output as :
<<" [ {{"a"},[2],[{3,"b"},{4,"c"}],[5,"d"],[1,1],{e},["f"]} ,
{{"g"},[3],[{6,"h"},{7,"i"}],[{8,"j"}],[1,1,1],{k},["L"]}] ">>
instead of
<<"[{{[97]},[2],[{3,[98]},{4,[99]}],[5,[100]],[1,1],{e},[[102]]},{{[103]},
[3],[{6,[104]},{7,[105]}],[{8,[106]}],[1,1,1],{k},[[76]]}]">>
Thank you in advance
Instead of io_lib:write(A), use io_lib:format("~p", [A]). It tries to guess which lists are actually meant to be strings. (In Erlang, strings are actually lists of integers. Try it: "A" == [65])
> A=[ {{"a"},[2],[{3,"b"},{4,"c"}],[5,"d"],[1,1],{e},["f"]} ,
{{"g"},[3],[{6,"h"},{7,"i"}],[{8,"j"}],[1,1,1],{k},["L"]} ].
[{{"a"},[2],[{3,"b"},{4,"c"}],[5,"d"],[1,1],{e},["f"]},
{{"g"},[3],[{6,"h"},{7,"i"}],[{8,"j"}],[1,1,1],{k},["L"]}]
> B = erlang:iolist_to_binary(io_lib:format("~p", [A])).
<<"[{{\"a\"},[2],[{3,\"b\"},{4,\"c\"}],[5,\"d\"],[1,1],{e},[\"f\"]},\n {{\"g\"},[3],[{6,\"h\"},{7,\"i\"}],[{8,\"j\"}],[1,1,1],{k},[\"L\"]}]">>
If you don't want to see the backslashes before the double quotes, you can print the string to standard output:
> io:format("~s\n", [B]).
[{{"a"},[2],[{3,"b"},{4,"c"}],[5,"d"],[1,1],{e},["f"]},
{{"g"},[3],[{6,"h"},{7,"i"}],[{8,"j"}],[1,1,1],{k},["L"]}]
<<" [ {{"a"},[2],[{3,"b"},{4,"c"}],[5,"d"],[1,1],{e},["f"]} ,
{{"g"},[3],[{6,"h"},{7,"i"}],[{8,"j"}],[1,1,1],{k},["L"]}] ">>
This ^^ isn't a valid erlang term, but I think what you're getting at is that you want the "listy" strings, like "a" to be printed out like "a" instead of [97]. Unfortunately, I've found this to be a serious shortcoming of Erlang. The problem is that the string literal "a" is only syntactic sugar and is identical to the term [97], so any time you output it, you're subject to the vagaries of "is this thing a string or a list of integers?" The best way I know to get out of that is to use binaries as your strings wherever possible, like <<"a">> instead of "a".

Notepad++: What is the "opposite" format of JSFormat?

I'm looking for the "opposite" Format of JSFormat from the JSTools. Here an example:
JSON code example:
title = Automatic at 07.02.17 & appId = ID_1 & data = {
"base": "+:background1,background2",
"content": [{
"appTitle": "Soil",
"service": {
"serviceType": "AG",
"Url": "http://test.de/xxx"
},
"opacity": "1"]
}
],
"center": "4544320.372869264,5469450.086030475,31468"
}
& context = PARAMETERS
and I Need to convert the Format to the following format:
title=Automatic at 07.02.17 &appId=ID_1&data={"base":"+:background1,background2","content":[{"appTitle":"Soil","service":{"serviceType":"AG","Url":"http://test.de/xxx"},"opacity":"1"]}],"center":"4544320.372869264,5469450.086030475,31468"}&context=PARAMETERS
which is a decoded URL (with MIME Tools) from this html POST:
title%3DAutomatic%20at%2007.02.17%20%26appId%3DID_1%26data%3D%7B%22base%22%3A%22+%3Abackground1,background2%22,%22content%22%3A%5B%7B%22appTitle%22%3A%22Soil%22,%22service%22%3A%7B%22serviceType%22%3A%22AG%22,%22Url%22%3A%22http%3A%2F%2Ftest.de%2Fxxx%22%7D,%22opacity%22%3A%221%22%5D%7D%5D,%22center%22%3A%224544320.372869264,5469450.086030475,31468%22%7D%26context%3DPARAMETERS%0D%0A
which I have to come back after doing changes in the JSON code. From the second to the third Format I can use URL encode (MIME Tools), but what about the reformating from the first to the second Format.
My question: Do you have ideas how to turn the first (JSON) Format into the second (decoded URL) in Notepad++? Something like the "opposite" of JSFormat?
If I understand correctly you basically need to put your JSON on a single line removing new lines and spaces.
This should be achieved with these steps:
CTRL + H to replace occurrences of more than one space with empty string using this regex: [ ]{2,} (remember to select "Regular expression" radiobutton). If this is not exactly what you want you can adjust the regular expression to achieve desired output
select all your JSON CTRL + A
put everything on a single line with join CTRL + J
You can also record a macro to automate this process and run it with a keyboard shortcut.

How to parse multidimensional JSON array in bash using jsawk?

I have an array like below. I want to parse entire data to my bash array.
So i can call the first "JSON addressLineOne" from ${bashaddr[0]}, and etc.
[
{
"id":"f0c546d5-0ce4-55ee-e043-516e0f0afdc1",
"cardType":"WMUSGESTORECARD",
"lastFour":"1682",
"cardExpiryDate":"2012-01-16",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Apt venue",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"534534",
"isDefault":false
},
{
"id":"f0c546d5-0ce0-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"2731",
"cardExpiryDate":"2009-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"119 maple ave.",
"addressLineTwo":"",
"city":"uncasville",
"state":"CT",
"postalCode":"06382",
"phone":"7676456",
"isDefault":false
},
{
"id":"f0c546d5-0ce2-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"6025",
"cardExpiryDate":"2011-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Angeline Street",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"7867876",
"isDefault":false
}
]
I have tried like this:
#!/bin/bash
addressLineOne="$(echo $card | jsawk 'return this.addressLineOne')"
but it gives me the entire address:
["address 1","address 2","address 3"]
Thank you.
I wrote the answer below before reading the comments, but this is exactly the same answer as #4ae1e1 provided, except I don't put -r tag in case you want the values to remain quoted (e.g. passing this as an argument somewhere else).
I know this is not jsawk, but do consider jq:
jq '.[].addressLineOne' yourfile.txt
And to access specific values you can put record number in the square brackets (starting with 0 for the first address and so on). For example to get the address for the third record:
jq '.[2].addressLineOne' yourfile.txt
For learning more about jq and advanced uses, check: http://jqplay.org
What you need to do is make use of the -a switch to apply some post processing and filter the output array like this:
jsawk 'return this.addressLineOne' -a 'return this[0]'
From the documentation:
-b <script> | -a <script>
Run the specified snippet of JavaScript before (-b) or after (-a)
processing JSON input. The `this` object is set to the whole JSON
array or object. This is used to preprocess (-b) or postprocess
(-a) the JSON array before or after the main script is applied.
This option can be specified multiple times to define multiple
before/after scripts, which will be applied in the order they
appeared on the command line.