I am trying to used the gson library to parse a json file.I want to get a list of names and URLs of all states within a JSON.I am not able to understand the structure of the json object and how to retrieve this data,since any structure i create is returning null values . The sample structure of the JSON is
{
"states" : {
"state53" : {
"name" : "state53",
"url" : "http://cv4a.org/veterans-group-calls-accountability-va-funds-boost/",
"candidateElements" : [ {
"top" : 202,
"left" : 58,
"xpath" : "/HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[2]/DIV[1]/ARTICLE[1]/HEADER[1]/P[1]/A[1]",
"width" : 135,
"height" : 20
}, {
"top" : 1307,
"left" : 225,
"xpath" : "/HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[2]/DIV[1]/OL[1]/LI[1]/ARTICLE[1]/HEADER[1]/TIME[1]/A[1]",
"width" : 191,
"height" : 22
}, {
"top" : 1374,
"left" : 912,
"xpath" : "/HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[2]/DIV[1]/OL[1]/LI[1]/ARTICLE[1]/A[1]",
"width" : 78,
"height" : 38
}, {
"top" : 0,
"left" : 0,
"xpath" : "/HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[2]/DIV[1]/SECTION[1]/DIV[1]/P[1]/A[1]",
"width" : 169,
"height" : 18
} ],
"fanIn" : 1,
"fanOut" : 3,
"id" : 53,
"failedEvents" : [ "xpath /HTML[1]/BODY[1]/DIV[2]/DIV[1]/DIV[2]/DIV[1]/SECTION[1]/DIV[1]/P[1]/A[1]" ]
},
"state9" : {
"name" : "state9",
"url" : "http://cv4a.org/blog/#",
"candidateElements" : [ ],
"fanIn" : 1,
"fanOut" : 0,
"id" : 9,
"failedEvents" : [ ]
},
public static void main(String[] args) {
JsonElement jsonElement = new JsonParser().parse(jsonString);
JsonObject statesObj = jsonElement.getAsJsonObject();
statesObj = statesObj.getAsJsonObject("states");
final Set<Map.Entry<String, JsonElement>> statesEntries = statesObj.entrySet();
for (Map.Entry<String, JsonElement> state : statesEntries) {
JsonObject stateObj = state.getValue().getAsJsonObject();
String name = stateObj.get("name").getAsString();
//....
}
}
Or you can create classes (like State, CandidateElement) with fields (name, url, e.t.c) and use auto serialization/deserialization. See documentation
Related
I'm trying to retrieve the data from this dictionary and for some reason I cannot seem to acquire it. I'm new to parsing JSON so apologies if this is rough.
let temp = json["list"].arrayValue.map({$0["main"].dictionaryValue})
print(temp[0])
Here I am setting a value equal to the dictionary from the JSON. However, I know I need to add the key's value that I'm searching for. To be clear, I am searching for the "temp" key which in the example is equal to 28.19999...
Here is an example of the JSON:
"list" : [
{
"dt" : 1641524400,
"main" : {
"humidity" : 68,
"sea_level" : 1014,
"temp_max" : 29.260000000000002,
"feels_like" : 28.199999999999999,
"temp_min" : 28.199999999999999,
"grnd_level" : 1004,
"temp" : 28.199999999999999,
"temp_kf" : -0.58999999999999997,
"pressure" : 1014
},{
"dt" : 1641546000,
"main" : {
"pressure" : 1009,
"feels_like" : 20.93,
"temp_max" : 27.100000000000001,
"temp" : 27.100000000000001,
"humidity" : 83,
"grnd_level" : 999,
"sea_level" : 1009,
"temp_min" : 27.100000000000001,
"temp_kf" : 0
},
"sys" : {
"pod" : "n"
},
"pop" : 0.41999999999999998,
"wind" : {
"deg" : 354,
"speed" : 5.4100000000000001,
"gust" : 10.58
},
"visibility" : 6695,
"weather" : [
{
"main" : "Snow",
"id" : 600,
"description" : "light snow",
"icon" : "13n"
}
],
"snow" : {
"3h" : 0.26000000000000001
},
"clouds" : {
"all" : 100
},
"dt_txt" : "2022-01-07 09:00:00"
},
{
"dt" : 1641556800,
"main" : {
"temp_min" : 26.82,
"humidity" : 90,
"pressure" : 1008,
"temp_kf" : 0,
"temp" : 26.82,
"feels_like" : 18.879999999999999,
"sea_level" : 1008,
"temp_max" : 26.82,
"grnd_level" : 998
},
"sys" : {
"pod" : "n"
},
"pop" : 0.97999999999999998,
"wind" : {
"deg" : 310,
"gust" : 14.359999999999999,
"speed" : 7.5199999999999996
}]
Found my answer:
let temp = json["list"].arrayValue.map({$0["main"]["temp"].stringValue})
I'm trying to scrape Amazon's Goldbox page by trying to extract the JSON object responsible for the deal details (dealdetails).
I've tried to extract all the JSON within the 40th script tag, however I came out with 15000 lines of code
The JSON within the page is like this:
<script type="text/javascript">(function(f) {var _np=(window.P._namespace("GoldboxMobileMason"));if(_np.guardFatal){_np.guardFatal(f)(_np);}else{f(_np);}}(function(P) {
window.gb = window.gb || {};
{
"GDS" : {
"baseRetryInterval" : 4000,
"maxRetries" : 0,
"ajaxTimeout" : 10000
}
},
{
"GD" : {
"baseRetryInterval" : 4000,
"maxRetries" : 1,
"ajaxTimeout" : 10000
}
},
{
"WD" : {
"baseRetryInterval" : 4000,
"maxRetries" : 0,
"ajaxTimeout" : 10000
}
}
"dealDetails" : {
"3b009cf9" : {
"egressUrl" : "https://www.amazon.com/Meredith-Martha-Stewart-Living/dp/B002PXW0EO",
"maxDealPrice" : "5.49",
"offerID" : 000
"maxPrevPrice" : "5.49",
"minBAmount" : "49.9",
"itemType" : "SINGLE_ITEM",
"minPercentOff" : 89,
"items" : [
]
},
"f87c994b" : {
"egressUrl" : "https://www.amazon.com/s/?url=search-
"reviewAsin" : "B073VYKTZN",
"maxListPrice" : "159.99",
"isMAP" : "0",
"displayPriority" : "0",
"isEligibleForFreeShipping" : "0",
"isPrimeEligible" : "1",
"dealID" : "f87c994b",
"description" : "Save 50% on JUVEA All Natural Talalay Latex Pillows",
"minBAmount" : "99.99",
"currencyCode" : "USD",
"minListPrice" : "129.99",
"merchantID" : "A21VHZ1TV3ZUZI",
"score" : "0",
"bKind" : "OP",
"msToFeatureEnd" : "0",
},
"responseMetadata" : {
"continueRetries" : "1",
"baseRetryInterval" : "12000"
}
};
window.gb.controller.registerWidget(widgetToRegister);
});
}));</script>
I tried using Regex but I think I'm doing it wrong:
page = requests.get(primary_url, auth=('user', 'pass'), headers=headers)
soup = BeautifulSoup(page.text, 'lxml')
data = soup.select("[type='text/javascript']")[40]
raw = "dealdetails" + "\n".join(str(data.find("script")).split("\n")[4:-3])
print(raw)
json_obj = json.loads(raw)
The end result must be:
"dealDetails" : {
"3b009cf9" : {
"egressUrl" : "https://www.amazon.com/Meredith-Martha-Stewart-Living/dp/B002PXW0EO",
"maxDealPrice" : "5.49",
"offerID" : 000
"maxPrevPrice" : "5.49",
"minBAmount" : "49.9",
"itemType" : "SINGLE_ITEM",
"minPercentOff" : 89,
"items" : [
]
},
"f87c994b" : {
"egressUrl" : "https://www.amazon.com/s/?url=search-
"reviewAsin" : "B073VYKTZN",
"maxListPrice" : "159.99",
"isMAP" : "0",
"displayPriority" : "0",
"isEligibleForFreeShipping" : "0",
"isPrimeEligible" : "1",
"dealID" : "f87c994b",
"description" : "Save 50% on JUVEA All Natural Talalay Latex Pillows",
"minBAmount" : "99.99",
"currencyCode" : "USD",
"minListPrice" : "129.99",
"merchantID" : "A21VHZ1TV3ZUZI",
"score" : "0",
"bKind" : "OP",
"msToFeatureEnd" : "0",
},
"responseMetadata" : {
"continueRetries" : "1",
"baseRetryInterval" : "12000"
}
};
My best guess is:
re.search(r'^{.*?^}', script_content, re.MULTILINE | re.DOTALL)[0]
but if the indenting is different you will need to adjust it.
fixed_str = [your json above, fixed into valid json format]
target = fixed_str.replace("dealDetails",'xxx{ "dealDetails').split("xxx") #this splits the script tag by first removing preceding irrelevant stuff
final = target[1].replace("}\n};","}}\n}xxx").split('xxx') #this splits it again by dropping trailing irrelevant stuff
json_obj = json.loads(final[0])
json_obj
And, if all works well :), it should get you your desired end result...
I am trying to extract values from a dictionary and return as list of tuples in Robot Framework. Would you suggest how to go about it?
my JSON content looks like this :
{
"_embedded" : {
"products" : [ {
"id" : "BMHY2IZB",
"Name" : "ANR",
"securityType" : "type1",
"_links" : {
"self" : {
"href" : "https://test.com/v1/products/BMHY2IZB"
},
"relatedproducts" : {
"href" : "https://test.com/v1/products/BMHY2IZB/related"
}
}
}, {
"id" : "FXDNZBW",
"Name" : "STREPLC",
"securityType" : "ANV",
"_links" : {
"self" : {
"href" : "https://test.com/v1/products/FXDNZBW"
},
"relatedProducts" : {
"href" : "https://test.com/v1/products/FXDNZBW/related"
}
}
} ]
},
"page" : {
"size" : 20,
"totalElements" : 2,
"totalPages" : 1,
"number" : 0
}
}
And with the below code from Robot Framework:
${fileload} = get file ../../Resources/Sample.json
${json}= to json ${fileload}
${PRD}= get from dictionary ${json} _embedded
${products}= get from dictionary ${PRD} products
${PRDlist} = create list
: FOR ${product} in #{products}
\ append to list ${PRDlist} ${product}
log to console ${PRDlist}
I get a response like this :
[{'id': 'BMHY2IZB', 'Name': 'ANR', 'securityType': 'type1', '_links':
{'self': {'href': 'https://test.com/v1/products/BMHY2IZB'},
'relatedproducts': {'href': 'https://test.com/v1/products/BMHY2
IZB/related'}}}, {'id': 'FXDNZBW', 'Name': 'STREPLC', 'securityType':
'ANV',
'_links': {'self': {'href': 'https://test.com/v1/products/FXDNZBW'},
'relatedProducts': {'href':
'https://test.com/v1/products/FXDNZBW/related'}}}]
But I wanted selected values returned as list of tuples :
[{'BMHY2IZB','ANR','type1'},{'FXDNZBW','STREPLC','ANV'}]
This seem to work :
import os
import collections
def APIResponse(dict):
prds = dict.get('_embedded')
products = prds.get('products')
l2 = []
for i in range(len(products)):
v1= products[i].get('id')
v2= products[i].get('Name')
v3= products[i].get('securityType')
l1 = (v1,v2,v3)
l2.append(l1)
return l2
i'm trying to set all the parsed information in json in a dict variable but it returns an empty dict. when i get the array value, everything works completely fine.
here is my code:
let dic = json.arrayValue
for each in dic {
let data = each["data"].dictionaryValue
print (data)
let date = each["date"].stringValue
print (date)
}
parsing date works fine too. and note that my json file is not empty. because when i get the arrayValue everything is fine. here is the output when i print each["data"].arrayValue:
[{
"factoryPrice" : 0,
"size" : 25,
"t5" : 0,
"t3" : 0,
"type" : 1,
"bongahPrice" : 2435,
"sherkat" : "",
"priceConfirmed" : 1,
"id" : 1658,
"factory" : 9,
"exist" : true,
"t1" : 0,
"provice" : 1,
"properties" : {
"طول" : "12 متری",
"info" : "",
"استاندارد" : "A2",
"standard" : "A3",
"رنگ" : "مشکی",
"نوع" : "آجدار"
},
"factoryName" : "نیشابور",
"city" : 306,
"name" : "",
"phoneNumber" : "09338810407",
"createdAt" : "2018-02-16 12:52:50",
"ownerId" : 282,
"shomareSabt" : "",
"t4" : 0,
"profileType" : 0,
"t2" : 0,
"modirName" : "آرزومند",
"bongahName" : "میلگرد تهران",
"updatedAt" : "11:36",
"weight" : 22,
"group" : 57,
"bongahAddress" : "بازاراهن شادباد بوستان بلوک B",
"bongahPhone" : "02166139083"
}]
and this only one of the arrays. i get multiple arrays in response.
so what should i do?
The JSON you have has an array as its top level element. This is clear by the first character: [.
You can't directly get a dictionary value for your JSON because it represents an array. That's the reason trying to get dictionaryValue returns nil.
the json file:
// config
{
"is_train" : false,
"train" : {
"train_data" : "data.txt",
"save_model_path" : "svm_model.yaml",
"SVM" : {
"term_crit" : {
"method" : 1,
"iter" : 1000,
"eps" : 1e-6
},
"type" : 100,
"kernel_type" : 0,
"Cvalue" : 0.1,
"degree" : 0,
"gamma" : 0,
"coef0" : 0,
"nu" : 0,
"p" : 0,
"class_weights" : 0,
}
},
"predict" : {
"SVM" : {
"model" : "save_model.yaml",
"test_data" : "test_data.txt",
"test_ans" : "test_out.txt"
}
}
}
The problem is when I put "predict" in the front of "train", the params in "predict" can be parsed well,
value["predict"].isNull() will return false.
but "train" can't. And vice versa.
So how can I parse both correctly?