(Python)How to process complicated json data

(Python)How to process complicated json data - json

//Please excuse my poor English.
Hello everyone, I am doing a project which is about a facebook comment spider.
then I find the Facebook Graph GUI. It will return a json file that's so complicated for me.
The json file is include so many parts
then I use json.loads to get all the json code
finally it return a dict for me.
and i dont know how to access the Value
for example i want get all the id or comment.
but i can only get the 2 key of dict "data" and "pading"
so, how can i get the next key? like "id" or "comment"
and how to process this complicated data.
code
Thank you very much.

Two ways I can think of, either you know what you're looking for and access it directly or you loop over the keys, look at the value of the keys and nest another loop until you reach the end of the tree.
You can do this using a self-calling function and with the appropriate usage of jQuery.
Here is an example:
function get_the_stuff(url)
{
$.getJSON(url, function ( data ) {
my_parser(data) });
}
function my_parser(node)
{
$.each(node, function(key, val) {
if ( val && typeof val == "object" ) { my_parser(val); }
else { console.log("key="+key+", val="+val); }
});
}
I omitted all the error checking. Also make sure the typeof check is appropriate. You might need some other elseif's to maybe treat numbers, strings, null or booleans in different ways. This is just an example.
EDIT: I might have slightly missed that the topic said "Python"... sorry. Feel free to translate to python, same principles apply.
EDIT2: Now lets' try in Python. I'm assuming your JSON is already imported into a variable.
def my_parser(node, depth=0):
if type(node) == "list":
for val in node:
my_parser(val,depth+1)
elif type(node) == "dict":
for key in node:
printf("level=%i key=%s" % ( depth, key ))
my_parser(node[key], depth+1)
elif type(node) == "str":
pritnf("level=%i value=%s" % ( depth, node ))
elsif type(node) == "int":
printf("level=%i value=%i % ( depth, node ))
else:
printf("level=%i value_unknown_type" % ( depth ))

Related

how to evaluate String to identify duplicate keys

I have an input string in groovy which is not strictly JSON.
String str = "['OS_Node':['eth0':'1310','eth0':'1312']]"
My issue is to identify the duplicate "eth0" . I tried to convert this into map using Eval.me(), but it automatically removes the duplicate key "eth0" and gives me a Map.
What is the best way for me to identify the presence of duplicate key ?
Note: there could be multiple OS_Node1\2\3\ entries.. need to identify duplicates in each of them ?
Is there any JSON api that can be used? or need to use logic based on substring() ?

One way to solve this could be to cheat a little and replace colons with commas which would transform the maps into lists and then do a recursive search for duplicates:
def str = "['OS_Node':['eth0':'1310','eth0':'1312'], 'OS_Node':['eth1':'1310','eth1':'1312']]"
def tree = Eval.me(str.replaceAll(":", ","))
def dupes = findDuplicates(tree)
dupes.each { println it }
def findDuplicates(t, path=[], dupes=[]) {
def seen = [] as Set
t.collate(2).each { k, v ->
if (k in seen) dupes << [path: path + k]
seen << k
if (v instanceof List) findDuplicates(v, path+k, dupes)
}
dupes
}
when run, prints:
─➤ groovy solution.groovy
[path:[OS_Node, eth0]]
[path:[OS_Node]]
[path:[OS_Node, eth1]]
i.e. the method finds all paths to duplicated keys where "path" is defined as the key sequence required to navigate to the duplicate key.
The function returns a list of maps which you can then do whatever you wish with. Should be noted that the "OS_Node" key is with this logic treated as a duplicate but you could easily filter that out as a step after this function call.

First of all, the string you have there is not JSON - not only due to
the duplicate keys, but also by the use of [] for maps. This looks
a lot more like a groovy map literal. So if this is your custom format
and you can not do anything against it, I'd write a small parser for this,
because sooner or later edge cases or quoting problems come around the
corner.
#Grab("com.github.petitparser:petitparser-core:2.3.1")
import org.petitparser.tools.GrammarDefinition
import org.petitparser.tools.GrammarParser
import org.petitparser.parser.primitive.CharacterParser as CP
import org.petitparser.parser.primitive.StringParser as SP
import org.petitparser.utils.Functions as F
class MappishGrammerDefinition extends GrammarDefinition {
MappishGrammerDefinition() {
define("start", ref("map"))
define("map",
CP.of("[" as Character)
.seq(ref("kv-pairs"))
.seq(CP.of("]" as Character))
.map{ it[1] })
define("kv-pairs",
ref("kv-pair")
.plus()
.separatedBy(CP.of("," as Character))
.map{ it.collate(2)*.first()*.first() })
define("kv-pair",
ref('key')9
.seq(CP.of(":" as Character))
.seq(ref('val'))
.map{ [it[0], it[2]] })
define("key",
ref("quoted"))
define("val",
ref("quoted")
.or(ref("map")))
define("quoted",
CP.anyOf("'")
.seq(SP.of("\\''").or(CP.pattern("^'")).star().flatten())
.seq(CP.anyOf("'"))
.map{ it[1].replace("\\'", "'") })
}
// Helper for `def`, which is a keyword in groovy
void define(s, p) { super.def(s,p) }
}
println(new GrammarParser(new MappishGrammerDefinition()).parse("['OS_Node':['eth0':'1310','eth0':'1312'],'OS_Node':['eth0':'42']]").get())
// → [[OS_Node, [[eth0, 1310], [eth0, 1312]]], [OS_Node, [[eth0, 42]]]]

Looping through list to store variables in dictionary runs in error

What I want to do:
Get user input from HTML form, store input in variables within Django and perform calculations with variables.
To accomplish that, I use following code:
my_var = requst.POST.get('my_var')
To prevent having 'None' stored in 'my_var' when a Django page is first rendered, I usually use
if my_var == None:
my_var = 1
To keep it simple when using a bunch of variables I came up with following idea:
I store all variable names in a list
I loop through list and create a dictionary with variable names as key and user input as value
For that I wrote this code in python which works great:
list_eCar_properties = [
'car_manufacturer',
'car_model',
'car_consumption',]
dict_sample_eCar = {
'car_manufacturer' : "Supr-Duper",
'car_model' : "Lightning 1000",
'car_consumption' : 15.8,
}
dict_user_eCar = {
}
my_dict = {
'car_manufacturer' : None,
'car_model' : None,
'car_consumption' : None,
}
for item in list_eCar_properties:
if my_dict[item] == None:
dict_user_eCar[item] = dict_sample_eCar[item]
else:
dict_user_eCar[item] = my_dict[item]
print(dict_user_eCar)
Works great - when I run the code, a dictionary (dict_user_eCar) is created where user input (in this case None simulated by using a second dictionary my_dict) is stored. When User leaves input blank - Data from dict_sample_eCar is used.
Now, when I transfer that code to my Django view things don't work not as nice anymore. Code as follows:
def Verbrauchsrechner_eAuto(request):
list_eCar_properties = [
'car_manufacturer',
'car_model',
'car_consumption',
]
dict_model_eCar = {
'car_manufacturer' : "Supr-Duper",
'car_model' : "Lightning 1000",
'car_consumption' : 15.8,
}
dict_user_eCar = {
}
for item in list_eCar_properties:
dict_user_eCar[item] = dict_model_eCar[item]
context = {
'dict_user_eCar' : dict_user_eCar,
'dict_model_eCar' : dict_model_eCar,
'list_eCar_properties' : list_eCar_properties,
}
return render(request, 'eAuto/Verbrauchsrechner_eAuto.html', context = context)
Result: The page gets rendered with only the first dictionary entry. All others are left out. In this cases only car_manufacturer gets rendered to html-page.

Sorry folks - as I was reviewing my post, I realized, that I had a major srew-up at the last part's indentation:
context and return both were part of the for-loop which obviously resulted in a page-rendering after the first loop.
I corrected the code as follows:
for item in list_eCar_properties:
dict_user_eCar[item] = dict_model_eCar[item]
context = {
'dict_user_eCar' : dict_user_eCar,
'dict_model_eCar' : dict_model_eCar,
'list_eCar_properties' : list_eCar_properties,
}
return render(request, 'eAuto/Verbrauchsrechner_eAuto.html', context = context)`
Since I didn't want the time I spend to write this post to be wasted - I simply posted it anyway - even though I found the mistake myself.
Lessons learned for a Newbie in programming:
To many comments in your own code might result in a big confusion
Try to be precise and keep code neat and tidy
Do 1 and 2 before writing long posts in stackoverflow
Maybe someone else will benefit from this.

Jmeter Json Extractor with multiple conditional - failed

I am trying to create a Json Extractor and it`s being a thought activity. I have this json structure:
[
{
"reportType":{
"id":3,
"nomeTipoRelatorio":"etc etc etc",
"descricaoTipoRelatorio":"etc etc etc",
"esExibeSite":"S",
"esExibeEmail":"S",
"esExibeFisico":"N"
},
"account":{
"id":9999999,
"holdersName":"etc etc etc",
"accountNamber":"9999999",
"nickname":null
},
"file":{
"id":2913847,
"typeId":null,
"version":null,
"name":null,
"format":null,
"description":"description",
"typeCode":null,
"size":153196,
"mimeType":null,
"file":null,
"publicationDate":"2018-12-05",
"referenceStartDate":"2018-12-05",
"referenceEndDate":"2018-12-06",
"extension":null,
"fileStatusLog":{
"idArquivo":2913847,
"dhAlteracao":"2018-12-05",
"nmSistema":"SISTEMA X",
"idUsuario":999999,
"reportStatusIndicador":"Z"
}
}
}
]
What I need to do: First of all, I am using the option "Compute concatenation var" and "Match No." as -1. Because the service can bring in the response many of those.
I have to verify, if "reportStatusIndicador" = 'Z' or 'Y', if positive, I have to collect File.Id OR file.FileStatusLog.idArquivo, they are the same, I was trying the first option, in this case the number "2913847", but if come more results, I will collect all File.id`s
With this values in hands, I will continue with a for each for all File.id`s.
My last try, was this combination, after reading a lot and tried many others combinations.
[?(#...file.fileStatusLog.reportStatusIndicador == 'Z' || #...file.fileStatusLog.reportStatusIndicador == 'Y')].file.id
But my debug post processor always appears like this, empty:
filesIds=

Go for $..[?(#.file.fileStatusLog.reportStatusIndicador == 'Z' || #.file.fileStatusLog.reportStatusIndicador == 'Y')].file.id
Demo:
References:
Jayway JsonPath: Inline Predicates
JMeter's JSON Path Extractor Plugin - Advanced Usage Scenarios

I could do it with this pattern:
[?(#.file.fileStatusLog.reportStatusIndicador == 'Z' ||
#.file.fileStatusLog.reportStatusIndicador == 'Y')].file.id
filesIds_ALL=2913755,2913756,2913758,2913759,2913760,2913761,2913762,2913763,2913764,2913765,2913766,2913767,2913768,2913769,2913770

How do I search for a specific string in a JSON Postgres data type column?

I have a column named params in a table named reports which contains JSON.
I need to find which rows contain the text 'authVar' anywhere in the JSON array. I don't know the path or level in which the text could appear.
I want to just search through the JSON with a standard like operator.
Something like:
SELECT * FROM reports
WHERE params LIKE '%authVar%'
I have searched and googled and read the Postgres docs. I don't understand the JSON data type very well, and figure I am missing something easy.
The JSON looks something like this.
[
{
"tileId":18811,
"Params":{
"data":[
{
"name":"Week Ending",
"color":"#27B5E1",
"report":"report1",
"locations":{
"c1":0,
"c2":0,
"r1":"authVar",
"r2":66
}
}
]
}
}
]

In Postgres 11 or earlier it is possible to recursively walk through an unknown json structure, but it would be rather complex and costly. I would propose the brute force method which should work well:
select *
from reports
where params::text like '%authVar%';
-- or
-- where params::text like '%"authVar"%';
-- if you are looking for the exact value
The query is very fast but may return unexpected extra rows in cases when the searched string is a part of one of the keys.
In Postgres 12+ the recursive searching in JSONB is pretty comfortable with the new feature of jsonpath.
Find a string value containing authVar:
select *
from reports
where jsonb_path_exists(params, '$.** ? (#.type() == "string" && # like_regex "authVar")')
The jsonpath:
$.** find any value at any level (recursive processing)
? where
#.type() == "string" value is string
&& and
# like_regex "authVar" value contains 'authVar'
Or find the exact value:
select *
from reports
where jsonb_path_exists(params, '$.** ? (# == "authVar")')
Read in the documentation:
The SQL/JSON Path Language
jsonpath Type

How to organize big R functions?

I'm writing an R function, that is becoming quite big. It admit multiple choice, and I'm organizing it like so:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
if (type == "aa") {
do something
- a lot of code here -
....
}
if (type == "bb") {
do something
- a lot of code here -
....
}
....
}
I have two questions:
Is there a better way, in order to not use the 'if' statement, for every choice of the parameter type?
Could it be more functional to write a sub-function for every "type" choice?
If I write subfunction, it would look like this:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
if (type == "aa") result <- sub_fun_aa(y)
if (type == "bb") result <- sub_fun_bb(y)
if (type == "cc") result <- sub_fun_cc(y)
if (type == "dd") result <- sub_fun_dd(y)
....
}
Subfunction are of course defined elsewhere (in the top of myfun, or in another way).
I hope I was clear with my question. Thanks in Advance.
- Additional info -
I'm writing a function that applies some different filters to an image (different filter = different "type" parameter). Some filters share some code (for example, "aa" and "bb" are two gaussian filters, which differs only for one line code), while others are completely different.
So I'm forced to use a lot of if statement, i.e.
if(type == "aa" | type == "bb"){
- do something common to aa and bb -
if(type == "aa"){
- do something aa-related -
}
if(type == "bb"){
- do something bb-related -
}
}
if(type == "cc" | type == "dd"){
- do something common to cc and dd -
if(type == "cc"){
- do something cc-related -
}
if(type == "dd"){
- do something dd-related -
}
}
if(type == "zz"){
- do something zz-related -
}
And so on.
Furthermore, there are some if statement in the code "do something".
I'm looking for the best way to organize my code.

Option 1
One option is to use switch instead of multiple if statements:
myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
switch(type,
"aa" = sub_fun_aa(y),
"bb" = sub_fun_bb(y),
"bb" = sub_fun_cc(y),
"dd" = sub_fun_dd(y)
)
}
Option 2
In your edited question you gave far more specific information. Here is a general design pattern that you might want to consider. The key element in this pattern is that there is not a single if in sight. I replace it with match.function, where the key idea is that the type in your function is itself a function (yes, since R supports functional programming, this is allowed).:
sharpening <- function(x){
paste(x, "General sharpening", sep=" - ")
}
unsharpMask <- function(x){
y <- sharpening(x)
#... Some specific stuff here...
paste(y, "Unsharp mask", sep=" - ")
}
hiPass <- function(x) {
y <- sharpening(x)
#... Some specific stuff here...
paste(y, "Hipass filter", sep=" - ")
}
generalMethod <- function(x, type=c(hiPass, unsharpMask, ...)){
match.fun(type)(x)
}
And call it like this:
> generalMethod("stuff", "unsharpMask")
[1] "stuff - General sharpening - Unsharp mask"
> hiPass("mystuff")
[1] "mystuff - General sharpening - Hipass filter"

There is hardly ever a reason not to refactor your code into smaller functions. In this case, besides the reorganisation, there is an extra advantage: the educated user of your function(s) can immediately call the subfunction if she knows where she's at.
If these functions have lots of parameters, a solution (to ease maintenance) could be to group them in a list of class "myFunctionParameters", but depends on your situation.
If code is shared between the different sub_fun_xxs, just plug that into another function that you use from within each of the sub_fun_xxs, or (if that's viable) calculate the stuff up front and pass it directly into each sub_fun_xx.

This is a much more general question about program design. There's no definitive answer, but there's almost certainly a better route than what you're currently doing.
Writing functions that handle the different types is a good route to go down. How effective it will be depends on several things - for example, how many different types are there? Are they at all related, e.g. could some of them be handled by the same function, with slightly different behavior depending on the input?
You should try to think about your code in a modular way. You have one big task to do overall. Can you break it down into a sequence of smaller tasks, and write functions that perform the smaller tasks? Can you generalize any of those tasks in a way that doesn't make the functions (much) more difficult to write, but does give them wider applicability?
If you give some more detail about what your program is supposed to be achieving, we will be able to help you more.

This is more of a general programming question than an R question. As such, you can follow basic guidelines of code quality. There are tools that can generate code quality reports from reading your code and give you guidelines on how to improve. One such example is Gendarme for .NET code. Here is a typical guideline that would appear in a report with too long methods:
AvoidLongMethodsRule

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

(Python)How to process complicated json data - json

Related

how to evaluate String to identify duplicate keys

Looping through list to store variables in dictionary runs in error

Jmeter Json Extractor with multiple conditional - failed

How do I search for a specific string in a JSON Postgres data type column?

How to organize big R functions?

Categories

Resources