Google Refine: iterate over a JSON dictionary - json

I've got some JSON within Google Refine - http://mapit.mysociety.org/point/4326/0.1293497,51.5464828 for the full version, but abbreviated it's like this:
{1234: {'name': 'Barking', 'type': 'WMC'},
5678: {'name': 'England', 'type': 'EUR'} }
I only want to extract the name for the object with the (presumed unique) type WMC.
Parse JSON in Google Refine doesn't help, that's working with arrays, not dicts.
Any suggestions what I should be looking at to fix this?
Edit: I don't know what the initial keys are: I believe they're unique identifiers which I can't predict ahead of time.

Refine doesn't currently know how to iterate through the keys of a dict where they keys are unknown (although I'm about to implement that functionality).
The trick to getting this working with the current implementation is to convert the JSON object to a JSON array. The following GREL expression will do that, parse the result as JSON, iterate through all elements of the array and give you the first name of type 'WMC'.
filter(('['+(value.replace(/"[0-9]+":/,""))[1,-1]+']').parseJson(),v,v['type']=='WMC')[0]['name']
Use that expression with the "Add column based on this column" command to create a new WMC name column. If there's a chance that there'll be more than one name of this type and you want them all, you can add in a forEach loop and join along the lines of
forEach(filter(('['+(value.replace(/"[0-9]+":/,""))[1,-1]+']').parseJson(),v,v['type']=='WMC'),x,x['name']).join('|')
This will give you a pipe separated list of names that you can split apart using "Split multi-valued cells."
It'll be easier in the next release hopefully!

Related

Extract JSON data with Data Flow - Flatten Transformation

Following a previous question I started working on a Data Flow, with the purpose of flattening a JSON file, created as a result of an Application Insights REST query. You can find an anonymised version here.
My goal is to extract the data in the "rows" array of arrays, but I end up with the data duplicated in a cartezian manner (I got an original number of 18 rows and I end up with 324, 18*18).
I cannot understand what I am doing wrong or if it is an issue with the JSON "rows" array of arrays.
Here is my Data Flow - Source has the "Document per line" JSON option, "Single documents" raises a [unexpected character "] error, probably due to the strange formatting in the JSON:
This is the Data Preview in the Source - as you can see, it is only one "tables" node, with 18 elements in the "rows" array:
rows:
I tried to Flatten it, but I cannot map "rows" data to a column, I cannot use something like table.rows[0]:
Also, the rows data gets duplicated - 18 rows for each of the 18 rows outputted:
I am not sure how to get to the bottom of this, if it's the JSON format or if I am doing something wrong. From my experience it's probably the latter.
I think this is caused by your special format.
Please try this:
add a rowdata property
flatten your data
#Steve Zhao thank you! But that solution duplicated the data similar to the original situation:
I did not manage to treat these data as JSON so I ended up thinking about it as text that can be manipulated into an array.
So I split the text by "rows:" and retrieve the second part of the split (arrays in Expression Builder start from 1):
Then I split that text as an array:
Which then I can flatten (at last):
From here on I keep splitting these values to get the data I need - I was interested in the first two and fourth column.

Azure | ADF | How to use a String variable to lookup a Key in an Object type Parameter and retrieve its Value

I am using Azure Data Factory. I'm trying to use a String variable to lookup a Key in a JSON array and retrieve its Value. I can't seem to figure out how to do this in ADF.
Details:
I have defined a Pipeline Parameter named "obj", type "Object" and content:
{"values":{"key1":"value1","key2":"value2"}}
Parameter definition
I need to use this pipeline to find a value named "key1" and return it as "value1"; "key2" and return it as "value2"... and so on. I'm planning to use my "obj" as a dictionary, to accomplish this.
Technically speaking, If i want to find the value for key2, I can use the code below, and it will be returned "value2":
#pipeline().parameters.obj.values.key2
What i can't figure out is how to do it using a variable (instead of hardcoded "key2").
To clear things out: I have a for-loop and, inside it, i have just a copy activity: for-each contents
The purpose of the copy activity is to copy the file named item().name, but save it in ADLS as whatever item().name translates to, according to "obj"
This is how the for-loop could be built, using Python: python-for-loop
In ADF, I tried a lot of things (using concat, replace...), but none worked. The simpliest woult be this:
#pipeline().parameters.obj.values.item().name
but it throws the following error:
{"code":"BadRequest","message":"ErrorCode=InvalidTemplate, ErrorMessage=Unable to parse expression 'pipeline().parameters.obj.values.item().name'","target":"pipeline/name_of_the_pipeline/runid/run_id","details":null,"error":null}
So, can you please give any ideas how to define my expression?
I feel this must be really obvious, but I'm not getting there.....
Thanks.
Hello fellow Pythonista!
The solution in ADF is actually to reference just as you would in Python by enclosing the 'variable' in square brackets.
I created a pipeline with a parameter obj like yours
and, as a demo, the pipeline has a single Set Variable activity that got the value for key2 into a variable.
This is documented but you need X-ray vision to spot it here.
Based on your comments, this is the output of a Filter activity. The Filter activity's output is an object that contains an array named value, so you need to iterate over the "output.value":
Inside the ForEach you reference the name of the item using "item().name":
EDIT BASED ON MORE INFORMATION:
The task is to now take the #item().name value and use it as a dynamic property name against a JSON array. This is a bit of a challenge given the limited nature of the Pipeline Expression Language (PEL). Array elements in PEL can only be referenced by their index value, so to do this kind of complex lookup you will need to loop over the array and do some string parsing. Since you are already inside a FOR loop, and nested FOR loops are not supported, you will need to execute another pipeline to handle this process AND the Copy activity. Warning: this gets ugly, but works.
Child Pipeline
Define a pipeline with two parameters, one for the values array and one for the item().name:
When you execute the child pipeline, pass #pipeline.parameters.obj.values as "valuesArray" and #item().name as "keyValue".
You will need several string parsing operations, so create some string variables in the Pipeline:
In the Child Pipeline, add a ForEach activity. Check the Sequential box and set the Items to the valuesArray parameter:
Inside the ForEach, start by cleaning up the current item and storing it as a variable to make it a little easier to consume.
Parse the object key out of the variable [this is where it starts to get a little ugly]:
Add an IF condition to test the value of the current key to the keyValue parameter:
Add an activity to the TRUE condition that parses the value into a variable [gets really ugly here]:
Meanwhile, back at the Pipeline
At this point, after the ForEach, you will have a variable (IterationValue) that contains the correct value from your original array:
Now that you have this value, you can use that variable as a DataSet parameter in the Copy activity.

clojure.data.json/write-str: specifying a key function for placing values into aggregate arrays

Suppose I have a simple map, example-map:
(def example-map {"s" {"f" "g"}
"m" {"r" "q"}})
I can use clojure.data.json/write-str to JSON-ify this map as such:
(clojure.data.json/write-str example-map) =>
"{\"s\":{\"f\":\"g\"},\"m\":{\"r\":\"q\"}}"
I'd like to conditionally place some of the values into lists according to the value of their keys.
write-str provides an optional :key-fn, which applies some function to key value pairs. For example, the desired function might specify that all values associated with entries that match "s" are placed in lists.
(clojure.data.json/write-str example-map :key-function desired-function) =>
"{\"s\":[{\"f\":\"g\"}],\"m\":{\"r\":\"q\"}}"
Does anyone know how to specify such a key function that checks for membership of a key in a set and places the values associated with members into an array rendered in the output JSON?
Like your previous question, this is not a job for the JSON parser. You don't need to rely on write-time features of your JSON library to adjust the shape of your JSON maps. Instead, you have a fully functional Turing complete language at your disposal: Clojure! If the maps don't already look the way you want them to be output, then write a function that takes one Clojure map as input and produces a different one as output; then ask your JSON library to write the output map, without any special rules for fiddling with the output.
Now, as it happens this particular JSON library does provide an option named value-fn (not key-function as you claim) to let you modify a value in a map based on its key. So you could use that, in which case you simply need to write a function with a signature like:
(fn [k v]
(...compute new value...))
There are many ways you could write such a function, but they are all entirely divorced from your JSON parser. If you need help writing it, mention some specific things you need help with, so you can get a clear explanation for the part of the process that is actually giving you trouble.

Deserialize an anonymous JSON array?

I got an anonymous array which I want to deserialize, here the example of the first array object
[
{ "time":"08:55:54",
"date":"2016-05-27",
"timestamp":1464332154807,
"level":3,
"message":"registerResourcePath ('', '/sap/bc/ui5_ui5/ui2/ushell/resources/')",
"details":"","component":"sap.ui.ModuleSystem"},
{"time":"08:55:54","date":"2016-05-27","timestamp":1464332154808,"level":3,"message":"URL prefixes set to:","details":"","component":"sap.ui.ModuleSystem"},
{"time":"08:55:54","date":"2016-05-27","timestamp":1464332154808,"level":3,"message":" (default) : /sap/bc/ui5_ui5/ui2/ushell/resources/","details":"","component":"sap.ui.ModuleSystem"}
]
I tried deserializing using CL_TREX_JSON_SERIALIZER, but it is corrupt and does not work with my JSON, here is why
Then I tried /UI2/CL_JSON, but it needs a "structure" that perfectly fits the object given by the JSON Object. "Structure" means in my case an internal table of objects with the attributes time, date, timestamp, level, messageanddetails. And there was the problem: it does not properly handle references and uses class description to describe the field assigned to the field-symbol. Since I can not have a list of objects but only a list of references to objects that solution also doesn't works.
As a third attempt I tried with the CALL TRANSFORMATION as described by Horst Keller, but with this method I was not able to read in an anonymous array, and here is why
My major points:
I do not want to change the JSON, since that is what I get from sap.ui.log
I prefere to use built-in functionality and not a thirdparty framework
Your problem comes out not from the anonymity of array, but from the awkwardness of SAP JSON (De)serializer, which doesn't respect double quotes, which enclose JSON attributes. The issue is thoroughly described in this answer.
If you don't want to change your JSON on-the-fly, the only way you have is to change CL_TREX_JSON_DESERIALIZER class like this.
/UI5/CL_JSON_PARSER parses JSONs with unknown format.
Note that it's got "for internal use" written on it so many times that you probably should take it seriously and clone its code to fixate it.

Is {0:{"id":1,...},{"id:2,....}} a other reprensation of a JSON list like [{"id":1,...},{"id:2,....}]

I have a little dilema. I have a backend/Frontend Application that comunicates with a JSON based REST Api.
The backend is written in PHP(Symfony/jmsserializer) and the Frontend in Dart
The communication between these two has a little Problem.
For most List Data the backend responds with a JSON like
[{"id":1,...},{"id:2,....}]
But for some it responds with
{"0":{"id":1,...}, "1":{"id:2,....}}
Now my Question is should the backend respond with the later at all or only with the first?
Problem
You usually have a list of objects. You sometimes get an object with sub-objects as properties.
Underlying issue
JS/JSON-Lists are ordered from 0 upwards which means that if you have PHP-Array which does not respect this rule json_encode will output a JS/JSON-Object instead using the numeric indices as keys.
PHP-Arrays are ordered maps which have more features that the JSON-Lists. Whenever you're using those extra features you won't be able to translate directly into JSON-Lists without loosing some information (ordering, keys, skipped indices, etc.).
PHP-Arrays and JSON-Objects on the other hand are more ore less equivalent in terms of features and can be correctly translated between each other without any loss of information.
Occurence
This happens if you have an initial PHP-Array of values which respects the JS/JSON-List rules but the keys in the list of objects are modified somehow. For example if you have a custom indexing order {"3":{}, "0":{}, "1":{}, "2":{}} or if you have (any) keys that are strings (ie. not numeric).
This always happens if you want to use the numeric id of the object as the numeric index of the list {"123":{"id": 123, "name": "obj"}} even if the numeric ids are in ascending order... so long as they are not starting from 0 upwards it's not a json-list it's a json-object.
Specific case
So my guess is that the PHP code in the backend is doing something like fetching a list of objects but its modifying something about it like inserting them by (string) keys into the array, inserting them in a specific order, removing some of them.
Resolution
The backend can easily fix this by using array_values($listOfObjects) before using json_encode which will reindex the entire list by numeric indices of ascending value.
Arrays and dictionaries are two separate types in JSON ("array" and "object" respectively), but PHP combines this functionality in a single array type.
PHP's json_encode deals with this as follows: an array that only contains numeric keys ($array = ['cat', 'dog']) is serialized as JSON array, an associative array that contains non-numeric keys ($array = ['cat' => 'meow', 'dog' => 'woof']) is serialized as JSON object, including the array's keys in the output.
If you end up with an associative array in PHP, but want to serialize it as a plain array in JSON, just use this to convert it to a numerical array before JSON encoding it: $array = array_values($array);