GREL multivalued JSON - json

I have a column with following contents in Open Refine
1. {"result":"Mango"}
2. {"result":"Banana"},{"result":"Apple"}
and I need resulting column
1. Mango
2. Banana | Apple
The expression I tried was
"forEach(value.parseJson().result).join(' | ')"
but this does not produce results.

Your second example isn't valid JSON, so I'm not sure what the parse will produce -- probably just the first object. One approach would be to wrap each cell with square brackets ([]) to turn it into a JSON array. This will give you something that forEach can operate on.

Thanks #Tom Morris
Modified cells as
1. [{"result":"Mango"}]
2. [{"result":"Banana"},{"result":"Apple"}]
Then solution used was
forEach(value.parseJson(),v,v.result).join(' | ')

Related

Spark SQL | Any depth path ($..foo) not working with get_json_object

The following code sample should ideally return the value of property value but instead returns null. Is this expected? From the spark documentation, it seems any depth query is supported by spark sql and the spark test suite has an example, but unfortunately its for a negative test case.
spark.sql("""select get_json_object('{"node":{"value":"abc"}}', '$..value') as j""").show()
Expected output - abc / Actual - null
I'm trying the any depth path because the json column in every row in the dataset has a variable schema and we're searching for a given key, say foobar, anywhere in the given json, so in the above example, $.value or $.foo.value for more nested values isn't something that's feasible.
The function get_json_object does not support recursive paths as the test show it, so you can't use it in this particular case.
If you know the schema of you json string then I'd recommend you to use from_json function to convert it into a struct type which would be easier to handle.
However, you can always try using regexp_extract with some regex like:
"key"\s?:\s?"([\w\s]+)"
Example:
spark.sql("""
SELECT regexp_extract('{"node":{"value":"abc"}}', '"value"\\s?:\\s?"([\\w\\s]+)"', 1) AS value
""").show()
//+-----+
//|value|
//+-----+
//| abc|
//+-----+

Using jq select elements with keys containing some string, key preserved in results

I have a JSON content as embedded in this link jq-play. The JSON content is large and couldn't be accommodated here.
Currently, I manage to get the values by
[.[keys[] | select(contains("VMIMAGE"))]]
but the key names, i.e. CP-COMPUTEENGINE-VMIMAGE-F1-MICRO aren't present in the result. How do I get it?
It looks like you want to take a "slice" of the object by selecting just those keys containing a certain string. Using your query as a model, this can most easily be accomplished using a query of the form with_entries( select(...) ), e.g.:
.gcp_price_list
| with_entries( select(.key|contains("VMIMAGE")))

How to use something like a doc string in Scenario outline in Gherkin?

I am doing a simple rest api test in Cucumber Java. The response is in Json format.
The gherkin feature file i write looks like:
Scenario:
Given I query service by "employees"
When I make the rest call
Then response should contain:
"""
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
"""
Now since there are multiple query to test with different parameters like "employees", "departments".. etc, it's natural to write Scenario Outline to perform the task:
Scenario Outline:
Given I query service by "<category>"
When I make the rest call
Then response should contain "<json_string_for_that_category>"
Examples:
| category | json_string_for_that_category |
| employee | "json_string_expected_for_employee" |
| department | "json_string_expected_for_department"|
where json_string_expected_for_employee is just:
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
by copy and paste.
But there are problems with this approach:
There are special characters in Json string need to escape if just surrounded by " "
The Scenario Outline table looks very messy
What is a good approach to do this? Can a string variable be defined else where in feature file to store long strings, and this variable name placed in the table?
Or any solutions? This must be a common scenario for people to compare non-trivial data output in Cucumber.
Thanks,
For your problem 1
You have to use escape character backslash (\)
example: \"employees\" instead of "employees"
For your problem 2
It is common that if your input is not in the similar length of a character, it will be messy. You can use indent to make it clear.
or
Use separate java file to store all the input as variable and pass it to scenario outline examples while execution.
First of all resist the temptation to use scenario outlines, generally they are not worth the hassle.
Personally I don't think you get any value in having the json string actually in the feature. So I would write something like:
Scenario: Query by category
Given there is a category
When I query the service by the category
Then I should receive a json representation of the category
This allows me to assign the responsibility of defining what a json representation of a category should be to code. This code could be in our step definitions, or it could even be in our application (perhaps shared with unit tests). Some of things this code could do are:
Actually validate that the response is valid json
Validate the structure of the json matches the structure of a category
Validate values e.g. the category id
One thing this approach will do is stop you having to rewrite all your scenarios when you add an extra field to a Category.

For loop for every object Jq

I'm trying to compare two attributes on every object. This is the code i use for this problem:
(.result[].downlaodable or .result[].playable) but this time jq does cartesian product. If i have 3 object, jq turns me 9 object.
I have to convert it to something like this:
(.result[1].downlaodable or .result[1].playable)
(.result[2].downlaodable or .result[2].playable)
(.result[3].downlaodable or .result[3].playable)
How can i do that?
Change your filter to generate the results once.
.result[] | .downloadable or .playable

Can topoJSON create composite properties?

Using topoJSON is it possible to take two properties from an input shapefile and combine them into a single property on the output topoJSON file?
For example if the feature on the original shapefile has the properties 'constituency':'34' and 'ward':'90' is it possible to concatenate these into a single id property in the output JSON file 'id':'3490'?
If not, can anyone suggest an elegant way to achieve this?
Yes! This is now possible.
As of this commit -p id=constituency+""+ward will concatenate constituency and ward properties on the input file into an id property on the output file. The "" between constituency and ward coerces to strings ensuring javascript doesn't simply add two integers i.e. 30+24 gives 54 30+""+24 gives 3024.