JSON parsing using String Condition with JMESPATH - json

I am facing an issue with parsing a json using Jmespath based on a string condition. I would want to get values of a key based on condition on string value of another key. Both of them are in same hierarchy of a given dictionary. However , the parent key of the dictionary is a variable.I am familiar with jq but jmespath is new to me. I am using it , as the current project depends on it. Hence no much choice on changing the parser.
Sample Json below :-
{
"people": {
"a": {"First":"James", "last": "1"},
"b": {"First":"Jacob", "last": "2"},
"c": {"First":"Jayden", "last": "3"},
"d": {"First":"different", "last" : "4"}
}
}
I would want to get the last's value where First's value starts with "J".
I have tried referring to articles provided in official site at http://jmespath.org/tutorial.html. However , most of them concentrate on a standard key structure and not much on a variable key dictionary structure. Hence am unable to write a jmespath query for a given json.
The jq equivalent for achieving the intended result is given below :-
.people | .[] | select (.First | startswith("J")) | .last
The closest jmespath query , that I could logically arrive at based on my understanding is :-
people.*[?starts_with(First,`J`)].last
However , the above query returns blank result.
Expected output is
"d","e","f"
I am unable to understand where I am going wrong.
It would be nice if someone can point me to a good article or help me find the solution to the above issue.
Thanks alot

UPDATE :
The solution is to use values(#).
Reference link
https://github.com/jmespath/jmespath.site/issues/24
So one of the possible solution for the above ask is
people.values(#)[?starts_with(First,`J`)].last
So for any variable key , we can use values(#) to filter projections further down the structure.

Related

SQLite query by json object property within an array

I've recently started using SQLite with the JSON1 extension which allows me to store and query dynamic/json data.
Lets take for example the following table and data structure:
# documents table:
--------------------------------------------
id | json
---- ------------------------------------
1 | [{"id": 1}, {"id": 2}, {"id": 3}]
2 | [{"id": 11}, {"id": 12}, {"id": 13}]
The problem I stumbled on is that there doesn't seems to be an easy way to query objects within an array without specifying an index. Or in other words, consider the following pseudo query:
SELECT *
FROM documents
WHERE json_extract(json, "$[*].id") > 1
# expect to return all rows that have json.*.id greater than 1
The above doesn't work because instead of [*] you have to specify a concrete array index.
One workaround of this could be to use json_each or json_tree but that can get pretty quickly out of hand if you have to handle nested array objects, eg. sub1.*.sub2.*.sub3.id
I found that the MySQL json data type supports [*] but I wasn't able to find anything similar for SQLite.
Is there some "hidden" syntax to specify [*] in json path queries for SQLite that I'm missing or this is a limitation of the JSON1 extension?

YAJL JSON PARSING

I am trying to parse below JSON using YAJL. YAJLGEN generated below data structure but the issue i am facing is the number of arrays ex: KEY, CUSTOMER are not fixed. These arrays are returned for each field in the response. I am trying to avoid defining an array for each field from the response.
Could you please, advise if there is a better way to read the below json and parse dyanic arrays. I tried using "yajl_array_loop", "yajl_array_elem" but i couldn't able to make it work in my program for some reason. Thank is in advance.
{
"errstatus": 400,
"errors": {
"Key": [
"The Key field is required."
],
"Customer": [
"The Customer field is required."
]
}
}
dcl-ds jsonDoc qualified;
errstatus packed(3) inz(0);
dcl-ds ERRORS;
num_KEY int(10) inz(0);
KEY varchar(37) inz('') dim(1);
num_CUSTOMER int(10) inz(0);
CUSTOMER varchar(43) inz('') dim(2);
end-ds;
end-ds;
If yajl is not working then it is probably not a good choice for your case. If your JSON is not hundreds of megabyte big then you may try a DOM like approach like using noxDB (https://github.com/sitemule/noxDB). It reads the whole JSON into memory and you can evaluate the in-memory JSON the way you want. Seems like a much better approach for your situation.

How to extract all elements in a JSON array using Redshift?

I want to extract from a JSON which has more JSONs nested inside, all the elements whose title is 'title2'. I have the code working on MySQL but I can not translate it into Redshift.
JSON structure:
{"master-title": [{"title": "a", "title2": "b"},{"title": "c", "title2: "d", "title3": "e"}], "master-title2": [{"title": "f", "title2": "g", "title3": "h"},{"title": "i", "title2": "j", "title3": "k"}]}
MySQL query (works as desired):
select id
,json_extract(myJSON, '$**.title2')),0)
from myTable
MySQL ouput:
["b", "d","g","j"]
My problem is that on Redshift I can only specifically define the path as:
JSON_EXTRACT_PATH_TEXT(myJSON, 'master-title2',0,'title')
So I can only get one element instead of all of them.
Any idea how to evaluate all paths and to get all elements in a JSON array which have the same "title2" using Redshift? (same output as in MySQL)
Thank you in advance.
Redshift has only a very rudimentary set to JSON manipulation functions (basically JSON_EXTRACT_PATH_TEXT and JSON_EXTRACT_ARRAY_ELEMENT_TEXT). It's not enough to deal with schemaless JSON.
Python UDF
If Redshift was my only mean of processing data I would give python UDF a try. You can code a function in imperative python. Then, having a column holding your json object and just call that function on all elements to do custom extraction.
Unnesting JSON arrays
Other options would be to really try to understand the schema and implement it using the two JSON funtions mentioned before (This SO answer will give you an idea on how to explode/unnest a JSON array in Redshift). Provided your JSON is not arbitrarily nested, but follows some patterns, this could work.
Regex (better don't)
Another desperate approach would be to try to extract your data with regex - could work for simple cases, but it's an easy way to shoot yourself in the foot.
Thanks for your answer.
I finally found a solution using Python. I hope it may help some others.
count=[x.count("title2") for x in df['myJSON'].tolist()]

manipulating (nested) JSON keys and there values, using nifi

I am currently facing an issue where I have to read a JSON file that has mostly the same structure, has about 10k+ lines, and is nested.
I thought about creating my own custom processor which reads the JSON and replaces several matching key/values to the ones needed. As I am trying to use NiFi I assume that there should be a more comfortable way as the JSON-structure itself is mostly consistent.
I already tried using the ReplaceText processor as well as the JoltTransformJson processor, but I could not figure out. How can I transform both keys and values, if needed? For example: if there is something like this:
{
"id": "test"
},
{
"id": "14"
}
It might be necessary to turn the "id" into "Number" and map "test" to "3", as I am using different keys/values in my jsonfiles/database, so they need to fit those. Is there a way of doing so without having to create my own processor?
Regards,
Steve

Searching trough JSON data stored in MySql

I've a big set of json data inside a database, the data wasn't supposed to be queryed, so the was stored in a very messy way... this is the structure
{
"0": {"key": "Developer(s)", "values": ["Capcom"]},
"1": {"key": "Publisher(s)", "values": ["Capcom"]},
"2": {"key": "Producer(s)", "values": ["Tokuro Fujiwara"]},
"3": {"key": "Composer(s)", "values": ["Setsuo Yamamoto"]},
"4": {"key": "Series", "values": ["X-Men"]},
"6": {"key": "Release", "values": ["EU:", " 1995"]},
"7": {"key": "Mode(s)", "values": ["Single-player"]}
}
I should query inside the db to verify which records has which property (i.e. all records with a "Release" key inside, all that contains the value "Capcom" inside the Developoer key, etc.)
Can someone point me to the right way? I found only examples with simple structures (i.e. { "key": "value" }), here the key is the index number, and the value is an array with two different keys...
Should I find a way to rewrite all the data or there is something easy?
p.s. I'm building a laravel application over this data, so I can also use an eloquent approach.
Thanks in advance
as #Guy L mentioned in his answer, you can use LIKE or REGEX. But, It will be expensive.
example:
SELECT * FROM table WHERE json_column LIKE '%"Release"%';
answering:
Should I find a way to rewrite all the data or there is something easy?
Consider, how often do you have to access this data?
a NoSQL database like MongoDB is a really good usecase for data like this, I have been using Mongo and I am happy with it.
You can easily migrate your data to MongoDB and use a ORM similar to Eloquent model like : https://github.com/jenssegers/laravel-mongodb to communicate with mongo from your laravel project.
Hope it helps you arrive at a solution.
Please refer to
https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html
which allows you to search for specific value at specific place of the JSON structure.
However, I would suggest - if the data is NOW supposed to be searchable - redesign/convert the data.
Query JSON as a string can be done using the LIKE or REGEXP operators
But in general, since the strings are long an complex it's really not recommended
The best way is reloading the info into proper tables, storing in an SQL way
If your MySQL version is 5.7 or above, you can try some options will its JSON support:
https://dev.mysql.com/doc/refman/8.0/en/json.html#json-paths
Thanks for the suggestions about using mysql json specific functions, the column is already in JSON type, so i can definitively use thm, but unfortunally even after reading the mysql documentation i can't figure out how to solve my problem, honestly i'm not a data specialist, i'm kind of confused when dealing with database, so any examples will be appreciated.
So far i've tried to mess up with queries, but i wasn't able to find a correct "selector" for the keys of my dataset, using '$[0]' will return only the first column, i'd need some hints for creating the right syntax using json_extract, json_contains, etc.
Thanks everyone, at the end i've decided to fetch all the data and storing them again properly.