How to extract all elements in a JSON array using Redshift? - mysql

I want to extract from a JSON which has more JSONs nested inside, all the elements whose title is 'title2'. I have the code working on MySQL but I can not translate it into Redshift.
JSON structure:
{"master-title": [{"title": "a", "title2": "b"},{"title": "c", "title2: "d", "title3": "e"}], "master-title2": [{"title": "f", "title2": "g", "title3": "h"},{"title": "i", "title2": "j", "title3": "k"}]}
MySQL query (works as desired):
select id
,json_extract(myJSON, '$**.title2')),0)
from myTable
MySQL ouput:
["b", "d","g","j"]
My problem is that on Redshift I can only specifically define the path as:
JSON_EXTRACT_PATH_TEXT(myJSON, 'master-title2',0,'title')
So I can only get one element instead of all of them.
Any idea how to evaluate all paths and to get all elements in a JSON array which have the same "title2" using Redshift? (same output as in MySQL)
Thank you in advance.

Redshift has only a very rudimentary set to JSON manipulation functions (basically JSON_EXTRACT_PATH_TEXT and JSON_EXTRACT_ARRAY_ELEMENT_TEXT). It's not enough to deal with schemaless JSON.
Python UDF
If Redshift was my only mean of processing data I would give python UDF a try. You can code a function in imperative python. Then, having a column holding your json object and just call that function on all elements to do custom extraction.
Unnesting JSON arrays
Other options would be to really try to understand the schema and implement it using the two JSON funtions mentioned before (This SO answer will give you an idea on how to explode/unnest a JSON array in Redshift). Provided your JSON is not arbitrarily nested, but follows some patterns, this could work.
Regex (better don't)
Another desperate approach would be to try to extract your data with regex - could work for simple cases, but it's an easy way to shoot yourself in the foot.

Thanks for your answer.
I finally found a solution using Python. I hope it may help some others.
count=[x.count("title2") for x in df['myJSON'].tolist()]

Related

Read json with glue crawler return UNKNOWN classification

I have a json file which of following format
{"result": [{"key1":"value1", "key2":"value2", "key3":"value3"}]}
When I use the crawler the table created has classification UNKOWNN. I have done some research and if you make a custom classifier with JsonPath $[*] you should be able to get the whole array. Unfortunately this does not work, for me at least. I created a new crawler after creating the classifier as it would not work if the old crawler were updated with the classifier.
Has anyone run into this issue and can be of help?
Your JSONPath is assuming that the root is a collection, eg.
[{"result ..},{}]
Since your root is not a collection, try a JSONPath like this:
$.result
That assumes that the whole object is the value you want, you may also want to do:
$.result[*]
That will get each entry in the result collection as a separate object.
I found a workaround..
In my python script I select the "result" array. In other words I do not have the "result" key now. I can then use the classifier with following JsonPath $[*]. This workaround worked fine for me.
Have a nice one!

manipulating (nested) JSON keys and there values, using nifi

I am currently facing an issue where I have to read a JSON file that has mostly the same structure, has about 10k+ lines, and is nested.
I thought about creating my own custom processor which reads the JSON and replaces several matching key/values to the ones needed. As I am trying to use NiFi I assume that there should be a more comfortable way as the JSON-structure itself is mostly consistent.
I already tried using the ReplaceText processor as well as the JoltTransformJson processor, but I could not figure out. How can I transform both keys and values, if needed? For example: if there is something like this:
{
"id": "test"
},
{
"id": "14"
}
It might be necessary to turn the "id" into "Number" and map "test" to "3", as I am using different keys/values in my jsonfiles/database, so they need to fit those. Is there a way of doing so without having to create my own processor?
Regards,
Steve

Searching trough JSON data stored in MySql

I've a big set of json data inside a database, the data wasn't supposed to be queryed, so the was stored in a very messy way... this is the structure
{
"0": {"key": "Developer(s)", "values": ["Capcom"]},
"1": {"key": "Publisher(s)", "values": ["Capcom"]},
"2": {"key": "Producer(s)", "values": ["Tokuro Fujiwara"]},
"3": {"key": "Composer(s)", "values": ["Setsuo Yamamoto"]},
"4": {"key": "Series", "values": ["X-Men"]},
"6": {"key": "Release", "values": ["EU:", " 1995"]},
"7": {"key": "Mode(s)", "values": ["Single-player"]}
}
I should query inside the db to verify which records has which property (i.e. all records with a "Release" key inside, all that contains the value "Capcom" inside the Developoer key, etc.)
Can someone point me to the right way? I found only examples with simple structures (i.e. { "key": "value" }), here the key is the index number, and the value is an array with two different keys...
Should I find a way to rewrite all the data or there is something easy?
p.s. I'm building a laravel application over this data, so I can also use an eloquent approach.
Thanks in advance
as #Guy L mentioned in his answer, you can use LIKE or REGEX. But, It will be expensive.
example:
SELECT * FROM table WHERE json_column LIKE '%"Release"%';
answering:
Should I find a way to rewrite all the data or there is something easy?
Consider, how often do you have to access this data?
a NoSQL database like MongoDB is a really good usecase for data like this, I have been using Mongo and I am happy with it.
You can easily migrate your data to MongoDB and use a ORM similar to Eloquent model like : https://github.com/jenssegers/laravel-mongodb to communicate with mongo from your laravel project.
Hope it helps you arrive at a solution.
Please refer to
https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html
which allows you to search for specific value at specific place of the JSON structure.
However, I would suggest - if the data is NOW supposed to be searchable - redesign/convert the data.
Query JSON as a string can be done using the LIKE or REGEXP operators
But in general, since the strings are long an complex it's really not recommended
The best way is reloading the info into proper tables, storing in an SQL way
If your MySQL version is 5.7 or above, you can try some options will its JSON support:
https://dev.mysql.com/doc/refman/8.0/en/json.html#json-paths
Thanks for the suggestions about using mysql json specific functions, the column is already in JSON type, so i can definitively use thm, but unfortunally even after reading the mysql documentation i can't figure out how to solve my problem, honestly i'm not a data specialist, i'm kind of confused when dealing with database, so any examples will be appreciated.
So far i've tried to mess up with queries, but i wasn't able to find a correct "selector" for the keys of my dataset, using '$[0]' will return only the first column, i'd need some hints for creating the right syntax using json_extract, json_contains, etc.
Thanks everyone, at the end i've decided to fetch all the data and storing them again properly.

Mapping json object with a schema in nodejs

I am using nodejs.
I am having the following problem -
There is a very big json object which i want to map to a much smaller object with a specific format which i will be using later on.
I want to have specific schemas which i will be able to use on that big object, each schema will have its own purpose. I want to be able to customize the structure via those schemas and no where else.
I was searching for a library which can help me with that.
I found one via npm which is called deep-map.
https://www.npmjs.com/package/deep-map
I played with it for a bit and it seem to answer my basic needs.
But i also going to be needing to do some more complex mapping.
A simplified example -
"testObj": { "myArr": [ {"type": "x", name:"test1"}, {"type": "y", name:"test2"}] }
and i need to look in myArr only for the name for which type equals x.
So basically i need some sort of for each loop and if term in between.
Since deep-map uses lodash/template i thought maybe there is a way to use the capabilities of lodash to solve this problem but so far i didnt find how to combine those two ( lodash + deep-map ) to solve the more complex mapping.
I am also open to other libraries which might help me with this problem.
Lodash has an at method that will do the simple cases of this. It lets you specify things like field.subfieldThatsAnArray[1].subsubfield and get the value.
For less of a "built in your own" solution, there's a couple bits of tech that might be interesting here too:
JSONPath
SelectTransform
There's some others out there, but JSONPath is used some other places in the space, and SelectTransform just looks cool.

Postgresql from Json Object to array

How can I convert a Postgresql stored json of this form
{"Kategorie": [{"ID": "environment", "ID": "economy"}]}
to get ["environment", "economy"] only using Postgresqls json flavoured syntax. The array in the stored source has here two elements, but may contain more (or only one). And the resulting array should result in all value elements.
This may give you something to work with:
SELECT ARRAY(select json_extract_path_text(x, 'ID') from
json_array_elements(
'{"Kategorie": [{"ID": "environment"}, {"ID": "economy"}]}'::json->'Kategorie')
as x)
The result is a text array:
{environment,economy}
It is entirely possible that there's a cleaner way to do this :)
The JSON operators documentation has the details. (This is 9.3+ only, 9.2 had very few utility functions.)