How can I convert a Postgresql stored json of this form
{"Kategorie": [{"ID": "environment", "ID": "economy"}]}
to get ["environment", "economy"] only using Postgresqls json flavoured syntax. The array in the stored source has here two elements, but may contain more (or only one). And the resulting array should result in all value elements.
This may give you something to work with:
SELECT ARRAY(select json_extract_path_text(x, 'ID') from
json_array_elements(
'{"Kategorie": [{"ID": "environment"}, {"ID": "economy"}]}'::json->'Kategorie')
as x)
The result is a text array:
{environment,economy}
It is entirely possible that there's a cleaner way to do this :)
The JSON operators documentation has the details. (This is 9.3+ only, 9.2 had very few utility functions.)
Related
I've recently started using SQLite with the JSON1 extension which allows me to store and query dynamic/json data.
Lets take for example the following table and data structure:
# documents table:
--------------------------------------------
id | json
---- ------------------------------------
1 | [{"id": 1}, {"id": 2}, {"id": 3}]
2 | [{"id": 11}, {"id": 12}, {"id": 13}]
The problem I stumbled on is that there doesn't seems to be an easy way to query objects within an array without specifying an index. Or in other words, consider the following pseudo query:
SELECT *
FROM documents
WHERE json_extract(json, "$[*].id") > 1
# expect to return all rows that have json.*.id greater than 1
The above doesn't work because instead of [*] you have to specify a concrete array index.
One workaround of this could be to use json_each or json_tree but that can get pretty quickly out of hand if you have to handle nested array objects, eg. sub1.*.sub2.*.sub3.id
I found that the MySQL json data type supports [*] but I wasn't able to find anything similar for SQLite.
Is there some "hidden" syntax to specify [*] in json path queries for SQLite that I'm missing or this is a limitation of the JSON1 extension?
I want to extract from a JSON which has more JSONs nested inside, all the elements whose title is 'title2'. I have the code working on MySQL but I can not translate it into Redshift.
JSON structure:
{"master-title": [{"title": "a", "title2": "b"},{"title": "c", "title2: "d", "title3": "e"}], "master-title2": [{"title": "f", "title2": "g", "title3": "h"},{"title": "i", "title2": "j", "title3": "k"}]}
MySQL query (works as desired):
select id
,json_extract(myJSON, '$**.title2')),0)
from myTable
MySQL ouput:
["b", "d","g","j"]
My problem is that on Redshift I can only specifically define the path as:
JSON_EXTRACT_PATH_TEXT(myJSON, 'master-title2',0,'title')
So I can only get one element instead of all of them.
Any idea how to evaluate all paths and to get all elements in a JSON array which have the same "title2" using Redshift? (same output as in MySQL)
Thank you in advance.
Redshift has only a very rudimentary set to JSON manipulation functions (basically JSON_EXTRACT_PATH_TEXT and JSON_EXTRACT_ARRAY_ELEMENT_TEXT). It's not enough to deal with schemaless JSON.
Python UDF
If Redshift was my only mean of processing data I would give python UDF a try. You can code a function in imperative python. Then, having a column holding your json object and just call that function on all elements to do custom extraction.
Unnesting JSON arrays
Other options would be to really try to understand the schema and implement it using the two JSON funtions mentioned before (This SO answer will give you an idea on how to explode/unnest a JSON array in Redshift). Provided your JSON is not arbitrarily nested, but follows some patterns, this could work.
Regex (better don't)
Another desperate approach would be to try to extract your data with regex - could work for simple cases, but it's an easy way to shoot yourself in the foot.
Thanks for your answer.
I finally found a solution using Python. I hope it may help some others.
count=[x.count("title2") for x in df['myJSON'].tolist()]
I'm using PostgreSQL 9.3.6 and I have a json type column. Is there any way to find out whether we are dealing with an array or an object? I mean:
SELECT '["x", "y"]'::json
SELECT '{"0": "x", "1": "y"}'::json
Or maybe it's possible to translate one notation to another? The thing is that I have to extract values which are nested inside an array or an object.
I can call "json_each" on an object, but it fails on an array:
SELECT * FROM json_each('{"0": "x", "1": "y"}'::json)
Unfortunately I can't upgrade to 9.4... which means I can't use jsonb type or new operators to test it... I appreciate your help.
Am very new to JSON interaction, I have few doubts regarding it. Below are the basic one
1) How could we call/invoke/open JSON file through Unix, I mean let suppose I have a metedata file in JSON, then how should I fetch/update the value backforth from JSON file.
2) Need the example, on how to interact it.
3) How Unix Shell is compatible to JSON, whether is there any other tech/language/tool which is better than shell script.
Thanks,
Nikhil
JSON is just text following a specific format.
Use a text editor and follow the rules. Some editors with "JSON modes" will help with [invalid] syntax highlighting, indenting, brace matching..
A "Unix Shell" has nothing directly to do with JSON - how does a shell relate to text or XML files?
There are some utilities for dealing with JSON which might be of use such as jq - but it really depends on what needs to be done with the JSON (which is ultimately just text).
Json is a format to store strings, bools, numbers, lists, and dicts and combinations thereof (dicts of numbers pointing to lists containing strings etc.). Probably the data you want to store has some kind of structure which fits into these types. Consider that and think of a valid representation using the types given above.
For example, if your text configuration looks something like this:
Section
Name=Marsoon
Size=34
Contents
foo
bar
bloh
EndContents
EndSection
Section
Name=Billition
Size=103
Contents
one
two
three
EndContents
EndSection
… then this looks like a list of dicts which contain some strings and numbers and one list of strings. A valid representation of that in Json would be:
[
{
"Name": "Marsoon",
"Size": 34,
"Contents": [
"foo", "bar", "bloh"
]
},
{
"Name": "Billition",
"Size": 103,
"Contents": [
"one", "two", "three"
]
},
]
But in case you know that each such dictionary has a different Name and always the same fields, you don't have to store the field names and can use the Name as a key of a dictionary; so you can also represent it as a dict of strings pointing to lists containing numbers and lists of strings:
{
"Marsoon": [
34, [ "foo", "bar", "bloh" ]
],
"Billition": [
103, [ "one", "two", "three" ]
]
}
Both are valid representations of your original text configuration. How you'd choose depends mainly on the question whether you want to stay open for later changes of the data structure (the first solution is better then) or if you want to avoid bureaucratic overhead.
Such a Json can be stored as a simple text file. Use any text editor you like for this. Notice that all whitespace is optional. The last example could also be written in one line:
{"Marsoon":[34,["foo","bar","bloh"]],"Billition":[103,["one","two","three"]]}
So sometimes a computer-generated Json might be hard to read and would need an editor at least capable of handling very long lines.
Handling such a Json file in a shell script will not be easy just because the shell has no notion of the idea of such complex types. The most complicated it can handle properly is a dict of strings pointing to strings (bash arrays). So I propose to have a look for a more suitable language, e. g. Python. In Python you can handle all these structures quite efficiently and with very readable code:
import json
with open('myfile.json') as jsonFile:
data = json.load(jsonFile)
print data[0]['Contents'][2] # will print "bloh" in the first example
# or:
print data['Marsoon'][1][2] # will print "bloh" in the second example
{"filters":
[
[
"Color",
[
[
"Blue",
629,
"t12-15=blue"
],
[
"Green",
279,
"t12-15=green"
]
]
],
[
"Style",
[
[
"Contemporary / Modern",
331,
"t6-11=contemporary+modern"
],
[
"Transitional",
260,
"t6-11=transitional"
],
]
]
]}
This looks like a 4 dimensional array to me, but when I tried to use ServiceStack.Text.JsonSerializer to deserialize it, I do not get the expected result.
Looks like the values "Color" and "Style" are not in an array per se. What kind of Json structure is this?
It is indeed an array of arrays of arrays of arrays wrapped in an object. It's quite horrible, but I don't see why JsonSerializer would choke on it.
What kind of structure? Irregular, really. An object containing a field which contains an array which contains arrays in which the first item is a string, and the second item is an array containing two items which are arrays of string, number and string items .... phew!
Nothing wrong with this at all!
To me it just looks like an object containing an array of arrays that goes 4 levels deep, so its an object with one field that is a 4D array. if you want to get the 4D array you'll need to get the filters field from the json object returned
It looks approximately like that Jackson might produce when you ask it to use arrays instead of objects to represent maps.
Style and Color are just the first index of the arrays that they're in. This is almost certainly NOT what was intended. If anything Style should probably be the label (object) for the next Array rather than its sibling, no? The obvious follow-up question is who/what is producing this JSON, and who/what is consuming it... as bmarguiles asks... what *is your expectation here, given that the JSON is syntactically valid, if it's not doing what you expect... what do you expect?
Edit based on your comment:
well... since you seem to be able to rely on the fact that these are all nested arrays and that the label that you're looking for is going to be the 0th index of whatever array, you can just recurse into the array looking for that label and then treating the array that is at the 1st index with the assumption that it will contain the data you expect. It's ugly, but it looks like it will work (so long as the service that generates this doesn't change). C# has a JSON deserializer james.newtonking.com/pages/json-net.aspx . You should just use that.