How to parse json array in postgres - json

I am trying to parse out the language for different profiles that are stored in a json field named "data". They are stored in their own array in the json field like:
"languages": ["EN", "BN", "HI"]
I can call the whole array by using:
data->>'languages' as languages
but I would like to split it out into
language1 = "EN"
language2 = "BN"
language3 = "HI"
I think if possible the best solution would be to return the whole language array but exclude "EN" from it, but I'm not sure if that is possible. ex.
"languages": ["BN", "HI"]

You can use the - operator to remove the EN key:
select (data -> 'languages') - 'EN' as languages
from the_table;
Online example

Related

What's the correct JsonPath expression to search a JSON root object using Newtonsoft.Json.NET?

Most examples deal with the book store example from Stefan Gössner, however I'm struggling to define the correct JsonPath expression for a simple object (no array):
{ "Id": 1, "Name": "Test" }
To check if this json contains Id = 1.
I tried the following expression: $..?[(#.Id == 1]), but this does find any matches using Json.NET?
Also tried Manatee.Json for parsing, and there it seems the jsonpath expression could be like $[?($.Id == 1)] ?
The path that you posted is not valid. I think you meant $..[?(#.Id == 1)] (some characters were out of order). My answer assumes this.
The JSON Path that you're using indicates that the item you're looking for should be in an array.
$ start
.. recursive search (1)
[ array item specification
?( item-based query
#.Id == 1 where the item is an object with an "Id" with value == 1 at the root
) end item-based query
] end array item specification
(1) the conditions following this could match a value no matter how deep in the hierarchy it exists
You want to just navigate the object directly. Using $.Id will return 1, which you can validate in your application.
All of that said...
It sounds to me like you want to validate that the Id property is 1 rather than to search an array for an object where the Id property is 1. To do this, you want JSON Schema, not JSON Path.
JSON Path is a query language for searching for values which meet certain conditions (e.g. an object where Id == 1.
JSON Schema is for validating that the JSON meet certain requirements (your data's in the right shape). A JSON Schema to validate that your object has a value of 1 could be something like
{
"properties": {
"Id": {"const":1}
}
}
Granted this isn't very useful because it'll only validate that the Id property is 1, which ideally should only be true for one object.

How do I search for a specific string in a JSON Postgres data type column?

I have a column named params in a table named reports which contains JSON.
I need to find which rows contain the text 'authVar' anywhere in the JSON array. I don't know the path or level in which the text could appear.
I want to just search through the JSON with a standard like operator.
Something like:
SELECT * FROM reports
WHERE params LIKE '%authVar%'
I have searched and googled and read the Postgres docs. I don't understand the JSON data type very well, and figure I am missing something easy.
The JSON looks something like this.
[
{
"tileId":18811,
"Params":{
"data":[
{
"name":"Week Ending",
"color":"#27B5E1",
"report":"report1",
"locations":{
"c1":0,
"c2":0,
"r1":"authVar",
"r2":66
}
}
]
}
}
]
In Postgres 11 or earlier it is possible to recursively walk through an unknown json structure, but it would be rather complex and costly. I would propose the brute force method which should work well:
select *
from reports
where params::text like '%authVar%';
-- or
-- where params::text like '%"authVar"%';
-- if you are looking for the exact value
The query is very fast but may return unexpected extra rows in cases when the searched string is a part of one of the keys.
In Postgres 12+ the recursive searching in JSONB is pretty comfortable with the new feature of jsonpath.
Find a string value containing authVar:
select *
from reports
where jsonb_path_exists(params, '$.** ? (#.type() == "string" && # like_regex "authVar")')
The jsonpath:
$.** find any value at any level (recursive processing)
? where
#.type() == "string" value is string
&& and
# like_regex "authVar" value contains 'authVar'
Or find the exact value:
select *
from reports
where jsonb_path_exists(params, '$.** ? (# == "authVar")')
Read in the documentation:
The SQL/JSON Path Language
jsonpath Type

How to convert Mnesia query results to a JSON'able list?

I am trying to use JSX to convert a list of tuples to a JSON object.
The list items are based on a record definition:
-record(player, {index, name, description}).
and looks like this:
[
{player,1,"John Doe","Hey there"},
{player,2,"Max Payne","I am here"}
]
The query function looks like this:
select_all() ->
SelectAllFunction =
fun() ->
qlc:eval(qlc:q(
[Player ||
Player <- mnesia:table(player)
]
))
end,
mnesia:transaction(SelectAllFunction).
What's the proper way to make it convertable to a JSON knowing that I have a schema of the record used and knowing the structure of tuples?
You'll have to convert the record into a term that jsx can encode to JSON correctly. Assuming you want an array of objects in the JSON for the list of player records, you'll have to either convert each player to a map or list of tuples. You'll also have to convert the strings to binaries or else jsx will encode it to a list of integers. Here's some sample code:
-record(player, {index, name, description}).
player_to_json_encodable(#player{index = Index, name = Name, description = Description}) ->
[{index, Index}, {name, list_to_binary(Name)}, {description, list_to_binary(Description)}].
go() ->
Players = [
{player, 1, "John Doe", "Hey there"},
% the following is just some sugar for a tuple like above
#player{index = 2, name = "Max Payne", description = "I am here"}
],
JSON = jsx:encode(lists:map(fun player_to_json_encodable/1, Players)),
io:format("~s~n", [JSON]).
Test:
1> r:go().
[{"index":1,"name":"John Doe","description":"Hey there"},{"index":2,"name":"Max Payne","description":"I am here"}]

Parse complex Json string contained in Hadoop

I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. I found that complex JSON can be parsed by using Twitter's Elephant Bird or Mozilla's Akela library. (I found some additional libraries, but I cannot use 'Loader' based approach since I use HCatalog Loader to load data from Hive.)
But, the problem is the structure of my data; each value of Map structure contains value part of complex JSON. For example,
1. My table looks like (WARNING: type of 'complex_data' is not STRING, a MAP of <STRING, STRING>!)
TABLE temp_table
(
user_id BIGINT COMMENT 'user ID.',
complex_data MAP <STRING, STRING> COMMENT 'complex json data'
)
COMMENT 'temp data.'
PARTITIONED BY(created_date STRING)
STORED AS RCFILE;
2. And 'complex_data' contains (a value that I want to get is marked with two *s, so basically #'d'#'f' from each PARSED_STRING(complex_data#'c') )
{ "a": "[]",
"b": "\"sdf\"",
"**c**":"[{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},
{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},]"
}
3. So, I tried... (same approach for Elephant Bird)
REGISTER '/path/to/akela-0.6-SNAPSHOT.jar';
DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap();
data = LOAD temp_table USING org.apache.hive.hcatalog.pig.HCatLoader();
values_of_map = FOREACH data GENERATE complex_data#'c' AS attr:chararray; -- IT WORKS
-- dump values_of_map shows correct chararray data per each row
-- eg) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }])
([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ...
attempt1 = FOREACH data GENERATE JsonTupleMap(complex_data#'c'); -- THIS LINE CAUSE AN ERROR
attempt2 = FOREACH data GENERATE JsonTupleMap(CONCAT(CONCAT('{\\"key\\":', complex_data#'c'), '}'); -- IT ALSO DOSE NOT WORK
I guessed that "attempt1" was failed because the value doesn't contain full JSON. However, when I CONCAT like "attempt2", I generate additional \ mark with. (so each line starts with {\"key\": ) I'm not sure that this additional marks breaks the parsing rule or not. In any case, I want to parse the given JSON string so that Pig can understand. If you have any method or solution, please Feel free to let me know.
I finally solved my problem by using jyson library with jython UDF.
I know that I can solve it by using JAVA or other languages.
But, I think that jython with jyson is the most simplist answer to this issue.

Erlang : JSON List to JSON List

I have a list of JSON objects (received from a nosql db) and want to remove or rename some keys. And then I want to return data as a list of JSON objects once again.
This Stackoverflow post provides a good sense of how to use mochijson2. And I'm thinking I could use a list comprehension to go through the list of JSON objects.
The part I am stuck with is how to remove the key for each JSON object (or proplist if mochijson2 is used) within a list comprehension. I can use the delete function of proplists. But I am unsuccessful when trying to do that within a list comprehension.
Here is a bit code for context :
A = <<"[{\"id\": \"0129\", \"name\": \"joe\", \"photo\": \"joe.jpg\" }, {\"id\": \"0759\", \"name\": \"jane\", \"photo\": \"jane.jpg\" }, {\"id\": \"0929\", \"name\": \"john\", \"photo\": \"john.jpg\" }]">>.
Struct = mochijson2:decode(A).
{struct, JsonData} = Struct,
{struct, Id} = proplists:get_value(<<"id">>, JsonData),
Any suggestions illustrated with code much appreciated.
You can use lists:keydelete(Key, N, TupleList) to return a new tuple list with certain tuples removed. So in the list comprehension, for each entry extract the tuple lists (or proplists), and create a new struct with the key removed:
B = [{struct, lists:keydelete(<<"name">>, 1, Props)} || {struct, Props} <- Struct].
gives:
[{struct,[{<<"id">>,<<"0129">>},
{<<"photo">>,<<"joe.jpg">>}]},
{struct,[{<<"id">>,<<"0759">>},
{<<"photo">>,<<"jane.jpg">>}]},
{struct,[{<<"id">>,<<"0929">>},
{<<"photo">>,<<"john.jpg">>}]}]
and
iolist_to_binary(mochijson2:encode(B)).
gives:
<<"[{\"id\":\"0129\",\"photo\":\"joe.jpg\"},{\"id\":\"0759\",\"photo\":\"jane.jpg\"},{\"id\":\"0929\",\"photo\":\"john.jpg\"}]">>
By the way, using the lists/* tuple lists functions are much faster, but sometimes slightly less convenient than the proplists/* functions: http://sergioveiga.com/index.php/2010/05/14/erlang-listskeyfind-vs-proplistsget_value/