Get the value from nested JSON in Postgres - json

I have a table called "Audio" with a column "transcript" as the following:
{"transcript": [
{"p": 0, "s": 0, "e": 320, "c": 0.545, "w": "This"},
{"p": 1, "s": 320, "e": 620, "c": 0.825, "w": "call"},
{"p": 2, "s": 620, "e": 780, "c": 0.909, "w": "is"},
{"p": 3, "s": 780, "e": 1010, "c": 0.853, "w": "being"}
...
]}
I would like to get the value of "p" where "w" matches certain keywords.
If I do the following query, it will give me the entire 's' entries of Audio where one of its "w" has words "google" or "all."
select json_array_elements(transcript->'transcript')->>'s'
from Audio,
json_array_elements(transcript->'transcript') as temp
where temp->>'w' ilike any(array['all','google'])
How could I get only value of "p" where the condition is satisfied?
Edit:
How could I get the value of "p" and its corresponding Audio ID at the same time?

Select your transcript array elements into a common table expression and match from there:
WITH transcript AS (
SELECT json_array_elements((transcript -> 'transcript')) AS line
FROM audio
)
SELECT line ->> 'p'
FROM transcript
WHERE line ->> 'w' ILIKE ANY (ARRAY ['all', 'google']);
This will select matching lines from all rows in the audio table. I'm guessing that you'll want to restrict the results to a subset of rows, in which case you'll have to narrow the query. Assuming an id column, do something like this:
WITH transcript AS (
SELECT
id,
json_array_elements((transcript -> 'transcript')) AS line
FROM audio
WHERE id = 1
)
SELECT
id,
line ->> 'p'
FROM transcript
WHERE line ->> 'w' ILIKE ANY (ARRAY ['call', 'google'])

Related

Extract value from its key using SQL

I want to extract value from a column using its key. The data is stored as a dictionary. I want the values of "hotel_id" from the column.
Data in column is stored like this: (Examples)
["search_params": {"region": "PK", "sort_by": "POPULARITY", "currency": "PKR", "hotel_id": "688867", "slot_only": false, "segment_id": "6f9a9cc5-be52-4ae4-b5d2-cc19c6753085", "occupancies": [{"rooms": 1, "adults": 2, "children": 0, "child_age_list": []}], "check_in_date": "2022-11-13"]
[ "search_result": {"hotel_dict": {"241825": {"source": 3, "hotel_id": 241825, "availability": {"rooms": {"rooms_id_list": [3187909, 3187910, 3187911, 3187912], "rooms_data_dict": {"3187909": {"room_id": 3187909, "rate_options": [{"adult": 3, "child": 0, "bar_rate": 7500, "rate_key": "542984|991065", "extra_bed": 0, "occupancy": {"2": 1}]
I tried a query but it works for specific number of characters and does not ignore spaces and commas (dirty data).
SELECT substr(substr(info, instr(info, 'hotel_id') + 11),
2,
instr(substr(info, instr(info, 'hotel_id') + 1), '"') - 2
) result
from df
The result I get is:
result
0 688867
1 41825,
2 41771,
3 394910
4 ull, "
5 394910

How to ORDER BY a JSON value?

I have this column named "data" and has some JSON in it.
What i want to do is order my SQL query by the "toptimes" value.
My actual and desired query:
"SELECT core_members.pp_thumb_photo,name,member_group_id,data FROM game_accounts.accounts INNER JOIN website_accounts.core_members ON member_id = account_id WHERE member_group_id IN (4, 7, 8, 6) ORDER BY data ->> '$[0].toptimes' ASC LIMIT 100"
My JSON code:
[ { "daily_login": { "yearday": 56, "hour": 11, "second": 33, "minute": 18, "weekday": 3, "month": 1, "monthday": 26, "timestamp": 1582715913, "year": 120, "isdst": 0 }, "toptimes": 49, "daily_login_streak": 1, "hunters": 59, "playtime": 226099647, "awards": [ ], "nickname": "RandomNick" } ]
It has to be something on these lines:
ORDER BY JSON_VALUE(data,'$.daily_login.toptimes)
Access toptimes through daily_login within the JSON object.
Presumably, you want:
order by data ->> '$[0].toptimes'
This will order the resultset according to the value of toptimes in the first element of your JSON array.
If you are storing a JSON object and not an array (although this is not what you showed in yuour sample data), then:
order by data ->> '$.toptimes'
I had a problem, only for MS SQL. It helped to convert a string to a number.
SELECT TOP (1000) [Uuid],
JSON_VALUE(json, '$.likesCount') as likesCount,
FROM [dbo].[Playlists]
order by CONVERT(bigint, JSON_VALUE(json, '$.likesCount')) desc

Sum of json array

I have json type field, something like this
data
{"age": 44, "name": "Jun"}
{"age": 19, "name": "Pablo", "attempts": [11, 33, 20]}
{"age": 33, "name": "Maria", "attempts": [77, 10]}
Here some json data have "attempts" array, some not. When json have this array, I need get sum of array elements in different field, need result like
data , sum_of_array
{"age": 44, "name": "Jun"} , (nothing here)
{"age": 19, "name": "Pablo", "attempts": [11, 33, 20]} , 64
{"age": 33, "name": "Maria", "attempts": [77, 10]} , 87
SELECT attempts.id,
sum(vals.v::integer) sum_attempts
FROM attempts
LEFT JOIN LATERAL jsonb_array_elements_text(val->'attempts') vals(v)
ON TRUE
GROUP BY attempts.id;
Use json_array_elements_text if you are using json instead of jsonb.
This works if you have unique id identity column in your table
SELECT your_table.*, tt.sum FROM your_table
LEFT JOIN (
select id, SUM(arrvals) as sum FROM (
select id, json_array_elements_text(CAST(your_json_column->>'attempts' AS json))::NUMERIC as arrvals from your_table
)t
group by id
) tt
ON your_table.id = tt.id

Unknown duplicates from querying a nested JSON

I would like to do text search in a JSON object in a table.
I have a table called Audio that is structured like below:
id| keyword | transcript | user_id | company_id | client_id
-----------------------------------------------------------
This is the JSON data structure of transcript:
{"transcript": [
{"duration": 2390.0,
"interval": [140.0, 2530.0],
"speaker": "Speaker_2",
"words": [
{"p": 0, "s": 0, "e": 320, "c": 0.545, "w": "This"},
{"p": 1, "s": 320, "e": 620, "c": 0.825, "w": "call"},
{"p": 2, "s": 620, "e": 780, "c": 0.909, "w": "is"},
{"p": 3, "s": 780, "e": 1010, "c": 0.853, "w": "being"},
{"p": 4, "s": 1010, "e": 1250, "c": 0.814, "w": "recorded"}
]
},
{"duration": 4360.0,
"interval": [3280.0, 7640.0],
"speaker": "Speaker_1",
"words": [
{"p": 5, "s": 5000, "e": 5020, "c": 0.079, "w": "as"},
{"p": 6, "s": 5020, "e": 5100, "c": 0.238, "w": "a"},
{"p": 7, "s": 5100, "e": 5409, "c": 0.689, "w": "group"},
{"p": 8, "s": 5410, "e": 5590, "c": 0.802, "w": "called"},
{"p": 9, "s": 5590, "e": 5870, "c": 0.834, "w": "tricks"}
]
},
...
}
What I am trying to do is to do a text search in the "w" field within "words". This is the query that I tried to run:
WITH info_data AS (
SELECT transcript_info->'words' AS info
FROM Audio t, json_array_elements(transcript->'transcript') AS transcript_info)
SELECT info_item->>'w', id
FROM Audio, info_data idata, json_array_elements(idata.info) AS info_item
WHERE info_item->>'w' ilike '%this';
Right now I only have four columns with data and the fifth column is null. And there are five columns in total. However, I got the following result where even the column that doesn't have data results an output:
?column? | id
----------+----
This | 2
This | 5
This | 1
This | 3
This | 4
This | 2
This | 5
I would love to know what the problem of my query is and whether there are more efficient way in doing this.
The problem is that you make a cartesian join between table Audio on the one hand and info_data and info_item on the other hand (there is an implicit lateral join between these latter two) here:
FROM Audio, info_data idata, json_array_elements(idata.info) AS info_item
You can solve this by adding Audio.id to the CTE and then adding WHERE Audio.id = info_data.id.
It is doubtful that this is the most efficient solution (CTEs rarely are). If you just want to get those rows where the word "this" is a word in the transcript, then you are most likely better off like this:
SELECT DISTINCT id
FROM (
SELECT id, transcript_info->'words' AS info
FROM Audio, json_array_elements(transcript->'transcript') AS transcript_info) AS t,
json_array_elements(info) AS words
WHERE words->>'w' ILIKE 'this';
Note that the % in the pattern string is very inefficient. Since very few words in the English language other than "this" end with the same, I have taken the liberty of removing it.

How can I merge records inside two JSON arrays?

I have two Postgres SQL queries returning JSON arrays:
q1:
[
{"id": 1, "a": "text1a", "b": "text1b"},
{"id": 2, "a": "text2a", "b": "text2b"},
{"id": 2, "a": "text3a", "b": "text3b"},
...
]
q2:
[
{"id": 1, "percent": 12.50},
{"id": 2, "percent": 75.00},
{"id": 3, "percent": 12.50}
...
]
I want the result to be a union of both array unique elements:
[
{"id": 1, "a": "text1a", "b": "text1b", "percent": 12.50},
{"id": 2, "a": "text2a", "b": "text2b", "percent": 75.00},
{"id": 3, "a": "text3a", "b": "text3b", "percent": 12.50},
...
]
How can this be done with SQL in Postgres 9.4?
Assuming data type jsonb and that you want to merge records of each JSON array that share the same 'id' value.
Postgres 9.5
makes it simpler with the new concatenate operator || for jsonb values:
SELECT json_agg(elem1 || elem2) AS result
FROM (
SELECT elem1->>'id' AS id, elem1
FROM (
SELECT '[
{"id":1, "percent":12.50},
{"id":2, "percent":75.00},
{"id":3, "percent":12.50}
]'::jsonb AS js
) t, jsonb_array_elements(t.js) elem1
) t1
FULL JOIN (
SELECT elem2->>'id' AS id, elem2
FROM (
SELECT '[
{"id": 1, "a": "text1a", "b": "text1b", "percent":12.50},
{"id": 2, "a": "text2a", "b": "text2b", "percent":75.00},
{"id": 3, "a": "text3a", "b": "text3b", "percent":12.50}]'::jsonb AS js
) t, jsonb_array_elements(t.js) elem2
) t2 USING (id);
The FULL [OUTER] JOIN makes sure you don't lose records without match in the other array.
The type jsonb has the convenient property to only keep the latest value for each key in the record. Hence, the duplicate 'id' key in the result is merged automatically.
The Postgres 9.5 manual also advises:
Note: The || operator concatenates the elements at the top level of
each of its operands. It does not operate recursively. For example, if
both operands are objects with a common key field name, the value of
the field in the result will just be the value from the right hand operand.
Postgres 9.4
Is a bit less convenient. My idea would be to extract array elements, then extract all key/value pairs, UNION both results, aggregate into a single new jsonb values per id value and finally aggregate into a single array.
SELECT json_agg(j) -- ::jsonb
FROM (
SELECT json_object_agg(key, value)::jsonb AS j
FROM (
SELECT elem->>'id' AS id, x.*
FROM (
SELECT '[
{"id":1, "percent":12.50},
{"id":2, "percent":75.00},
{"id":3, "percent":12.50}]'::jsonb AS js
) t, jsonb_array_elements(t.js) elem, jsonb_each(elem) x
UNION ALL -- or UNION, see below
SELECT elem->>'id' AS id, x.*
FROM (
SELECT '[
{"id": 1, "a": "text1a", "b": "text1b", "percent":12.50},
{"id": 2, "a": "text2a", "b": "text2b", "percent":75.00},
{"id": 3, "a": "text3a", "b": "text3b", "percent":12.50}]'::jsonb AS js
) t, jsonb_array_elements(t.js) elem, jsonb_each(elem) x
) t
GROUP BY id
) t;
The cast to jsonb removes duplicate keys. Alternatively you could use UNION to fold duplicates (for instance if you want json as result). Test which is faster for your case.
Related:
How to turn json array into postgres array?
Merging Concatenating JSON(B) columns in query
For any single jsonb element this use of the concat || operator works well for me with strip_nulls and another trick to cast the result back to jsonb (not an array).
select jsonb_array_elements(jsonb_strip_nulls(jsonb_agg(
'{
"a" : "unchanged value",
"b" : "old value",
"d" : "delete me"
}'::jsonb
|| -- The concat operator works as merge on jsonb, the right operand takes precedence
-- NOTE: it only works one JSON level deep
'{
"b" : "NEW value",
"c" : "NEW field",
"d" : null
}'::jsonb
)));
This gives the result
{"a": "unchanged value", "b": "NEW value", "c": "NEW field"}
which is properly typed jsonb