Transform JSON array to a JSON map - json

I have a postgresql table called datasource with jsonb column called config. It has the following structure:
{
"url":"some_url",
"password":"some_password",
"username":"some_username",
"projectNames":[
"project_name_1",
...
"project_name_N"
]
}
I would like to transform nested json array projectNames into a map and add a default value for each element from the array, so it would look like:
{
"url":"some_url",
"password":"some_password",
"username":"some_username",
"projectNames":{
"project_name_1": "value",
...
"project_name_N": "value"
}
}
I have selected projectNames from the table using postgresql jsonb operator config#>'{projectNames}', but I have no idea how to perform transform operation.
I think, I should use something like jsonb_object_agg, but it converts all data into a single row.
I'm using PostgreSQL 9.6 version.

You need to first unnest the array, then build a new JSON document from that. Then you can put that back into the column.
update datasource
set config = jsonb_set(config, '{projectNames}', t.map)
from (
select id, jsonb_object_agg(pn.n, 'value') as map
from datasource, jsonb_array_elements_text(config -> 'projectNames') as pn (n)
group by id
) t
where t.id = datasource.id;
The above assumes that there is a primary (or at least unique) column named id. The inner select transforms the array into a map.
Online example: http://rextester.com/GPP85654

are you looking for smth like:
t=# with c(j) as (values('{
"url":"some_url",
"password":"some_password",
"username":"some_username",
"projectNames":[
"project_name_1",
"project_name_N"
]
}
'::jsonb))
, n as (select j,jsonb_array_elements_text(j->'projectNames') a from c)
select jsonb_pretty(jsonb_set(j,'{projectNames}',jsonb_object_agg(a,'value'))) from n group by j
;
jsonb_pretty
------------------------------------
{ +
"url": "some_url", +
"password": "some_password", +
"username": "some_username", +
"projectNames": { +
"project_name_1": "value",+
"project_name_N": "value" +
} +
}
(1 row)
Time: 19.756 ms
if so, look at:
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://www.postgresql.org/docs/current/static/functions-json.html

Related

Unnest non-array JSON in BigQuery

I have data arriving as separate events in JSON form resembling:
{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}
I'd like to unnest the data to essentially have one row per 'packet' (there may be any number of packets).
id
name
value
1234
packet1
1
1234
packet2
2
I've looked at using the unnest function with the various JSON functions but it seems limited to working with arrays. I have not been able to find a way to treat the 'data' field as if it were an array.
At the moment, I cannot change these events to store packets in an array, and ideally the unnesting should be happening within BigQuery.
1. Regular expressions
There might be other ways but you can consider below approach using regular expressions.
WITH sample_table AS (
SELECT """{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}""" AS events
)
SELECT JSON_VALUE(events, '$.id') AS id, name, value
FROM sample_table,
UNNEST (REGEXP_EXTRACT_ALL(events, r'"name":"(\w+)"')) name WITH offset
JOIN UNNEST (REGEXP_EXTRACT_ALL(events, r'"value":([0-9.]+)')) value WITH offset
USING (offset);
Query results
2. Javascript UDF
or, you might consider below using Javascript UDF.
CREATE TEMP FUNCTION extract_pair(json STRING)
RETURNS ARRAY<STRUCT<name STRING, value STRING>>
LANGUAGE js AS """
result = [];
for (const [key, value] of Object.entries(JSON.parse(json))) {
result.push(value);
}
return result;
""";
WITH sample_table AS (
SELECT """{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}""" AS events
)
SELECT JSON_VALUE(events, '$.id') AS id, obj.*
FROM sample_table, UNNEST(extract_pair(JSON_QUERY(events, '$.data'))) obj;
#Jaytiger's suggestion of unnesting a regex extract led me to the following solution.
The example I showed was simplified - there are more fields within the packets. To avoid requiring separate regex for each field name, I used regex to split/extract each individual packet, and then read the JSON.
This iteration doesn't do everything in one step but works when just looking at packets.
with sample_data
AS (SELECT """{"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}}""" as packets)
select
json_value('{'||packet||'}', "$.name") name,
json_value('{'||packet||'}', "$.value") value
from sample_data,
unnest(regexp_extract_all(packets, r'\:{(.*?)\}')) packet

Postgresql: How to get value in JSON where key LIKE?

I have a column in my table containing a JSON statuses_json:
{
"demoStatus" : "true",
"productionStatus": "false"
}
I would like to retrieve a value where the key is LIKE some string.
For example, if I pass in "demo", I want to retrieve the value for the key demoStatus.
Right now I am able to retrieve values when passing the exact key:
`statuses_json->>'productionStatus' = 'false' `;
Extract the keys and run a query on it:
select *
from json_object_keys('{
"demoStatus" : "true",
"productionStatus": "false"
}') k where k like '%demo%';
I don't have a new enough version of postgresql but jsonb_path_query looks interesting, too. Then used statuses_json->>(...) to extract the corresponding value(s).
select statuses_json from your_table
where statuses_json->>(
select prop
from json_object_keys(statuses_json) as prop
where prop like 'demo%'
) = 'false';

mySQL/Sequelize - how to query to get data whose field is empty array

My http request will return below data:
It returns below data:
Users.js
{
{
...
friends:[]
},
{
...
friends:[{id:xxx,...},...]
},
{
...
friends:[]
},
}
If I want to use query to get all data whose friends array is [],
should I do below query.
select * from users where (what should I write here)
If friends is a direct column in your database is JSON array. You can use JSON_LENGTH to find out the length of array.
SELECT JSON_LENGTH('[1, 2, {"a": 3}]'); // Output: 3
SELECT JSON_LENGTH('[]'); // Output: 0
You can use same concept to get data from your database.
select *
FROM users
WHERE JSON_LENGTH(friends) = 0;
If you've nested JSON and one of key is friends in that json for given column(data) then your query would be like using JSON_CONTAINS
SELECT *
FROM users
WHERE JSON_CONTAINS(data, JSON_ARRAY(), '$.friends') -- To check do we have `friends` as key in that json
and JSON_LENGTH(data, '$.friends') = 0; -- To check whether it is empty array.
Now you can convert it to sequelize query. One of the way you can use is
Model.findAll({
where: {
[Op.and]: [
Sequelize.literal('RAW SQL STATEMENT WHICH WONT BE ESCAPED!!!')
]
}
})
Make sure to update Model with your user model and query.

T-SQL - search in filtered JSON array

SQL Server 2017.
Table OrderData has column DataProperties where JSON is stored. JSON example stored there:
{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Value": [
"test.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Value": [
"a"
]
}
]
}
}
So, it's an object with Input object, which has Data array that's KVP - full of objects with Key string and Value array of strings.
And my problem - I need to query for rows based on values in Files in example JSON - simple LIKE that matches %text%.
This query works:
SELECT TOP 10 *
FROM OrderData CROSS APPLY OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0
AND JSON_QUERY(dat.value, '$.Value') LIKE '%2%'
Problem is that this query is very slow, unsurprisingly.
How to make it faster?
I cannot create computed column with JSON_VALUE, because I need to filter in an array.
I cannot create computed column with JSON_QUERY on "$.Input.Data" or "$.Input.Data[0].Values" - because I need specific array item in this array with Key == "Files".
I've searched, but it seems that you cannot create computed column that also filters data, like with this attempt:
ALTER TABLE OrderData
ADD aaaTest AS (select JSON_QUERY(dat.value, '$.Value')
OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0 );
Error: Subqueries are not allowed in this context. Only scalar expressions are allowed.
What are my options?
Add Files column with an index and use INSERT/UPDATE triggers that populate this column on inserts/updates?
Create a view that "computes" this column? Can't add index, will still be slow
So far only option 1. has some merit, but I don't like triggers and maybe there's another option?
You might try something along this:
Attention: I've added a 2 to the text2 to fullfill your filter. And I named both to the plural "Values":
DECLARE #mockupTable TABLE(ID INT IDENTITY, DataProperties NVARCHAR(MAX));
INSERT INTO #mockupTable VALUES
(N'{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Values": [
"test2.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Values": [
"a"
]
}
]
}
}');
The query
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
AND dat.[Values] LIKE '%2%';
The main difference is the WITH-clause, which is used to return the properties inside an object in a typed way and side-by-side (similar to a naked OPENJSON with a PIVOT for all columns - but much better). This avoids expensive JSON methods in your WHERE...
Hint: As we return the Value with NVARCHAR(MAX) AS JSON we can continue with the nested array and might proceed with something like this:
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
--we read the array again with `OPENJSON`:
AND 'test2.txt' IN(SELECT [Value] FROM OPENJSON(dat.[Values]));
You might use one more CROSS APPLY to add the array's values and filter this at the WHERE directly.
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
CROSS APPLY OPENJSON(dat.[Values]) vals
WHERE dat.[Key] = 'Files'
AND vals.[Value]='test2.txt'
Just check it out...
This is an old question, but I would like to revisit it. There isn't any mention of how the source table is actually constructed in terms of indexing. If the original author is still around, can you confirm/deny what indexing strategy you used? For performant json document queries, I've found that having a table using the COLUMSTORE indexing strategy yields very performant JSON queries even with large amounts of data.
https://learn.microsoft.com/en-us/sql/relational-databases/json/store-json-documents-in-sql-tables?view=sql-server-ver15 has an example of different indexing techniques. For my personal solution I've been using COLUMSTORE albeit on a limited NVARCAHR document size. It's fast enough for any purposes I have even under millions of rows of decently sized json documents.

Mapping JSON in Hive where Nested Fields Have Underscores

We are attempting to create a schema to load a massive JSON structure into Hive. We are having a problem, however, in that some fields have leading underscores for names--at the root level, this is fine, but we have not found a way to make this work for nested fields.
Sample JSON:
{
"_id" : "319FFE15FF908EDD86B7FDEADBEEFBD8D7284128841B14AA6A966923C268DF39",
"SomeThing" :
{
"_SomeField" : 22,
"AnotherField" : 2112,
"YetAnotherField": 1
}
. . . etc . . . .
Using a schema as follows:
create table testSample
(
id string,
something struct
<
somefield:int,
anotherfield:bigint,
yetanotherfield:int
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties
(
"mapping.id" = "_id",
"mapping.somefield" = "_somefield"
);
This schema builds OK--however, after loading the in above sample, the value of "somefield" (the nested + leading underscore one) is always null (all the other values exist and are correct).
We've been trying a lot of syntax combinations, but to no avail.
Does anyone know the trick to hap a nested field with a leading underscore in its name?
Cheers!
Answering my own question here: there is no trick because you can't.
However, there's an easy work-around: you can tell Hive to treat the names as literals upon creating the schema. If you do this, you will also need to query using the same literal syntax. In the above example, it would look like:
`_something` struct<rest_of_definitions>
without any special serde properties for it.
Then use again in query:
select stuff.`_something` from sometable;
e.g., schema:
create table testSample
(
id string,
something struct
<
`_somefield`:int,
anotherfield:bigint,
yetanotherfield:int
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties("mapping.id" = "_id");
for an input JSON like:
{
"_id": "someuid",
"something":
{
"_somefield": 1,
"anotherfield": 2,
"yetanotherfield": 3
}
}
with a query like:
select something.`_somefield`
from testSample
where something.anotherfield = 2;