Couchbase - SELECT a subset of fields from array of objects - couchbase

I am using the travel-sample data set, and am running the following query:
SELECT id, schedule FROM `travel-sample`WHERE type = "route" LIMIT 1;
It is returning with the following results:
[
{
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
},
...
]
}
]
However, I don't want to return the schedule.$.day field; i.e. I want my results to be:
[
{
"id": 10000,
"schedule": [
{
"flight": "AF198",
"utc": "10:13:00"
},
{
"flight": "AF547",
"utc": "19:14:00"
},
...
]
}
]
How can I SELECT only a subset of object fields from an array of objects?
I have tried UNNEST but I don't want to have a separate record for each schedule element - I want the schedule elements to remain nested inside the document.
I have also tried using OBJECT_REMOVE
SELECT id, ARRAY OBJECT_REMOVE(x, 'day') FOR x in schedule END AS schedule FROM `travel-sample` WHERE type = "route" LIMIT 1;
But I want to whitelist rather than blacklist fields.

Your last attempt was close. Instead of using OBJECT_REMOVE, you can simply construct the object you want returned.
SELECT id, ARRAY {"flight": x.flight, "utc": x.utc} FOR x in schedule END AS schedule FROM `travel-sample` WHERE type = "route" LIMIT 1;
You will get the following results:
[
{
"id": 10000,
"schedule": [
{
"flight": "AF198",
"utc": "10:13:00"
},
{
"flight": "AF547",
"utc": "19:14:00"
},
...
]
}
]

Related

Using Recursive feature while Flattening in Snowflake

I have a JSON string, which needs to be parsed in order to retrieve particular values.Here is an example I am working with;
{
"assignable_type": "SHIPMENT",
"rule": {
"rules": [
{
"meta_data": {},
"rules": [
{
"op": "IN",
"target": "CLIENT_FID",
"type": "ARRAY_VALUE_ASSERTION",
"values": [
"flx::core:client:dbid/64171",
"flx::core:client:dbid/76049",
"flx::core:client:dbid/34040",
"flx::core:client:dbid/61806"
]
}
],
"type": "AND"
}
],
"type": "OR"
},
"type": "USER_DEFINED"
}
The goal is to get the values when "target":"CLIENT_FID".
Expected Output for this JSON file should be ;
["flx::core:client:dbid/64171",
"flx::core:client:dbid/76049",
"flx::core:client:dbid/34040",
"flx::core:client:dbid/61806"]
Here, as we can see rules is a list of dictionaries, and we can have nested lists as seen in the example.
Similarly, we have other JSON file of following type;
{
"assignable_type": "SHIPMENT",
"rule": {
"rules": [
{
"meta_data": {},
"rules": [
{
"op": "IN",
"target": "PORT_OF_ENTRY_FID",
"type": "ARRAY_VALUE_ASSERTION",
"values": [
"flx::core:port:dbid/566788",
"flx::core:port:dbid/566931",
"flx::core:port:dbid/561482"
]
}
],
"type": "AND"
},
{
"meta_data": {},
"rules": [
{
"op": "IN",
"target": "PORT_OF_LOADING_FID",
"type": "ARRAY_VALUE_ASSERTION",
"values": [
"flx::core:port:dbid/561465"
]
},
{
"op": "IN",
"target": "SHIPMENT_MODE",
"type": "ARRAY_VALUE_ASSERTION",
"values": [
0
]
},
{
"op": "IN",
"target": "CLIENT_FID",
"type": "ARRAY_VALUE_ASSERTION",
"values": [
"flx::core:client:dbid/28169"
]
}
],
"type": "AND"
}
],
"type": "OR"
},
"type": "USER_DEFINED"
}
For the second example ,
Expected Output shd be;
["flx::core:client:dbid/28169"]
As. seen, we may need to read the values at different depths in the file. In order to address this issue, I used following code;
/* first convert the string to a JSON object in cte1 */
with cte1 as (
select to_json(json_string) as json_rep,
parse_json(json_extract_path_text(json_rep, 'rule.rules')) as list_elem
from table 1),
cte2 as (select split_array,
json_extract_path_text(split_array, 'target') as target_client
from (
select json_rep,
list_elem,
t.value as split_array,
typeof(split_array) as obj_type,
index
from cte1,
table(flatten(cte1.list_elem, recursive=>true)) as t) temp /* use recursive feature */
where split_array ilike '%"target":"client_fid"%' /* filter for those rows containing this string */
and obj_type='OBJECT')
select
split_array,
json_extract_path_text(split_array, 'values') as client_values
from cte2
where target_client='CLIENT_FID'; /* filter the rows where we have the dictionary containing client fid */
In order to address the issue of varying depth at which client_fid is found we're recursing while flattening the string into rows. The output which is obtained for both of above inputs is provided below,
For the first String we get the actual output in variable client_values as
["flx::core:client:dbid/64171",
"flx::core:client:dbid/76049",
"flx::core:client:dbid/34040",
"flx::core:client:dbid/61806"]
Similarly, for the second string we get the actual output as
["flx::core:client:dbid/28169"]
As seen the code seems to be working in getting the correct output, but the way I filtered in the final query for target_client='CLIENT_FID'; it seems to be a very hacky way. Hence is it possible to get a better approach to resolve the issue of retrieving client fid values though the depth can vary in the given input.
Help is appreciated.

Extract value of Tags from cloudTrail logs using Athena

I am trying to query cloudtrail logs using Athena. My goal is to find specific instances and extract them with their Tags.
The query I am using is:
SELECT eventTime, awsRegion , json_extract(responseelements, '$.instancesSet.items[0].instanceId') AS instanceId, json_extract(responseelements, '$.instancesSet.items[0].tagSet.items') AS TAGS FROM cloudtrail_logs_PP WHERE (eventName = 'RunInstances' OR eventName = 'StartInstances' ) AND requestparameters LIKE '%mytest1%' AND "timestamp" BETWEEN '2021/09/01' AND '2021/10/01' ORDER BY eventTime;
Using this query - I am able to get all Tags under one column.
Output of query
I want to extract only specific Tags and need help in the same. How cam I extract the only specific Tag?
I tried enhancing my query as json_extract(responseelements, '$.instancesSet.items[0].tagSet.items[0]' but the order of Tags is diff in diff logs - so cant pass the index location.
My json file in S3 is something like below:
{
"eventVersion": "1",
"eventTime": "2022-05-27T18:44:29Z",
"eventName": "RunInstances",
"awsRegion": "us-east-1",
"requestParameters": {
"instancesSet": {
"items": [{
"imageId": "ami-1234545",
"keyName": "DDKJKD"
}]
},
"instanceType": "m5.2xlarge",
"monitoring": {
"enabled": false
},
"hibernationOptions": {
"configured": false
}
},
"responseElements": {
"instancesSet": {
"items": [{
"tagSet": {
"items": [ {
"key": "11",
"value": "DS"
}, {
"key": "1",
"value": "A"
}]
}]
}
}
}

Creating nodes from nested JSON using neo4J query

I'm new in neo4j and i have this json file:
{
"locations_connections": {
"locations": [
{
"id": "aws.us-east-1",
"longitude": 72.8777,
"latitude": 19.0760
},
{
"id": "aws.us-east-2",
"longitude": 126.9780,
"latitude": 37.5665
},
{
"id": "aws.us-west-1",
"longitude": 103.8517837,
"latitude": 1.287950
}
],
"connections": [
{
"aws.us-west-1": [
{
"id": "aws.us-west-1",
"latency": 3.16,
"cost": 0.02
},
{
"id": "aws.us-east-1",
"latency": 53.47,
"cost": 0.02
},
{
"id": "aws.us-east-2",
"latency": 53.47,
"cost": 0.02
}
]
},
{
"aws.us-east-1": [
{
"id": "aws.us-east-1",
"latency": 3.16,
"cost": 0.02
},
{
"id": "aws.us-east-2",
"latency": 53.47,
"cost": 0.02
}
]
},
{
"aws.us-east-2": [
{
"id": "aws.us-east-2",
"latency": 53.47,
"cost": 0.02
}
]
}
]
}
}
After reading the json using the apoc.load.json(URL) procedure , what query do I write to represent this as a graph?
where the Node will contain the information name like for example aws.us-east-1, value of longitude and value of latitude and the edges will have the latency and the cost
I have this code:
call apoc.load.json("/file.json") yield value
UNWIND value.locations_connections.locations as loc
UNWIND value.locations_connections.connections as con
MERGE (e:Element {id:loc.id}) ON CREATE
SET e.longitude = loc.longitude, e.latitude = loc.latitude
WITH con
FOREACH (region_source IN KEYS(con)|
FOREACH (data in con[region_source]|
MERGE (e1:Element {id: region_source})
MERGE (e1)<-[:CONNECT]-(e2:Element {id:data.id, latency:data.latency, cost:data.cost})
))
and the execution result is incorrect:
Added 9 labels, created 9 nodes, set 27 properties, created 6 relationships, completed after 60 ms.and I have seen this one,But this is not what I expected
You cannot use match inside a FOREACH so when you put MERGE and :CONNECT inside the for loop, it is creating multiple nodes. This is what I did and tell us if it works for you or not.
call apoc.load.json("/file.json") yield value
// read the json file
WITH value, value.locations_connections.locations as locs
// for loop to create the locations (or regions)
FOREACH (loc in locs | MERGE (e:Element {id:loc.id}) ON CREATE
SET e.longitude = loc.longitude, e.latitude = loc.latitude
)
// get the data for the connections
WITH value.locations_connections.connections as cons
UNWIND cons as con
// the keys and value are assigned to variables region and data
WITH KEYS(con)[0] as region_source, con[KEYS(con)[0]] as dat
// unwind is similar to a for loop
UNWIND dat as data
// look for the nodes that we want
MATCH (e1:Element {id: region_source}), (e2:Element {id: data.id})
// create the connection between regions
MERGE (e1)<-[:CONNECT {latency:data.latency, cost:data.cost}]-(e2)
RETURN e1, e2
See result below:

Issue with cts.jsonPropertyScopeQuery and cts.jsonPropertyValueQuery with data types and field order

I have MarkLogic 9 on my database.
I have created the following documents in my database:
test1.json
{
"users": [
{
"userId": "A",
"value": 0
}
]
}
test2.json
{
"users": [
{
"userId": "A",
"value": "0"
}
]
}
test3.json
{
"users": [
{
"value": 0,
"userId": "A"
}
]
}
test4.json
{
"users": [
{
"value": "0",
"userId": "A"
}
]
}
I have run the following codes and have recorded the results:
cts.uris(“”, null, cts.jsonPropertyScopeQuery(
"users",
cts.andQuery(
[
cts.jsonPropertyValueQuery('userId', "A"),
cts.jsonPropertyValueQuery('value', "0"),
]
)
))
Result: test2.json, test4.json
cts.uris(“”, null, cts.jsonPropertyScopeQuery(
"users",
cts.andQuery(
[
cts.jsonPropertyValueQuery('userId', "A"),
cts.jsonPropertyValueQuery('value', 0),
]
)
))
Result: test3.json
I was wondering why test1.json did not return in the 2nd query while test3.json did. They both had the same values for fields but in different order. The order of the fields are different in test2.json and test4.json, however, the query returned both documents. The only difference between the 2 pairs that I can think of is that there are 2 data types for the field “value”, integer and string.
How would I go about resolving this issue?
https://docs.marklogic.com/cts.jsonPropertyValueQuery shows the value to match as an array.
If you want to keep the variants in data, maybe you can try something on the query side like cts.jsonPropertyValueQuery('value', ["0", 0])

Add new fields to nested JSON array in JSONB

I have a nested JSON structure stored in a PostgreSQL table.
Table users:
id | content [JSON]
JSON:
{
"purchases": [
{
"id": 1,
"costs": [
{
"amount": 23
},
{
"amount": 34
}
]
},
{
"id": 2,
"costs": [
{
"amount": 42
}
]
}
]
}
I would like to add a field "jsonClass": "Static" to all the objects within the costs array so I have following in the end:
{
"purchases": [
{
"id": 1,
"costs": [
{
"jsonClass": "Static",
"amount": 23
},
{
"jsonClass": "Static",
"amount": 34
}
]
},
{
"id": 2,
"costs": [
{
"jsonClass": "Static",
"amount": 42
}
]
}
]
}
I couldn't figure out how to add values to such a nested structure. Anyone knows how to achieve such thing? The only way I found was to make it a text and do string replace which is not very performant and I have a lot of such entries.
Unfortunately, due to having to change multiple sub-objects, I don't know of a better way than to deconstruct and then reconstruct the object. It gets pretty hairy.
UPDATE users
SET content=(
SELECT jsonb_agg(purchase)
FROM (
SELECT jsonb_build_object('id', pid, 'purchases', jsonb_agg(cost)) AS purchase
FROM (
SELECT pid, cost || '{"jsonClass":"static"}'::jsonb AS cost
FROM (
SELECT purchase->'id' AS pid, jsonb_array_elements(purchase->'costs') AS cost
FROM jsonb_array_elements(content::jsonb->'purchases') AS purchase
) AS Q
) AS R
GROUP BY pid
) AS S
);
Fiddle
EDIT: Sorry about all the edits, forgot to test for multiple rows. Should be good now. It might be possible to simplify it a bit more, not sure.