NEST in COUCHBASE - couchbase

How to nest data in couhcbase similar to mongo DB we do on reference key.
we have two table In a bucket first table is "CHAIN", and second table is "STORE".
I am MONGO user previously and very new to couchbase.
Please suggest how I can nest using N1QL for couchbase
Table 1 CHAIN
{
"chId": "chid_1",
"chName": "Walmart",
"type": "CHAIN"
}
2nd table STORE
{
"chId": "chid_1",
"csName": "store1",
"type": "STORE"
}
{
"chId": "chid_1",
"csName": "store2",
"type": "STORE"
}
I want to get data by joing these table as
{
"chId": "chid_1",
"chName": "Walmart",
"type": "CHAIN",
"stores": [
{"csName": "store1", "type": "STORE"},
{"csName": "store2", "type": "STORE"}]
}

Use JOIN, GROUP BY. Also checkout https://blog.couchbase.com/ansi-join-support-n1ql/
CREATE INDEX ix1 ON (chId) WHERE type = "CHAIN";
CREATE INDEX ix2 ON (chId) WHERE type = "STORE";
SELECT c.*, ARRAY_AGG({s.type, s.csName}) AS stores
FROM default AS c
JOIN default AS s ON c.chId = s.chId
WHERE c.type = "CHAIN" AND s.type = "STORE"
GROUP BY c;
You Can also use ANSI NEST if you want include whole document
SELECT c.*, s AS stores
FROM default AS c
NEST default AS s ON c.chId = s.chId AND s.type = "STORE"
WHERE c.type = "CHAIN";

Related

Modify JSON files in GCS bucket to change the datatype of a field from String to Array (GCP)

I have a use case where we are receiving millions of JSON files into our GCS bucket. I am creating an external table on top of the GCS bucket. problem is for one particular field the data type is not consistent.
few files have string and other has Array.
My question is
example:
can we alter the json to make these strings to Array ? or any other recommendation to handle this
**string**:
"ing": {
"info": "abc,def",
"details": []
},
**array**:
"ing": {
"info": [
"abc,def",
"abc,efg"
],
"details": []
},
I tried by adding the [] to string value and queryng the external table it works . But need a way to efficiently alter the 1M json files to add brackets.
am expecting move this data from external table into biquery table
I hope below query gives you a hint to handle your problem
WITH sample_table AS (
SELECT '{"ing": {"enfo": "abc,def", "details": []}}' json UNION ALL
SELECT '{"ing": {"info": "abc,def", "details": []}}' json UNION ALL
SELECT '{"ing": {"info": ["abc,def", "abc,efg"], "details": []}}' UNION ALL
SELECT '{"ing": {"info": null, "details": []}}'
)
SELECT COALESCE(
JSON_VALUE_ARRAY(json, '$.ing.info'),
ARRAY(SELECT e FROM UNNEST([JSON_VALUE(json, '$.ing.info')]) e WHERE e IS NOT NULL)
) AS info
FROM sample_table;
Query results
External Table
CREATE SCHEMA IF NOT EXISTS `your-project.stackoverflow`;
CREATE OR REPLACE EXTERNAL TABLE `stackoverflow.sample_table` (
json STRING
)
OPTIONS (
format = 'CSV',
field_delimiter = CHR(1),
uris = ['https://drive.google.com/open?id=1CIW3UmvYr2JAmSounOY6l5dUFUJCOJOH']
);
SELECT COALESCE(
JSON_VALUE_ARRAY(json, '$.ing.info'),
[JSON_VALUE(json, '$.ing.info')]
) AS info
FROM `stackoverflow.sample_table`;

Sql Query Json Array items by Value

I have searched and can't seem to find somewhere doing exactly what I am trying.
I have a json similar to as follows in multiple rows in my database:
{
"date": "0001-01-01T00:00:00",
"details": {
"detail": [
{
"item": "11",
"value": "xt"
},
{
"item": "12",
"value": "xy"
},
{
"item": "13",
"value": "xz"
},
{
"item": "14",
"value": "zz"
}
]
}
}
I want to do sql that does this:
select ID
jsonColumn.value where item=11 as X
jsonColumn.value where item=12 as Y
from tbl
So I have results like this
----------------------
|ID |X |Y |
----------------------
|1 |xt |xy |
----------------------
I have tried using JSONVALUE but I seem to need to do it by the array item number like this:
'$.details.detail[3].value'
which doesn't really work
I have also tried this:
SELECT id, x.item, x.value
FROM
tbl F
CROSS APPLY (select *
FROM OPENJSON(F.Json,'$.details.detail')
CROSS APPLY OPENJSON(value)
WITH (item NVARCHAR(25) '$.item',
value NVARCHAR(max) '$.value') As x
where F.ID=55
Which I can use to print out all the items and values but then I'd have to query each separately again.
Is there a way of combining the two in to one big query that won't be completely inefficient?
Seems what you want is a pivot. I personally use conditional aggregation over the far more restrictive PIVOT operator. The JSON you supplied was invalid, so I took some liberties correcting it in my sandbox environment:
SELECT --ID,
MAX(CASE d.item WHEN 11 THEN d.[value] END) AS X,
MAX(CASE d.item WHEN 12 THEN d.[value] END) AS Y
FROM (VALUES(#JSON))V(J) --Your Table
CROSS APPLY OPENJSON(V.J,'$.details')
WITH (detail nvarchar(MAX) AS JSON ) OJ
CROSS APPLY OPENJSON(OJ.detail)
WITH(item int,
[value] nvarchar(2)) d;
If you are using this against a table, and not limiting the data to a single row, you'll need to also add a GROUP BY clause on the relevant columns (ID?).

create index on couchbase for ARRAY_REMOVE

I want to execute this query :
UPDATE `bucket` SET etats= ARRAY_REMOVE( etats, etats[2])
my question is how to create an index to execute this query, i don't want to use
primary index couchbase.
the goal of the query is to remove an element from array 'etats'.
example of the document :
{
"lastUpdateTime": "2019-03-31T22:02:00.164",
"origin": "origin1",
"etats": [
{
"dateTime": "2019-03-28T17:13:49.766",
"etat": "etat1",
"code": "code1"
},
{
"dateTime": "2019-03-29T15:26:48.577",
"etat": "etat2",
"code": "code2"
},
{
"dateTime": "2019-03-31T22:01:59.843",
"etat": "etat3",
"code": "code3"
}
],
"etatType": "type1"
}
You must have WHERE clause to choose index, otherwise only option is primary index.
In general you check elements if present then only do update the field.
The following query removes object from array that have code value "code2"
CREATE INDEX ix1 ON default (DISTINCT ARRAY v.code FOR v IN etats END) WHERE etatType = "type1";
UPDATE default AS d
SET d.etats = ARRAY v FOR v IN d.etats WHEN v.code != "code2" END
WHERE d.etatType = "type1" AND ANY v IN d.etats SATISFIES v.code = "code2" END;
If you really want index for your query only.
CREATE INDEX ix1 ON `bucket` (etatType);
UPDATE `bucket` SET etats= ARRAY_REMOVE( etats, etats[2])
WHERE etatType = "type1";

Couchbase N1QL - Nest within a Nest

Nesting within a Nest.
I've adapted my need into the following restaurant example:
Desired Output:
{
"restaurant": {
"id": "restaurant1",
"name": "Foodie",
"mains": [ // < main nested in restaurant
{
"id": "main1",
"title": "Steak and Chips",
"ingredients": [ // < ingredient nested in main (...which is nested in restaurant)
{
"id": "ingredient1",
"title": "steak"
},
{
"id": "ingredient2",
"title": "chips"
}
]
},
{
"id": "main2",
"title": "Fish and Chips",
"ingredients": [
{
"id": "ingredient3",
"title": "fish"
},
{
"id": "ingredient2",
"title": "chips"
}
]
}
]
"drinks": [ you get the idea ] // < drink nested in restaurant
}
}
Example Docs:
// RESTAURANTS
{
"id": "restaurant1",
"type": "restaurant",
"name": "Foodie",
"drinkIds": [ "drink1", "drink2" ],
"mainIds: [ "main1", "main2" ]
},
// MAINS
{
"id": "main1",
"type": "main",
"restaurantIds": [ "restaurant1" ],
"title": "Steak and Chips"
},
{
"id": "main2",
"type": "main",
"restaurantIds": [ "restaurant1" ],
"title": "Fish and Chips"
},
// INGREDIENTS
{
"id": "ingredient1",
"type": "ingredient",
"title": "steak",
"mainIds": [ "main1" ]
},
{
"id": "ingredient2",
"type": "ingredient",
"title": "chips",
"mainIds": [ "main1", "main2" ]
},
{
"id": "ingredient3",
"type": "ingredient",
"title": "fish",
"mainIds": [ "main2" ]
},
// DRINKS
{ you get the idea.... }
The closest I can get without error is:
SELECT restaurant, mains, drinks
FROM default restauant USE KEYS "restaurant1"
NEST default mains ON KEYS restaurant.mainIds
NEST default drinks ON KEYS restaurant.drinkIds;
But:
1. Obviously the nested nest is missing
2. The returned order is incorrect - the drinks nest comes first instead of last
(3. Since I'm also using Sync Gateway - it returns all the "_sync" fields with every doc - can't figure out how to omit this on each doc.)
UPDATE 1: ADAPTED SOLUTION
NB: I should have specified above that a main cannot hold ingredientIds.
Based on geraldss' v helpful input below, I added a doc which tracks keys per restaurant, eg:
{
"id": "restaurant1-JoeBloggs",
"dinerId": "JoeBloggs",
"ingredientIds": [ "ingredient1", "ingredient2" "ingredient3" ],
"mainOrdered": [ "main1" ], // < other potential uses...
"drinkOrdered": [ "drink2" ]
}
I added this to geraldss' first solution below as a JOIN to make it available to the query, eg:
SELECT *
FROM
(
SELECT
r.*,
(
SELECT
drink.*
FROM default AS drink
USE KEYS r.drinkIds
) AS drinks,
(
SELECT
main.*,
(
SELECT
ingredient.*
FROM default AS ingredient
USE KEYS keyIndex.ingredientIds // < keyIndex
WHERE ingredient.mainId=main.id
) AS ingredients
FROM default AS main
USE KEYS r.mainIds
) AS mains
FROM default AS r
USE KEYS "restaurant1"
JOIN default AS keyIndex ON KEYS "restaurant1-JoeBloggs" // < keyIndex JOINed
) AS restaurant
;
geraldss' second solution below also looks good - unfortunately it won't work for my case as this query requires that mains are found via ingredients; for my needs a main can exist without any ingredients. EDIT: > he came up with another solution. See 2.
UPDATE 2: FINAL SOLUTION
So, again, with geraldss' help I have a solution which does not require an additional doc to track keys:
SELECT *
FROM
(
SELECT
restaurant.id, restaurant.name,
(
SELECT
drink.id, drink.title
FROM default AS drink
USE KEYS restaurant.drinkIds
)
AS drinks,
(
SELECT
main.id, main.title,
ARRAY_AGG({"title":ingredient.title, "id":ingredient.id}) AS ingredients
FROM default AS ingredient
JOIN default AS main
ON KEYS ingredient.mainIds
WHERE main.restaurantId="restaurant1"
AND meta().id NOT LIKE '_sync:%' // < necessary only if using Sync Gateway
GROUP BY main
UNION ALL
SELECT
mainWithNoIngredients.id, mainWithNoIngredients.title
FROM default AS mainWithNoIngredients
UNNEST mainWithNoIngredients AS foo // < since this is being flattened the AS name is irrelevant
WHERE mainWithNoIngredients.restaurantId="restaurant1"
AND mainWithNoIngredients.type="main"
AND meta().id NOT LIKE '_sync:%' // < necessary only if using Sync Gateway
AND META(mainWithNoIngredients).id NOT IN
(
SELECT RAW mainId
FROM default AS ingredient
)
)
AS mains
FROM default AS restaurant
USE KEYS "restaurant1"
)
AS restaurant
;
NB - the AND meta().id NOT LIKE '_sync:%' lines are only necessary if using Sync Gateway.
With just 1 key I can pull all the related docs - even if they are unknown to the immediate 'parent'.
Thank you geraldss.
If the mains contain ingredientIds:
SELECT *
FROM
(
SELECT
r.*,
(
SELECT
drink.*
FROM default AS drink
USE KEYS r.drinkIds
) AS drinks,
(
SELECT
main.*,
(
SELECT
ingredient.*
FROM default AS ingredient
USE KEYS main.ingredientIds
) AS ingredients
FROM default AS main
USE KEYS r.mainIds
) AS mains
FROM default AS r
USE KEYS "restaurant1"
) AS restaurant
;
EDIT: Updated to include mains not referenced by any ingredients.
If the mains do not contain ingredientIds:
SELECT *
FROM
(
SELECT
r.*,
(
SELECT
drink.*
FROM default AS drink
USE KEYS r.drinkIds
) AS drinks,
(
SELECT
main.*,
ARRAY_AGG(ingredient) AS ingredients
FROM default AS ingredient
JOIN default AS main
ON KEYS ingredient.mainIds
WHERE "restaurant1" IN main.restaurantIds
GROUP BY main
UNION ALL
SELECT
main.*
FROM default AS main
WHERE "restaurant1" IN main.restaurantIds
AND META(main).id NOT IN (
SELECT RAW mainId
FROM default AS ingredient
UNNEST mainIds AS mainId
)
) AS mains
FROM default AS r
USE KEYS "restaurant1"
) AS restaurant
;

couchbase n1qlQuery Delete with sub-query

I have one bucket contain 2 types of objects:
first:
{
"id": "123"
"objectNamespace": "a",
"value": "value1"
}
second:
{
"id": "234",
"objectNamespace": "b",
"value": "value2",
"association": ["123"]
}
now I want to delete the document from type a only if does NOT have any associations from type b:
I try this:
DELETE FROM `bukcet_name`
WHERE objectNamespace = 'a'
AND id = "123"
AND NOT EXISTS (
SELECT *
WHERE ANY item IN bukcet_name.association
SATISFIES item = "123" END);
BUT this always delete the a doc with id 123
How can I do that?
There are a couple of mismatches between your data and your query.
(1) You are missing a FROM clause.
(2) You use associations instead of association.
(3) bucket_name.
Here is a possible query.
DELETE FROM `bucket_name`
WHERE objectNamespace = 'a'
AND id = "123"
AND NOT EXISTS (
SELECT * FROM bucket_name b2
WHERE ANY item IN b2.association
SATISFIES item = "123" END);