How to query deep nested json value from couchbase? - json

How to query deep nested json value from couchbase? I have the following documents in the couchbase bucket. I need to query appversion>3.2.1 OR appversion <3.3.0 OR appversion=3.4.1.
How to query these values from nested json?
My Json Documents,
Document 1:
com.whatsapp_1
{
"doc-type": "App-Metadata",
"bundleid": "com.whatsapp",
"value": {
"appId": "com.whatsapp",
"appName": "WhatsApp Messenger",
"primaryCategoryName": "Communication"
}
}
Document 2:
com.whatsapp_2
{
"doc-type": "App-Lookalike",
"bundleid": "com.whatsapp",
"value": {
"com.facebook.orca": 476664,
"org.telegram.messenger.erick.lite": 423132,
"com.viber.voip": 286410,
"messenger.free.video.call.chat": 232830,
"com.facebook.katana": 223000,
"com.wChatMessenger_6210995": 219960,
"com.facebook.talk": 187884
}
}
Document 3:
com.whatsapp_3
{
"doc-type": "Internal-Metadata",
"bundleid": "com.whatsapp",
"value": {
"appversion": "3.4.1"
}
}

value is reserved keyword, you need to use back-ticks around it.
SELECT *
FROM sampleBucket
WHERE `doc-type` = 'Internal-Metadata' AND
(`value`.appversion>"3.2.1" OR
`value`.appversion <"3.3.0" OR
`value`.appversion="3.4.1");

To query nested entities you should use the unnest keyword:
https://dzone.com/articles/nesting-and-unnesting-in-couchbase-n1ql
In your case, it will be something similar to:
select t.* from mybucket t UNNEST `t.value` v where t.doc-type = 'Internal-Metadata' and v.appversion = '3.2.1'
As you are app versions are String, you should use the replace function to remove "." and then convert it to int before the comparison
https://docs.couchbase.com/server/5.5/n1ql/n1ql-language-reference/stringfun.html#fn-str-replace

I'm not quite sure what you want, but if you want a query that only returns document 3, this query should do it.
SELECT *
FROM sampleBucket
WHERE value.appversion>"3.2.1" OR value.appversion <"3.3.0" OR value.appversion="3.4.1"
This should return only the third document. The query also assumes all app versions are of the from x.y.z where x, y, and z are single-digit numbers.
If that's not the result you are looking for, please explain more precisely what you want.

Related

T-SQL - search in filtered JSON array

SQL Server 2017.
Table OrderData has column DataProperties where JSON is stored. JSON example stored there:
{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Value": [
"test.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Value": [
"a"
]
}
]
}
}
So, it's an object with Input object, which has Data array that's KVP - full of objects with Key string and Value array of strings.
And my problem - I need to query for rows based on values in Files in example JSON - simple LIKE that matches %text%.
This query works:
SELECT TOP 10 *
FROM OrderData CROSS APPLY OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0
AND JSON_QUERY(dat.value, '$.Value') LIKE '%2%'
Problem is that this query is very slow, unsurprisingly.
How to make it faster?
I cannot create computed column with JSON_VALUE, because I need to filter in an array.
I cannot create computed column with JSON_QUERY on "$.Input.Data" or "$.Input.Data[0].Values" - because I need specific array item in this array with Key == "Files".
I've searched, but it seems that you cannot create computed column that also filters data, like with this attempt:
ALTER TABLE OrderData
ADD aaaTest AS (select JSON_QUERY(dat.value, '$.Value')
OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0 );
Error: Subqueries are not allowed in this context. Only scalar expressions are allowed.
What are my options?
Add Files column with an index and use INSERT/UPDATE triggers that populate this column on inserts/updates?
Create a view that "computes" this column? Can't add index, will still be slow
So far only option 1. has some merit, but I don't like triggers and maybe there's another option?
You might try something along this:
Attention: I've added a 2 to the text2 to fullfill your filter. And I named both to the plural "Values":
DECLARE #mockupTable TABLE(ID INT IDENTITY, DataProperties NVARCHAR(MAX));
INSERT INTO #mockupTable VALUES
(N'{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Values": [
"test2.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Values": [
"a"
]
}
]
}
}');
The query
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
AND dat.[Values] LIKE '%2%';
The main difference is the WITH-clause, which is used to return the properties inside an object in a typed way and side-by-side (similar to a naked OPENJSON with a PIVOT for all columns - but much better). This avoids expensive JSON methods in your WHERE...
Hint: As we return the Value with NVARCHAR(MAX) AS JSON we can continue with the nested array and might proceed with something like this:
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
--we read the array again with `OPENJSON`:
AND 'test2.txt' IN(SELECT [Value] FROM OPENJSON(dat.[Values]));
You might use one more CROSS APPLY to add the array's values and filter this at the WHERE directly.
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
CROSS APPLY OPENJSON(dat.[Values]) vals
WHERE dat.[Key] = 'Files'
AND vals.[Value]='test2.txt'
Just check it out...
This is an old question, but I would like to revisit it. There isn't any mention of how the source table is actually constructed in terms of indexing. If the original author is still around, can you confirm/deny what indexing strategy you used? For performant json document queries, I've found that having a table using the COLUMSTORE indexing strategy yields very performant JSON queries even with large amounts of data.
https://learn.microsoft.com/en-us/sql/relational-databases/json/store-json-documents-in-sql-tables?view=sql-server-ver15 has an example of different indexing techniques. For my personal solution I've been using COLUMSTORE albeit on a limited NVARCAHR document size. It's fast enough for any purposes I have even under millions of rows of decently sized json documents.

How to query documents which are arrays

I'm a novice Couchbase user and I have a bucket which I've created that contains documents which are actually arrays in the form of:
{
"key": [
{
"data1": "somedata1"
},
{
"data2": "somedata2"
}
]
}
I want to query these documents via N1QL statements and have yet to find a solution to how to do this properly. More specifically, I would like to select fields inside each sub-document that is in an array of a certain key. For example, I would like to access: key.[1].data2 or key.[0].data1
How should I do it?
Couchbase has some reserved keywords that need to be escaped. In this case, key needs to be escaped. For example, if you're querying against my_bucket, then
SELECT my_bucket.`key`[0].data1 FROM my_bucket;
should return somedata1

N1QL Distinct Query on Nested Arrays

(Couchbase 4.5) Suppose I have the following object stored in my couchbase instance:
{
parentArray : [
{
childArray: [{value: 'v1'}, {value:'v2'}]
},
{
childArray: [{value: 'v1'}, {value: 'v3'}]
}
]
}
Now I want to select the distinct elements from childArray, which should return an array equal to ['v1', 'v2', 'v3'].
I have a couple solutions to this. My first thought was to go ahead and use the UNNEST operation:
SELECT DISTINCT ca.value FROM `my-bucket` AS b UNNEST b.parentArray AS pa UNNEST pa.childArray AS ca WHERE _class="someclass" AND dataType="someDataType";
With this approach I get a polynomial explosion in the number of scanned elements (due to the unnest'ing of two arrays), and the query takes a bit of time to complete (for my real data on the order of 24 seconds). When I remove unnest, and simply query for distinct elements on the top-level elements (those adjacent to parentArray), it takes on the order of milliseconds.
Another solution is to handle this in the application code, by simply iterating through the returned values and finding the distinct values my-self. This approach is bad, because it brings too much data into the application space.
Any help please!
Thank you!
UPDATE: Looks like without a "WHERE" clause using the "UNNEST" statements the performance is fast. So do I need Array Indexes here?
UPDATE: Nevermind about the previous update, since there is no index elements in the where clause. Also, but I do notice that if I remove the UNNEST OR the WHERE then the query is fast. Moreover, looking at the explain and adding a GSI for compound index (_class, dataType) I can see "IndexScan" on the provided index.
INSERT INTO default values("3",{ "parentArray" : [ { "childArray": [{"value": 'v1'}, {"value":'v2'}] }, { "childArray": [{"value": 'v1'}, {"value": 'v3'}] } ] });
SELECT ARRAY_DISTINCT(ARRAY v.`value` FOR v WITHIN parentArray END) FROM default;
OR
SELECT ARRAY_DISTINCT(ARRAY_FLATTEN(
ARRAY ARRAY v.`value` FOR v IN ca.childArray END FOR ca IN parentArray END,
2)) FROM default;
You can add where clause. If this requires across the documents use the following.
INSERT INTO default values("4",{ "parentArray" : [ { "childArray": [{"value": 'v5'}, {"value":'v2'}] }, { "childArray": [{"value": 'v1'}, {"value": 'v3'}] } ] });
SELECT ARRAY_DISTINCT(ARRAY_FLATTEN(ARRAY_AGG(ARRAY v.`value` FOR v WITHIN parentArray END),2)) FROM default;
SELECT ARRAY_DISTINCT(ARRAY_FLATTEN(ARRAY_AGG(ARRAY_FLATTEN(ARRAY ARRAY v.`value` FOR v IN ca.childArray END FOR ca IN parentArray END,2)),2)) FROM default;

How to parse JSON value of a text column in cassandra

I have a column of text type be contain JSON value.
{
"customer": [
{
"details": {
"customer1": {
"name": "john",
"addresses": {
"address1": {
"line1": "xyz",
"line2": "pqr"
},
"address2": {
"line1": "abc",
"line2": "efg"
}
}
}
"customer2": {
"name": "robin",
"addresses": {
"address1": null
}
}
}
}
]
}
How can I extract 'address1' JSON field of column with query?
First I am trying to fetch JSON value then I will go with parsing.
SELECT JSON customer from text_column;
With my query, I get following error.
com.datastax.driver.core.exceptions.SyntaxError: line 1:12 no viable
alternative at input 'customer' (SELECT [JSON] customer...)
com.datastax.driver.core.exceptions.SyntaxError: line 1:12 no viable
alternative at input 'customer' (SELECT [JSON] customer...)
Cassandra version 2.1.13
You can't use SELECT JSON in Cassandra v2.1.x CQL v3.2.x
For Cassandra v2.1.x CQL v3.2.x :
The only supported operation after SELECT are :
DISTINCT
COUNT (*)
COUNT (1)
column_name AS new_name
WRITETIME (column_name)
TTL (column_name)
dateOf(), now(), minTimeuuid(), maxTimeuuid(), unixTimestampOf(), typeAsBlob() and blobAsType()
In Cassandra v2.2.x CQL v3.3.x Introduce : SELECT JSON
With SELECT statements, the new JSON keyword can be used to return each row as a single JSON encoded map. The remainder of the SELECT statment behavior is the same.
The result map keys are the same as the column names in a normal result set. For example, a statement like “SELECT JSON a, ttl(b) FROM ...” would result in a map with keys "a" and "ttl(b)". However, this is one notable exception: for symmetry with INSERT JSON behavior, case-sensitive column names with upper-case letters will be surrounded with double quotes. For example, “SELECT JSON myColumn FROM ...” would result in a map key "\"myColumn\"" (note the escaped quotes).
The map values will JSON-encoded representations (as described below) of the result set values.
If your Cassandra version is 2.1x and below, you can use the Python-based approach.
Write a python script using Cassandra-Python API
Here you have to get your row first and then use python json's loads method, which will convert your json text column value into JSON object which will be dict in Python. Then you can play around with Python dictionaries and extract your required nested keys. See the below code snippet.
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import json
if __name__ == '__main__':
auth_provider = PlainTextAuthProvider(username='xxxx', password='xxxx')
cluster = Cluster(['0.0.0.0'],
port=9042, auth_provider=auth_provider)
session = cluster.connect("keyspace_name")
print("session created successfully")
rows = session.execute('select * from user limit 10')
for user_row in rows:
customer_dict = json.loads(user_row.customer)
print(customer_dict().keys()

Possible to chain results in N1ql?

I'm currently trying to do a bit of complex N1QL for a project I'm working on, theoretically I could do all of this processing in multiple N1QL calls and by parsing the results each time, however if possible I'd like for this to contained in one call.
What I would like to do is:
filter all documents that contain a "dataSync.test.id" field with more than 1 id
Read back all other ids in that list
Use that list to get other documents containing those ids
Get the "dataSync.test._channels" field for those documents (optionally a filter by docType might help parsing)
This would probably return a list of "dataSync.test._channels"
Is this possible in N1QL? It appears like it might be but I can't get the syntax right.
My data structures look a little like
{
"dataSync": {
"test": {
"_channels": [
"RP"
],
"id": [
"dataSync_user_1015",
"dataSync_user_1010",
"dataSync_user_1005"
],
"_lastUpdatedBy": "TEST"
}
},
...
}
{
"dataSync": {
"test": {
"_channels": [
"RSD"
],
"id": [
"dataSync_user_1010"
],
"_lastUpdatedBy": "TEST"
}
},
...
}
Yes. I think you can do all these.
Initial set of IDs with filtering can be retrieved as a subquery and then you can get subsquent documents by joins.
SELECT fulldoc
FROM (select meta().id as dockey from doc where a=1) as mydoc
INNER JOIN doc fulldoc ON KEYS mydoc.dockey;
There are optimizations that can be done here. Try the sequencing first to ensure you're get the job done.