Couchbase Full Text search with combination of dynamic fields and N1QL - couchbase

Consider following documents and consider that is created Full-text index over following documents:
{
email : "A",
"data" : {
"dynamic_property" : "ANY_TYPE",
"dynamic_property2" : {
"property" : "searchableValue"
},
"field" : "VALUE"
}
},
{
email : "B",
"data" : {
"other_dynamic_prop" : "test-searchableValue-2",
}
},
{
email : "A",
"data" : {
"thirdDynamicProp" : {
"childProp" : "this should be searchableValue!"
}
}
}
The goal: Create N1QL query which will match all the documents with have associated given email address AND the data property contains given substring.
Basically following:
SELECT * FROM `bucket` WHERE `email` = 'A' AND `data` LIKE '%searchableValue%';
The expected result is the first and second document because matching criteria. But the query does not work because data is not a text type but is object type. If the data property would be like:
{"data" : "this should be searchableValue!" }
The query would return expected result.
The question is:
How to create such a N1QL query which would return expected result?
I know that Couchbase is not able to do compare substring in the text, but using Full-text index it should be possible since Couchbase 4.5+

Couchbase4.6 and 5.0 have more/better options (explained below). In couchbase4.5, you can use Array Indexing to solve this:
https://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/indexing-arrays.html
https://www.couchbase.com/blog/2016/october/n1ql-functionality-enhancements-in-couchbase-server-4.5.1
For instance, using the travel-sample sample bucket, following array index, and query would do the kind of substring search you want.
create index tmp_geo on `travel-sample`(DISTINCT ARRAY x FOR x IN object_values(geo) END) where type = "airport";
select meta().id, geo from `travel-sample` where type = "airport"
and ANY x IN object_values(geo) SATISFIES to_string(x) LIKE "12%" END;
N1QL introduced a function TOKENS() in 4.6, which can help you create functional index on tokenized sub-objects (instead of array index in the above example):
https://developer.couchbase.com/documentation/server/4.6/n1ql/n1ql-language-reference/string-functions.html
https://dzone.com/articles/more-than-like-efficient-json-search-with-couchbas
And, Couchbase 5.0 developer build (https://blog.couchbase.com/2017/january/introducing-developer-builds) has N1QL function CURL(), which allows you to access any HTTP/REST endpoint as part of the N1QL query (hence, can access the FTS endpoint). See following blogs for more details & examples:
- https://blog.couchbase.com/2017/january/developer-release--curl-n1ql
- https://dzone.com/articles/curl-comes-to-n1ql-querying-external-json-data
Btw, can you clarify if you want partial tokens or only full tokens in the query?
-Prasad

Here are the specific queries based on the answer from #prasad.
Using Couchbase 4.5:
CREATE INDEX idx_email ON `bucket`( email );
SELECT *
FROM `bucket`
WHERE
`email` = 'A'
AND ANY t WITHIN `data` SATISFIES t LIKE '%searchableValue%' END;
Using Couchbase 4.6:
CREATE INDEX idx_email ON `bucket`( email );
CREATE INDEX idx_tokens ON `bucket`( DISTINCT ARRAY t FOR t IN TOKENS( `data` ) END );
SELECT *
FROM `bucket`
WHERE
`email` = 'A'
AND ANY t IN TOKENS( `data` ) SATISFIES t = 'searchableValue' END;

Related

Couchbase unable to retrieve document No index available Code 4000

I have a bucket in Couchbase called mybucket. When I select Documents and then choose my bucket, It has an option to retrieve the documents. When I choose the first one, the web platform of Couchbase shows the content of that document to me:
{
"type": "activity",
"version": "1.0.0",
....
}
So, with that, I am sure that I can see some documents that have "type" = "activity" in my bucket. However, when I want to retrieve them using the Query editor and the following N1QL query:
select * from `mybucket` where `type` = "activity" limit 10;
I get the following response:
[
{
"code": 4000,
"msg": "No index available on keyspace `default`:`mybucket` that matches your query. Use CREATE PRIMARY INDEX ON `default`:`mybucket` to create a primary index, or check that your expected index is online.",
"query": "select * from `mybucket` where `type` = \"activity\" limit 10;"
}
]
Bucket Retrieve documents uses DCP stream vs Query Editor uses N1QL which required the secondary index or primary index
Option 1) CREATE INDEX ix1 ON mybuckte(type);
OR
Option 2) CREATE PRIMARY INDEX on mybucket;

Can I index and search nested object keys in JSON on postgres?

If I have a table called configurations where rows contain a jsonb column called data with values similar to the following:
{
"US": {
"1234": {
"id": "ABCD"
}
},
"CA": {
"5678": {
"id": "WXYZ"
}
}
}
My hope is to be able to write a query akin to the following:
select * from configurations where data->'$.*.*.id' = 'WXYZ'
(Please note: I'm aware that the SQL above is not correct, treat it as pseudo.)
Questions:
What is the correct syntax to perform the query I've written above?
What type of index would I need to create to ensure I'm not scanning the entire table using any query from my previous question?
You can turn your pseudo code into real jsonpath code:
select * from configurations where data ## '$.*.*.id == "WXYZ"'
And this can use a default gin index on "data":
create index on configurations using gin (data);

T-SQL - search in filtered JSON array

SQL Server 2017.
Table OrderData has column DataProperties where JSON is stored. JSON example stored there:
{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Value": [
"test.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Value": [
"a"
]
}
]
}
}
So, it's an object with Input object, which has Data array that's KVP - full of objects with Key string and Value array of strings.
And my problem - I need to query for rows based on values in Files in example JSON - simple LIKE that matches %text%.
This query works:
SELECT TOP 10 *
FROM OrderData CROSS APPLY OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0
AND JSON_QUERY(dat.value, '$.Value') LIKE '%2%'
Problem is that this query is very slow, unsurprisingly.
How to make it faster?
I cannot create computed column with JSON_VALUE, because I need to filter in an array.
I cannot create computed column with JSON_QUERY on "$.Input.Data" or "$.Input.Data[0].Values" - because I need specific array item in this array with Key == "Files".
I've searched, but it seems that you cannot create computed column that also filters data, like with this attempt:
ALTER TABLE OrderData
ADD aaaTest AS (select JSON_QUERY(dat.value, '$.Value')
OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0 );
Error: Subqueries are not allowed in this context. Only scalar expressions are allowed.
What are my options?
Add Files column with an index and use INSERT/UPDATE triggers that populate this column on inserts/updates?
Create a view that "computes" this column? Can't add index, will still be slow
So far only option 1. has some merit, but I don't like triggers and maybe there's another option?
You might try something along this:
Attention: I've added a 2 to the text2 to fullfill your filter. And I named both to the plural "Values":
DECLARE #mockupTable TABLE(ID INT IDENTITY, DataProperties NVARCHAR(MAX));
INSERT INTO #mockupTable VALUES
(N'{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Values": [
"test2.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Values": [
"a"
]
}
]
}
}');
The query
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
AND dat.[Values] LIKE '%2%';
The main difference is the WITH-clause, which is used to return the properties inside an object in a typed way and side-by-side (similar to a naked OPENJSON with a PIVOT for all columns - but much better). This avoids expensive JSON methods in your WHERE...
Hint: As we return the Value with NVARCHAR(MAX) AS JSON we can continue with the nested array and might proceed with something like this:
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
--we read the array again with `OPENJSON`:
AND 'test2.txt' IN(SELECT [Value] FROM OPENJSON(dat.[Values]));
You might use one more CROSS APPLY to add the array's values and filter this at the WHERE directly.
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
CROSS APPLY OPENJSON(dat.[Values]) vals
WHERE dat.[Key] = 'Files'
AND vals.[Value]='test2.txt'
Just check it out...
This is an old question, but I would like to revisit it. There isn't any mention of how the source table is actually constructed in terms of indexing. If the original author is still around, can you confirm/deny what indexing strategy you used? For performant json document queries, I've found that having a table using the COLUMSTORE indexing strategy yields very performant JSON queries even with large amounts of data.
https://learn.microsoft.com/en-us/sql/relational-databases/json/store-json-documents-in-sql-tables?view=sql-server-ver15 has an example of different indexing techniques. For my personal solution I've been using COLUMSTORE albeit on a limited NVARCAHR document size. It's fast enough for any purposes I have even under millions of rows of decently sized json documents.

couchbase N1ql query select with non-group by fields

I am new to couchbase and I have been going through couchbase documents and other online resources for a while but I could't get my query working. Below is the data structure and my query:
Table1:
{
"jobId" : "101",
"jobName" : "abcd",
"jobGroup" : "groupa",
"created" : " "2018-05-06T19:13:43.318Z",
"region" : "dev"
},
{
"jobId" : "102",
"jobName" : "abcd2",
"jobGroup" : "groupa",
"created" : " "2018-05-06T22:13:43.318Z",
"region" : "dev"
},
{
"jobId" : "103",
"jobName" : "abcd3",
"jobGroup" : "groupb",
"created" : " "2018-05-05T19:11:43.318Z",
"region" : "test"
}
I need to get the jobId which has the latest job information (max on created timestamp) for a given jobGroup and region (group by jobGroup and region).
My sql query doesn't help me using self-join on jobId.
Query:
/*
Idea is to pull out the job which was executed latest for all possible groups and region and print the details of that particular job
select * from (select max(DATE_FORMAT_STR(j.created,'1111-11-11T00:00:00+00:00')) as latest, j.jobGroup, j.region from table1 j
group by jobGroup, region) as viewtable
join table t
on keys meta(t).id
where viewtable.latest in t.created and t.jobGroup = viewtable.jobGroup and
viewtable.region = t.region
Error Result: No result displayed
Desired result :
{
"jobId" : "102",
"jobName":"abcd2",
"jobGroup":"groupa",
"latest" :"2018-05-06T22:13:43.318Z",
"region":"dev"
},
{
"jobId" : "103",
"jobName" : "abcd3",
"jobGroup" : "groupb",
"created" : " "2018-05-05T19:11:43.318Z",
"region" : "test"
}
If I understand your query correctly, this can be answered using 'group by' and no join. I tried entering your sample data and the following query gives the correct result:
select max([created,d])[1] max_for_group_region
from default d
group by jobGroup, region;
How does it work? It uses 'group by' to group documents by jobGroup and region, then creates a two-element array holding, for every document in the group:
the 'created' timestamp field
the document where the timestamp came from
It then applies the max function on the set of 2-element arrays. The max of a set of arrays looks for the maximum value in the first array position, and if there's a tie look at the second position, and so on. In this case we are getting the two-element array with the max timestamp.
Now we have an array [ timestamp, document ], so we apply [1] to extract just the document.
I'm seeing some inconsistencies and invalid JSON in your examples, so I'm going to do the best I can. First off, I'm using Couchbase Server 5.5 which provides the new ANSI JOIN syntax. There might be a way to do this in an earlier version of Couchbase Server.
Next, I created an index on the created field: CREATE INDEX ix_created ON bucketname(created).
Then, I use a subquery to get the latest date, aggregated by jobGroup and region. I then join the latest date from this query to the entire bucket and select the fields that (I think) you want in your desired result:
SELECT k.jobId, k.jobName, k.jobGroup, k.created AS latest, k.region
FROM (
SELECT j.jobGroup, j.region, MAX(j.created) as latestDate
FROM so j
GROUP BY j.jobGroup, j.region
) dt
LEFT JOIN so k ON k.created = dt.latestDate;
Problems with this approach:
If two documents have the exact same date, this isn't a reliable way to determine the latest. You can add a LIMIT 1 to the subquery, which would just pick one arbitrarily, or you could ORDER BY whatever your preference is.
Subquery performance: I don't know how large your data set is, but this could be pretty slow.
Requires Couchbase Server 5.5, which is currently in beta.
If you are using a different version of Couchbase Server, you may want to consider asking in the Couchbase N1QL Forums for a more expert answer.

how to check whether "knownlanguages" key contains value ="English" from json datatype of mysql using query

{
"Actor": {
"knownlanguages": [
"English"
]
}
}
This JSON is stored in JSON columntype of MySQL with name data.
My question is how to check whether knownlanguages key contains value English from JSON datatype of MySQL using query?
Easy:
SELECT * FROM table WHERE JSON_CONTAINS(json, '"English"', "$.Actor.knownlanguages")
or (depending on what you have to do):
SELECT JSON_CONTAINS(json, '"English"', "$.Actor.knownlanguages") FROM table
reference: https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html
you can use like keyword with wildcard %
SELECT * FROM table where data like "%English%"
NOTE:
this won't search in JSON field knownlanguages, but in column data