Couchbase array index not getting used in the query - couchbase

I have the following document structure:
{
"customerId": "",
"schemeId": "scheme-a",
"type": "account",
"events": [
{
"dateTime": "2019-03-14T02:23:58.573Z",
"id": "72998bbf-94a6-4031-823b-6c304707ad49",
"type": "DebitDisabled",
"authorisedId": ""
},
{
"dateTime": "2018-05-04T12:40:15.439Z",
"transactionReference": "005171-15-1054-7571-60990-20180503165536",
"id": "005171-15-1054-7571-60990-20180503165536-1",
"type": "Credit",
"authorisedId": ",
"value": 34,
"funder": "funder-a"
},
{
"dateTime": "2019-03-06T04:14:54.564Z",
"transactionReference": "000000922331",
"eventDescription": {
"language": "en-gb",
"text": "
},
"id": "000000922331",
"type": "Credit",
"authorisedId": "",
"value": 16,
"funder": "funder-b"
},
{
"dateTime": "2019-03-10T04:24:17.903Z",
"transactionReference": "000001510154",
"eventDescription": {
"language": "en-gb",
"text": ""
},
"id": "000001510154",
"type": "Credit",
"authorisedId": "",
"value": 10,
"funder": "funder-c"
}
]
}
And the following indexes :
CREATE INDEX `scheme-a_customers_index`
ON `default`(`type`,`schemeId`,`customerId`)
WHERE ((`schemeId` = "scheme-a") and (`type` = "account"))
WITH { "num_replica":1 }
CREATE INDEX `scheme-a_credits_index`
ON `default`(
`type`,
`schemeId`,
`customerId`,
(distinct (array (`e`.`funder`) for `e` in `events` when ((`e`.`type`) = "Credit") end))
)
WHERE ((`type` = "scheme") and (`schemeId` = "scheme-a"))
WITH { "num_replica":1 }
I am trying to query all the customerIds and events for each where type="credit" and funder like "funder%"
below is my query :
SELECT
customerId,
(ARRAY v.`value` FOR v IN p.events WHEN v.type = "Credit" AND v.funder like "funder%" END) AS credits
FROM default AS p
WHERE p.type = "account" AND p.schemeId = "scheme-a"
AND (ANY e IN p.events SATISFIES e.funder = "funder-a" END)
I am expecting the query to use the index scheme-a_credits_index, instead it is using scheme-a_customers_index. Can't understand why ! isn't the query supposed to use scheme-a_credits_index ?

Your query doesn't have predicate on customerId. So query can only push two predicates to indexers and both indexes are qualify. scheme-a_customers_index is more efficient because of number of entries in the index due to non array index.
You should try the following.
CREATE INDEX `ix1` ON `default`
(DISTINCT ARRAY e.funder FOR e IN events WHEN e.type = "Credit" END, `customerId`)
WHERE ((`schemeId` = "scheme-a") and (`type` = "account")) ;
SELECT
customerId,
(ARRAY v.`value` FOR v IN p.events WHEN v.type = "Credit" AND v.funder like "funder%" END) AS credits
FROM default AS p
WHERE p.type = "account" AND p.schemeId = "scheme-a"
AND (ANY e IN p.events SATISFIES e.funder LIKE "funder%" AND e.type = "Credit" END);

Related

How to join tables and get the json output using jooq

dslContext.select(
jsonObject(
key("id").value(CATEGORY.ID),
key("courses").value(
jsonArrayAgg(
jsonObject(
Arrays.stream(COURSE.fields())
.map(i -> key(CamelcaseConverter.snakeToCamel(i.getName())).value(
i))
.collect(
Collectors.toList())
)
)
)
)
).from(CATEGORY)
.leftJoin(COURSE_CATEGORY).on(CATEGORY.ID.eq(COURSE_CATEGORY.CATEGORY_ID))
.leftJoin(COURSE).on(COURSE.ID.eq(COURSE_CATEGORY.COURSE_ID)).fetchInto(JSONObject.class)
Output I got:
[
{
"courses": [
{
"id": 19
},
{
"id": null
}
],
"name": "Exam1",
"id": 1,
}
]
The required output is
[
{
"courses": [
{
"id": 19
}
],
"name": "Exam1",
"id": 1
},
{
"courses":[],
"name": "Exam2",
"id": 2
}
]
The query which need to be executed is
"select * from category left outer join course_category on category.id = course_category.category_id left outer join course on course_category.course_id = course.id"
how do I implement it?
You forgot to group by:
.groupBy(CATEGORY.ID, CATEGORY.NAME)
If you have a primary (or unique) key on CATEGORY.ID, then in MySQL, it will be sufficient to group by that
.groupBy(CATEGORY.ID)

Query a JSONB object array

I did a DB Fiddle of what the table is kinda looking like https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/3382
Data in the table looks like this
[
{
"id": 1,
"form_id": 1,
"questionnaire_response": [
{
"id": "1",
"title": "Are you alive?",
"value": "Yes",
"form_id": 0,
"shortTitle": "",
"description": ""
},
{
"id": "2",
"title": "Did you sleep good?",
"value": "No",
"form_id": 0,
"shortTitle": "",
"description": ""
},
{
"id": "3",
"title": "Whats favorite color(s)?",
"value": [
"Red",
"Blue"
],
"form_id": 0,
"shortTitle": "",
"description": ""
}
]
},
{
"id": 2,
"form_id": 1,
"questionnaire_response": [
{
"id": "1",
"title": "Are you alive?",
"value": "Yes",
"form_id": 0,
"shortTitle": "",
"description": ""
},
{
"id": "2",
"title": "Did you sleep good?",
"value": "Yes",
"form_id": 0,
"shortTitle": "",
"description": ""
},
{
"id": "3",
"title": "Whats favorite color(s)?",
"value": "Black",
"form_id": 0,
"shortTitle": "",
"description": ""
}
]
},
{
"id": 3,
"form_id": 1,
"questionnaire_response": [
{
"id": "1",
"title": "Are you alive?",
"value": "Yes",
"form_id": 0,
"shortTitle": "",
"description": ""
},
{
"id": "2",
"title": "Did you sleep good?",
"value": "No",
"form_id": 0,
"shortTitle": "",
"description": ""
},
{
"id": "3",
"title": "Whats favorite color(s)?",
"value": [
"Black",
"Red"
],
"form_id": 0,
"shortTitle": "",
"description": ""
}
]
}
]
I have a query select * from form_responses,jsonb_to_recordset(form_responses.questionnaire_response) as items(value text, id text) where (items.id = '3' AND items.value like '%Black%');
But unable to do more than one object like select * from form_responses,jsonb_to_recordset(form_responses.questionnaire_response) as items(value text, id text) where (items.id = '3' AND items.value like '%Black%') AND (items.id = '2' AND items.value like '%Yes%');
The value field in the object could be an array or a single value also.. unpredictable.. I feel like I'm close but also not sure if im using the correct query in the first place.
Any help would be appreciated!
EDIT
select * from form_responses where(
questionnaire_response #> '[{"id": "2", "value":"No"},{"id": "3", "value":["Red"]}]')
Seems to work but not sure if this is the best way to do it
Your current query returns one result row per item. None of these rows has both id = 3 and id = 2. If your goal is to select the entire form response, you need to use a subquery (or rather, two of them):
SELECT *
FROM form_responses
WHERE EXISTS(
SELECT *
FROM jsonb_to_recordset(form_responses.questionnaire_response) as items(value text, id text)
WHERE items.id = '3'
AND items.value like '%Black%'
)
AND EXISTS(
SELECT *
FROM jsonb_to_recordset(form_responses.questionnaire_response) as items(value text, id text)
WHERE items.id = '2'
AND items.value like '%Yes%'
);
or alternatively
SELECT *
FROM form_responses
WHERE (
SELECT value
FROM jsonb_to_recordset(form_responses.questionnaire_response) as items(value text, id text)
WHERE items.id = '3'
) like '%Black%'
AND (
SELECT value
FROM jsonb_to_recordset(form_responses.questionnaire_response) as items(value text, id text)
WHERE items.id = '2'
) like '%Yes%';
A nicer alternative would be using json path queries:
SELECT *
FROM form_responses
WHERE questionnaire_response ## '$[*]?(#.id == "1").value == "Yes"'
AND questionnaire_response ## '$[*]?(#.id == "3").value[*] == "Black"'
-- in one:
SELECT *
FROM form_responses
WHERE questionnaire_response ## '$[*]?(#.id == "1").value == "Yes" && $[*]?(#.id == "3").value[*] == "Black"'
The [*] even has the correct semantics for that sometimes-string-sometimes-array value. And if you know the indices of the items with those ids, you can even simplify to
SELECT *
FROM form_responses
WHERE questionnaire_response ## '$[0].value == "Yes" && $[2].value[*] == "Black"'
(dbfiddle demo)

Indexes for couchbase query

Could you please suggest possible couchbase indexes for this query and how you drive your choices?
SELECT Count(*) AS count
FROM `ORDER`
WHERE `_class` = "com.lbk.entities.OrderEntity"
AND (
lower(buyer.contact.firstname) LIKE '%aziz%'
OR lower(buyer.contact.lastname) LIKE '%aziz%'
OR ANY communicationchannel IN buyer.contact.communicationchannel satisfies ( communicationchannel.communicationchannelcode = 'EMAIL'
AND communicationchannel.communicationvalue = NULL )
END )
AND ordertypecode = '220'
AND (
ordercategory != 'EXCLUDED_CAT'
OR ordercategory IS NOT valued )
AND creationdatetime IN
(
SELECT raw max(o2.creationdatetime)
FROM `ORDER` o2
WHERE (
lower(o2.buyer.contact.firstname) LIKE '%aziz%'
OR lower(o2.buyer.contact.lastname) LIKE '%aziz%'
OR ANY communicationchannel IN o2.buyer.contact.communicationchannel satisfies ( communicationchannel.communicationchannelcode = 'EMAIL'
AND communicationchannel.communicationvalue = NULL )
END )
AND ANY communicationchannel IN o2.buyer.contact.communicationchannel satisfies ( communicationchannel.communicationchannelcode = 'EMAIL'
AND communicationchannel.communicationvalue IS NOT NULL ) END
AND o2.ordertypecode = '220'
AND (
o2.ordercategory != 'EXCLUDED_CAT'
OR o2.ordercategory IS NOT valued)
GROUP BY ( array item.communicationvalue FOR item IN o2.buyer.contact.communicationchannel WHEN item.communicationchannelcode = 'EMAIL'
END )
)
i have created this index which is hitted by the query :
CREATE INDEX `idx_customer` ON
`order`(`_class`, ((`buyer`.`contact`).`firstname`), ((`buyer`.`contact`).`lastname`), (DISTINCT (array
(`aoc`.`communicationvalue`) FOR `aoc` IN ((`buyer`.`contact`).`communicationchannel`) WHEN
((`aoc`.`communicationchannelcode`) = "EMAIL") end)))
but my performances are poor and corresponding to other responses it is mainely due to my poor index design.
the explanition of my query is :
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "UnionScan",
"scans": [
{
"#operator": "IntersectScan",
"scans": [
{
"#operator": "IndexScan3",
"index": "idx_customer1",
"index_id": "d1463e49b12fcd45",
"index_projection": {
"primary_key": true
},
"keyspace": "order",
"namespace": "default",
"spans": [
{
"range": [
{
"high": "\"220\"",
"inclusion": 3,
"low": "\"220\""
},
{
"inclusion": 0,
"low": "null"
}
]
}
],
"using": "gsi"
},
{
"#operator": "DistinctScan",
"scan": {
"#operator": "IndexScan3",
"index": "idx_customer",
"index_id": "2132a2f8632e76f3",
"index_projection": {
"primary_key": true
},
"keyspace": "order",
"namespace": "default",
"spans": [
{
"range": [
{
"high": "\"com.lbk.entities.OrderEntity\"",
"inclusion": 3,
"low": "\"com.lbk.entities.OrderEntity\""
},
{
"inclusion": 0,
"low": "null"
}
]
}
],
"using": "gsi"
}
}
]
},
{
"#operator": "IndexScan3",
"index": "idx_customer1",
"index_id": "d1463e49b12fcd45",
"index_projection": {
"primary_key": true
},
"keyspace": "order",
"namespace": "default",
"spans": [
{
"range": [
{
"high": "\"220\"",
"inclusion": 3,
"low": "\"220\""
},
{
"inclusion": 0,
"low": "null"
}
]
}
],
"using": "gsi"
}
]
},
{
"#operator": "Fetch",
"keyspace": "order",
"namespace": "default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "((((((`order`.`_class`) = \"com.lbk.entities.OrderEntity\") and (((lower((((`order`.`buyer`).`contact`).`firstName`)) like \"%aziz%\") or (lower((((`order`.`buyer`).`contact`).`lastName`)) like \"%aziz%\")) or any `communicationChannel` in (((`order`.`buyer`).`contact`).`communicationChannel`) satisfies (((`communicationChannel`.`communicationChannelCode`) = \"EMAIL\") and ((`communicationChannel`.`communicationValue`) = null)) end)) and ((`order`.`orderTypeCode`) = \"220\")) and ((not ((`order`.`orderCategory`) = \"EXCLUDED_CAT\")) or ((`order`.`orderCategory`) is not valued))) and ((`order`.`creationDateTime`) in (select raw max((`O2`.`creationDateTime`)) from `order` as `O2` where ((((((lower((((`O2`.`buyer`).`contact`).`firstName`)) like \"%aziz%\") or (lower((((`O2`.`buyer`).`contact`).`lastName`)) like \"%aziz%\")) or any `communicationChannel` in (((`O2`.`buyer`).`contact`).`communicationChannel`) satisfies (((`communicationChannel`.`communicationChannelCode`) = \"EMAIL\") and ((`communicationChannel`.`communicationValue`) = null)) end) and any `communicationChannel` in (((`O2`.`buyer`).`contact`).`communicationChannel`) satisfies (((`communicationChannel`.`communicationChannelCode`) = \"EMAIL\") and ((`communicationChannel`.`communicationValue`) is not null)) end) and ((`O2`.`orderTypeCode`) = \"220\")) and ((not ((`O2`.`orderCategory`) = \"EXCLUDED_CAT\")) or ((`O2`.`orderCategory`) is not valued))) group by array (`item`.`communicationValue`) for `item` in (((`O2`.`buyer`).`contact`).`communicationChannel`) when ((`item`.`communicationChannelCode`) = \"EMAIL\") end)))"
},
{
"#operator": "InitialGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
},
{
"#operator": "FinalGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"as": "count",
"expr": "count(*)"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
},
"text": "SELECT COUNT(*) AS count FROM `order` \nWHERE `_class` = \"com.lbk.entities.OrderEntity\" \nAND ( \nLOWER(buyer.contact.firstName) LIKE '%aziz%' OR LOWER(buyer.contact.lastName) LIKE '%aziz%' \nOR ANY communicationChannel IN buyer.contact.communicationChannel SATISFIES ( communicationChannel.communicationChannelCode = 'EMAIL' AND communicationChannel.communicationValue = null ) END ) \nAND orderTypeCode = '220' \nAND (orderCategory != 'EXCLUDED_CAT' OR orderCategory is not valued ) \nAND creationDateTime in (select RAW max(O2.creationDateTime) \nfrom `order` O2 WHERE ( LOWER(O2.buyer.contact.firstName) \nLIKE '%aziz%' OR LOWER(O2.buyer.contact.lastName) LIKE '%aziz%' \nOR ANY communicationChannel IN O2.buyer.contact.communicationChannel SATISFIES ( communicationChannel.communicationChannelCode = 'EMAIL' AND communicationChannel.communicationValue = null ) END ) \nAND ANY communicationChannel IN O2.buyer.contact.communicationChannel SATISFIES ( communicationChannel.communicationChannelCode = 'EMAIL' AND communicationChannel.communicationValue is not null ) END \nAND O2.orderTypeCode = '220' \nAND (O2.orderCategory != 'EXCLUDED_CAT'\nOR O2.orderCategory is not valued) \ngroup by ( ARRAY item.communicationValue FOR item IN O2.buyer.contact.communicationChannel WHEN item.communicationChannelCode = 'EMAIL' END ))"
}
When this query is executed in query monitor, it takes (2 seconds even if i have a little number of documents ~2000 ) :
elapsed: 1.93s | execution: 11.93s | count: 1 | size: 34
When executed from my spring boot application using spring data, it take double time (4 seconds).
Thanks for your help
It looks to me you are scanning the ORDER twice. Make sure if query using the ix1 only. If not specify USE INDEX
CREATE INDEX ix1 ON (ordertypecode, buyer.contact.firstname, buyer.contact.lastname, creationdatetime, ordercategory,buyer.contact.communicationchannel)
WHERE `_class` = "com.lbk.entities.OrderEntity";
SELECT SUM(o.cnt) AS count
FROM ( SELECT MAX([o1.creationdatetime,o1.cnt])[1] AS cnt
FROM (SELECT acomval, o2.creationdatetime, COUNT(1) AS cnt
FROM `ORDER` o2
LET acomval = (ARRAY ch.communicationvalue FOR ch IN o2.buyer.contact.communicationchannel
WHEN ch.communicationchannelcode = 'EMAIL' END )
WHERE (LOWER(o2.buyer.contact.firstname) LIKE '%aziz%'
OR LOWER(o2.buyer.contact.lastname) LIKE '%aziz%'
OR ANY ch IN o2.buyer.contact.communicationchannel
SATISFIES ( ch.communicationchannelcode = 'EMAIL' AND ch.communicationvalue IS NULL)
END
)
AND ANY ch IN o2.buyer.contact.communicationchannel
SATISFIES (ch.communicationchannelcode = 'EMAIL' AND ch.communicationvalue IS NOT NULL) END
AND o2.`_class` = "com.lbk.entities.OrderEntity"
AND o2.ordertypecode = '220'
AND ( o2.ordercategory != 'EXCLUDED_CAT' OR o2.ordercategory IS NOT VALUED)
GROUP BY acomval, o2.creationdatetime) AS o1
GROUP BY o1.acomval) AS o;

Couchbase N1QL array query

Document sample from my giata_properties bucket: link
Relevant json paste
{
"propertyCodes": {
"provider": [
{
"code": [
{
"value": [
{
"value": "304387"
}
]
}
],
"providerCode": "hotelbeds",
"providerType": "gds"
},
{
"code": [
{
"value": [
{
"name": "Country Code",
"value": "EG"
},
{
"name": "City Code",
"value": "HRG"
},
{
"name": "Hotel Code",
"value": "91U"
}
]
}
],
"providerCode": "gta",
"providerType": "gds"
}
]
},
"name": "Arabia Azur Resort"
}
I want a query (and an index) to retrieve a document based on propertyCodes.provider.code.value.value and propertyCodes.provider.providerCode. I've managed to do each separately but I'm not sure how to merge both of them in a single query.
SELECT meta().id FROM giata_properties AS gp USE INDEX(`#primary`) WHERE ANY v WITHIN gp.propertyCodes.provider[*].code SATISFIES v.`value` = '150613' END;
SELECT meta().id FROM giata_properties AS gp USE INDEX(`#primary`) WHERE ANY v within gp.propertyCodes.provider[*].providerCode SATISFIES v = 'hotelbeds' END;
So for example I want to fetch the document that includes propertyCodes.provider.code.value.value of 304387 and that provider is also hotelbeds, because code value can be duplicated over documents, but code and providerCode combination is unique.
Here are the query and the indexes.
The query.
SELECT META().id
FROM giata_properties AS gp
WHERE ANY p IN propertyCodes.provider SATISFIES ( ANY v WITHIN p.code SATISFIES v.`value` = '304387' END ) AND p.providerCode = 'hotelbeds' END;
The indexes.
CREATE INDEX idx_value ON giata_properties
( DISTINCT ARRAY ( DISTINCT ARRAY v.`value` FOR v WITHIN p.code END ) FOR p IN propertyCodes.provider END );
CREATE INDEX idx_providerCode ON giata_properties
( DISTINCT ARRAY p.providerCode FOR p IN propertyCodes.provider END );

Create a json object using array of json objects

In postgres Say I have schema as such:
table item {
type varchar(40)
entity_id bigint
entity_type varchar(40)
user_id bigint
}
And I want to query the table to get the info like this:
{
"typeA": {
"count": 3,
"me": true
},
"typeC": {
"count": 3,
"me": false
},
"typeE": {
"count": 3,
"me": false
},
"typeR": {
"count": 3,
"me": true
}
}
From a query where the main data is this:
SELECT ARRAY_AGG(x)
FROM
(
SELECT type,
count(*),
(CASE
WHEN (SELECT id
FROM items as i
WHERE i.entity_type = 'sometype'
AND i.entity_id = 234
AND i.user_id = 32
AND i.type = items.type) is not null
THEN true
ELSE false
END) AS me
FROM items
WHERE items.entity_type = 'sometype'
AND items.entity_id = 234
GROUP BY type
) as x
This returns an array of the info i need type count and me. But I need it formatted like above versus:
[
{
"type": "typeA",
"count": 3,
"me": true
},
{
"type": "typeC",
"count": 3,
"me": false
},
{
"type": "typeE",
"count": 3,
"me": false
},
{
"type": "typeR",
"count": 3,
"me": true
}
]
Which is the current way it is formatted. Have been unable to find a way to build the json object I need. I was able to get three json objects that are like that But I need the three nested in one object.
Not exactly what you want, but from PostgreSQL - Aggregate Functions, I would guess, you can try json_object_agg(name, value), e.g.
SELECT JSON_OBJECT_AGG(type, x)
FROM
(
SELECT type,
count(*),
(CASE
WHEN (SELECT id
FROM items as i
WHERE i.entity_type = 'sometype'
AND i.entity_id = 234
AND i.user_id = 32
AND i.type = items.type) is not null
THEN true
ELSE false
END) AS me
FROM items
WHERE items.entity_type = 'sometype'
AND items.entity_id = 234
GROUP BY type, me
) as x