Couchbase Index and N1QL Query - couchbase

I have created a new bucket, FooBar on my couchbase server.
I have a Json Document which is a List with some properties and it is in my couchbase bucket as follows:
[
{
"Venue": "Venue1",
"Country": "AU",
"Locale": "QLD"
},
{
"Venue": "Venue2",
"Country": "AU",
"Locale": "NSW"
},
{
"Venue": "Venue3",
"Country": "AU",
"Locale": "NSW"
}
]
How Do i get the couchbase query to return a List of Locations when using N1QL query.
For instance, SELECT * FROM FooBar WHERE Locale = 'QLD'
Please let me know of any indexes I would need to create as well. Additionally, how can i return only results where the object is of type Location, and not say another object which may have the 'Locale' Property.
Chud
PS - I have also created some indexes, however I would like an unbiased answer on how to achieve this.

Typically you would store these as separate documents, rather than in a single document as an array of objects, which is how the data is currently shown.
Since you can mix document structures, the usual pattern to distinguish them is to have something like a 'type' field. ('type' is in no way special, just the most common choice.)
So your example would look like:
{
"Venue": "Venue1",
"Country": "AU",
"Locale": "QLD",
"type": "location"
}
...
{
"Venue": "Venue3",
"Country": "AU",
"Locale": "NSW",
"type": "location"
}
where each JSON object would be a separate document with a unique document ID. (If you have some predefined data you want to load, look at cbimport for how to add it to your database. There are a few different formats for doing it. You can also have it generate document IDs for you.)
Then, what #vsr wrote is correct. You'd create an index on the Locale field. That will be optimal for the query you want. Note you could create an index on every document with CREATE INDEX ix1 ON FooBar(Locale); too. In this simple case it doesn't really make a difference. Read about the query Explain feature of the admin console to for help using that to understand optimizing queries.
Finally, the query #vsr wrote is also correct:
SELECT * FROM FooBar WHERE type = "Location" AND Locale = "QLD";

CREATE INDEX ix1 ON FooBar(Locale);
https://dzone.com/articles/designing-index-for-query-in-couchbase-n1ql
CREATE INDEX ix1 ON FooBar(Locale) WHERE type = "Location";
SELECT * FROM FooBar WHERE type = "Location" AND Locale = "QLD";
If it is array and filed name is list
CREATE INDEX ix1 ON FooBar(DISTINCT ARRAY v.Locale FOR v IN list END) WHERE type = "Location";
SELECT * FROM FooBar WHERE type = "Location" AND ANY v IN list SATISFIES v.Locale = "QLD" END;

Related

Confusion about Couchbase keys and indexes

I have imported a dataset into Couchbase that looks like so:
{
"CLUSTER": "M1M",
"CLUSTER_NAME": "MARTIN MARIETTA",
"PRIMARY": "",
"SET_NUM": "10000163",
SHORTENED_NAME": "MARTIN MARIETTA MATERIALS",
"TYPE": "SET",
"_class": "com.company.aad.xref.model.ClusterCodeXref"
}
I had to provide a key-generation strategy, and I made the strategy what I ultimately want my index to look like, %SET_NUM%::%TYPE%. So I have a couple of questions:
Does the key-generation automatically create a field called ID with those 2 elements, or do I need to create an ID column in my CSV dataset?
How can I create an index out of those 2 fields? I understand how to use the CREATE INDEX command with composite fields, but will that index look like the key generated by %SET_NUM%::%TYPE%? I need them to be the same, with the :: in the middle.
I hope my question is clear! Would appreciate any help.
In Couchbase, the ID/key of a document is not actually in the document itself. If you use the --generate-key template, your document would look something like:
key = "10000163::SET"
{
"CLUSTER": "M1M",
"CLUSTER_NAME": "MARTIN MARIETTA",
"PRIMARY": "",
"SET_NUM": "10000163",
SHORTENED_NAME": "MARTIN MARIETTA MATERIALS",
"TYPE": "SET",
"_class": "com.company.aad.xref.model.ClusterCodeXref"
}
There is no designated "id" field in Couchbase. You can certainly create an id field, but it will be just like any other field.
As for an index, it depends on what kind of query you want to run. You can CREATE INDEX idx_setnumandtype ON bucketname (SET_NUM, TYPE) as you mentioned. This is going to be a useful index for queries like: SELECT b.* FROM bucketname WHERE SET_NUM = 'foo' AND TYPE = 'bar';
But, if you know those two values and just need to do a lookup of a single document, you don't necessary need to create an index or use N1QL. You can simply do a key/value GET operation. In Java for instance: bucket.get("10000163::SET")

PostgreSQL: Querying a table based on the existence of a dynamic field in an object nested inside another object which is in an array of objects

In a table items, I have a jsonb column called users. The JSON structure of users follows the following example:
[
{
"required": 1,
"agents": {
"user1": "A",
"user2": "P",
"user3": "A"
}
},
{
"required": 3,
"agents": {
"user1": "P",
"user4": "P",
"user5": "P"
}
}
]
Note that the table items has many fields, but for the sake of simplicity, we can consider that it has only an item_id and a users field. And all answers I saw here on SO provide queries for elements of objects directly inside an array.
I also wish I could rewrite the object's structure in a better way, but it's not my decision in this case :D.
I'm new to JSON queries in postgres, so I tried to write a few queries without success.
Question:
I'm trying to find a query, that can return all items that have a key 'user4' inside the agents sub-object of any element in the array. Any suggestions?
Use the function jsonb_array_elements() and the ? operator:
select i.*
from items i
cross join jsonb_array_elements(users)
where value->'agents' ? 'user4'
See JSON Functions and Operators.

Is it possible to query JSON data in DynamoDB?

Let's say my JSON looks like this (example provided here) -
{
"year" : 2013,
"title" : "Turn It Down, Or Else!",
"info" : {
"directors" : [
"Alice Smith",
"Bob Jones"
],
"release_date" : "2013-01-18T00:00:00Z",
"rating" : 6.2,
"genres" : [
"Comedy",
"Drama"
],
"image_url" : "http://ia.media-imdb.com/images/N/O9ERWAU7FS797AJ7LU8HN09AMUP908RLlo5JF90EWR7LJKQ7##._V1_SX400_.jpg",
"plot" : "A rock band plays their music at high volumes, annoying the neighbors.",
"rank" : 11,
"running_time_secs" : 5215,
"actors" : [
"David Matthewman",
"Ann Thomas",
"Jonathan G. Neff"
]
}
}
I would like to query all movies where genres contains Drama.
I went through all of the examples but it seems that I can query only on hash key and sort key. I can't have JSON document as key itself as that is not supported.
You cannot. DynamoDB requires that all attributes you are filtering for have an index.
As you want to query independently of your main index, you are limited to Global Secondary Indexes.
The documentation lists on what kind of attributes indexes are supported:
The index key attributes can consist of any top-level String, Number, or Binary attributes from the base table; other scalar types, document types, and set types are not allowed.
Your type would be an array of Strings. So this query operation isn't supported by DynamoDB at this time.
You might want to consider other NoSQL document based databases which are more flexible like MongoDB Atlas, if you need this kind of querying functionality.
String filterExpression = "coloumnname.info.genres= :param";
Map valueMap = new HashMap();
valueMap.put(":param", "Drama");
ItemCollection scanResult = table
.scan(new ScanSpec().
withFilterExpression(filterExpression).
withValueMap(valueMap));
One example that I took from AWS Developer Forums is as follows.
We got some hints for you from our team. Filter/condition expressions for maps have to have key names at each level of the map specified separately in the expression attributeNames map.
Your expression should look like this:
{
"TableName": "genericPodcast",
"FilterExpression": "#keyone.#keytwo.#keythree = :keyone",
"ExpressionAttributeNames": {
"#keyone": "attributes",
"#keytwo": "playbackInfo",
"#keythree": "episodeGuid"
},
"ExpressionAttributeValues": {
":keyone": {
"S": "podlove-2018-05-02t19:06:11+00:00-964957ce3b62a02"
}
}
}

N1QL nested json, query on field inside object inside array

I have json documents in my Couchbase cluster that looks like this
{
"giata_properties": {
"propertyCodes": {
"provider": [
{
"code": [
{
"value": [
{
"name": "Country Code",
"value": "EG"
},
{
"name": "City Code",
"value": "HRG"
},
{
"name": "Hotel Code",
"value": "91U"
}
]
}
],
"providerCode": "gta",
"providerType": "gds"
},
{
"code": [
{
"value": [
{
"value": "071801"
}
]
},
{
"value": [
{
"value": "766344"
}
]
}
],
"providerCode": "restel",
"providerType": "gds"
},
{
"code": [
{
"value": [
{
"value": "HRG03Z"
}
]
},
{
"value": [
{
"value": "HRG04Z"
}
]
}
],
"providerCode": "5VF",
"providerType": "tourOperator"
}
]
}
}
}
I'm trying to create a query that fetches a single document based on the value of giata_properties.propertyCodes.provider.code.value.value and a specific providerType.
So for example, my input is 071801 and restel, I want a query that will fetch me the document I pasted above (because it contains these values).
I'm pretty new to N1QL so what I tried so far is (without the providerType input)
SELECT * FROM giata_properties AS gp
WHERE ANY `field` IN `gp.propertyCodes.provider.code.value` SATISFIES `field.value` = '071801' END;
This returns me an empty result set. I'm probably doing all of this wrongly.
edit1:
According to geraldss answer I was able to achieve my goal via 2 different queries
1st (More general) ~2m50.9903732s
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp SATISFIES v.`value` = '071801' END;
2nd (More specific) ~2m31.3660388s
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp.propertyCodes.provider[*].code SATISFIES v.`value` = '071801' END;
Bucket have around 550K documents. No indexes but the primary currently.
Question part 2
When I do either of the above queries, I get a result streamed to my shell very quickly, then I spend the rest of the query time waiting for the engine to finish iterating over all documents. I'm sure that I'll be only getting 1 result from future queries so I thought I can use LIMIT 1 so the engine stops searching on first result, I tried something like
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp SATISFIES v.`value` = '071801' END LIMIT 1;
But that made no difference, I get a document written to my shell and then keep waiting until the query finishes completely. How can this be configured correctly?
edit2:
I've upgraded to the latest enterprise 4.5.1-2844, I have only the primary index created on giata_properties bucket, when I execute the query along with the LIMIT 1 keyword it still takes the same time, it doesn't stop quicker.
I've also tried creating the array index you suggested but the query is not using the index and it keeps insisting on using the #primary index (even if I use USE INDEX clause).
I tried removing SELF from the index you suggested and it took a much longer time to build and now the query can use this new index, but I'm honestly not sure what I'm doing here.
So 3 questions:
1) Why LIMIT 1 using primary index doesn't make the query stop at first result?
2) What's the difference between the index you suggested with and without SELF? I tried to look for SELF keyword documentation but I couldn't find anything.
This is how both indexes look in Web ui
Index 1 (Your original suggestion) - Not working
CREATE INDEX `gp_idx1` ON `giata_properties`((distinct (array (`v`.`value`) for `v` within (array_star((((self.`giata_properties`).`propertyCodes`).`provider`)).`code`) end)))
Index 2 (Without SELF)
CREATE INDEX `gp_idx2` ON `giata_properties`((distinct (array (`v`.`value`) for `v` within (array_star(((self.`propertyCodes`).`provider`)).`code`) end)))
3) What would be the query for a specific giata_properties.propertyCodes.provider.code.value.value and a specific providerCode? I managed to do both separately but I wasn't successful in merging them.
Thanks for all your help dear
Here is a query without the providerType.
EXPLAIN SELECT *
FROM giata_properties AS gp
WHERE ANY v WITHIN gp.giata_properties.propertyCodes.provider[*].code SATISFIES v.`value` = '071801' END;
You can also index this in Couchbase 4.5.0 and above.
CREATE INDEX idx1 ON giata_properties( DISTINCT ARRAY v.`value` FOR v WITHIN SELF.giata_properties.propertyCodes.provider[*].code END );
Edit to answer question edits
The performance has been addressed in 4.5.x. You should try the following on Couchbase 4.5.1 and post the execution times here.
Test on 4.5.1.
Create the index.
Use the LIMIT. In 4.5.1, the limit is pushed down to the index.

How to add nested json object to Lucene Index

I need a little help regarding lucene index files, thought, maybe some of you guys can help me out.
I have json like this:
[
{
"Id": 4476,
"UrlName": null,
"PhoneData": [
{
"PhoneType": "O",
"PhoneNumber": "0065898",
},
{
"PhoneType": "F",
"PhoneNumber": "0065898",
}
],
"Contact": [],
"Services": [
{
"ServiceId": 10,
"ServiceGroup": 2
},
{
"ServiceId": 20,
"ServiceGroup": 1
}
],
}
]
Adding first two fields is relatively easy:
// add lucene fields mapped to db fields
doc.Add(new Field("Id", sampleData.Id.Value.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("UrlName", sampleData.UrlName.Value ?? "null" , Field.Store.YES, Field.Index.ANALYZED));
But how I can add PhoneData and Services to index so it can be connected to unique Id??
For indexing JSON objects I would go this way:
Store the whole value under a payload field, named for example $json. This field would be stored but not indexed.
For each (indexable) property (maybe nested) create an indexable field with its name as a XMLPath-like expression identifying the property, for example PhoneData.PhoneType
If is ok that all nested properties will be indexed then it's simple, just iterate over all of them generating this indexable field.
But if you don't want to index all of them (a more realistic case), how to know which property is indexable is another problem; in this case you could:
Accept from the client the path expressions of the index fields to be created when storing the document, or
Put JSON Schema into play to describe your data (assuming your JSON records have a common schema), and extend it with a custom property that would allow you to tag which properties are indexable.
I have created a library doing this (and much more) that maybe can help you.
You can check it at https://github.com/brutusin/flea-db