what is view in couchbase - couchbase

I am trying to understand what exactly couchbase view is used for, I have gone through some materials in docs, but the 'view' concept does not settle me quite well.
Are views in couchbase analogues to views in view in RDBMS?
https://docs.couchbase.com/server/6.0/learn/views/views-basics.html
A view performs the following on the Couchbase unstructured (or
semi-structured) data:
Extract specific fields and information from the data files.
Produce a view index of the selected information.
how does view and index work here, seems there is separate index for view. so if a documents updates are both indexes updated?
https://docs.couchbase.com/server/6.0/learn/views/views-store-data.html
In addition, the indexing of data is also affected by the view system
and the settings used when the view is accessed.
Helpful post:
Views in Couchbase

You can think of Couchbase Map/Reduce views as similar to materialized views, yes. Except that you create them with JavaScript functions (a map function and optionally a reduce function).
For example:
function(doc, meta)
{
emit(doc.name, [doc.city]);
}
This will look at every document, and save a view of each document that contains just city, and has a key of name.
For instance, let's suppose you have two documents;
[
key 1 {
"name" : "matt",
"city" : "new york",
"salary" : "100",
"bio" : "lorem ipsum dolor ... "
},
key 2 {
"name" : "emma",
"city" : "columbus",
"salary" : "120",
"bio" : "foo bar baz ... "
}
]
Then, when you 'query' this view, instead of full documents, you'll get:
[
key "matt" {
"city" : "new york"
},
key "emma" {
"city" : "columbus"
}
]
This is a very simple map. You can also use reduce functions like _count, _sum, _stats, or your own custom.
The results of this view are stored alongside the data on each node (and updated whenever the data is updated). However, you should probably stay away from Couchbase views because:
Views are stored alongside the data on each node. So when reading it, data has to be pulled from every node, combined, and pulled again. "Scatter/gather"
JavaScript map/reduce doesn't give you all the query capabilities you might want. You can't do stuff like 'joins', for instance.
Couchbase has SQL++ (aka N1QL), which is more concise, declarative, and uses global indexes (instead of scatter/gather), so it will likely be faster and reduce strains during rebalance, etc.
Are deprecated as of Couchbase Server 7.0 (and not available in Couchbase Capella at all)

Related

Solr facet doesn't segment text

I am a beginner of Solr. I push the books.json into Solr, which looks like
{
"id" : "978-0641723445",
"cat" : ["book","hardcover"],
"name" : "The Lightning Thief",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 1,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 12.50,
"pages_i" : 384
}
then I change the schema of "name" to
<field name="name" type="text_general"/> with everything else unchanged. The Analysis in Solr gives correct segmentation. However, when I run the query http://localhost:8983/solr/testCore/select?facet.field=name&facet=on&indent=on&q=*:*&wt=json
the output is not segmented:
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"name":[
"Lucene in Action, Second Edition",1,
"Sophie's World : The Greek Philosophers",1,
"The Lightning Thief",1,
"The Sea of Monsters",1]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}
Can anyone explain why?
After changing the definition of a field (unless you're only changing the "query" part of an analysis chain that has separate query and index chains), you'll have to reindex your content.
Since the facet module works on the actual tokens generated in the index, you have to clean out the old index and reindex all your content, so that each value is processed again and divided into tokens matching the behaviour you're looking for.
If all your documents are still present when you're reindexing (so all the old ids are still there), you don't have to clean out the index first, since all the old tokens will be overwritten. But to be sure you can delete everything first, then reindex your content and see the new tokens.
You can also do this while in production as long as a commit doesn't happen in between; first issue the delete, then reindex and then commit. Until the commit happens all the old data will still be available (but be aware that other threads or other code can issue a commit while you're working on the index, so be sure you're the only one issuing commits first).

CloudFormation Template - any way to get a Spot-Fleet-Request ID?

I'm attempting to create a single template that creates the following:
AWS::EC2::SpotFleet resource
2 AWS::ApplicationAutoScaling::ScalingPolicy resources (scale up, scale down)
Initially, my template included only the SpotFleet resource, and I confirmed that the stack would create without issue. Once I added the ScalingPolicy resources, the stack would rollback because there was "No scalable target registered for namespace..." So, I added an additional resource.
AWS::ApplicationAutoScaling::ScalableTarget resource.
(From http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-applicationautoscaling-scalabletarget.html#cfn-applicationautoscaling-scalabletarget-resourceid)
{
"Type" : "AWS::ApplicationAutoScaling::ScalableTarget",
"Properties" : {
"MaxCapacity" : Integer,
"MinCapacity" : Integer,
"ResourceId" : String,
"RoleARN" : String,
"ScalableDimension" : String,
"ServiceNamespace" : String
}
}
The ResourceID is a required property. I have the data for all the other properties, but when researching what data is needed for the ResourceID property, I have found that the data I need is the spot-fleet-request ID, (something like this: "SpotFleetRequestId": "sfr-73fbd2ce-aa30-494c-8788-1cee4EXAMPLE").
So here's the problem: Since I am creating the spot fleet request in the same template as the scaling policy, I can't put the SpotFleetRequestId in manually, since to my knowledge this is created when the resource is and there's no way to anticipate what the request ID will be. In other templates, with other kinds of resources, I've simply used "Ref" or "Fn::GetAtt" to pass in the arn of a resource without having to manually input this. However--there seems to be no way to do this with a SpotFleetRequestID. All the research I've done has turned up nothing, not even a single template example that uses a method like I'm describing - the only examples available assume that the scalable target resource already exists and the SpotFleetRequestID is known prior to creating the ScalingPolicy.
Does anyone have any idea if referring to the SpotFleetRequestID of an AWS::EC2::SpotFleet initialized in the same template is even possible? Or am I just missing something REALLY obvious?
-KungFuBilly
Turns out that if you "Ref" the logical name of the AWS::EC2::SpotFleet it will return the request ID. Then, it's a matter of using "Fn::Join" to get the right data for the ResourceID. Should look something like this:
"ResourceId": {
"Fn::Join": [
"/",
[
"spot-fleet-request",
{
"Ref": "SpotFleet"
}
]
]
},
That will output: spot-fleet-request/"SpotFleetRequestID"

How do I ensure Firebase database structure using anonymous auth?

I have a public-input type app using Firebase, with anonymous auth. The user data is used to create points on a map. Each anonymous user can only edit the data inside the node matching their auth id - via security rules.
However, my app depends on a certain database structure. How do I ensure my database structure/integrity using anonymous auth, since the database url is client-side readable?
I think it is possible with security and validation rules, but I'm not sure. Maybe deny children creation in a node? This would be necessary to ensure the schema is followed.
Each auth node can have many key nodes, but I would want to limit this Firebase-side. And each key node must follow the schema below (so I can pull out the geojson easily). Below is my current setup - wondering what is missing?
"features" : {
"5AGxfaK2q8hjJsmsO3PUxUs09Sz1" : {
"-KS3R4sWPdcDkrxyIFX6" : {
"geometry" : {
"coordinates" : [ -81.88247680664062, 38.884619201291905 ],
"type" : "Point"
},
"properties" : {
"color" : "#2be",
"title" : ""
},
"type" : "Feature"
},
Authentication and database schema are completely separate topics. You ensure database schema by using a combination of .write and .validate rules in your security doc, not by anything to do with your authentication provider (i.e. Anonymous authentication).
This is described in detail in our database security guide.
A quick summary:
hasChildren: specify required fields
newData: refer to the data being written
data: refer to data already in the database
.validate: specify data schema using things like newData.isString() or newData.val() == data.val() + 1
Keep in mind that .validate rules are only run for non-null values. Thus, if you want to try something like !data.exists() (i.e. you can only write to this path once and can't modify it later) or newData.exists() (i.e. you can't delete this data) then you need to specify those in a .write rule.
Refer to the guide for more detail.

How to Update Parts of a document in Couchbase

In scanning the docs I cannot find how to update part of a document.
for example - say the whole document looks like this:
{
"Active": true,
"Barcode": "123456789",
"BrandID": "9f3751ef-f14f-464a-bb86-854e99cf14c0",
"BuyCurrencyOverride": ".37",
"BuyDiscountAmount": "45.00",
"ID": "003565a3-4a0d-47d9-befb-0ac642cb8057",
}
but I only want to work with part of the document as I don't want to be selecting / updating the whole document in many cases:
{
"Active": false,
"Barcode": "999999999",
"BrandID": "9f3751ef-f14f-464a-bb86-854e99cf14c0",
"ID": "003565a3-4a0d-47d9-befb-0ac642cb8057",
}
How can I use N1QL to just update those fields? Upsert completely replaces the whole document and update statement is not that clear.
Thanks
The answer to your question depends on why you want to update only part of the document (e.g., are you concerned about network bandwidth?), and how you want to perform the update (e.g., from the web console? from a program using the SDK?).
The 4.5 sub-document API, for which you provided a link in your comment, is a feature only available via the SDK (e.g., from Go or Java programs), and the goal of that feature is to reduce network bandwidth by no transmitting entire documents around. Does your use case include programmatic document modifications via the SDK? If so, then the sub-document API is a good way to go.
Using the "UPDATE" statement in N1QL is a good way to change any number of documents that match a pattern for which you can specify a "WHERE" clause. As noted above, it works very similarly to the "UPDATE" statement in SQL. To use your example above, you could change the "Active" field to false in any documents where the BuyDiscountAmount was "45.00":
UPDATE my bucket SET Active = false WHERE BuyDiscountAmount = "45.00"
When running N1QL UPDATE queries, almost all the network traffic will be between the Query, Index, and Data nodes of your cluster, so a N1QL update does not cause much network traffic into/out-of your cluster.
If you provide more details about your use case, and why you want to update only part of your documents, I could provide more specific advice on the right approach to take.
The sub-doc API introduced in Couchbase4.5 is currently not used by N1QL. However, when you use the UPDATE statement to update parts of one or more documents.
http://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/update.html
Let me know any Qs.
-Prasad
It is simple like sql query.
update `Employee` set District='SambalPur' where EmpId="1003"
and here is the responce
{
"Employee": {
"Country": "India",
"District": "SambalPur",
"EmpId": "1003",
"EmpName": "shyam",
"Location": "New-Delhi"
}
}

Database schema or framework to support keyword searches

About to add keyword/tags to one of the business objects in our database, let's call the table users. I've considered adding a tags table and a usertags table, but I can't see an easy way to perform queries which would contain and and or. For example, I'd like to be able to return all the users that have tag A AND B, as well as query for users with tag A OR B. OR queries are easy, but AND queries are
I've considered even putting all the user records into a json backed database so I could have all the users duplicated like this:
{
user_id:1,
keyword:"A",
keyword:"B"
}
etc.
but I'm not sure how performant a database like MongoDB is when running queries like this.
Yet another option is to have a tags field on the user table, and use REGEX queries. In some ways I like this the best, since it means it's much easier to have ad hoc queries, but I'm worried about performance.
Note that the tag isn't the only field that we need to search by, so ideally we'd have a solution that supports date range searches as well as searches against other fields.
I can only really talk of MongoDB for that matter, so I'll stick to it.
Let's assume a more accurate model like
{
_id: "foo#bar.com",
keywords: [ "A", "B" ],
joined: ISODate("2014-12-28T12:00:00.123Z"),
tags: [ "C", "D" ],
location: { type: "Point", coordinates: [ 38.1200538, -86.9141607 ] },
notes: "Lorem ipsum dolor sic amet."
}
Performance in MongoDB is determined more or less by two factors: wether a field you query is indexed and wether the index is in RAM. In general, MongoDB tries to keep at least all indices in RAM, plus as big of a subset of the data as possible. Indexing a field is quite easy. To stick with your first requirement, we index the keywords field:
db.yourCollection.ensureIndex({ keywords: 1})
What happens now is that MongoDB will create a list of keywords and a link to the respective documents. So if you do a query for keyword "A"
db.yourCollection.find({keywords: "A"})
only the documents actually containing the keyword "A" will be read and returned. This is called an index scan. If there wasn't an index on "keywords", MongoDB would have read each and every document in the collection, checking wether the keyword field contained "A" and added the respective documents to the result set, which is called a collection scan.
Now, checking for a document that has both the "A" and the "B" keyword, that would be rather simple:
db.yourCollection.find({$or: [ {keywords:"A"}, {keywords:"B"} ] })
Since we have indexed the "keywords" field, the logical check is done in RAM and the respective documents are added to the result set.
As for regex searches, they are absolutely possible and quite fast for indexed fields:
db.yourCollection.find({keywords: /^C.*/i})
will return all documents which contain keywords beginning with the letter "c" (case insensitive) using an index scan.
As for your requirement for doing queries on date ranges:
db.yourCollection.find({joined:
{
$gte: ISODate("2014-12-28T00:00:00.000Z"),
$lt: ISODate("2014-12-29T00:00:00.000Z")
}
})
will return all users who joined on the Dec 28, 2014. Since we haven't created an index on the field yet, a collection scan would have been used. Of course, you can create an index on the "joined" field.
So, let's assume you want to find all users with a keyword "A" from Santa Claus, IN:
db.yourCollection.find({
keywords: "A",
location: {
$nearSphere : {
$geometry: {
type : "Point",
coordinates: [ 38.1200538, -86.9141607 ]
},
$minDistance: 0,
$maxDistance: 10000
}
}
})
This will return... Nothing, iirc, since we have to create a geospatial index first:
db.collection.ensureIndex( { location : "2dsphere" } )
Now the mentioned query will work as expected.
Conclusion
Your requirements can be fulfilled by MongoDB and with proper indexing with good performance. However, you might want to dig into MongoDBs restrictions.
You might want to read a bit more. Here are my suggestions:
Introduction to MongoDB
Index documentation
Data modelling introduction