I'm following along with the N1QL tutorial for Couchbase and the first step is to create an index which is straight forward only part of the command that I am unclear about is the last parameter USING GSI can someone explain this. Initially I thought the GSI was a field specific to this bucket but it doesn't appear to be in any of the documents.
CREATE PRIMARY INDEX ON `beer-sample` USING GSI;
USING GSI tells Couchbase to use an internal indexing technology called GSI (Global Secondary Indexing). GSI is the default, so you can leave out "USING GSI". It is not related to documents or data. The Couchbase documentation explains all of this in detail.
Related
As I saw in Couchbase documentation it's possible to create an index for the particular fields of a document: https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/createindex.html
CREATE INDEX country_idx ON `travel-sample`.inventory.airport(country, city)
WITH {"nodes": ["node1:8091", "node2:8091", "node3:8091"]};
Is it possible to create an index for all available fields just like GIN in PostgreSQL?
There is a statement in the section about Community-edition limitations:
In Couchbase Server Community Edition, a single global secondary index can be placed on a single node that runs the indexing service.
Could this "global index" be what I'm looking for? If so how I can create it?
P.S. This one-node limitation doesn't make a sense for me to be honest, even for community edition. What is the point to use the system that can't scale if scalability is its purpose? Maybe I got it wrong?
By default couchbase GSI index is called Global secondary index (Even though data distributed across different data nodes, index will have data from the all the nodes).
Above statement means Community Edition will not support partition index or number of replica index (replica can be over come by creating duplicate index i.e. index with same definition with different name, during execution it will load balance the index scans).
As far as all the fields of the document check out adaptive index, but better performance create targeted indexes.
checkout 8 & 9 https://blog.couchbase.com/create-right-index-get-right-performance/
Can you enable query + index service on an existing 1 node cluster?
When we fire query in Couchbase 6.0.0 Select Query WorkBench, then
an error occurs:
No index available on keyspace demo that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online.
So we have to enabled query and index service. Is this possible in an existing cluster?
As far as I know, this cannot be done once you've already set up a node. If you've already set up your cluster and did not select index/query services, then you will have to setup again (or add another node with index/query services). You aren't the first to ask, and you can learn more about this feature request here: MB-15357
The error message you're seeing, however suggests that you DO have index/query services setup. The error message simply means you haven't actually created an index. You could start by creating a primary index:
CREATE PRIMARY INDEX ON mybucketname
This is not recommended for production, but then again neither is a 1-node cluster. To learn more about creating indexes, you can check out the Couchbase documentation on Indexes and query performance.
Im new to Couchbase and I'm wondering if it is possible to perform a N1QL query based on an existing View.
I'm curious if something like this would work in Couchbase:
Select * From `MY_VIEW_INSTEAD_OF_BUCKET` where id = 12
From what I've gathered Views are a way to query the DB and N1QL is the new alternative I'm essentially trying to combined the two.
I ask this question because I believe querying against the view will be more efficient than querying against the entire bucket.
I would refer you to the N1QL documentation. In short, you cannot query views directly. N1QL can create and use indexes using views, and it can also create and use indexes using a dedicated technology called GSI. All the details are in the documentation.
This post says:
If you’re running Innodb Plugin on Percona Server with XtraDB you get
benefit of a great new feature – ability to build indexes by sort
instead of via insertion
However I could not find any info on this. I'd like to have an ability to reorganize how a table is laid out physically, similar to Postgre CLUSTER command, or MyISAM "alter table ... order by". For example table "posts" has millions of rows in random insertion order, most queries use "where userid = " and I want the table to have rows belonging to one user physically separated nearby on disk, so that common queries require low IO. Is it possible with XtraDB?
Clarification concerning the blog post
The feature you are basically looking at is fast index creation. This features speeds up the creation of secondary indexes to InnoDB tables, but it is only used in very specific cases. For example the feature is not used while OPTIMIZE TABLE, which can therefore be dramatically speed up by dropping the indexes first, then run OPTIMIZE TABLE and then recreate the indexes with fast index creation (about this was the post you linked).
Some kind of automation for the cases, which can be improved by using this feature manually like above, was added to Percona Server as a system variable named expand_fast_index_creation. If activated, the server should use fast index creation not only in the very specific cases, but in all cases it might help, such as OPTIMIZE TABLE — the problem mentioned in the linked blog article.
Concerning your question
Your question was actually if it is possible to save InnoDB tables in a custom order to speed up specific kind of queries by exploiting locality on the disk.
This is not possible. InnoDB rows are saved in pages, based on the clustered index (which is essentially the primary key). The rows/pages might be in chaotic ordering, for which one can OPTIMIZE TABLE the InnoDB table. With this command the table is actually recreated in primary key order. This allows to gather primary key local rows on the same or neighboring pages.
That is all you can force InnoDB to do. You can read the manual about clustered index, another page in the manual as a definite answer that this is not possible ("ORDER BY does not make sense for InnoDB tables because InnoDB always orders table rows according to the clustered index.") and the same question on dba.stackexchange which answers might interest you.
I am creating an asp.net *MVC* application using EF code first. I had used Sql azure as my database. But it turns out Sql Azure is not reliable. So I am thinking of using MySql/PostgreSQL for database.
I wanted to know the repercussions/implications of using EF code first with MySql/PostgreSQL in regards of performance.
Has anyone used this combo in production or knows anyone who has used it?
EDIT
I keep on getting following exceptions in Sql Azure.
SqlException: "*A transport-level error has occurred when receiving results from the server.*
(provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)"
SqlException: *"Database 'XXXXXXXXXXXXXXXX' on server 'XXXXXXXXXXXXXXXX' is not
currently available. Please retry the connection later.* If the problem persists, contact
customer support, and provide them the session tracing ID of '4acac87a-bfbe-4ab1-bbb6c-4b81fb315da'.
Login failed for user 'XXXXXXXXXXXXXXXX'."
First your problem seems to be a network issue, perhaps with your ISP. You may want to look at getting a remote PostgreSQL or MySQL db I think you will run into the same problems.
Secondly comparing MySQL and PostgreSQL performance is relatively tricky. In general, MySQL is optimized for pkey lookups, and PostgreSQL is more generally optimized for complex use cases. This may be a bit low-level but....
MySQL InnoDB tables are basically btree indexes where the leaf note includes the table data. The primary key is the key of the index. If no primary key is provided, one will be created for you. This means two things:
select * from my_large_table will be slow as there is no support for a physical order scan.
Select * from my_large_table where secondary_index_value = 2 requires two index traversals sinc ethe secondary index an only refer to the primary key values.
In contrast a selection for a primary key value will be faster than on PostgreSQL because the index contains the data.
PostgreSQL by comparison stores information in an unordered way in a series of heap pages. The indexes are separate from the data. If you want to pull by primary key you scan the index, then read the data page in which the data is found, and then pull the data. In comparison, if you pull from a secondary index, this is not any slower. Additionally, the tables are structured such that sequential disk access is possible when doing a long select * from my_large_table will result in the operating system read-ahead cache being able to speed performance significantly.
In short, if your queries are simply joinless selection by primary key, then MySQL will give you better performance. If you have joins and such, PostgreSQL will do better.