"Lost data" after changing RAM quota in Couchbase Server - couchbase

I changed the “Index RAM Quota” from Couchbase Server 4.5.1-2845 Community Edition settings and now all the queries don’t find any data…
When I run “SELECT id FROM myBucket” I get 4 ids… The bucket contains thousands of documents, not just 4.
If I click on “Data Buckets” > “MyBucket” > “Documents” I see all the documents, but the queries seem broken.
Any ideas?!
Thank you

Did you try to solve by adding a primary index to your bucket
CREATE PRIMARY INDEX `bucket-name-primary-index` ON `bucket-name` USING GSI;

Related

GIN-like index in Couchbase

As I saw in Couchbase documentation it's possible to create an index for the particular fields of a document: https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/createindex.html
CREATE INDEX country_idx ON `travel-sample`.inventory.airport(country, city)
WITH {"nodes": ["node1:8091", "node2:8091", "node3:8091"]};
Is it possible to create an index for all available fields just like GIN in PostgreSQL?
There is a statement in the section about Community-edition limitations:
In Couchbase Server Community Edition, a single global secondary index can be placed on a single node that runs the indexing service.
Could this "global index" be what I'm looking for? If so how I can create it?
P.S. This one-node limitation doesn't make a sense for me to be honest, even for community edition. What is the point to use the system that can't scale if scalability is its purpose? Maybe I got it wrong?
By default couchbase GSI index is called Global secondary index (Even though data distributed across different data nodes, index will have data from the all the nodes).
Above statement means Community Edition will not support partition index or number of replica index (replica can be over come by creating duplicate index i.e. index with same definition with different name, during execution it will load balance the index scans).
As far as all the fields of the document check out adaptive index, but better performance create targeted indexes.
checkout 8 & 9 https://blog.couchbase.com/create-right-index-get-right-performance/

Is it possible to enable query + index service on an existing 1-node cluster?

Can you enable query + index service on an existing 1 node cluster?
When we fire query in Couchbase 6.0.0 Select Query WorkBench, then
an error occurs:
No index available on keyspace demo that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online.
So we have to enabled query and index service. Is this possible in an existing cluster?
As far as I know, this cannot be done once you've already set up a node. If you've already set up your cluster and did not select index/query services, then you will have to setup again (or add another node with index/query services). You aren't the first to ask, and you can learn more about this feature request here: MB-15357
The error message you're seeing, however suggests that you DO have index/query services setup. The error message simply means you haven't actually created an index. You could start by creating a primary index:
CREATE PRIMARY INDEX ON mybucketname
This is not recommended for production, but then again neither is a 1-node cluster. To learn more about creating indexes, you can check out the Couchbase documentation on Indexes and query performance.

MEMSQL vs. MySQL

I need to start off by pointing out that by no means am I a database expert in any way. I do know how to get around to programming applications in several languages that require database backends, and am relatively familiar with MySQL, Microsoft SQL Server and now MEMSQL - but again, not an expert at databases so your input is very much appreciated.
I have been working on developing an application that has to cross reference several different tables. One very simple example of an issue I recently had, is I have to:
On a daily basis, pull down 600K to 1M records into a temporary table.
Compare what has changed between this new data pull and the old one. Record that information on a separate table.
Repopulate the table with the new records.
Running #2 is a query similar to:
SELECT * FROM (NEW TABLE) LEFT JOIN (OLD TABLE) ON (JOINED FIELD) WHERE (OLD TABLE.FIELD) IS NULL
In this case, I'm comparing the two tables on a given field and then pulling the information of what has changed.
In MySQL (v5.6.26, x64), my query times out. I'm running 4 vCPUs and 8 GB of RAM but note that the rest of my configuration is default configuration (did not tweak any parameters).
In MEMSQL (v5.5.8, x64), my query runs in about 3 seconds on the first try. I'm running the exact same virtual server configuration with 4 vCPUs and 8 GB of RAM, also note that the rest of my configuration is default configuration (did not tweak any parameters).
Also, in MEMSQL, I am running a single node configuration. Same thing for MySQL.
I love the fact that using MEMSQL allowed me to continue developing my project, and I'm coming across even bigger cross-table calculation queries and views that I can run that are running fantastically on MEMSQL... but, in an ideal world, i'd use MySQL. I've already come across the fact that I need to use a different set of tools to manage my instance (i.e.: MySQL Workbench works relatively well with a MEMSQL server but I actually need to build views and tables using the open source SQL Workbench and the mysql java adapter. Same thing for using the Visual Studio MySQL connector, works, but can be painful at times, for some reason I can add queries but can't add table adapters)... sorry, I'll submit a separate question for that :)
Considering both virtual machines are exactly the same configuration, and SSD backed, can anyone give me any recommendations on how to tweak my MySQL instance to run big queries like the one above on MySQL? I understand I can also create an in-memory database but I've read there might be some persistence issues with doing that, not sure.
Thank you!
The most likely reason this happens is because you don't have index on your joined field in one or both tables. According to this article:
https://www.percona.com/blog/2012/04/04/join-optimizations-in-mysql-5-6-and-mariadb-5-5/
Vanilla MySQL only supports nested loop joins, that require the index to perform well (otherwise they take quadratic time).
Both MemSQL and MariaDB support so-called hash join, which does not require you to have indexes on the tables, but consumes more memory. Since your dataset is negligibly small for modern RAM sizes, that extra memory overhead is not noticed in your case.
So all you need to do to address the issue is to add indexes on joined field in both tables.
Also, please describe the issues you are facing with the open source tools when connect to MemSQL in a separate question, or at chat.memsql.com, so that we can fix it in the next version (I work for MemSQL, and compatibility with MySQL tools is one of the priorities for us).

Does a person's internet connection affect the speed of sql queries or php parsing?

I started benchmarking with Zend_Db_Profiler by saving queries that take too long. For one user, this query:
SELECT chapter, order, topic, id, name
FROM topics
WHERE id = '1'
AND hidden = 'no'
took 2.97 seconds. I performed an Explain:
select_type table possible keys key key_len ref rows Extra
SIMPLE topics id id 4 const 42 Using Where
and ran the query myself from phpMyAdmin, and it only took 0.0108 seconds. I thought that perhaps the size of the table might have an effect, as there is one column which is varchar and 8000 characters long, but it's not a part of the Select. I also just switched over to semi-dedicated hosting but can't imagine that this would have had a negative effect. Any thoughts as to how I could troubleshoot would be appreciated.
No. PHP and MySQL are server-side technologies, meaning your server processes them and has no bearing on the client. If your server is slow, it will just be slower in returning the response to the client.
Sadly, your premise about bottleneck here is not right. Also, when testing how one query behaves within your browser and then within PHPMyAdmin (or any other GUI), you have to clear query cache before trying to do the same query again. You didn't mention whether you did that.
The second part of tracking what might be wrong includes confirming that your database's configuration variables have been optimally set, that you chose the proper storage engine, and that your indexing strategy is optimal (such as choosing an INT for primary key instead of VARCHAR and similar atrocities).
That means that in most cases you'd go with InnoDB storage engine. It's free, it's quick if optimized (server variable named innodb_buffer_pool does wonders when set to proper size and when you have sufficient RAM). Seeing you said that you use semi-dedicated hosting implies you don't have control over those configuration variables.
Only when you're sure that
1) you're not testing the same query off of cache
2) that you've done everything within your power to make it optimal (this includes making sure that you don't have rogue processes raping your server).
Only then you can assume there might be an error in communication between the server and client.
As both PHP and SQL run on a server side, the user's internet connection does not affect the speed of the query.
Maybe the database server was too loaded at the time and couldn't pass the query in time.

Entity Framework code first with mysql in Production

I am creating an asp.net *MVC* application using EF code first. I had used Sql azure as my database. But it turns out Sql Azure is not reliable. So I am thinking of using MySql/PostgreSQL for database.
I wanted to know the repercussions/implications of using EF code first with MySql/PostgreSQL in regards of performance.
Has anyone used this combo in production or knows anyone who has used it?
EDIT
I keep on getting following exceptions in Sql Azure.
SqlException: "*A transport-level error has occurred when receiving results from the server.*
(provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)"
SqlException: *"Database 'XXXXXXXXXXXXXXXX' on server 'XXXXXXXXXXXXXXXX' is not
currently available. Please retry the connection later.* If the problem persists, contact
customer support, and provide them the session tracing ID of '4acac87a-bfbe-4ab1-bbb6c-4b81fb315da'.
Login failed for user 'XXXXXXXXXXXXXXXX'."
First your problem seems to be a network issue, perhaps with your ISP. You may want to look at getting a remote PostgreSQL or MySQL db I think you will run into the same problems.
Secondly comparing MySQL and PostgreSQL performance is relatively tricky. In general, MySQL is optimized for pkey lookups, and PostgreSQL is more generally optimized for complex use cases. This may be a bit low-level but....
MySQL InnoDB tables are basically btree indexes where the leaf note includes the table data. The primary key is the key of the index. If no primary key is provided, one will be created for you. This means two things:
select * from my_large_table will be slow as there is no support for a physical order scan.
Select * from my_large_table where secondary_index_value = 2 requires two index traversals sinc ethe secondary index an only refer to the primary key values.
In contrast a selection for a primary key value will be faster than on PostgreSQL because the index contains the data.
PostgreSQL by comparison stores information in an unordered way in a series of heap pages. The indexes are separate from the data. If you want to pull by primary key you scan the index, then read the data page in which the data is found, and then pull the data. In comparison, if you pull from a secondary index, this is not any slower. Additionally, the tables are structured such that sequential disk access is possible when doing a long select * from my_large_table will result in the operating system read-ahead cache being able to speed performance significantly.
In short, if your queries are simply joinless selection by primary key, then MySQL will give you better performance. If you have joins and such, PostgreSQL will do better.