Empty N1QL result set - couchbase

On my couchbase bucket I have a bucket (myBucket) which contains 1.7 billion documents. I have a primary index on the bucket that should make myBucket fully queryable.
CREATE PRIMARY INDEX 'my_primary' ON myBucket
The issue is that I cannot get ANY results from N1QL. All responses are empty. Even doing something as simple as:
SELECT * from myBucket LIMIT 1;
Winds up returning an empty set.

Can you provide some basic information about your setup, server version, document size. Also, check logs (especially indexer.log and query.log) if it has reported any errors/warnings.
To make sure sanity of the setup, can you first try with a smaller dataset, or rather create a partial index on smaller amount of data and try the query using that index. Based on that we can guide you further.
-Prasad

Related

need to know about performance mysql vs Couchbase

I have three table in mysql
user(1K)
Campaign(6K)
CamapaignDailyUSes(70K)
If I get data of all user by
Select User.column1,User.column2,Campaign.column1 ,Campaign.column2 ,
DailyUSes.* from User Join Campaign join CamapaignDailyUSes
it will give result in few secounds may be.
But in Couchbase N1ql it will take more then 1 minute
what should Do fot it ?? even create some proper index.
How can i structure my Couchbase data ??
can you post (or mail to prasad.varakur#couchbase.com) the sample docs. Did you explore restructuring/embedding to avoid some JOINS. What is the exact N1QL query? Couches4.5 onwards has two kinds of joins for better performance (leveraging indexes better), and allowing more flexibility in JOINs.
See https://developer.couchbase.com/documentation/server/4.5/n1ql/n1ql-language-reference/from.html#story-h2-3 for more info on lookup & index joins.
And, what are the sizes you specify.. size of document or number of documents??
If 70K is size of document, and you are fetching all of it, then what is the expected result size (based on selectivities).
If you have too big results, then you may want to use parameters (in 4.5.1) such as pretty=false, to minimize the n/w overhead.
-Prasad

Whether or not SQL query (SELECT) continues or stops reading data from table when find the value

Greeting,
My question; Whether or no sql query (SELECT) continues or stops reading data (records) from table when find the value that I was looking for?
referance: "In order to return data for this query, mysql must start at the beginning of the disk data file, read in enough of the record to know where the category field data starts (because long_text is variable length), read this value, see if it satisfies the where condition (and so decide whether to add to the return record set), then figure out where the next record set is, then repeat."
link for referance: http://www.verynoisy.com/sql-indexing-dummies/#how_the_database_finds_records_normally
In general you don't know and you don't care, but you have to adapt when queries take too long to execute. When you do something like
select a,b,c from mytable where a=3 and b=5
then the database engine has a couple of options to optimize. When all these options fail, then it will do a "full table scan" - which means, it will have to examine the entire table to see which rows are eligible. When you have indices on e.g. column a then the database engine can optimize the search because it can pre-select rows where a has value 3. So, in general, make sure that you have indices for the columns that are most searched. (Perversely, some database engines get confused when you have too many indices and will fall back to a full table scan because they've lost their way...)
As to whether or not the scanning stops: In general, the database engine has to examine all data in the table (hopefully aided by indices) and won't stop after having found just one hit. If you want just the first hit, use a limit 1 clause to make sure that your result set has only one outcome. But then again, if you have a sort by clause, the database engine cannot stop after the first hit, there might be next ones that should get priority given the sorting.
Summarizing, how the db engine does its scan depends on how smart it is, what indices are available etc.. If your select queries take too long then consider re-organizing your indices, writing your select statements differently, or rebuilding the table.
The RDBMS reading data from disk is something you cannot know, you should not care and you must not rely on.
The issue is too broad to get a precise answer. The engine reads data from storage in blocks, a block can contain records that are not needed by the query at hand. If all the columns needed by the query is available in an index, the RDBMS won't even read the data file, it will only use the index. The data it needs could already be cached in memory (because it was read during the execution of a previous query). The underlying OS and the storage media also keep their own caches.
On a busy system, all these factors could lead to very different storage access patterns while running the same query several times on a couple of minutes apart.
Yes it scans the entire file. Unless you put something like
select * from user where id=100 limit 1
This of course will still search entire rows if id 100 is the last record.
If id is a primary key it will automatically be indexed and searching would be optimized
I'm sorry... I thought the table.
I will change question and I will explain it in the following image;
I understand that in CASE 1 all columns must be read with each iteration.
My question is: If it's the same in the CASE 2 or columns that are not selected in the query are excluded from reading in each iteration.
Also, are the both queries are the some in performance perspective?
Clarify:
CASE: 1 In first CASE select print all data
CASE: 2 In second CASE select print columns first_name and last_name
Whether in CASE 2 mysql server (SQL query) reads only columns first_name, last_name or read the entire table to get that data(rows)=(first_name, last_name)?
An interest of me how the server reads table row in CASE 1 and CASE 2?

How to check if a node is already indexed in the neo4j-spatial index?

I'm running the latest neo4j v2, with the spatial plugin installed. I have managed to index almost all of the nodes I need indexed in the geo index. One of the problems that I'm struggling with is how can I easily check if a node is already been indexed ?
I can't find any REST endpoint to get this information and not easy to get to this with cypher. But I tried this query as it seems to give me the result I want except that the runtime is unacceptable.
MATCH (a)-[:RTREE_REFERENCE]->(b) where b.id=989898 return b;
As the geo index only store a reference to the node that has been indexed in a property value of id in a node referenced by the relationship RTREE_REFERENCE I figured this could be the way to go.
This query takes now: 14459 ms run from the neo4j-shell.
My database is not big, about 41000 nodes, that I want to add to the spatial index in total.
There must be a better way to do this. Any idea and or pointer would be greatly appreciated.
Since you know the ID of your data node, you can access it directly in Cypher without an index, and just check for the incoming RTREE_REFERENCE relationship:
START n=node(989898) MATCH (p)-[r:RTREE_REFERENCE]->(n) RETURN r;
As a side node, your Cypher had the syntax 'WHERE n.id=989898' but if this is an internal node ID, then that will not work, since n.id will look for a property with key 'id'. For the internal node id, use 'id(n)'.
If your 'id' is actually a node property (and not it's internal ID), then I think #deemeetree suggestion is better, using an index over this property.
Right now your requests seems to be scouring through all the nodes in the network which are related with :RTREE_REFERENCE and checking id property for each of them.
Why don't you try to instead start your search from the node id you need and then get all the paths like that?
I also don't quite understand why you need to return the node that you're defining, but anyway.
As you're running Neo4J I recommend you to add labels to your nodes (all of them in the example below):
START n=node(*) SET n:YOUR_LABEL_NAME
then create an index on the labeled node by id property.
CREATE INDEX ON :YOUR_LABEL_NAME(id)
Once you've done that, run a query like this:
MATCH (b:YOUR_LABEL_NAME{id:"989898"}), a-[:RTREE_REFERENCE]->b RETURN a,b;
That should increase the speed of your query.
Let me know if that works and please explain why you were querying b in your original question if you already knew it...

Filtered index condition is ignored by optimizer

Assume I'm running a website that shows funny cat pictures. I have a table called CatPictures with the columns Filename, Awesomeness, and DeletionDate, and the following index:
create nonclustered index CatsByAwesomeness
on CatPictures (Awesomeness)
include (Filename)
where DeletionDate is null
My main query is this:
select Filename from CatPictures where DeletionDate is null and Awesomeness > 10
I, as a human being, know that the above index is all that SQL Server needs, because the index filter condition already ensures the DeletionDate is null part.
SQL Server however doesn't know this; the execution plan for my query will not use my index:
Even if adding an index hint, it will still explicitly check DeletionDate by looking at the actual table data:
(and in addition complain about a missing index that would include DeletionDate).
Of course I could
include (Filename, DeletionDate)
instead, and it will work:
But it seems a waste to include that column, since this just uses up space without adding any new information.
Is there a way to make SQL Server aware that the filter condition is already doing the job of checking DeletionDate?
No, not currently.
See this connect item. It is Closed as Won't Fix. (Or this one for the IS NULL case specifically)
The connect item does provide a workaround shown below.
Posted by RichardB CFCU on 29/09/2011 at 9:15 AM
A workaround is to INCLUDE the column that is being filtered on.
Example:
CREATE NONCLUSTERED INDEX [idx_FilteredKey1] ON [dbo].[TABLE]
(
[TABLE_ID] ASC,
[TABLE_ID2] ASC
)
INCLUDE ( [REMOVAL_TIMESTAMP]) --explicitly include the column here
WHERE ([REMOVAL_TIMESTAMP] IS NULL)
Is there a way to make SQL Server aware that the filter condition is
already doing the job of checking DeletionDate?
No.
Filtered indexes were designed to solve certain problems, not ALL. Things evolve and some day, you may see SQL Server supporting the feature you expect of filtered indexes, but it is also possible that you may never see it.
There are several good reasons I can see for how it works.
What it improves on:
Storage. The index contains only keys matching the filtering condition
Performance. A shoo-in from the above. Less to write and fewer pages = faster retrieval
What it does not do:
Change the query engine radically
Putting them together, considering that SQL Server is a heavily pipelined, multi-processor parallelism capable beast, we get the following behaviour when dealing with servicing a query:
Pre-condition to the query optimizer selecting indexes: check whether a Filtered Index is applicable against the WHERE clause.
Query optimizer continues it's normal work of determining selectivity from statistics, weighing up index->bookmark lookup vs clustered/heap scan depending on whether the index is covering etc
Threading the condition against the filtered index into the query optimizer "core" I suspect is going to be a much bigger job than leaving it at step 1.
Personally, I respect the SQL Server dev team and if it were easy enough, they might pull it into a not-too-distant sprint and get it done. However, what's there currently has achieved what it was intended to and makes me quite happy.
Just found that "gap in functionality", it's really sad that filtered indexes are ignored by optimizer.
I think I'll try to use indexed views for that, take a look at this article
http://www.sqlperformance.com/2013/04/t-sql-queries/optimizer-limitations-with-filtered-indexes

Is there a way to get rows_examined in MySQL without the slow log?

I'm building some profile information for a home grown app. I'd like the debug page to show the query sent along with how many rows were examined without assuming that slow_log is turned on, let alone parsing it.
Back in 2006, what I wanted was not possible. Is that still true today?
I see Peter Zaitsev has a technique where you:
Run FLUSH STATUS;
Run the query.
Run SHOW STATUS LIKE "Handler%";
and then in the output:
Handler_read_next=42250 means 42250 rows were analyzed during this scan
which sounds like if MySQL is only examining indexes, it should give you the number. But are there a set of status vars you can poll, add up and find out how many rows examined? Any other ideas?
It's slightly better than it was in 2006. You can issue SHOW SESSION STATUS before and after and then look at each of the Handler_read_* counts in order to be able to tell the number of rows examined.
There's really no other way.. While the server protocol has a flag to say if a table scan occurred, it doesn't expose rows_examined. Even tools like MySQL's Query Analyzer have to work by running SHOW SESSION STATUS before/after (although I think it only runs SHOW SESSION STATUS after, since it remembers the previous values).
I know it's not related to your original question, but there are other expensive components to queries besides rows_examined. If you choose to do this via the slow log, you should check out this patch:
http://www.percona.com/docs/wiki/patches:microslow_innodb#changes_to_the_log_format
I can recommend looking for "Disk_tmp_table: Yes" and "Disk_filesort: Yes".
Starting in 5.6.3, the MySQL performance_schema database also exposes statements statistics, in tables such as performance_schema.events_statements_current.
The statistics collected by statements include the 'ROWS_EXAMINED' column.
See
http://dev.mysql.com/doc/refman/5.6/en/events-statements-current-table.html
From there, statistics are aggregated to provide summaries.
See
http://dev.mysql.com/doc/refman/5.6/en/statement-summary-tables.html
From documentation:
Handler_read_rnd
The number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan entire tables or you have joins that don't use keys properly.
Handler_read_rnd_next
The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.
read_rnd* means reading actual table rows with a fullscan.
Note that it will show nothing if there is a index scan combined with a row lookup, it still counts as key read.
For the schema like this:
CREATE TABLE mytable (id INT NOT NULL PRIMARY KEY, data VARCHAR(50) NOT NULL)
INSERT
INTO mytable
VALUES …
SELECT id
FROM mytable
WHERE id BETWEEN 100 AND 200
SELECT *
FROM mytable
WHERE id BETWEEN 100 AND 200
, the latter two queries will both return 1 in read_key, 101 in read_next and 0 in both read_rnd and read_rnd_next, despite the fact that actual row lookups occur in the second query.
Prepend the query with EXPLAIN. In MySQL that will show the query's execution path, which tables were examined as well as the number of rows examined for each table.
Here's the documentation.