DUPLICATE: Couchbase 4 beta “ORDER BY” performance
Like Question title shows, I am facing huge response delay like 13s for one call using Couchbase 4 (N1QL) ORDER BY clause. If I don't use ORDER BY clause every thing is fine.
My Primary Index is
Definition: CREATE PRIMARY INDEX `#primary` ON `default` USING GSI
and secondary index is
Definition: CREATE INDEX `index_location_name` ON `default`(`name`) USING GSI
N1QL Query
req.params.filter can be any key in the location document.
SELECT _id AS id FROM default WHERE type = 'location' ORDER BY " +
req.params.filter + (req.query.descending?' DESC':'') + " LIMIT " +
limit + " OFFSET " + skip
Location Document in my Bucket is
{
"_id": "location::370794",
"name": "Kenai Riverside Fishing",
"avgRating": 0,
"city": "Cooper Landing",
"state": "Alaska",
"country": "USA",
"zipCode": "99572",
"created": "2013-07-10T17:30:00.000Z",
"lastModified": "2015-02-13T12:34:36.923Z",
"type": "location",
}
Any one can tell why ORDER BY clause is making so much delay?
I believe couchbase is not built to handle queries that can be ordered by any field. Since, ordering is an expensive operation in CB, it's always recommended to create an index based on the sorting fields. Also, if the index is built in ascending order, then it can't be used for descending ordering and vice versa. Your best option with CB is to create all the possible indices with asc & desc order if feasible.
I'd also recommend that you consider if elasticsearch would be a better fit for your dynamic search use cases.
Related
I'm trying to use an incrementing ingest to produce a message to a topic on update of a table in mysql. It works using timestamp but doesn't seem to be working using incrementing column mode. When I insert a new row into the table, I do not see any message published to the topic.
{
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL. This will vary by RDBMS. Consult your manufacturer's handbook for more information",
"connection.url": "jdbc:mysql://localhost:3306/lte?user=root&password=tiger",
"_comment": "Which table(s) to include",
"table.whitelist": "candidate_score",
"_comment": "Pull all rows based on an timestamp column. You can also do bulk or incrementing column-based extracts. For more information, see http://docs.confluent.io/current/connect/connect-jdbc/docs/source_config_options.html#mode",
"mode": "incrementing",
"_comment": "Which column has the timestamp value to use? ",
"incrementing.column.name": "attempt_id",
"_comment": "If the column is not defined as NOT NULL, tell the connector to ignore this ",
"validate.non.null": "true",
"_comment": "The Kafka topic will be made up of this prefix, plus the table name ",
"topic.prefix": "mysql-"
}
attempt_id is an auto incrementing non null column which is also the primary key.
Actually, its my fault. I was listening to the wrong topic.
I have documents with schema in a bucket:
{
"status": "done",
"id": 1
}
I want to select all documents that have status as done.
Assuming you're using Couchbase Server 4.x or greater, you can use a N1QL query to do this. For instance:
SELECT d.*
FROM mydocuments d
WHERE d.status == 'done'
You also need to create an index on status (at least--creating indexes is more complex than a StackOverflow answer can provide) like this:
CREATE INDEX ix_status ON mydocuments (status);
For more information, check out the N1QL documentation and the interactive N1QL tutorial.
I have a table that contains a JSON array column (nvarchar(max)), has millions of rows expected to be billions of rows in the future.
The table structure is like this:
[SnapshotId] - PK,
[BuildingId],
......................
[MeterData],
MeterData contains Json array like this:
[{
"MeterReadingId": 0,
"BuildingMeterId": 1,
"Value": 1.0,
}, {
"MeterReadingId": 0,
"BuildingMeterId": 2,
"Value": 1.625,
}]
I need to filter by "HourlySnapshot" table where "BuildingMeterId = 255" is for example, wrote the below query
SELECT *
FROM [HourlySnapshot] h
CROSS APPLY OPENJSON(h.MeterData)
WITH (BuildingMeterId int '$.BuildingMeterId') AS MeterDataJson
WHERE MeterDataJson.BuildingMeterId = 255
Works fine, but performance is bad due to parse of JSON. I read you can overcome the performance issue by creating indexes. I created a clustered index like below
CREATE CLUSTERED INDEX CL_MeterDataModel
ON [HourlySnapshot] (MeterDataModel)
But can't see any improvements in terms of speed. Have I done it wrong ? what is the best way to improve the speed.
Thanks
NeroIsNoHero
The combination of a computed column and an index may help.
ALTER TABLE [HourlySnapshot]
ADD [BuildingMeterId] AS JSON_VALUE([MeterData], '$.BuildingMeterId');
CREATE NONCLUSTERED INDEX IX_ParsedBuildingMeterId ON [HourlySnapshot] (BuildingMeterId)
This actually causes SQL Server to parse and index the value at insert/update time. When reading, it can use the index and not do a full table scan.
While creating index I get this error:
[
{
"code": 3000,
"msg": "syntax error - at -",
"query_from_user": "create primary index on sample-partner"
}
]
If I change the bucket name to sample_partner, then it works. Using Couchbase 4.5 Enterprise edition.
Yeah that's because N1QL will interpret the - as a minus sign... You simply need to escape the bucket name using backquotes:
CREATE PRIMARY INDEX ON `sample-partner`;
It should work that way. Remember to always escape that bucket name in all N1QL queries and you should be fine. Or use the underscore in the bucket name, as an alternative :)
I'm currently testing some databases to my application. The main functionality is data aggregation (similar to this guy here: Data aggregation mongodb vs mysql).
I'm facing the same problem. I've created a sample test data. No joins on the mysql side, it's a single innodb table. It's a 1,6 milion rows data set and I'm doing a sum and a count on the full table, without any filter, so I can compare the performance of the aggregation engine of each one. All data fits in memory in both cases. In both cases, there is no write load.
With MySQL (5.5.34-0ubuntu0.12.04.1) I'm getting results always around 2.03 and 2.10 seconds.
With MongoDB (2.4.8, linux 64bits) I'm getting results always between 4.1 and 4.3 seconds.
If I do some filtering on indexed fields, MySQL result time drops to around 1.18 and 1.20 (the number of rows processed drops to exactly half the dataset).
If I do the same filtering on indexed fields on MongoDB, the result time drops only to around 3.7 seconds (again processing half the dataset, which I confirmed with an explain on the match criteria).
My conclusion is that:
1) My documents are extremely bad designed (truly can be), or
2) The MongoDB aggregation framework realy does not fit my needs.
The questions are: what can I do (in terms of especific mongoDB configurations, document modeling, etc) to make Mongo's results faster? Is this a case where MongoDB is not suited to?
My table and documento schemas:
| events_normal |
CREATE TABLE `events_normal` (
`origem` varchar(35) DEFAULT NULL,
`destino` varchar(35) DEFAULT NULL,
`qtd` int(11) DEFAULT NULL,
KEY `idx_orides` (`origem`,`destino`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
{
"_id" : ObjectId("52adc3b444ae460f2b84c272"),
"data" : {
"origem" : "GRU",
"destino" : "CGH",
"qtdResultados" : 10
}
}
The indexed and filtered fields when mentioned are "origem" and "destino".
select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal group by origem, destino;
select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal where origem="GRU" group by origem, destino;
db.events.aggregate( {$group: { _id: {origem: "$data.origem", destino: "$data.destino"}, total: {$sum: "$data.qtdResultados" }, qtd: {$sum: 1} } } )
db.events.aggregate( {$match: {"data.origem":"GRU" } } , {$group: { _id: {origem: "$data.origem", destino: "$data.destino"}, total: {$sum: "$data.qtdResultados" }, qtd: {$sum: 1} } } )
Thanks!
Aggregation is not really what MongoDB was originally designed for, so it's not really its fastest feature.
When you really want to use MongoDB, you could use sharding so that each shard can process its share of the aggregation (make sure to select the shard-key in a way that each group is on only one cluster, or you will achieve the opposite). This, however, wouldn't be a fair comparison to MySQL anymore because the MongoDB cluster would use a lot more hardware.