Couchbase Multi-Key Get with Composite Keys - couchbase

I've got a problem with querying a Map/Reduce view in Couchbase for specific keys.
The view maps some documents in Couchbase, emitting a composite key and a value and calls the built-in _stats reduce function. I'm grouping on the 2nd part of the key (group=true&group_level=2) and the results are exactly what I want.
The issue I've got is I need to find the "reduce" result for specific document IDs which aren't necessarily sequential, so I can't use startkey and endkey.
For example, looking up the results for document IDs 2, 5, 8, 18, using &startkey=[2, null]&endkey=[18,"\u0fff"] could potentially return results for documents with IDs 3, 4, 6, 7, 9-17.
I'm looking at using the keys=[] parameter to specify the document IDs to look for, but can't work out how to do this when using a composite key.
Is this possible, and if so, how do I do it?

Turns out I was misunderstanding how this should work, after some reading I've made split different bits of my query out into separate views and it now works as I expect it to.
The reduce view now has a single key, rather than a composite key, which means I can query it via the keys parameter.

Related

IndexedDB - IDBKeyRange on simple index with arbitrary key list

I have an object store in an IDB that has a simple (non-compound) index on a field X. This index is not unique (many items may have the same value for X).
I'd like to query the IDB to return all items that have an X value of either "foo", "bar", or "bat".
According to the documentation, index getAll takes either a key (in my case a string) or an IDBKeyRange. However, it's not obvious to me how to construct an IDBKeyRange with an arbitrary set of keys, and get the union of all results based on those keys.
You cannot do this in a single request. indexedDB does not currently support "OR" style queries.
An alternative solution is to do one request per value. Basically, for each value, use getAll on the index for the value, then concatenate all of the arrays into a single array (possibly merging duplicates). You don't actually have that many round trips against the DB since you are using getAll. In setting up this index, you basically want a store of let's say "things", where each "thing" has a property such as "tags", where tags is an array of the values (each being a string). The index you create on the "tags" property should be flagged as a multi-entry index.
There are, of course, creative hacky solutions. Here is one. Keep in mind it is completely useless if you have things that have different tag sets but you still want to match the ones that share, this would only work if you do not care about whether any one thing has extra tags. For example, consider each distinct set of values, ignoring order. Let's call them groups. E.g. foo is 1, bar is 2, bat is 3, foo-bar is 4, foo-bat is 5, bar-bat is 6, etc. You can give each group a key, like the numerical counter value I just used in the example. Then you can store the group key as a property in the object. Each time you go to store the object, calculate its group key. You can precalculate all group keys, or develop a hash-style function that generates a particular key given a set of arbitrary string values. Sure, you pay a tiny bit more upfront at time of storage, and when building the request query, but you save a ton of processing because indexedDB does all the processing after that. So you want a simple fast hash. And sure, this is added complexity. But maybe it will work. Just find a simple JS hash. Modify it so that you lexicographically store the value set prior to use (so that difference in value order does not cause difference in hash value). So, to explain more concretely, for the things object store, each thing object has a property called "tags-hash". You create a basic index on this (not unique, not multi-entry). Each time you put a thing in the store, you calculate the value of tags-hash, and set the property's value, before calling put. Then, each time you want to query, you calculate the hash of the array of tags by which you wish to query, then call getAll(calculated-hash-value), and it will give you all things that have those exact tags.

in indexedDB, is it possible to make an efficient equivalent of "where Column in (value1, value2.... value2)" call?

I want to implement this search using indexedDB:
where CustomerName in ('bob', 'fred'... 'nancy')
I can see two possibilities:
1) simply openCursor on object store, then loop through entire table, checking manually if a record is in ('bob', 'fred'... 'nancy')
2) using index, issue multiple calls to index openCursor('bob'), openCursor('fred')...
both openCursor take IDBKeyRange which does not seem to allow searching for multiple, non continues values
Is there another, more efficient way?
The fastest way would probably be to sort the keys you're searching for, open a cursor at the first one, IDBCursor.continue until you have all the values for that key and then IDBCursor.advance to the next key you're searching for. Repeat until you've done all the keys. That way you get all the values with only one cursor, but you can quickly skip over values you don't care about.
Either of your suggestions work.
A performance improvement to #1 would be to first sort the list of keys (e.g. using indexedDB.cmp() to implement a comparison function), and open the cursor on the first key (e.g. 'bob'). Then, as you iterate, once you see key that's after 'bob' but before 'fred' you continue('fred') to skip the intervening records. This avoids iterating over the records in the table you don't care about.
The latest Chrome/Firefox/Safari also support getAll(), which would be a variant on your #2 to get all the records for a given key at once (e.g. via getAll('nancy')) rather than having to iterate a cursor.

MySQL --> MongoDB: Keep IDs or create mapping?

We are going to migrate our database from MySQL to MongoDB.
Some URLs pointing at our web application use database IDs (e.g. http://example.com/post/5)
At the moment I see two possibilities:
1) Keep existing MySQL IDs and use them as MongoDB IDs. IDs of new documents will get new MongoDB ObjectIDs.
2) Generate new MongoDB ObjectIDs for all documents and create a mapping with MySQLId --> MongoDBId for all external links with old IDs in it.
2 will mess up my PHP app a little, but I could imagine that #1 will cause problems with indexes or sharding?
What is the best practice here to avoid problems?
1) Keep existing MySQL IDs and use them as MongoDB IDs. IDs of new
documents will get new MongoDB ObjectIDs.
ObjectId very useful when you don't want/have a natural primary key for your documents, but mixing ObjecIDs and numerical IDs as primary keys can only cause you problems later on with queries. I would suggest a different route. Keep existing MySQL IDs and use them as MongoDB IDs; create new documents with numerical IDs, as you would do for MySQL. This way you don't have to mix data types in one field.
2) Generate new MongoDB ObjectIDs for all documents and create a
mapping with MySQLId --> MongoDBId for all external links with old IDs
in it.
This can work also, but you need, as you said, map your new and old IDs. This is probably some extra work which you can avoid if you leave your IDs unchanged.
I could imagine that #1 will cause problems with indexes or sharding?
ObjectIDs and MySQL AUTO_INCREMENT IDs are both monotonically increasing so there wouldn't be much difference if they are used as a shard keys (you will probably use hashed shard keys in that case; you can read more details here).
Edit
Which problems could occur when mixing ObjectIDs and numeric IDs?
If you're doing simple equality checks (i.e get doc. with {_id: 5} or {_id: ObjectId("53aeb2dcb9f8955d1a927b97")) you will have no problems. However, range queries will be more complicated:
As an example:
db.coll.find({_id : { $gt : 5}})
This query will return you only documents with num. IDs.
This query:
db.coll.find({_id : { $gt : ObjectId("53aeb2dcb9f8955d1a927b97")}});
will return only documents that have ObjectIds.
Obviously, you can use $or to find either, but my point that your queries won't be as straightforward as with non-mixed Ids.

Best approach for having unique row IDs in the whole database rather than just in one table?

I'm designing a database for a project of mine, and in the project I have many different kinds of objects.
Every object might have comments on it - which it pulls from the same comments table.
I noticed I might run into problems when two different kind of objects have the same id, and when pulling from the comments table they will pull each other comments.
I could just solve it by adding an object_type column, but it will be harder to maintain when querying, etc.
What is the best approach to have unique row IDs across my whole database?
I noticed Facebook number their objects with a really, really large number IDs, and probably determine the type of it by id mod trillion or some other really big number.
Though that might work, are there any more options to achieve the same thing, or relying on big enough number ranges should be fine?
Thanks!
You could use something like what Twitter uses for their unique IDs.
http://engineering.twitter.com/2010/06/announcing-snowflake.html
For every object you create, you will have to make some sort of API call to this service, though.
Why not tweaking your concept of object_type by integrating it in the id column? For example, an ID would be a concatenation of the object type, a separator and a unique ID within the column.
This approach might scale better, as a unique ID generator for the whole database might lead to a performance bottleneck.
If you only have one database instance, you can create a new table to allocate IDs:
CREATE TABLE id_gen (
id BIGINT PRIMARY KEY AUTO_INCREMENT NOT NULL
);
Now you can easily generate new unique IDs and use them to store your rows:
INSERT INTO id_gen () VALUES ();
INSERT INTO foo (id, x) VALUES (LAST_INSERT_ID(), 42);
Of course, the moment you have to shard this, you're in a bit of trouble. You could set aside a single database instance that manages this table, but then you have a single point of failure for all writes and a significant I/O bottleneck (that only grows worse if you ever have to deal with geographically disparate datacenters).
Instagram has a wonderful blog post on their ID generation scheme, which leverages PostgreSQL's awesomeness and some knowledge about their particular application to generate unique IDs across shards.
Another approach is to use UUIDs, which are extremely unlikely to exhibit collisions. You get global uniqueness for "free", with some tradeoffs:
slightly larger size: a BIGINT is 8 bytes, while a UUID is 16 bytes;
indexing pains: INSERT is slower for unsorted keys. (UUIDs are actually preferable to hashes, as they contain a timestamp-ordered segment.)
Yet another approach (which was mentioned previously) is to use a scalable ID generation service such as Snowflake. (Of course, this involves installing, integrating, and maintaining said service; the feasibility of doing that is highly project-specific.)
I am using tables as object classes, rows as objects and columns as object parameters. Everything starts with the class techname, in which every object has its unique identifier, which is unique in the database. The object classes are registered as objects in the table object classes, and the parameters for each object class are linked to it.

In MySQL is it possible to auto increment in pairs?

I have match_id which always comes in pairs. Instead of incrementing as follows:
1,2,3,4...
Is it possible to increment like this:
1,1,2,2,3,3,4,4
If not how can I guarantee that I will always have at most two identical match_id. I've tried to implement a Manual solution in code but occasionally I get three at a time if there is a collision.
This is not a primary key. However I want the match_id to come in pairs so that if I look up match 56, it will show the following:
match_id, user_id, score
56, 29, 434
56, 49, 516
Dividing by two is a possible solution although it makes the queries really messy. For example searching for match 56 becomes really convoluted
I'm pretty sure that you cannot autoincrement in pairs like that. However, you can easily treat a column that has 1,2,3,4,... as if it had 1,1,2,2,3,3,4,4,... by using FLOOR((match_id+1)/2) instead of match_id. You can either use that expression in your data retrieval statements or else denormalize your data base and define a column where you insert that value along with the autoincrement value.