How do I use paginate with 'cursors' in N1ql? - couchbase

I need to paginate a Couchbase N1ql Query.
I am aware of pagination with OFFSET, but it is more efficient to designate the start and end point.
I see documentation about startkey_docid, but none about how to us this in N1ql.
How do I paginate with cursors, or something similar, in N1ql?

Look into "keyset pagination", a general technique for improving pagination performance by leveraging the index instead of using offset. A commonly cited article on the topic is Markus Winand's "We need tool support for keyset pagination."
For a Couchbase-specific example that uses N1QL, see Keshav Murthy's article "Database Pagination: Using OFFSET and Keyset in N1QL."

N1QL does not support cursors. Behind the scenes, every query sent to the query engine is a separate HTTP request. There is simply no continuity between queries, and therefore no concept of cursors is supported.

Related

Proper way to implement near-match searching MySQL

I have a table on a MySQL database that has two (relevant) columns, 'id' and 'username'.
I have read that MySQL and relational databases in general are not optimal for searching for near matches on strings, so I wonder, what is the industry practice for implementing simple, but not exact match, search functionalities- for example when one searches for accounts by name on Facebook and non-exact matches are shown? I found Apache Lucene when researching this, but this seems to be used for indexing pages of a website, not necessarily arbitrary strings in a database table.
Is there an external tool for this use case? It seems like any SQL query for this task would require a full scan, even if it was simply looking for the inclusion of a substring.
In your situation I would recommend for you to use Elasticsearch instead of relational database. This search engine is a powerful tool for implementing search and analytics functionality.
Elasticsearch also flexible and versatile, with a rich query language using JSON as query language and support for many different types of data.
And of course supports near-match searching. As you said, MySQL and anothers relational databases aren't recommended to use near-match searching, they aren't for this purpose.
--------------UPDATE------------
If you want to use full-text-search using a relational database It's possile but you might have problem to scale if your numbers of users increase a lot. Keep in mind that ElasticSearch is robust and powerfull, so, you can do a lot of types of searches so easily in this search engine, but it can be more expensive too.
When I propose to you use ElasticSearch I'm thinking about the scaling the search. But I've thinking in your problem since I answered and I've understood that you only need a simple full-text-search. For conclude, in the begginning you can use only relational database to do that, but in the future you might move your search to ElasticSearch or if your search became complex.
Follow this guide to do full-text search in Postgresql. http://rachbelaid.com/postgres-full-text-search-is-good-enough/
There's another example in MySql: https://sjhannah.com/blog/2014/11/03/using-soundex-and-mysql-full-text-search-for-fuzzy-matching/
Like I said in the comments, It's a trade-off you must to do. You can prefer to use ElasticSearch in the beginning or you can choose another database and move to ElasticSearch in the future.
I also recommend this book to you: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. Actually I'm reading this book and it would help you to understand this topic.
--------------UPDATE------------
To implement near-match searching in ElasticSearch you can use fuzzy matching query. The fuzzy matching query allows you to controls how lenient the matching should be, for example for this query bellow:
{
"query": {
"fuzzy": {
"username": {
"value": "julienambrosio",
"fuzziness": 2
}
}
}
}
They'll return "julienambrosio", such as "julienambrosio1", "julienambrosio12" or "juliembrosio".
You can adjust the level of fuzziness to control how lenient/strict the matching should be.
Before you create this example you should to study more about ElasticSearch. There're a lot of courses in udemy, youtube and etc.
You can read more about in the official docs.

Rails ordering of collections by proc

I want to sort a collection using a custom proc. I know Rails has the order method, but I don't believe this works with procs, so I'm just using sort_by instead. Can someone go into detail about the speed I'm sacrificing, or suggest alternatives? My understanding is that the exact implementation of order will depend on the adapter (which, in my case, is mysql), but I'm wondering if there are ways to take advantage of this to speed the sort up.
As an example, I want to do this:
Model.order(|m| m.get_priority )
but am forced to do this
Model.all.sort_by{|m| m.get_priority}
sort_by is implemented at Ruby level and it's part of Ruby, not ActiveRecord. Therefore, the sorting will not be executed by the database, rather by the Ruby interpreter.
This is not an optimal solution as DBMS are generally more efficient at sorting data as they may use existing indexes.
If get_priority performs some sort of computation outside the database, then you don't have a lot of alternatives to the code you posted here unless you want to cache the result of the get_priority as a column in the Model table and sort against it using the ActiveRecord order statement that will result in an ORDER BY SQL statement.

Postgresql JSONB is coming. What to use now? Hstore? JSON? EAV?

After going through the relational DB/NoSQL research debate, I've come to the conclusion that I will be moving forward with PG as my data store. A big part of that decision was the announcement of JSONB coming to 9.4. My question is what should I do now, building an application from the ground up knowing that I want to migrate to (I mean use right now!) jsonb? The DaaS options for me are going to be running 9.3 for a while.
From what I can tell, and correct me if I'm wrong, hstore would run quite a bit faster since I'll be doing a lot of queries of many keys in the hstore column and if I were to use plain json I wouldn't be able to take advantage of indexing/GIN etc. However I could take advantage of nesting with json, but running any queries would be very slow and users would be frustrated.
So, do I build my app around the current version of hstore or json data type, "good ol" EAV or something else? Should I structure my DB and app code a certain way? Any advice would be greatly appreciated. I'm sure others may face the same question as we await the next official release of PostgreSQL.
A few extra details on the app I want to build:
-Very relational (with one exception below)
-Strong social network aspect (groups, friends, likes, timeline etc)
-Based around a single object with variable user assigned attributes, maybe 10 or 1000+ (this is where the schema-less design need comes into play)
Thanks in advance for any input!
It depends. If you expect to have a lot of users, a very high transaction volume, or an insane number of attribute fetches per query, I would say use HSTORE. If, however, you app will start small and grow over time, or have relatively few transactions that fetch attributes, or just fetch a few per query, then use JSON. Even in the latter case, if you're not fetching many attributes but checking one or two keys often in the WHERE clause of your queries, you can create a functional index to speed things up:
CREATE INDEX idx_foo_somekey ON foo((bar ->> 'somekey'));
Now, when you have WHERE bar ->> somekey, it should use the index.
And of course, it will be easier to use nested data and to upgrade to jsonb when it becomes available to you.
So I would lean toward JSON unless you know for sure you're going kick your server's ass with heavy use of key fetches before you have a chance to upgrade to 9.4. But to be sure of that, I would say, do some benchmarking with anticipated query volumes now and see what works best for you.
You probably don't give quite enough to give a very detailed answer, but I will say this... If your data is "very relational" then I believe your best course is to build it with a good relational design. If it's just one field with "variable assigned attributes", then that sounds like a good use for an hstore. Which is pretty tried and true at this point. I've been doing some reading on 9.4 and jsonb sounds cool, but, that won't be out for a while. I suspect that a good schema design in 9.3 + a very targeted use of hstore will probably yield a good combination of performance and flexibility.

Complex filtering in rails app. Not sure complex sql is the answer?

I have an application that allows users to filter applicants based on very large set of criteria. The criteria are each represented by boolean columns spanning multiple tables in the database. Instead of using active record models I thought it was best to use pure sql and put the bulk of the work in the database. In order to do this I have to construct a rather complex sql query based on the criteria that the users selected and then run it through AR on the db. Is there a better way to do this? I want to maximize performance while also having maintainable and non brittle code at the same time? Any help would be greatly appreciated.
As #hazzit said, it is difficult to answer without much details, but here's my two cents on this. Raw SQL is usually needed to perform complex operations like aggregates, calculations, etc. However, when it comes to search / filtering features, I often find using raw SQL overkill and not quite maintainable.
The key question here is : can you break down your problem in multiple independent filters ?
If the answer is yes, then you should leverage the power of ActiveRecord and Arel. I often find myself implementing something like this in my model :
scope :a_scope, ->{ where something: true }
scope :another_scope, ->( option ){ where an_option: option }
scope :using_arel, ->{ joins(:assoc).where Assoc.arel_table[:some_field].not_eq "foo" }
# cue a bunch of scopes
def self.search( options = {} )
output = relation
relation = relation.a_scope if options[:an_option]
relation = relation.another_scope( options[:another_option] ) unless options[:flag]
# add logic as you need it
end
The beauty of this solution is that you declare a clean interface in which you can directly pour all the params from your checkboxes and fields, and that returns a relation. Breaking the query into multiple, reusable scopes helps keeping the thing readable and maintainable ; using a search class method ties it all together and allows thorough documentation... And all in all, using Arel helps securing the app against injections.
As a side note, this does not prevent you from using raw SQL, as long as the query can be isolated inside a scope.
If this method is not suitable to your needs, there's another option : use a full-fledged search / filtering solution like Sunspot. This uses another store, separate from your db, that indexes defined parts of your data for easy and performant search.
It is hard to answer this question fully without knowing more details, but I'll try anyway.
While databases are bad at quite a few things, they are very good at filtering data, especially when it comes to a high volumes.
If you do the filtering in Ruby on Rails (or just about any other programming language), the system will have to retrieve all of the unfiltered data from the database, which will cause tons of disk I/O and network (or interprocess) traffic. It then has to go through all those unfiltered results in memory, which may be quite a burdon on RAM and CPU.
If you do the filtering in the database, there is a pretty good chance that most of the records will never be actually retrieved from disk, won't be handed over to RoR and won't then be filtered. The main reason for indexes to even exist is for the sole purpose of avoiding expensive operations in order to speed things up. (Yes, they also help maintain data integrity)
To make this work, however, you may need to help the database a bit to do its job efficiently. You will have to create indexes matching your filtering criteria, and you may have to look into performance issues with certain types of queries (how to avoid temporary tables and such). However, it is definately worth it.
Having that said, there actually are a few types of queries that a given database is not good at doing. Those are few and far between, but they do exist. In those cases, an implementation in RoR might be the better way to go. Even without knowing more about your scenario, I'd say it's a pretty safe bet that your queries are not among those.

Is HBase meaningful if it's not running in a distributed environment?

I'm building an index of data, which will entail storing lots of triplets in the form (document, term, weight). I will be storing up to a few million such rows. Currently I'm doing this in MySQL as a simple table. I'm storing the document and term identifiers as string values than foreign keys to other tables. I'm re-writing the software and looking for better ways of storing the data.
Looking at the way HBase works, this seems to fit the schema rather well. Instead of storing lots of triplets, I could map document to {term => weight}.
I'm doing this on a single node, so I don't care about distributed nodes etc. Should I just stick with MySQL because it works, or would it be wise to try HBase? I see that Lucene uses it for full-text indexing (which is analogous to what I'm doing). My question is really how would a single HBase node compare with a single MySQL node? I'm coming from Scala, so might a direct Java API have an edge over JDBC and MySQL parsing etc each query?
My primary concern is insertion speed, as that has been the bottleneck previously. After processing, I will probably end up putting the data back into MySQL for live-querying because I need to do some calculations which are better done within MySQL.
I will try prototyping both, but I'm sure the community can give me some valuable insight into this.
Use the right tool for the job.
There are a lot of anti-RDBMSs or BASE systems (Basically Available, Soft State, Eventually consistent), as opposed to ACID (Atomicity, Consistency, Isolation, Durability) to choose from here and here.
I've used traditional RDBMSs and though you can store CLOBs/BLOBs, they do
not have built-in indexes customized specifically for searching these objects.
You want to do most of the work (calculating the weighted frequency for
each tuple found) when inserting a document.
You might also want to do some work scoring the usefulness of
each (documentId,searchWord) pair after each search.
That way you can give better and better searches each time.
You also want to store a score or weight for each search and weighted
scores for similarity to other searches.
It's likely that some searches are more common than others and that
the users are not phrasing their search query correctly though they mean
to do a common search.
Inserting a document should also cause some change to the search weight
indexes.
The more I think about it, the more complex the solution becomes.
You have to start with a good design first. The more factors your
design anticipates, the better the outcome.
MapReduce seems like a great way of generating the tuples. If you can get a scala job into a jar file (not sure since I've not used scala before and am a jvm n00b), it'd be a simply matter to send it along and write a bit of a wrapper to run it on the map reduce cluster.
As for storing the tuples after you're done, you also might want to consider a document based database like mongodb if you're just storing tuples.
In general, it sounds like you're doing something more statistical with the texts... Have you considered simply using lucene or solr to do what you're doing instead of writing your own?