How do I feed couchbase view output into another view? - couchbase

Say that I have a set of documents of varying types that all contain a single date field. I am looking to get max(date) for documents of a given type. However, date only exists in each document once. Meaning that a single view will only be able to output max(date) for each document.
[{1,date},{1,date},{1,date},{2,date},{2,date}]
So theoretically if I was able to run an additional view on this output, I could easily get the output I need.
[{1,date},{2,date}]
I could accomplish this by querying the first view and creating a new document, which would then be used by the second view. Since now each document has multiple dates, the aggregate works.
However, this is not dynamic at all and adds a lot of overhead - basically defeating the purpose of switching from simple n1ql in the first place.
Is there any way to chain / nest / increment views in couchbase?

Jacob, you can do paging with N1QL:
SELECT max(date)
FROM testbucket
WHERE type = 'mytype'
GROUP BY grp
LIMIT 10
OFFSET 100
You could also define a view as follows:
map:
emit(doc.id, doc.date)
reduce:
_stats
The built-in _stats reduce function will emit a number of useful values including the maximum which may be what you're looking for.
You can also define a custom reduce function. More details appear here:
https://developer.couchbase.com/documentation/server/current/views/views-writing.html
In general, N1QL approach should be preferred except for some niche cases that require Views map/reduce capability. From the problem description, it should be possible to accomplish what you need via N1QL.

Related

Rails - json column vs seperate table

I'm currently working on a Ruby on Rails project in which I have objects with association to instructions, meaning, each object, can have zero or more instruction objects that hold some basic data, like title, data (string), and position (for ordering them in the UI). I tried looking up an answer in google but found no relevant answer. the instructions are specific to each object and shouldn't be used for lookup or search of any kind, and therefore I figured I should store them as JSON within the object's own table instead of making a join table. The reason I think of doing so is that join table would explode when there would be many objects and because of that querying for each object's instructions would get longer over time. Is that a reasonable concern for storing this data as a JSON instead of has_many association?
Think of using JSON in an RDBMS as a form of denormalization. There are legitimate reasons to use denormalization, but you must keep in mind that it always optimizes for one type of query at the expense of other types of queries.
For example, in this case you could query your object and it would include the JSON document containing all instructions. But if you wanted to search for a specific instruction, it would be quite complex to search for the row that has a JSON documenting containing a specific instruction. Have you thought about how you would query that?
Using normalized database design, i.e. the join table you mention, allows for more flexibility in queries. You can query the object table, or you can query the instruction table. Either way, then simply join to the other table to the the corresponding rows.
The way to make this more optimized is to use indexes on the columns you want to search. See my presentation How to Design Indexes, Really or the video.
Using JSON creates a lot of complexity that you probably haven't considered. See my presentation How to Use JSON in MySQL Wrong.

storing rows order in mysql

I need to give the ability to change order of displaying rows to my script admin page.
for that there is a default order for newly added rows (the go to the end of list) and admin should be able to change the position of an specific row.
I'm going to act the rows like a doubly linked list to be able to re-position rows.
Is it OK to use linked list method for saving the display position of mysql rows?
Is there a better method?
Should I use a separate table to store orders or it is OK to add two next & prev columns to original table?
Is it possibe then to use mysql order statement with this method?
Edit: I also thought of using spaced order codes (e.g. 0, 100, 200, ...) but this has a limit that may be reached
I think you'll be better off just storing the ordering position in a dedicated field, instead of trying to implement a linked list.
The issue with the linked list is that is requires some sort of list traversal to "reconstruct" the order before you can display it to the user. Normally, you'd employ a recursive query to do that, but unfortunately MySQL doesn't support recursive queries, so you'll either need to fiddle with stored procedures, or end-up making a database round-trip for each and every list node.
All in all, just updating the order field of several rows from time to time (when you need to reorder) is probably cheaper than traversing the list every time (when you need to display it), especially if you mostly move rows by small distancees. And if you introduce gaps (as you already mentioned), the number of rows that you'll actually need to update will fall dramatically, at the price of increased complexity.
You may also be able to piggy-back the order field onto the clustering mechanism offered by InnoDB.
YMMV, of course, but I'd advise benchmarking the simple order field approach on representative amounts of data before attempting to implement anything more sophisticated...

SSAS calculated measure: Access relational database

I recently asked a question about many-to-many relationships and how they can be used to calculate intersections that got answered pretty fine. Now, there is another nice-to-have requirement for our cube to extend that to more data. The general question remains: How many orders contain both product x and y?
However, the measure groups are now much larger, currently about 1.4 billion rows. I tried to implement that using the method described in the other post, with several hidden cross-referenced measure groups. However, this is simply too much for our hardware, the cube is reaching sizes next to 0.5 TB, and querys take several minutes to complete.
Now I would try to use another option: Can I access our relational database in a calculated measure? It seems I can, using UDFs like described in this article. I could write a Function in c# that queries our relational database and returns all the orders that contain the products chosen by the user. But in order to do that, I need to supply all the dimensional data the user has selected to the UDF. I also need the UDF to return the calculated value so it can be output as the result of the calculated member. Is that possible? If yes, how? The example microsoft provides only includes a small deterministic string-function as the UDF.
Here my own results:
It seems to be possible, though with limitations. The class Microsoft.AnalysisServices.AdomdServer.Context can provide you with the currentMember of each Hierarchy, however this does not work with Excel-Style-Subselects. It either contains a single member or the AllMember.
Another option is to get the MDX query using the dmv SELECT * FROM $System.DISCOVER_SESSIONS. There will be a column on that view which contains the last mdx query for a given session. However in order to not overwrite your own last query, you will need to not use the current connection, but to open a new one. The session id can be obtained through Microsoft.AnalysisServices.AdomdServer.Context.CurrentConnection.SessionID.
The second approach is ok for our use-case. It does not allow you to handle axes, since the udf-function has a cell-scope, but you don't know which cell you are in. If anyone of you knows anything about that last bit, please tell me. Thanks!

Filtering Data Without sql query

I am developing a pretty big enterprise level data analysis software based on flex-4. I usually need to filter datagrids based on users selection, that requires to run a query on my database. I am wondering if there is any way to filter grid data without sql query? That would take very little time where it's causing me 2-3 minutes delay now.
If you are using ArrayCollection (or other implementation of ICollectionView), take a look at ICollectionView.filterFunction property. You can set it to what you need after user interaction and call ICollectionView.refresh() - all associated grids should automatically show filtered data then.
There are many ways to do this in ActionScript. However, since you use Flex, let's rely on the framework. The feature you are looking for the filterFunction (see the docs):
Given a data object such as {name:"Jo", type:"employee"}, you can filter employees with:
myArrayCollection.filterFunction = function(data:Object):Boolean {
return data.type == "employee";
}
myArrayCollection.refresh();
Your data grid should then be updated accordingly.
Of course, depending on the number of items being present in your list, this might run in a blink of an eye or be horribly slow =)

Which DB to choose for finding best matching records?

I'm storing an object in a database described by a lot of integer attributes. The real object is a little bit more complex, but for now let's assume that I'm storing cars in my database. Each car has a lot of integer attributes to describe the car (ie. maximum speed, wheelbase, maximum power etc.) and these are searchable by the user. The user defines a preferred range for each of the objects and since there are a lot of attributes there most likely won't be any car matching all the attribute ranges. Therefore the query has to return a number of cars sorted by the best match.
At the moment I implemented this in MySQL using the following query:
SELECT *, SQRT( POW((a < min_a)*(min_a - a) + (a > max_a)*(a - max_a), 2) +
POW((b < min_b)*(min_b - b) + (b > max_b)*(b - max_b), 2) +
... ) AS match
WHERE a < (min_a - max_allowable_deviation) AND a > (max_a + max_allowable_deviation) AND ...
ORDER BY match ASC
where a and b are attributes of the object and min_a, max_a, min_b and max_b are user defined values. Basically the match is the square root of the sum of the squared differences between the desired range and the real value of the attribute. A value of 0 meaning a perfect match.
The table contains a couple of million records and the WHERE clausule is only introduced to limit the number of records the calculation is performed on. An index is placed on all of the queryable records and the query takes like 500ms. I'd like to improve this number and I'm looking into ways to improve this query.
Furthermore I am wondering whether there would be a different database better suited to perform this job. Moreover I'd very much like to change to a NoSQL database, because of its more flexible data scheme options. I've been looking into MongoDB, but couldn't find a way to solve this problem efficiently (fast).
Is there any database better suited for this job than MySQL?
Take a look at R-trees. (The pages on specific variants go in to a lot more detail and present pseudo code). These data structures allow you to query by a bounding rectangle, which is what your problem of searching by ranges on each attribute is.
Consider your cars as points in n-dimensional space, where n is the number of attributes that describe your car. Then given a n ranges, each describing an attribute, the problem is the find all the points contained in that n-dimensional hyperrectangle. R-trees support this query efficiently. MySQL implements R-trees for their spatial data types, but MySQL only supports two-dimensional space, which is insufficient for you. I'm not aware of any common databases that support n-dimensional R-trees off the shelf, but you can take some database with good support for user-defined tree data structures and implement R-trees yourself on top of that. For example, you can define a structure for an R-tree node in MongoDB, with child pointers. You will then implement the R-tree algorithms in your own code while letting MongoDB take care of storing the data.
Also, there's this C++ header file implementing of an R-tree, but currently it's only an in-memory structure. Though if your data set is only a few million rows, it seems feasible to just load this memory structure upon startup and update it whenever new cars are added (which I assume is infrequent).
Text search engines, such as Lucene, meet your requirements very well. They allow you to "boost" hits depending on how they were matched, eg you can define engine size to be considered a "better match" than wheel base. Using lucene is really easy and above all, it's SUPER FAST. Way faster than mysql.
Mysql offer a plugin to provide text-based searching, but I prefer to use it separately, that way it's easily scalable (being read-only, you can have multiple lucene engines), and easily manageable.
Also check out Solr, which sits on top of lucene and allows you to store, retrieve and search for simple java object (Lists, arrays etc).
Likely, your indexes aren't helping much, and I can't think of another database technology that's going to be significantly better. A few things to try with MySQL....
I'd try putting a copy of the data in a memory table. At least the table scans will be in memory....
http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html
If that doesn't work for you or help much, you could also try a User Defined Function to optimize the calculation of the matching. Basically, this means executing the range testing in a C library you provide:
http://dev.mysql.com/doc/refman/5.0/en/adding-functions.html