How to get ordered results from couchbase using bulk gets - couchbase

I am trying to improve performance of querying a couchbase view by using async gets.
I have read their documentation about the proper way to do so, it goes something like:
Cluster cluster = CouchbaseCluster.create();
Bucket bucket = cluster.openBucket();
List<JsonDocument> foundDocs = Observable
.just("key1", "key2", "key3", "key4", "key5")
.flatMap(new Func1<String, Observable<JsonDocument>>() {
#Override
public Observable<JsonDocument> call(String id) {
return bucket.async().get(id);
}
})
.toList()
.toBlocking()
.single();
Which works great and fast, but since I rely on the order of the results, it seems that i need to do some extra work to keep the results ordered.
In the example above, the JsonDocument list contains all 5 documents but the order changes randomly from call to call.
Is there any ellegant way to order the result using JavaRx capabilities or couchbase Java SDK capabilities?
The only solution i can think of is saving the results in to a HashMap and then transform the original list of ids using this HashMap into an ordered list of JsonDocuments.

Instead of flatMap, you can either use:
concatMap: will retain order, but actually wait for each inner GET to complete before firing the next one (could revert to sequential execution with less performance)
concatMapEager: will immediately subscribe inner Observables (so trigger inner GET). Maintains the order by buffering responses that arrive out of order until they can be replayed at the correct index in the sequence. Best of both worlds in terms of ordering and performance.

I would use Zip operator to concat all your observables, and then once they finish add documents results into the list
#Test
public void zipObservables() {
Observable<String> oKey1 = Observable.just("key1").doOnNext(getDocument());
Observable<String> oKey2 = Observable.just("key2").doOnNext(getDocument());
Observable<String> oKey3 = Observable.just("key3").doOnNext(getDocument());
Observable<String> oKey4 = Observable.just("key4").doOnNext(getDocument());
List<Observable<String>> observables = Arrays.asList(oKey1,oKey2,oKey3,oKey4);
List<Object> foundDocs = Observable.zip(observables, Arrays::asList)
.toBlocking()
.single();
}
private Action1<String> getDocument() {
return id -> bucket.async().get(id);
}
You can see more Zip examples here https://github.com/politrons/reactive/blob/master/src/test/java/rx/observables/combining/ObservableZip.java

Related

Limit questions

I'm make quiz app on flutter and have local json with
questions(around 200). How i can limit questions for 40?
because when i open app its show me all question
json={results:[
{question},
]}
final jsonResponse = convert.jsonDecode(json);
final result = (jsonResponse['results'] as List).map((question)
=> QuestionModel.fromJson(question));
questions.value =
result.map((question) =>
Question.fromQuestionModel(question)).toList();
return true;
}
}
Use subList function after using .toList().
this can be done easily by using .subList() which basically returns a list from the start index to the end index parameters from your original List ,like this
final result = (jsonResponse['results'] as List).map((question)
=> QuestionModel.fromJson(question));
questions.value =
result.map((question) =>
Question.fromQuestionModel(question)).toList().sublist(0,39);
Note
if you want to save all the 200 Questions and get every 40 questions then you should use pagination ,in this case you'll not use the subList function here, you'll use it after returning the result with the list that should be attached with the ui part.
Bonus Tip
check out this flutter plugin flutter page wise which makes the pagination alot easier, it can very helpful in a lot of situations.

Spring JpaRepository findBy...In(Collection) returns union not intersection

I have a query method in my JpaRepository
Page<Course> findDistinctCourseByAttrsInAllIgnoreCase(Set<String> a, Pageable page);
to find Course objects by their instance variable Set<String> attrs. Given a Set a with "foo" and "bar", I want to find Courses whose attrs contain BOTH "foo" and "bar", i.e. an intersection of Courses with "foo" and those with "bar". This method above returns a union.
Is there a way to do this with JpaRepository queries or do I have to make multiple calls and find the intersection myself?
In the unlikely case that you know the number of as up front you could combine multiple constraints with And:
...AttrsInAndAttrsIn...
But even if the precondition holds that would be very ugly.
So the next best option is probably a Specification and a factoryMethod constructing the Specification from a Set<String> or from varargs.
Your repository needs to extend JpaSpecificationExecutor.
You would call it like this
Page<Course> findAll(matchesAll(attrs), pageable)
And the factory method would look something like this:
Specification<Course> matchesAll(Set<String> attrs) {
return (Root<T> root, CriteriaQuery<?> query, CriteriaBuilder builder) -> {
// construct Predicate by combining calls to builder.isMember
// https://docs.oracle.com/javaee/6/api/javax/persistence/criteria/CriteriaBuilder.html#isMember(E,%20javax.persistence.criteria.Expression)
}
}
Something like this should make it:
#Query("SELECT c FROM Course c JOIN Attribute a WHERE LOWER(a.name) IN (:attributes) GROUP BY c HAVING COUNT(c) = :size")
public Page<Course> findByAllAttributes(#Param("attributes") Set<String> attributes, #Param("size") Integer size, Pageable page);
and you call it like this:
Page<Course> page = findByAllAttributes(set.stream()
.map(String::toLowerCase)
.collect(Collectors.toSet(),
set.size(), page);

Best way to cache results of method with multiple parameters - Object as key in Dictionary?

At the beginning of a method I want to check if the method is called with these exact parameters before, and if so, return the result that was returned back then.
At first, with one parameter, I used a Dictionary, but now I need to check 3 parameters (a String, an Object and a boolean).
I tried making a custom Object like so:
var cacheKey:Object = { identifier:identifier, type:type, someBoolean:someBoolean };
//if key already exists, return it (not working)
if (resultCache[cacheKey]) return resultCache[cacheKey];
//else: create result ...
//and save it in the cache
resultCache[cacheKey] = result;
But this doesn't work, because the seccond time the function is called, the new cacheKey is not the same object as the first, even though it's properties are the same.
So my question is: is there a datatype that will check the properties of the object used as key for a matching key?
And what else is my best option? Create a cache for the keys as well? :/
Note there are two aspects to the technical solution: equality comparison and indexing.
The Cliff Notes version:
It's easy to do custom equality comparison
In order to perform indexing, you need to know more than whether one object is equal to another -- you need to know which is object is "bigger" than the other.
If all of your properties are primitives you should squash them into a single string and use an Object to keep track of them (NOT a Dictionary).
If you need to compare some of the individual properties for reference equality you're going to have a write a function to determine which set of properties is bigger than the other, and then make your own collection class that uses the output of the comparison function to implement its own a binary search tree based indexing.
If the number of unique sets of arguments is in the several hundreds or less AND you do need reference comparison for your Object argument, just use an Array and the some method to do a naive comparison to all cached keys. Only you know how expensive your actual method is, so it's up to you to decide what lookup cost (which depends on the number of unique arguments provided to the function) is acceptable.
Equality comparison
To address equality comparison it is easy enough to write some code to compare objects for the values of their properties, rather than for reference equality. The following function enforces strict set comparison, so that both objects must contain exactly the same properties (no additional properties on either object allowed) with the same values:
public static propsEqual(obj1:Object, obj2:Object):Boolean {
for(key1:* in obj1) {
if(obj2[key1] === undefined)
return false;
if(obj2[key1] != obj2[key1])
return false;
}
for(key2:* in obj2)
if(obj1[key2] === undefined)
return false;
return true;
}
You could speed it up by eliminating the second for loop with the tradeoff that {A:1, B:2} will be deemed equal to {A:1, B:2, C:'An extra property'}.
Indexing
The problem with this in your case is that you lose the indexing that a Dictionary provides for reference equality or that an Object provides for string keys. You would have to compare each new set of function arguments to the entire list of previously seen arguments, such as using Array.some. I use the field currentArgs and the method to avoid generating a new closure every time.
private var cachedArgs:Array = [];
private var currentArgs:Object;
function yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
currentArgs = { stringArg:stringArg, objArg:objArg, boolArg:boolArg };
var iveSeenThisBefore:Boolean = cachedArgs.some(compareToCurrent);
if(!iveSeenThisBefore)
cachedArgs.push(currentArgs);
}
function compareToCurrent(obj:Object):Boolean {
return someUtil.propsEqual(obj, currentArgs);
}
This means comparison will be O(n) time, where n is the ever increasing number of unique sets of function arguments.
If all the arguments to your function are primitive, see the very similar question In AS3, where do you draw the line between Dictionary and ArrayCollection?. The title doesn't sound very similar but the solution in the accepted answer (yes I wrote it) addresses the exact same techinical issue -- using multiple primitive values as a single compound key. The basic gist in your case would be:
private var cachedArgs:Object = {};
function yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
var argKey:String = stringArg + objArg.toString() + (boolArg ? 'T' : 'F');
if(cachedArgs[argKey] === undefined)
cachedArgs[argKey] = _yourMethod(stringArg, objArg, boolArg);
return cachedArgs[argKey];
}
private function _yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
// Do stuff
return something;
}
If you really need to determine which reference is "bigger" than another (as the Dictionary does internally) you're going to have to wade into some ugly stuff, since Adobe has not yet provided any API to retrieve the "value" / "address" of a reference. The best thing I've found so far is this interesting hack: How can I get an instance's "memory location" in ActionScript?. Without doing a bunch of performance tests I don't know if using this hack to compare references will kill the advantages gained by binary search tree indexnig. Naturally it would depend on the number of keys.

Mahout 0.7 Failed to get recommendation with a large data using MysqlJdbcDataModel

I am using Mahout to build an Item-based Cf recommendation engine.
I create an MahoutHelper class which has a constructor:
public MahoutHelper(String serverName, String user, String password,
String DatabaseName, String tableName) {
source = new MysqlConnectionPoolDataSource();
source.setServerName(serverName);
source.setUser(user);
source.setPassword(password);
source.setDatabaseName(DatabaseName);
source.setCachePreparedStatements(true);
source.setCachePrepStmts(true);
source.setCacheResultSetMetadata(true);
source.setAlwaysSendSetIsolation(true);
source.setElideSetAutoCommits(true);
DBmodel = new MySQLJDBCDataModel(source, tableName, "userId", "itemId",
"value", null);
similarity = new TanimotoCoefficientSimilarity(DBmodel);
}
and the recommend method is:
public List<RecommendedItem> recommendation() throws TasteException {
Recommender recommender = null;
recommender = new GenericItemBasedRecommender(DBmodel, similarity);
List<RecommendedItem> recommendations = null;
recommendations = recommender.recommend(userId, maxNum);
System.out.println("query completed");
return recommendations;
}
It's using datasource to build datamodel but the problem is that when mysql has only a few data (less than 100) the program works fine for me, while when the scale turns to be over 1,000,000, the program stacks at doing recommendation and never goes forward. I have no idea how it happens. By the way I used the same data to build a FileDataModel with a .dat file, and it takes only 2~3 second to complete analysis. I am confused.
Using the database directly will only work for tiny data sets, like maybe a hundred thousand data points. Beyond that the overhead of such data-intensive applications will never run quickly; a query takes thousands of SQL queries or more.
Instead you must load and re-load into memory. You can still pull from the database; look at ReloadFromJDBCDataModel as a wrapper.

Linq to SQL Stored Procedures with Multiple Results

We have followed the approach below to get the data from multiple results using LINQ To SQL
CREATE PROCEDURE dbo.GetPostByID
(
#PostID int
)
AS
SELECT *
FROM Posts AS p
WHERE p.PostID = #PostID
SELECT c.*
FROM Categories AS c
JOIN PostCategories AS pc
ON (pc.CategoryID = c.CategoryID)
WHERE pc.PostID = #PostID
The calling method in the class the inherits from DataContext should look like:
[Database(Name = "Blog")]
public class BlogContext : DataContext
{
...
[Function(Name = "dbo.GetPostByID")]
[ResultType(typeof(Post))]
[ResultType(typeof(Category))]
public IMultipleResults GetPostByID(int postID)
{
IExecuteResult result =
this.ExecuteMethodCall(this,
((MethodInfo)(MethodInfo.GetCurrentMethod())),
postID);
return (IMultipleResults)(result.ReturnValue);
}
}
Notice that the method is decorated not only with the Function attribute that maps to the stored procedure name, but also with the ReturnType attributes with the types of the result sets that the stored procedure returns. Additionally, the method returns an untyped interface of IMultipleResults:
public interface IMultipleResults : IFunctionResult, IDisposable
{
IEnumerable<TElement> GetResult<TElement>();
}
so the program can use this interface in order to retrieve the results:
BlogContext ctx = new BlogContext(...);
IMultipleResults results = ctx.GetPostByID(...);
IEnumerable<Post> posts = results.GetResult<Post>();
IEnumerable<Category> categories = results.GetResult<Category>();
In the above stored procedures we had two select queries
1. Select query without join
2. Select query with Join
But in the above second select query the data which is displayed is from one of the table i.e. from Categories table. But we have used join and want to display the data table with the results from both the tables i.e. from Categories as well as PostCategories.
Please if anybody can let me know how to achieve this using LINQ to SQL
What is the performance trade-off if we use the above approach vis-à-vis implement the above approach with simple SQL
Scott Guthrie (the guy who runs the .Net dev teams at MS) covered how to do this on his blog some months ago much better than I ever could, link here. On that page there is a section titled "Handling Multiple Result Shapes from SPROCs". That explains how to handle multiple results from stored procs of different shapes (or the same shape).
I highly recommend subscribing to his RSS feed. He is pretty much THE authoritative source on all things .Net.
Heya dude - does this work?
IEnumerable<Post> posts;
IEnumerable<Category> categories;
using (BlogContext ctx = new BlogContext(...))
{
ctx.DeferredLoadingEnabled = false; // THIS IS IMPORTANT.
IMultipleResults results = ctx.GetPostByID(...);
posts = results.GetResult<Post>().ToList();
categories = results.GetResult<Category>().ToList();
}
// Now we need to associate each category to the post.
// ASSUMPTION: Each post has only one category (1-1 mapping).
if (posts != null)
{
foreach(var post in posts)
{
int postId = post.PostId;
post.Category = categories
.Where(p => p.PostId == postId)
.SingleOrDefault();
}
}
Ok. lets break this down.
First up, a nice connection inside a using block (so it's disposed of nicely).
Next, we make sure DEFERRED LOADING is off. Otherwise, when u try and do the set (eg. post.Category == blah) it will see that it's null, lazy-load the data (eg. do a rountrip the database) set the data and THEN override the what was just dragged down from the db, with the result of there Where(..) method. phew! Summary: make sure deferred loading is off for the scope of the query.
Last, for each post, iterate and set the category from the second list.
does that help?
EDIT
Fixed it so that it doesn't throw an enumeration error by calling the ToList() methods.
Just curious, if a Post have have one or many Categories, is it possible to instead of using the for loop, to load the Post.PostCategories with the list of Categories (one to many), all in one shot, using a JOIN?
var rslt = from p in results.GetResult<Post>()
join c in results.GetResult<Category>() on p.PostId = c.PostID
...
p.Categories.Add(c)