Couchbase Custom Reduce behaving inconsistently - couchbase

I am using couchbase version 2.0.1 - enterprise edition (build-170) and java-client version 1.2.2
I have a custom reduce function to get last activity of a user
The response from java client is inconsistent At time I get correct response and most of the time I get null value against valid keys. Even Stale.FALSE doesn't help !!
Number of records in view is around 1 millon and result set for query is arounk 1K key value pairs.
I am not sure what could be the issue here.. It will be great if someone can help.
Reduce Function is as below:
function (key, values, rereduce) {
var currDate = 0;
var activity = "";
for(var idx in values){
if(currDate < values[idx][0]){
currDate = values[idx][0];
activity = values[idx][1];
}
}
return [currDate, activity];
}
View Query:
CouchbaseClient cbc = Couchbase.getConnection();
Query query = new Query();
query.setIncludeDocs(false);
query.setSkip(0);
query.setLimit(10000);
query.setReduce(true);
query.setGroupLevel(4);
query.setRange(startKey,endKey);
View view = cbc.getView(document, view);
ViewResponse response = cbc.query(view, query);

Looks like There was some compatibility issue with java-client 1.2.2 and google gson 1.7.1 which was being used in my application. I switched to java-client 1.2.3 and google gson 2.2.4. Things are working as great now.

Related

Couchbase: MetaData.Metrics always contain default values?

I am testing Couchbase, and I am making a very simply query:
public async Task SelectRandomJobs(int nbr)
{
IBucket bucket = await cluster.BucketAsync("myBucket");
IScope scope = bucket.Scope("myScope");
IQueryResult<JObject> result = await scope.QueryAsync<JObject>("SELECT * FROM myCollection WHERE Id = {id}");
// The Metrics.* has default values
Console.WriteLine(result.MetaData.Metrics.ElaspedTime);
}
Here are the values:
I was expecting ElaspedTime (misspelled!) and ExecutionTime to be not null. There is a AnalyticsQueryAsync method, but that did work for me (error 24045).
Why are those values null?
-- UPDATE --
I followed the advice of Eric, but I got the same results:
So you will need to enable Metrics for this query, I have provided a code sample below with two possible ways of doing this, it is covered in our docs but maybe could be easier to find or have better examples, this is something I will investigate further and see if we can make it clearer in future editions of the docs.
I have used the travel-sample dataset and tried to set the code up similar to your example so that it will be easy to implement for you.
As for why the times are null by default and the other fields are zero, that seems to just be a design decision for this class.
About the misspelling, we have filed a ticket to get the spelling corrected. Thank you for pointing that out.
using System;
using System.Threading.Tasks;
using Couchbase;
using Couchbase.Query;
namespace _3x_simple
{
class Program
{
static async Task Main(string[] args)
{
var cluster = await Cluster.ConnectAsync("couchbase://localhost", "Administrator", "password");
var bucket = await cluster.BucketAsync("travel-sample");
var myScope = bucket.Scope("inventory");
//scope path
var options = new QueryOptions().Metrics(true);
var queryResult = await myScope.QueryAsync<dynamic>("SELECT * FROM airline LIMIT 10;", options);
//cluster path
//var queryResult = await cluster.QueryAsync<dynamic>("SELECT * FROM `travel-sample`.inventory.airline LIMIT 10;", options => options.Metrics(true));
Console.WriteLine($"Execution time before read: {queryResult.MetaData.Metrics.ExecutionTime}");
await foreach(var row in queryResult){
Console.WriteLine(row);
}
Console.WriteLine($"Execution time after read: {queryResult.MetaData.Metrics.ExecutionTime}");
Console.WriteLine("Press any key to exit...");
Console.Read();
}
}
}
You won't see the execution time until after the results are read. The reason you are seeing default values for those fields is because you are trying to read that information at the wrong time/place considering your async operation.

RDF4J SPARQL query to JSON

I am trying to move data from a SPARQL endpoint to a JSONObject. Using RDF4J.
RDF4J documentation does not address this directly (some info about using endpoints, less about converting to JSON, and nothing where these two cases meet up).
Sofar I have:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
Map<String, String> headers = new HashMap<String, String>();
headers.put("Accept", "SPARQL/JSON");
repo.setAdditionalHttpHeaders(headers);
try (RepositoryConnection conn = repo.getConnection())
{
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
GraphQuery query = conn.prepareGraphQuery(queryString);
debug("Mark 2");
try (GraphQueryResult result = query.evaluate())
this fails because "Server responded with an unsupported file format: application/sparql-results+json"
I figured a SPARQLGraphQuery should take the place of GraphQuery, but RepositoryConnection does not have a relevant prepare statement.
If I exchange
try (RepositoryConnection conn = repo.getConnection())
with
try (SPARQLConnection conn = (SPARQLConnection)repo.getConnection())
I run into the problem that SPARQLConnection does not generate a SPARQLGraphQuery. The closest I can get is:
SPARQLGraphQuery query = (SPARQLGraphQuery)conn.prepareQuery(QueryLanguage.SPARQL, queryString);
which gives a runtime error as these types cannot be cast to eachother.
I do not know how to proceed from here. Any help or advise much appreciated. Thank you
this fails because "Server responded with an unsupported file format: application/sparql-results+json"
In RDF4J, SPARQL SELECT queries are tuple queries, so named because each result is a set of bindings, which are tuples of the form (name, value). In contrast, CONSTRUCT (and DESCRIBE) queries are graph queries, so called because their result is a graph, that is, a collection of RDF statements.
Furthermore, setting additional headers for the response format as you have done here is not necessary (except in rare circumstances), the RDF4J client handles this for you automatically, based on the registered set of parsers.
So, in short, simplify your code as follows:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
try (RepositoryConnection conn = repo.getConnection()) {
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
TupleQuery query = conn.prepareTupleQuery(queryString);
debug("Mark 2");
try (TupleQueryResult result = query.evaluate()) {
...
}
}
If you want to write the result of the query in JSON format, you could use a TupleQueryResultHandler, for example the SPARQLResultsJSONWriter, as follows:
SPARQLRepository repo = new SPARQLRepository(<My Endpoint>);
try (RepositoryConnection conn = repo.getConnection()) {
String queryString = "SELECT * WHERE {GRAPH <urn:x-evn-master:mwadata> {?s ?p ?o}}";
TupleQuery query = conn.prepareTupleQuery(queryString);
query.evaluate(new SPARQLResultsJSONWriter(System.out));
}
This will write the result of the query (in this example to standard output) using the SPARQL Query Results JSON format. If you have a non-standard format in mind, you could of course also create your own TupleQueryResultHandler implementation.
For more details on the various ways in which you can process the result (including iterating, streaming, adding to a List, or just directly sending to a result handler), see the documentation on querying a repository. As an aside, the javadoc on the RDF4J APIs is pretty extensive too, so if your Java editing environment has support for displaying that, I'd advise you to make use of it.

NamedList with Deep Pagination

QueryRequest req=new QueryRequest(solrQuery);
NoOpResponseParser responseParser = new NoOpResponseParser();
responseParser.setWriterType("csv");
searcherServer.setParser(responseParser);
NamedList<Object> resp=searcherServer.request(req);
QueryResponse res = searcherServer.query(solrQuery);
responseString = (String)resp.get("response");
I use the above code to get the output in CSV format. The data I am trying to fetch is huge (In billions). So I want to include deep pagination of SOLR and get chunks of CSV output. Is there a way to do? Also, with the current version of SOLR (I cannot upgrade) I have to use the above code to get CSV output.
I tried the below way to fetch the results.
searcherServer = new HttpSolrServer(url);
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(query);
solrQuery.set("fl","field1");
solrQuery.setParam("wt", "csv");
solrQuery.setStart(0);
solrQuery.setRows(1000);
solrQuery.setSort(SolrQuery.SortClause.asc("field2"));
In the output from the above code has wt as javabin. So I cannot get the CSV output.
Any suggestions?
You have two ways.
use Solr export request handler (or add it) and wt=csv parameter. Just to be clear, this is an Implicit Request Handler usually available even in older Solr versions and specifically designed to handle scenarios that involve exporting millions of records.
implement deep paging correctly. I suggest Yonic post paging and deep paging, it easier than you think. But after you'll have correctly implement, you also need to create the csv file by yourself.
The solution I found was:
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(query); //what you want to fetch
QueryResponse res = searcherServer.query(solrQuery);
int numFound = (int)res.getResults().getNumFound();
int rowsToBeFetched = (numFound > 1000 ? (int)(numFound/6) : numFound);
for(int i=0; i< numFound; i=i+rowsToBeFetched ){
solrQuery.set("fl","fieldToBeFetched");
solrQuery.setParam("wt", "csv");
solrQuery.setStart(i);
solrQuery.setRows(rowsToBeFetched);
QueryRequest req=new QueryRequest(solrQuery);
NoOpResponseParser responseParser = new NoOpResponseParser();
responseParser.setWriterType("csv");
searcherServer.setParser(responseParser);
NamedList<Object> resp=searcherServer.request(req);
responseString = (String)resp.get("response"); //This is in CSV format
}
Pros:
Since I don't get the result at once, it was faster.
The output was csv.
Hitting solr multiple items isn't costly.
Cons:
The result is not unique, meaning there can be repeated data based on what you are fetching.
To get unique results, you can use facets.
Thanks!

What causes facet errors after Hibernate Search upgrade from version 4 to 5?

Since upgrading (described below) the Facet search throws this exception.
HSEARCH000268: Facet request 'groupArchiv' tries to facet on field
'facetfieldarchiv' which either does not exists or is not configured
for faceting (via #Facet). Check your configuration.
Migrating from hibernate.search.version 4.4.4 to hibernate.search.version 5.5.2
lucene-queryparser 5.3.1
jdk 1.8xx
All the Indexing is via a ClassBridge.
The field facetfieldarchiv is in the index.
All other searches are working fine.
protected List<FacetBean> searchFacets(String searchQuery, String defaultField,
String onField, String facetGroupName)
{
List<FacetBean> results = new ArrayList<FacetBean>();
FullTextSession ftSession = getHibernateFulltextSession();
org.apache.lucene.analysis.Analyzer analyzer = getAnalyzer(Archiv.class);
QueryParser parser = new QueryParser(defaultField, analyzer);
try
{
Query query = parser.parse(searchQuery);
QueryBuilder builder = ftSession.getSearchFactory().buildQueryBuilder().forEntity(Item.class).get();
FacetingRequest gruppeFacetingRequest = builder.facet()
.name(facetGroupName)
.onField(onField).discrete()
.orderedBy(FacetSortOrder.COUNT_DESC)
.includeZeroCounts(false)
.maxFacetCount(99999)
.createFacetingRequest();
org.hibernate.search.FullTextQuery hibQuery = ftSession.createFullTextQuery(query, Item.class);
FacetManager facetManager = hibQuery.getFacetManager();
facetManager.enableFaceting(gruppeFacetingRequest);
Iterator<Facet> itf1 = facetManager.getFacets(facetGroupName).iterator();
**// The error occurs here,**
while (itf1.hasNext())
{
FacetBean bean = new FacetBean();
Facet facetgruppe = itf1.next();
bean.setFacetName(facetgruppe.getFacetingName());
bean.setFacetFieldName(facetgruppe.getFieldName());
bean.setFacetValue(facetgruppe.getValue());
bean.setFacetCount(facetgruppe.getCount());
results.add(bean);
}
} catch (Exception e)
{
logger.error(" Fehler FacetSuche: " + e);
}
return results;
}
The faceting API went through an overhaul between Hibernate Search 4 and 5. In the 4.x series one could facet on any (single valued) field without special configuration. The implementation was based on a custom Collector.
In Hibernate Search 5.x the implementation has changed and native Lucene faceting support is used. For this to work though, the faceted fields need to be known at index time. For this the annotation #Facet got introduced which needs to be places on fields used for faceting. You find more information in the Hibernate Search online docs or check this blog post which gives you a short summary of the changes.
Thank you for answering.
I didn't catch that change since 5.x
My facets are made up of several fields.
Is there a possibility to build the facets in a ClassBridge using pur Lucene?
like
FacetField f = new FacetField(fieldName, fieldValue);
document.add(f);
indexWriter.addDocument(document);
Thank you
pe

Username in WebTokenRequestResult is empty

in a Windows 10 UWP I try use WebAuthenticationCoreManager.RequestTokenAsync to get the result from a login with a Microsoft account.
I get a WebTokenRequestResult with Success. ResponseData[0] contains a WebAccount with an ID - but the UserName is empty.
The scope of the call is wl.basic - so I should get a lot of information...
I'm not sure how to retrieve extra information - and for the current test the Username would be OK.
I checked out the universal samples - and there I found a snippet which tries to do what I'm trying - an output of webTokenRequestResult.ResponseData[0].WebAccount.UserName.
By the way - the example output is also empty.
Is this a bug - or what do I (and the MS in the samples) have to do to get the users profile data (or at least the Username)?
According to the documentation (https://learn.microsoft.com/en-us/windows/uwp/security/web-account-manager), you have to make a specific REST API call to retrieve it:
var restApi = new Uri(#"https://apis.live.net/v5.0/me?access_token=" + result.ResponseData[0].Token);
using (var client = new HttpClient())
{
var infoResult = await client.GetAsync(restApi);
string content = await infoResult.Content.ReadAsStringAsync();
var jsonObject = JsonObject.Parse(content);
string id = jsonObject["id"].GetString();
string name = jsonObject["name"].GetString();
}
As to why the WebAccount property doesn't get set... shrugs
And FYI, the "id" returned here is entirely different from the WebAccount.Id property returned with the authentication request.