Solr grouped query pagination not working properly. [Solr, Lucene] - json

I have grouped my solr documents by a field family.
the solr query for getting first 20 groups is as follows
/select?q=*:*&group=true&group.field=family&group.ngroups=true&start=0&group.limit=1
Result of this query is 20 groups as following
responseHeader: {
zkConnected: true,
status: 0,
QTime: 1260,
params: {
q: "*:*",
group.limit: "1",
start: "0",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [
{
groupValue: "__fam__ME.EA.HE.728928",
doclist: {
numFound: 1,
start: 0,
maxScore: 1,
docs: [
{
sku: "ME.EA.HE.728928",
title: "Rexton Pocket Family Hearing Instrument Fusion",
family: "__fam__ME.EA.HE.728928",
brand: "Rexton",
brandId: "6739",
inStock: false,
bulkDiscount: false,
quoteOnly: false,
cats: [
"Hearing Machine & Components",
"Health & Personal Care",
"Medical Supplies & Equipment"
],
leafCatIds: [
"6038"
],
parentCatIds: [
"6259",
"4913"
],
Type__attr__: "Pocket Family",
Type of Products__attr__: "Hearing Instrument",
price: 3790,
discount: 40,
createdAt: "2016-02-18T04:51:36Z",
moq: 1,
offerPrice: 2255,
suggestKeywords: [
"Rexton",
"Pocket Family",
"Rexton Pocket Family"
],
suggestPayload: "6038,Hearing Machine & Components",
_version_: 1548082328946868200
}
]
}
},
Just the thing to notice in this result is the value of ngroups which is 396324
But when i want to get data of last pages i would hit this query on Solr
select?q=*:*&group=true&group.field=family&group.ngroups=true&start=396320&group.limit=1
{
responseHeader: {
zkConnected: true,
status: 0,
QTime: 5238,
params: {
q: "*:*",
group.limit: "1",
start: "396320",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [ ]
}
}
}
0 results when i set start to 396320. There must be 5 documents in the result. The actual number of groups are 386887. Why is ngroups incorrect?
btw this issue is not present in my local solr server i have setup up. just shows up in solr cloud on the test env

This is a known issue with grouping across distributed nodes (which is what happens in SolrCloud mode):
Grouping is supported for distributed searches, with some caveats:
Currently group.func is is not supported in any distributed searches
group.ngroups and group.facet require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations.
The most direct solution is to use the family as a part of the routing key, ensuring that all identical family values will end up on the same node. As it seems that the number of distinct family values are very high compared to the number of nodes, this should still ensure that you have a good distribution of documents across nodes.
Depending on what you're actually trying to do, there might be other alternative solutions as well (if you just want a count, using a JSON facet might be a good solution).

Related

What result from REST endpoint is more common?

We are during design of our REST-API and we are wondering in what form REST endpoint should return data?
We have an endpoint that returns so-called "identity" objects that have different attributes.
Each 'identities' has unique string eg. UUID#cf684c35-200e-4936-8b63-e6e51b6e3569.
We are wondering which format the developers are more used to?
Like this below:
{
"UUID#cf684c35-200e-4936-8b63-e6e51b6e3569": {
"validity_date": 1608591121,
"visibility": "private"
},
"RFID#cf684c35-200e-4936-8b63-e6e51b6e3570": {
"validity_date": 1608591123,
"visibility": "public".
}
}
or
{
"results": [
{
"identity": "UUID#cf684c35-200e-4936-8b63-e6e51b6e3569",
"validity_date": 1608591121,
"visibility": "private"
},
{
"identity": "RFID#cf684c35-200e-4936-8b63-e6e51b6e3570",
"validity_date": 0,
"visibility": "1608591123"
},
]
}
What is your opinion?
TL;DR I recommend to use a list of objects (your second approach).
Let's take your objects to a more obvious example of users with an id and a name:
{
1: {
"name": "Michal"
},
2: {
"name": "Thomas"
}
}
[
{
"id": 1,
"name": "Michal"
},
{
"id": 2,
"name": "Thomas"
}
]
Both approaches can be used, I don't see any difference from the API-level itself.
But let's consider how an application might provide or consume such data:
fetching a database table of users (e.g. whose birthday is next week)
showing a table of users (e.g. user name and birthday)
processing the monthly salary to employees
All three examples use a list of users, which is the second approach. Since many applications operate on a list of entities, that's a common sense for APIs.
I think that it [] is better than it { "results": [] }.
Said it, on my opinion the 2nd is better because of [on some languages] is easier to map it than to map the 1st.

Google Analytics API - RemarketingAudiences.insert only working when linkedAdAccounts is AD_WORDS

I'm writing a Google Apps Script for creating audiences in Google Analytics. I keep getting the very unhelpful error message of There was an internal error.
As per [this guide][1], I am able to insert new audiences with the type AD_WORDS without issue. However my current task involves duplicating audiences of type ANALYTICS.
It seems that the linkedAdAccounts attribute of the resource I submit is incorrect. I can see that the official docs mention 3 possible options for the type: ADWORDS_LINKS, DBM_LINKS, MCC_LINKS or OPTIMIZE. Unfortunately, no detailed explanation is given for how these work other than ADWORDS_LINKS.
Here is the payload which is being rejected:
{
name: "newName",
linkedViews: ["123445677"],
linkedAdAccounts: [
{
kind: "analytics#linkedForeignAccount",
internalWebPropertyId: "12345678",
status: "OPEN",
remarketingAudienceId: "aaaaaaaaaaaaaaaaaaaaa",
id: "xxxxxxxxxxxxxxxxxxxxx",
webPropertyId: "UA-1234567-1",
type: "ANALYTICS",
accountId: "12345678",
},
],
audienceType: "SIMPLE",
audienceDefinition: {
includeConditions: {
daysToLookBack: 7,
segment: "users::condition::ga:sessionDuration>60",
membershipDurationDays: 30,
isSmartList: false,
},
},
}
It turns out you can't add an id for an ANALYTICS linkedAdAccount. Just adding the following is sufficient.
linkedAdAccounts: [
{
type: "ANALYTICS",
},
],

Ansible trouble parsing JSON to get correct UUIDs to poweron VMs

making an API GET cal I get the following JSON structure:
{
"metadata": {
"grand_total_entities": 231,
"total_entities": 0,
"count": 231
},
"entities": [
{
"allow_live_migrate": true,
"gpus_assigned": false,
"ha_priority": 0,
"memory_mb": 1024,
"name": "test-ansible2",
"num_cores_per_vcpu": 2,
"num_vcpus": 1,
"power_state": "off",
"timezone": "UTC",
"uuid": "e1aff9d4-c834-4515-8c08-235d1674a47b",
"vm_features": {
"AGENT_VM": false
},
"vm_logical_timestamp": 1
},
{
"allow_live_migrate": true,
"gpus_assigned": false,
"ha_priority": 0,
"memory_mb": 1024,
"name": "test-ansible1",
"num_cores_per_vcpu": 1,
"num_vcpus": 1,
"power_state": "off",
"timezone": "UTC",
"uuid": "4b3b315e-f313-43bb-941b-03c298937b4d",
"vm_features": {
"AGENT_VM": false
},
"vm_logical_timestamp": 1
},
{
"allow_live_migrate": true,
"gpus_assigned": false,
"ha_priority": 0,
"memory_mb": 4096,
"name": "test",
"num_cores_per_vcpu": 1,
"num_vcpus": 2,
"power_state": "off",
"timezone": "UTC",
"uuid": "fbe9a1ac-cf45-4efa-9d65-b3257548a9f4",
"vm_features": {
"AGENT_VM": false
},
"vm_logical_timestamp": 17
},
]
}
In my Ansible playbook I register a variable holding this content.
I need to get a list of UUID of "test-ansible1" and "test-ansible2" but I'm having a hard time finding the best way to to this.
Note that I have another variable holding the list of names for which I need to lookup the UUID.
The need is to use those UUIDs to fire a poweron command for all UUIDs corresponding to specific names.
How would you guys do that?
I've taken a number of approaches but I can't seem to get what I want so I prefer an uninfluenced opinion.
P.S.: This is what Nutanix AHV returns as a get of all vms thgough APIs. There seems to me no way to get only specific VMs JSON information but only all VMs.
Thanks.
Here is some Jinja2 magic for you:
- debug:
msg: "{{ mynames | map('extract', dict(test_json | json_query('entities[].[name,uuid]'))) | list }}"
vars:
mynames:
- test-ansible1
- test-ansible2
Explanation:
test_json | json_query('entities[].[name,uuid]') reduces your original json data to a list of elements which are lists of two items – name value and uuid value:
[
[
"test-ansible2",
"e1aff9d4-c834-4515-8c08-235d1674a47b"
],
[
"test-ansible1",
"4b3b315e-f313-43bb-941b-03c298937b4d"
],
[
"test",
"fbe9a1ac-cf45-4efa-9d65-b3257548a9f4"
]
]
BTW you can use http://jmespath.org/ to test query statements.
dict(...) when applied to such structure (list of "touples") generates a dictionary:
{
"test": "fbe9a1ac-cf45-4efa-9d65-b3257548a9f4",
"test-ansible1": "4b3b315e-f313-43bb-941b-03c298937b4d",
"test-ansible2": "e1aff9d4-c834-4515-8c08-235d1674a47b"
}
Then we apply extract filter as per documentation to fetch only required elements:
[
"4b3b315e-f313-43bb-941b-03c298937b4d",
"e1aff9d4-c834-4515-8c08-235d1674a47b"
]

What is correct way of sending list/tabular data JSON with REST?

I am working on a RESTful APIs. One of our screen shows table with Grand total.
Below are two JSON responses for returning data
First
[
{
"name": "Richard",
"bank_balance": 3000,
"assets_worth": 4000,
"total": 7000
},
{
"name": "John",
"bank_balance": 1000,
"assets_worth": 2000,
"total": 3000
},
{
"name": "Total",
"bank_balance":4000,
"assets_worth": 6000,
"total": 10000
}
]
Second
{
"rows": [
{
"name": "Richard",
"bank_balance": 3000,
"assets_worth": 4000,
"total": 7000
},
{
"name": "John",
"bank_balance": 1000,
"assets_worth": 2000,
"total": 3000
}
],
"grand_total":
{
"name": "Total",
"bank_balance":4000,
"assets_worth": 6000,
"total": 10000
}
}
Which one is more correct considering REST standard?
REST is merely an architecture style for designing networked applications. It doesn't directly answer to your question on data structuring.
Personally I would go with the first approach (just without the total row) as grand total can be trivially calculated from the row data, resulting in something like:
[
{
name: "Richard",
bank_balance: 3000,
assets_worth: 4000,
total: 7000
},
{
name: "John",
bank_balance: 1000,
assets_worth: 2000,
total: 3000
}
]
I think the important design principle here is that your API should not be opinionated about data representation. Some applications that use your API may choose to display data in tabular format, while other applications may choose some other representations. A good API is able to cater equally well different applications (and use cases).

REST api design to retrieve summary information

I have a scenario in which I have REST API which manages a Resource which we will call Group.
A Group is similar in concept to a discussion forum in Google Groups.
Now I have two GET access method which I believe needs separate representations.
The 1st GET access method retrieves the minimal amount of information about a Group.
Given a group_id it should return a minimal amount of information like
{
group_id: "5t7yu8i9io0op",
group_name: "Android Developers",
is_moderated: true,
number_of_users: 34,
new_messages: 5,
icon: "http://boo.com/pic.png"
}
The 2nd GET access method retrives summary information which are more statistical in nature like:
{
group_id: "5t7yu8i9io0op",
top_ranking_users: {
[ { user: "george", posts: 789, rank: 1 },
{ user: "joel", posts: 560, rank: 2 } ...]
},
popular_topics: {
[ ... ]
}
}
I want to separate these data access methods and I'm currently planning on this design:
GET /group/:group_id/
GET /group/:group_id/stat
Only the latter will return the statistical information about the group. What do you think about this ?
I don't see a problem with your approach. Since the statistics are basically separate data, you could treat the stats as a separate resource, too, providing a URI like
GET /stat/:group_id
Additionally you can cross reference your resources (meaning a group links to the corresponding stat resource and vice versa):
GET /group/5t7yu8i9io0op
{
group_id: "5t7yu8i9io0op",
group_name: "Android Developers",
is_moderated: true,
number_of_users: 34,
new_messages: 5,
icon: "http://boo.com/pic.png",
stats: "http://mydomain.com/stat/5t7yu8i9io0op"
}
GET /stat/5t7yu8i9io0op
{
group: "http://mydomain.com/group/5t7yu8i9io0op",
top_ranking_users: {
[ { user: "george", posts: 789, rank: 1 },
{ user: "joel", posts: 560, rank: 2 } ...]
},
popular_topics: {
[ ... ]
}
}
What would be even better would be if you embedded the link to the statistics in the group summary:
{
group_id: "5t7yu8i9io0op",
group_name: "Android Developers",
is_moderated: true,
number_of_users: 34,
new_messages: 5,
icon: "http://boo.com/pic.png"
stats_link : "http://whatever.who/cares"
}