Querying Microsoft Academic graph by fields of study of references - json

I've been playing around with the Microsoft Academic API, trying to do graph queries using JSON formatted queries. I'm at the point where I think I can produce results, but for some reason I don't get the full set of results.
The query I am attempting to perform will retrieve all papers that reference a paper that has a FieldOfStudy that is one of the ones I'm looking for. Essentially, I'm trying to find out how well cited a field of study is.
I think the query should look something like this:
{
"path": "/paper/ReferenceIDs/reference/FieldOfStudyIDs/field",
"paper": {
"type": "Paper",
"match" : {
"PublishYear": 2017
},
"select": ["DOI","OriginalTitle","PublishYear"]
},
"reference" : {
"type" : "Paper",
"select" : "OriginalTitle"
},
"field": {
"type": "FieldOfStudy",
"select": [ "Name" ],
"return": { "id": [106686826,204641814] }
}
}
Unfortunately, I get only an incomplete subset of results. Funnily enough though, if I further restrict the initial node by matching on a title, I get another set of results (disjoint from the first query result set)
{
"path": "/paper/ReferenceIDs/reference/FieldOfStudyIDs/field",
"paper": {
"type": "Paper",
"match" : {
"OriginalTitle": "cancer",
"PublishYear": 2017
},
"select": ["DOI","OriginalTitle","PublishYear"]
},
"reference" : {
"type" : "Paper",
"select" : "OriginalTitle"
},
"field": {
"type": "FieldOfStudy",
"select": [ "Name" ],
"return": { "id": [106686826,204641814] }
}
}
So, what could be going on here? Is the query giving up because the very first node it hits on the broader search doesn't match the path? Is it even possible to query all papers published in a year like this?

Related

How do I properly use deleteMany() with an $and query in the MongoDB shell?

I am trying to delete all documents in my collection infrastructure that have a type.primary property of "pipelines" and a type.secondary property of "oil."
I'm trying to use the following query:
db.infrastructure.deleteMany({$and: [{"properties.type.primary": "pipelines"}, {"properties.type.secondary": "oil"}] }),
That returns: { acknowledged: true, deletedCount: 0 }
I expect my query to work because in MongoDB Compass, I can retrieve 182 documents that match the query {$and: [{"properties.type.primary": "pipelines"}, {"properties.type.secondary": "oil"}] }
My documents appear with the following structure (relevant section only):
properties": {
"optional": {
"description": ""
},
"original": {
"Opername": "ENBRIDGE",
"Pipename": "Lakehead",
"Shape_Leng": 604328.294581,
"Source": "EIA"
},
"required": {
"unit": null,
"viz_dim": null,
"years": []
},
"type": {
"primary": "pipelines",
"secondary": "oil"
}
...
My understanding is that I just need to pass a filter to deleteMany() and that $and expects an array of objects. For some reason the two combined isn't working here.
I realized the simplest answer was the correct one -- I spelled my database name incorrectly.

Need documentation for *.analysis.windows.net/public/reports/querydata

I am reverse engineering an app that sends queries to
SOMESERVERNAME.analysis.windows.net/public/reports/querydata via an HTTP POST of an JSON-structured query.
Some initial lines of a sample query are at the end of this message.
I can't find any documentation on this anywhere. I don't know if this is some secret API or what. I ultimately would like to just ignore the aggregations altogether and just dump the raw data, which seems to sit in some flat-file type container on the back-end, but without some API documentation I'm stuck with just re-running the super basic handful of queries I've been able to intercept.
Note: this app is an embedded analytics page created with PowerBI, but the only REST API I can find for PowerBI has nothing to do with querying, but just basic object management.
Thanks!
{
"version": "1.0.0",
"queries": [
{
"Query": {
"Commands": [
{
"SemanticQueryDataShapeCommand": {
"Query": {
"Version": 2,
"From": [
{
"Name": "s",
"Entity": "Sheet1"
}
],
"Select": [
{
"Aggregation": {
"Expression": {
"Column": {
"Expression": {
"SourceRef": {
"Source": "s"
}
},
"Property": "Total"
}
},
"Function": 0
},
"Name": "Sum(Sheet1.Total)"
}
],
"Where": [
{
"Condition": {
"In": {
"Expressions": [
{
"Column": {
"Expression": {
"SourceRef": {
"Source": "s"
}
},
"Property": "Year"
}
}
],
"Values": [
[
{
"Literal": {
"Value": "'2018'"
}
}
]
]
}
}
},
............
I have built a client that scrapes data off a specific Power BI report using the same API, but probably you'll be able to adapt it to your use case. Maybe we can even abstract the code into a more generalized Power BI client!
Having tinkered with the API for two days, I realised that there are many ways the data can be formatted:
"nested"/multidimensional data can be unflattened, flattened by 1 degree, etc.
a primary "table" of a result dataset (in data.PH) can reference others (in data.SH)
The basics are as follows:
A dataset is structured like a multidimensional table, with cells containing values.
In a set of cells, the first always has a field S that contains the schema of its and all subsequent cells.
The schema maps a field of each cell's object with a selection from your query, e.g. the G0 field with the queried column age.
My client seems to work only with a specific type of query (SemanticQueryDataShapeCommand), a specific nr of dimensions and a specific column marked as primary (via Binding.Primary). But maybe that helps! https://github.com/derhuerst/fetch-bvg-occupancy/blob/1ebb864b1ff7130f9d2f0ab031c6d78bcabdd633/lib/parse-dataset.js
The only documented way to use this API is through the ADOMD.NET or OleDb provider.
If you want to send a DAX/MDX query and retrieve data programmatically, there's a sample of how to front-end the service with a simple REST API here.

Elastic search filter for distinct categories

I made a simple mapping with three fields and i am analyzing one field which is text type and other fields are keyword type.
example
fields: Category_one, Category_two, Category_three.
Now i am searching the documents.
Get _search/cat
{
"size": 4,
"query": {
"match": {
"Category_one.ngrams": {
"query": "Nice food place in XYZ location",
"analyzer": "standard"
}
},
"aggs":{
"distincr_values":{
"terms": {
"fields" : "Category_two"
}
}
}
}
}
It's showing this error
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 10,
"col": 5
}
],
"type": "parsing_exception",
"reason": "[match] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 10,
"col": 5
},
"status": 400
}
Kindly help me with this error. My main motive is to find distinct searches according Category_two field.
Any help would be appreciated.
I believe youre getting this error because of your query structure.
Your aggregations keyword must be outside (same level as) the query. At the moments your aggs is wrapped up inside the query.
Following this structure:
Get _search/cat
{
"size": 4,
"query": {
'query goes here'
},
"aggs":{
'aggregation go here'
}
}

Deeply nested JSON documents in Apache Solr

I have a deeply nested document(pseudo structure as shown below):
[{
"id": "1",
"company_id": "1",
"company_name": "company_1",
"departments":[{
"dep1" : [{
"id" : 40,
"name" : xyz
},
{
"id" : 41,
"name" : xyr
}],
"dep2": [{
}]
}]
"employeePrograms" :[{
}]
}]
How can I index these type of documents in Apache Solr?
Documentation gives the idea of immediate child documents alone.
Unfortunatelly i'm don't have huge experience with this technology, but want to help. Here is some official documentation, that might be useful: oficial doc
more specific
If you have some uncommon issue, tell about it, maybe any error, or whatever.. I would try my best to help)
Upd1 :
Solr can only maintain a 'flat' representation of the data. What you weretrying to do is not really possible. There are a number of workarounds, such as using dynamic fields and using a solr join to link multiple data sets.
Speking about a deep nesting ? I've found such an example of work around.
If you had something like that:
"docs": [
{
"name": "Product Name",
"categories": [
{
"name": "Category 1",
"priority": 8
},
{
"name": "Category 2",
"priority": 6
}
...
]
},
You have to modify it like that to make it not deeply nested :
"docs": [
{
name: "Sample Product"
categories: [
{
priority_category: "9_Category 1",
},
{
priority_category: "5_Category 2",
}
...
]
},
So, you've done something similar, check if there are any errors anywhere

How can I delete an id-less entity in Orion?

Question title pretty much self explanatory. It is possible to create an id-less entity in Orion. An id = .* query returns normally as an id-less, although existing entity. But how can someone delete that entity? This request didn't work obviously:
{
"contextElements": [
{
"type": "",
"isPattern": "false",
"id": ""
}
],
"updateAction": "DELETE"
}
This is the returned query:
{
"contextElement": {
"type": "",
"isPattern": "false",
"id": "",
"attributes": [
{
"name": "temp",
"type": "integer",
"value": "15"
},
{
"name": "pressure",
"type": "integer",
"value": "720"
}
]
},
"statusCode": {
"code": "200",
"reasonPhrase": "OK"
}
}
There is a known bug (now fixed) in Orion that seems to be causing your issue. Basically Orion interprets the final "/" at the end of an URL as an empty element.
For example (as documented in the issue):
v1/contextEntityTypes queries for all types, while
v1/contextEntityTypes/ queries only for the empty type
In your particular case, something similar happens with some REST operations. If you do a GET /v1/contextEntities you will see all entities, including the empty-id one. You can query that particular entity with GET /v1/contextEntity/ (note the final "/").
And then, the DELETE method doesn't seem to use the same pattern. So if you do DELETE /v1/contextEntity/ You get a No context element found.
So, basically, this is another manifestation of a known bug.