Get first result of suggested results with Wikipedia's API

Get first result of suggested results with Wikipedia's API - json

I am trying to use Wikipedia's API, but I can't find a way to get the first result when there are multiple possible ones.
For example if I use this request https://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts%7Cimages&titles=test&redirects=1&explaintext=1&imlimit=20 it will return an article that says
Test may refer to: ...
What I want is for it to skip this part and give me directly the results of the first result that "Test may refer to".
Do you know if this is possible or not ?
Thank you for reading :)

Related

Composition in REST and consistence of the inserted data

How to properly design REST if you have a composition? I have a TestResult entity, which has TestCaseResults entities. Both support full set of REST methods. The important fact about this (which I believe differs from many examples I found on a web) is that TestResult is not consistent if it doesn't have all of TestCaseResults How do I properly design this in REST?
Let's say I create it as separate but dependent resources: api\testresults\ and api\testresults\1\testcaseresults. When the client wants to create a test result, he needs to POST to api\testresults, then retrieve URL api\testresults\1\testcaseresutls by a link from the response, and POST all of test case results to it. This means that at some point in time the test result is not consistent until the user finishes its operation. Basically, there is no concept of the transaction here.
Let's say I create only api\testresults resource, and embed an array of test case results inside, like this:
{
"Name": "Test A"
"Results": [
{
"Measured": "BB",
...
},
...
]
...
}
Then it is easier to insert, but it still hard to work with. Simple GET to api\testresults\1\ will retrieve test result with a big amount of test case results. GET to api\testresults\ will retrieve much more! The structure of this becomes complex. Furthermore, in the real word I have a few entities like TestCaseResults belong to TestResults, so there will be a few arrays, and each could have 100-200 elements.
I could try to combine the approaches. Embed the array, but also provide links to api\testresults\1\testcaseresults and support operations there as well. Maybe on GET api\testresults\1\ I could provide TestResult without it's TestCaseResults but only with a link pointing to a resource, but on POST I could accept an array of TestCaseResults embedded (not sure though it is allowed to have different return types for POST and GET in REST) But now there are two approaches for inserting information, it is confusing and I'm still not sure it solves anything.

your approach with api\testresults\1 and api\testresults\1\testcaseresults seems promising.
As JSON does not have a fixed structure, you can add query parameters to your URL to control if results are inserted or not.
api\testresults\1?with_results=true would mean that your caller want to see the test cases in addition to the test results.
api\testresults\1\testcaseresults would still return the test case results for your test 1.
If you fear that the number of test case results is too large, you can add pagination parameters, that would be reuse in the testcaseresults call.
api\testresults\1?with_results=true&per_page=10 would include the only the 10 first results. To get more, use api\testresults\1\testcaseresults?per_page=10&page=2 and so on, as it is the dedicated endpoint.
Cheers
Note: if you want a flexible API still returning JSON data, you can give a look to GraphQL, the trendy approach.

Using cts query to retrieve collections associated with the given document uri- Marklogic

I need to retrieve the collections to which a given document belongs in Marklogic.
I know xdmp command does that. But I need to use it in cts query to retrieve the data and then filter records from it.
xdmp:document-get-collections("uri of document") can't be run inside cts-query to give appropriate data.
Any idea how can it be done using cts query?
Thanks

A few options come to mind:
Option One: Use cts:values()
cts:values(cts:collection-reference())
If you check out the documentation, you will see that you can also restrict this to certain fragments by passing a query as one of the parameters.
**Update: [11-10-2017]
The comment attached to this asked for a sample of restricting the results of cts:values() to a single document(for practical purposes, I will say fragment == document)
The documentation for cts:values explains this. It is the 4th parameter - a query to restrict the results. Get to know this pattern as it is part of many features of MarkLogic. It is your friend. The query I would use for this problem statement would be a cts:document-query();
An Example:
cts:values(
cts:collection-reference(),
(),
(),
cts:document-query('/path/to/my/document')
)
Full Example:
cts:search(
collection(),
cts:collection-query(
cts:values(
cts:collection-reference(),
(),
(),
cts:document-query('/path/to/my/document')
)
)
)[1 to 10]
Option two: use cts:collection-match()
Need more control over returning just some of the collections from a document, then use cts:colection-match(). Like the first option, you can restrict the results to just some fragments. However, it has the benefit of having an option for a pattern.
Attention:
They both return a sequence - perfect for feeding into other parts of your query. However, under the hood, I believe they work differently. The second option is run against a lexicon. The larger the list of unique collection names and the more complex your pattern match, the longer for resolution. I use collection-match in projects. However, I usually use it when I can limit the possible choices by restricting the results to a smaller number of documents.

You can't do this in a single step. You have to run code first to retrieve collections associated with a document. You can use something like xdmp:document-get-collections for that. You then have to feed that into a cts query that you build dynamically:
let $doc-collections := xdmp:document-get-collections($doc-uri)
return
cts:search(collection(), cts:collection-query($doc-collections))[1 to 10]
HTH!

Are you looking for cts:collection-query()?

Insert two XML files to the same collection:
xquery version "1.0-ml";
xdmp:document-insert("/a.xml", <root><sub1><a>aaa</a></sub1></root>,
map:map() => map:with("collections", ("coll1")));
xdmp:document-insert("/b.xml", <root><sub2><a>aaa</a></sub2></root>,
map:map() => map:with("collections", ("coll1")));
Search the collection:
xquery version "1.0-ml";
let $myColl:= xdmp:document-get-collections("/a.xml")
return
cts:search(/root,
cts:and-query((cts:collection-query($myColl),cts:element-query(xs:QName("a"),"aaa")
)))

How to create RegEx with SubMatches of the same Match that capture 2 different types of output?

I'm trying to get my Jira data via JSON REST API into Excel, i.e. using VBA, and I'm parsing JSON output using RegEx. There are plenty of useful tutorials on the web, and after a couple of days I do have more or less working solution I'm happy with, except one minor obstacle. Long story short:
Among many issue fields I need friendly Assignee name, but some issues in my projects may be Unassigned, that obviously results in TWO VERY different kinds of JSON output:
Unassigned issue:
..."assignee":null,"updated"...
Assigned issue:
"assignee":{
"self":...
<Lots of NOT needed fields here>
...
},
"displayName":"Doe, John", <-- That's what I need, name only part
"active":...
<Lots of NOT needed fields here>
...
},
"updated"...
Well, I suppose that something like:
"assignee".*?"displayName":"(.*?)"|"assignee":(.*?),"updated"
will handle the job by producing TWO possible Matches, but... Is there a way to create RegEx where ANY of output options will result in SubMatches of ONE Match?
I'm a total newbie to RegEx, so sorry if the wording of my question is silly due to incorrectly used terms. Anyway, I hope the sample part is more or less clear, and I'll be extremely grateful for useful suggestions.

After an hour of tryouts on regex101 I ended up with the following RegEx:
"assignee":(null|.*?"displayName":"(.*?)","active")
Probably it's ugly and may be improved - but it DOES the job, and does NOT ruin in the process the indexes of subsequent Matches in collection, therefore keeping the rest of code working as it is now.

Extract a html tag that contains a string in openrefine?

There is not much to add to the title. It's what i'm trying to do. Any suggestions?
I reviewed the docs at github and googled extensively.
The best i got is:
value.parseHtml().select('p[contains('xyz')]')
It results in a syntax error.

The 'select' syntax is based on the select syntax in Beautiful Soup (http://jsoup.org/cookbook/extracting-data/selector-syntax)
In this case I believe the syntax you need is:
value.parseHtml().select("p:contains(xyz)")
Owen

Perhaps you missed my writeup (and WARNING) on the wiki :) here ?
https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML#extract-html-attributes-text-links-with-integrated-grel-jsoup-commands
WARNING: Make sure to use .toString() suffixes when needed to output strings into Refine cells while working with the built-in HTML GREL commands (the default output is org.jsoup.nodes objects). Otherwise you'll get a preview just fine in the Expression Editor, BUT no data shown in the Refine cells when you apply it!
BTW, How could we make the docs better and where, so that someone doesn't miss this in the future ?
I even gave folks a nice example in our docs that shows using .toString() :
https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#selectelement-e-string-s

Which is a better long term URL design?

I like Stack Overflow's URLs - specifically the forms:
/questions/{Id}/{Title}
/users/{Id}/{Name}
It's great because as the title of the question changes, the search engines will key in to the new URL but all the old URLs will still work.
Jeff mentioned in one of the podcasts - at the time the Flair feature was being announced - that he regretted some design decisions he had made when it came to these forms. Specifically, he was troubled by his pseudo-verbs, as in:
/users/edit/{Id}
/posts/{Id}/edit
It was a bit unclear which of these verb forms he ended up preferring.
Which pattern do you prefer (1 or 2) and why?

I prefer pattern 2 for the simple reason is that the URL reads better. Compare:
"I want to access the USERS EDIT resource, for this ID" versus
"I want to access the POSTS resource, with this ID and EDIT it"
If you forget the last part of each URL, then in the second URL you have a nice recovery plan.
Hi, get /users/edit... what? what do you want to edit? Error!
Hi, get /posts/id... oh you want the post with this ID hmm? Cool.
My 2 pennies!

My guess would be he preferred #2.
If you put the string first it means it always has to be there. Otherwise you get ugly looking urls like:
/users//4534905
No matter what you need the id of the user so this
/user/4534905/
Ends up looking better. If you want fakie verbs you can add them to the end.
/user/4534905/edit

Neither. Putting a non-English numeric ID in the URL is hardly search engine friendly. You are best to utliize titles with spaces replaced with dashes and all lowercase. So for me the correct form is:
/question/how-do-i-bake-an-apple-pie
/user/frank-krueger

I prefer the 2nd option as well.
But I still believe that the resulting URLs are ugly because there's no meaning whatsoever in there.
That's why I tend to split the url creation into two parts:
/posts/42
/posts/42-goodbye-and-thanks-for-all-the-fish
Both URLs refer to the same document and given the latter only the id is used in the internal query. Thus I can offer somewhat meaningful URLs and still refrain from bloating my Queries.

I like number 2:
also:
/questions/foo == All questions called "foo"
/questions/{id}/foo == A question called "foo"
/users/aiden == All users called aiden
/users/{id}/aiden == A user called aiden
/users/aiden?a=edit or /users/aiden/edit == Edit the list of users called Aiden?
/users/{id}/edit or /users/{id}?a=edit is better
/rss/users/aiden == An RSS update of users called aiden
/rss/users/{id} == An RSS feed of a user's activity
/rss/users/{id}/aiden == An RSS feed of Aiden's profile changes
I don't mind GET arguments personally and think that /x/y/z should refer to a mutable resource and GET/POST/PUT should act upon it.
My 2p

/question/how-do-i-bake-an-apple-pie
/question/how-do-i-bake-an-apple-pie-2
/question/how-do-i-bake-an-apple-pie-...

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008