Semantic Mediawiki: aggregation similar to SQL GROUP BY like #ask query - mediawiki

I've implemented a page with a long list of subojects.
Every object contains one article (title + url) and N tags. I'd like to group by tag and show the count of articles related to that tag.
Something like:
SELECT tag, count(distinct article)
GROUP BY tag
I found an answer but it's very generic and I'd also like to document the solution for other user with the same problem.

As you know from previous answers to this question, you cannot have a "distinct" function from an SMW ask query.
My preferred solution is to use the "arrays" extension, that allows you to access PHP array manipulation functions in wiki code. Further than "distinct" list of value, its an irreplaceable tool for handling semantic data from queries.
You can create an array with the following function :
{{#arraydefine: *identifier* | *data* | *delimiter* | *parameters* }}
Identifier is the variable name you want.
Data is the array content, in SMW context, you load it with a query result content.
Delimiter specify the array delimiter relative to data. This have to be
coherent with the delimiter chosen in the ask query.
Parameters is where the magic appends. You can set a "unique" parameter, reducing the data list to unique values, thus, emulating the "distinct" function.
In tour case, you may do something like :
{{#arraydefine:tags
| {{#ask:[[-Has subobject::{{FULLPAGENAME}}]]
|?Tags#-=
| mainlabel=-
|limit = 1000
}}
|,
|unique
}}
Note that SMW ask query are, by default, limited to 50 results. Adding "limit=" adjusts the maximum result size.
At this point, you defined an array called "tags" containing all distinct values of this property.
You can use arrayprint function for any further data treatment or display.

Related

Regular expression to pick a row in an html table containing desired text

Sorry, but uhrm, I'd like to use regexp (actually I'd use something else but I want to do the task within a Matlab function) to pick a single row containing desired keywords within an html table.
I am using Matlab calling function regexpi (case-insensitive version of regexp), which is akin to PHP regex from what I can tell.
Ok, here's a snippet from such an html table to parse:
<tr><td>blu</td><td>value</td></tr><tr><td>findme</td><td>value</td></tr><tr><td>ble</td><td>value</td></tr>
The desired row to pick contains the word "findme".
(added:) Content of other cells and tags in the table could be anything (here "bla" is a dummy value)- the important part is the presence of "findme" and that a single line (not more) is caught (or all lines containing "findme" but such behaviour is not expected). Any paired name/value table in a wikipedia page is a good example.
I tinkered with https://regex101.com/ using whatever I could dig up at the Matlab documentation (forward/backward looking, combinations of :,> and ?), but have failed to identify a pattern that will pick just the right row (or all those that contain the keyword "findme"). The following pattern for instance will pick the text but not the entire row: <tr[^>]*>[^>]*.*?(findme).*?<\/td .
Pattern <tr[^>]*>(.*?findme.*?)<\/tr[^>]*> picks the row but is too greedy and picks preceding rows.
Note that the original task I had set out was to capture entire tables and then parse these, but the Matlab regexp-powered function I found for the task had trouble with nested tables (or I had trouble implementing it for the task).
The question is how to return a row containing desired keywords from an html table, programmatically, within a matlab function (without calling an external program)? Bonus question is how to solve the nested table issue, but maybe that's another question.
I suggest you split up the string with strsplit and use contains for the filtering, which is a lot more readable and maintainable than a regex pattern:
htmlString = ['<tr><td>blu</td><td>value</td></tr><tr><td><a',...
'href="bla">findme</a></td><td>value</td></tr><tr><td><a',...
'href="ble">ble</a></td><td>value</td></tr>'];
keyword = 'findme';
splitStrings = strsplit(htmlString,'<tr>');
desiredRow = ['<tr>' splitStrings{contains(splitStrings,keyword)}]
The output is:
<tr><td>findme</td><td>value</td></tr>
Alternatively you may also combine extractBetween and contains:
allRows = extractBetween(htmlString,'<tr>','</tr>');
desiredRow = ['<tr>' allRows{contains(allRows,keyword)} '</tr>']
If you must use regex:
regexp(htmlString,['<tr><td>[^>]+>' keyword '.*?<\/tr>'],'match')
Try this
%<td>(.*?)%sg
https://regex101.com/r/0Xq0mO/1

How avoid infinity composed indexes in firestores

I'm doing a social media and in my posts i have 'tags' (which are practical infinity, as they are more then 200), i want my users to filter both tags and date, example:
myRef.where("tag" == "tagName").orderBy('date', 'asc')
BUT... I do have infinity number of tagNames, which give me a shock and i couldn't handle.
Should i create a custom map with sections of 1m size ???
Should i create a custom ID with data on it???
How will i be able to mix data asc with these queries or mix two or more types together?
The query you have requires an index on tag + date, not on tagName + date.
But if you want to keep a list of tags for each document, you'll want to store those in an array, and then use array-contains to check whether the document has a certain tag. To see if tagName exists in the array of String values tag, you'd query for:
myRef.where("tag", "array-contains", "tagName").orderBy('date', 'asc')
For more on this, see Better Arrays in Cloud Firestore!

Semantic Mediawiki - Passing a variable to a template

I have the result of a semantic query. For one of the properties, a comma separated list, I want to separate each item and pass it as a parameter to a template. However, I am struggling to find a way to do this.
For example;
Query:
{{#ask: [[Category:Something]] [[Has title::Somethingelse]]
| mainlabel=-
| ?Has property
| link=none
| format=template
| template=plainText
}}
The plainText template will have the result, which is a comma separated list. Now, from the plainText template I would like to separate the comma separated list and put each value as a parameter into another template.
I have tried using {{#arraydefine:key|values|delimiter|options}} but when I pass {{#arrayindex:key|0}} to the template, the value is not passed. The whole array is passed separated by 0. I have also tried using {{#vardefine: etc but this also does not pass the variable.
My question boils down to, how to pass a variable to a template?
Thanks,
The separation needs to be done in the template.
If you use anonymous args like in
http://semantic-mediawiki.org/wiki/Template:Query_output_demo
Your params can be fetched with defaults like this:
{{{1|param1default}}} {{{2|param2default}}} ...
Now one of your params is a comma separated list. You might want to use the
#explode
parser function to get to the different parts of the CSV. Lets assume the second
parameter has your csv then:
{{#explode:{{{2}}}|;|0}}
{{#explode:{{{2}}}|;|1}}
...
will provide the fields.
For this to work you need the extension
https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions##explode
and to enable it according to the instructions there.

Find column values that are a start string of given string.

I have a database table that contains URLs in a column. I want to show certain data depending on what page the user is on, defaulting to a 'parent' page if not a direct match. How can I find the columns where the value is part of the submitted URL?
Eg. I have www.example.com/foo/bar/baz/here.html; I would expect to see (after sorting on length of column value):
www.example.com/foo/bar/baz/here.html
www.example.com/foo/bar/baz
www.example.com/foo/bar
www.example.com/foo
www.example.com
if all those URLs are in the table of course.
Is there a built in function or would I need to create a procedure? Googling kept getting me to LIKE and REGEXP, which is not what I need. I figured that a single query would be much more efficient than chopping the URL and making multiple queries (the URLs could potentially contain many path components).
Simple turn around the "Like" operator:
SELECT * FROM urls WHERE "www.example.com/foo/bar/baz/here.html" LIKE CONCAT(url, "%");
http://sqlfiddle.com/#!2/ef6ee/1

MySQL - get data from custom field with read-only access to db

I have a text field with data, something like:
[{"id":10001,"timeStarted":1355729600733,"projectId":10002,"issueId":"29732,","userName":"tester","assignee":"test","status":"STARTED","shared":True,"name":"Session 4","projectName":"IDS","assigneeDisplayName":"First1 Last1"},
{"id":10002,"timeStarted":1358354188010,"projectId":10002,"issueId":"","userName":"tester","assignee":"test","status":"CREATED","shared":True,"name":"asdf98798","projectName":"IDS","assigneeDisplayName":"First Last"}]
but with much more rows, it may be 30-40, and may be 2 more different statuses (total 4).
Is it possible to extract some data from here having read-only access to DB and only using MySQL query?
For example to count number of items with status "Stated" and with status "created".
Additional conditions may apply, e.g. where id is in definite interval.
Assuming you're using PHP, first you're better off with correcting those unrecognized booleans. You have True where it should have been true (alternatively TRUE for PHP) for it to evaluate the data right.
$jsStr = preg_replace_callback(
'~(?<=[,{[])(".+?"\s*:\s*)(true|false)(?=\s*[,}\]])~i',
create_function('$m','return $m[1].strtolower($m[2]);'),
$jsStr);
Then to be able to process it you want to use the json_decode() function.
$parsed = json_decode($jsStr);
// see the result if you like:
// print_r($parsed);
Ultimately if you want to extract some specific information on the client side (using Javascript) you can use the Array filter() function or a loop if you're not using jQuery. Otherwise you can use the jQuery filter() function with necessary conditions.
If you want to do this in PHP, after the string is parsed into JSON you can use the solutions that apply to Javascript.