Semantic Mediawiki: Aggregation similar to GROUP BY, adding the Count - mediawiki

Little by little I am trying to learn Semantic Mediawiki almost like in a tutorial. I got so save info (including uri, titles and tags) for each element of a list using subobjects and then to get the list of the tag.
This is the wiki page with the list of the tags: link
Now I'd like to further explore the articles related to each tag. For example, is it possible to list the articles having the tag x? I wonder if it would be a nice idea to create a Module to parse the output of the semantic query.
SemanticMediawiki: embed some property into a piece of text
Can I use Semantic Mediawiki to add properties to each element of a list?
MediaWiki Semantic Template: Property "" (as page type) with input value contains invalid characters or is incomplete can cause unexpected results
Semantic Mediawiki error: processing error text "#category " cannot be used as a property name in this wiki
Semantic Mediawiki: writing a query that returns all the suboject defined in a page
Semantic Mediawiki: aggregation similar to SQL GROUP BY like #ask query

Best solution here is to make use of array extension.
Create an array containing all tags, and make it unique to have a "distinct list".
Then print your array, and run an ask query for each tag in the print loop, with the count format .
{{#arraydefine:tags| {{#ask:[[-Has subobject::{{FULLPAGENAME}}]] |?Tags#-=| mainlabel=-|limit = 1000}} |,|unique}}
{{#arrayprint:tags|, |####|<nowiki/>
[[####]] ({{#ask:[[Tags::####]]|format=count}})
}}
This code will print a link to each page named as tag value, and print the number of subobjects that hold this tag. Even if the solution is not optimal, as you are running a lot of independent queries, you should not have performance issues unless you have very high traffic on your wiki.
Nota bene : The best practice is to create a specific template for tag pages, one that list all articles having the tag. With the Page Forms extension, you can create each page automatically with this template, simply by running the job queue.

Related

Algorithm to develop an article extractor

I have undertaken a project which will extract the main content from any webpage. For example, if I input the URL of any news article, it will return the article part only. The first step would be getting the source code of the given URL. There are many ways to do it. After getting HTML code of given webpage, I will keep the part inside <body> tag because obviously article will be somewhere inside body.
After this, I am selecting each div element and checking how much text it contains. At end I am selecting the div with most text inside it.
Other way I am thinking is, for each <p> element, I will check the parent of it. At end, I will select the div which has most <p> child directly. To understand it better check this tree- Tree of an HTML
Now I know that these methods are the basic and that's why I am asking this question. I want to know the suggestions of the community about this. What approaches you all use?
I like the idea of implementing your own 'News' crawler...
A few suggestions:
Check the source ('Right Click' > 'Inspect' at chrome) of some popular sites (e.g. The New York Times); search for common html object names, ids or classes they use to identify the different blocks in the html; for instance: divs with 'story' or 'story-body' ids.
I would go with the word count, but also use a dictionary of common phrases, which are likely to appear in a news article.
I would search for the block within 'header' and 'footer', excluding comments section or advertisements (again, by searching the values of the object id or class names).
Start your crawling from the main page, it will probably have references to the sub pages or articles - once you have the reference (e.g. a header or article name), it will help you navigate in the sub page itself.
In any case, I suggest working with java jsoup library - it will make your life easier; use it with the jquery-like selectors.
Goodluck.

String to HTML conversion so that page can read HTML tags

I'm currently working on a blog using Django and SQLite for the back end. In my setup, I stored my articles in the database in this sort of form:
<p> <strong>The Time/Money Tradeoff</strong> </p> <p> As we flesh out High Life, Low Price, you will notice that sometimes we will suggest deals and solutions that may cost slightly more than their alternatives. We won’t always suggest the cheapest laptop...
On the page itself, I have this code for where I use the session data:
<p>{{request.session.article.0.blog_article}}</p>
I had assumed that the web broswer would be able to read the HTML tags. However, it prints on the page in that form, with the visible <p> tags and the like. I think this is because it's stored as a Unicode string in the database and is put onto the page between two quotation marks. If I paste the HTML code onto the page, the format looks like I wanted it to look, but I want it to be an automated process (tell Django which article ID I want, it plugs the elements of the page into the template and everything looks great).
How can I get the stored article in a form where the page can see the HTML tags?
By default django would autoescape all strings in the template, so when you render html code in the template, they just show up as the literal html code. But you could use safe filter to turn this off:
<p>{{request.session.article.0.blog_article|safe}}</p>

Can Go capture a click event in an HTML document it is serving?

I am writing a program for managing an inventory. It serves up html based on records from a postresql database, or writes to the database using html forms.
Different functions (adding records, searching, etc.) are accessible using <a></a> tags or form submits, which in turn call functions using http.HandleFunc(), functions then generate queries, parse results and render these to html templates.
The search function renders query results to an html table. To keep the search results page ideally usable and uncluttered I intent to provide only the most relevant information there. However, since there are many more details stored in the database, I need a way to access that information too. In order to do that I wanted to have each table row clickable, displaying the details of the selected record in a status area at the bottom or side of the page for instance.
I could try to follow the pattern that works for running the other functions, that is use <a></a> tags and http.HandleFunc() to render new content but this isn't exactly what I want for a couple of reasons.
First: There should be no need to navigate away from the search result page to view the additional details; there are not so many details that a single record's full data should not be able to be rendered on the same page as the search results.
Second: I want the whole row clickable, not merely the text within a table cell, which is what the <a></a> tags get me.
Using the id returned from the database in an attribute, as in <div id="search-result-row-id-{{.ID}}"></div> I am able to work with individual records but I have yet to find a way to then capture a click in Go.
Before I run off and write this in javascript, does anyone know of a way to do this strictly in Go? I am not particularly adverse to using the tried-and-true js methods but I am curious to see if it could be done without it.
does anyone know of a way to do this strictly in Go?
As others have indicated in the comments, no, Go cannot capture the event in the browser.
For that you will need to use some JavaScript to send to the server (where Go runs) the web request for more information.
You could also push all the required information to the browser when you first serve the page and hide/show it based on CSS/JavaScript event but again, that's just regular web development and nothing to do with Go.

Adding metadata to markdown text

I'm working on software creating annotations and would like my main data structure to be based around markdown.
I was thinking of working with an existing markdown editor, but hacking it so that certain tags, i.e. [annotation-id-001]Sample text.[/annotation-id-001] did not show up as rendered HTML; the above would output Sample text. in an HTML preview and link to a separate annotation with the ID 001.
My question is, is this the most efficient way to represent this kind of metadata inside of a markdown document? Also, if a user wants to legitimately use something like "[annotation-id-001]" as text inside of their document, I assume that I would have to make that string syntax illegal, correct?
I don't know what Markdown parser you use but you can abord your problem with different points of view:
first you can "hack" an existing parser to exclude your annotation tags from "classic" parsing and include them only in a certain mode
you can also use the internal "meta-data" information proposed by certain parsers (like MultiMarkdown or MarkdownExtended) and only write your annotations like meta-data with a reference to their final place in content
or, as mentionned by mb21, you can use simple links notation like [Sample text.](#annotation-id-001) or use footnotes like [Sample text.](^annotation-id-001) and put your annotations as footnotes.

Mediawiki 1.16: Template documentation example usage

I'm writing template documentation for a wiki and wanted to include a working example of the template. However, I wrote the template to auto-categorize various fields and the entire template itself is also auto-categorized.
This means if I simply call on the template, it will categorize the doc page...and because the actual template page transcludes the doc page, the template page will also be categorized.
Is there a way to prevent these categories from automatically kicking in?
Something like the following should do the trick. Wrap the categorization in your template inside a parserfunction:
{{#ifeq: {{NAMESPACE}} | Help || [[Category:Some_Category]] }}
This sets the category when the template is transcluded onto a page that is not in the "Help" namespace.
Another option is to allow a parameter such as demo to avoid including the category.
If you don't mind being slightly cryptic, you could do the category in the template as {{{cat|[[Category:Some_Category]]}}}; then specifying the parameter as {{my template|cat=}} will prevent the category inclusion.
I'm not sure if I understand the question completely (what is "auto-categorize various fields"?). I am assuming here that you want to show a template "in action" on a documentation page - without attaching some categories (those categories the documentation page usually attaches to articles using this template) to the documentation page.
So
<onlyinclude>[[Category:Some_Category]]</onlyinclude>
will not do the job - as the template is in fact included. Right?
Try passing a parameter categorize=false to the template to indicate that categories are not to be attached in this case:
{{#ifeq:{{{categorize|}}}|false||[[Category:Some_Category]]}}
The double pipe after "false" means: if(categorize==false) then (empty), else [[Category:Some_Category]] - i.e. it is an equivalent construction for if(NOT(categorize==false))...
Good luck and thanks for all the fish,
Achim