MKdocs not taking the meta-data (frontmatter) into consideration for searching? - mkdocs

The structure of my md file is :
---
title: Mango
subtitle: King of Fruits
tags: [fruit, sweet, tasty]
---
Mango is a very tasty fruit.
## Juice
Juice can be made from mango
When I search in the generated document, I don't get any results when I search for Mango. I do get results when I search for Juice.
My understanding of search is: it searches in all H1 tags, H2 tags, and so on. I want the search to look into the meta title, subtitle, and the tags to find the result and then consider H1, H2... and other content. Is there some configuration setting in Mkdocs using which I can accomplish this? Or is there a custom solution available?

As of the writing of this answer, the option of using metadata like title and subtitle in your case in search is neither implemented in mkdocs nor in available custom themes for mkdocs.
This topic has caused a heated discussion between mkdocs maintainers and theme maintainers multiple times as both of them think it's the other's responsibility to implement this feature.
More can be found here: https://github.com/mkdocs/mkdocs/issues/1828.

Related

How to find for the wikipedia links in the infobox templates and other templates, using sql dumps

I want to extract the pages mentioned in the infobox and templates of pages.
E.g. From this page:
https://en.wikipedia.org/wiki/DNA
I want to extract all of the links in the infobox, like: "Genetics", "Introduction to Genetics" etc.
I want to do it, by using the sql dumps, possibly avoiding to parse the xml of whole pages, and I don't want to do it with APIs.
I could not find a way.
While Pagelinks does include also the links of infoboxes, I cannot find a way to exclude them.
I thought Templatelinks may have that info, but it is not: I could not find the pageids of the corresponding links in infoboxes.
Where is this information stored?
Or which kind of tables should I look at?
I consulted previous questions:
where can I find the infobox templates used in wiki?
and Mediawiki reference:
https://www.mediawiki.org/wiki/Manual:Templatelinks_table#Schema_summary
but could not find a solution.
That is a sidebar rather than an infobox: https://en.wikipedia.org/wiki/Template:Genetics_sidebar
I don't think there's a way of doing it other than parsing the content of the template to extract the links or using the API: e.g. https://en.wikipedia.org/w/api.php?action=query&prop=links&titles=Template:Genetics%20sidebar&pllimit=100&plnamespace=0
Something like this should also work but it's not returning any results for me:
SELECT * from pagelinks
where pl_title = 'Genetics_sidebar'
and pl_namespace = 0
and pl_from_namespace = 10
https://quarry.wmcloud.org/query/71442

Get an article summary from the MediaWiki API

I am looking for a mediawiki api using which I can get short description about any query string. For example , if I search for Nicolas Cage then it should return the short description for him.
I tried http://en.wikipedia.org/w/api.php?%20format=json&action=query&titles=Nicolas%20Cage&prop=revisions&rvprop=content
I am not sure if prop=revisions is right. My intention is to get a short description on the final version of the page.
Also I need another api which can give the link of the wikipedia page (web / mobile) from the query string. i.e. For Nicolas Cage, http://en.wikipedia.org/wiki/Nicolas_cage should be returned.
There is no such thing as a page summary in MediaWiki by default,but you can get the first paragraph of a page like this: http://en.wikipedia.org/w/api.php?action=parse&page=Nicolas_Cage&prop=text&section=0
If the wiki has the extension PageSummaries installed, you can use that to get exactly what you are asking for (like in this example from the extension description page).
To find pages matching a string, you use the open search function, like this: http://en.wikipedia.org/w/api.php?action=opensearch&search=Nicolas%20cage&namespace=0
edit: #Bergi point out in the comments that open search also gives a summary of the page. I had somehow missed that.
Say, you want to get the summary of a search string Nicolas Cage.
Step 1. Get the page id: "https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Nicolas%20Cage&format=json&srlimit=1"
Step 2. Use this page id to get section 0 of the page:
"https://en.wikipedia.org/w/api.php?action=parse&section=0&pageid=21111&prop=text&format=json"
Step 3. Parse as per requirements.
Step 3 extended for Python: Use BeautifulSoup for target tags and get_text() gives plaintext.
use rvprop to get latest revision, further go through mediaWIKI documentation.
Alternate Solution:
Step 1. Get page title using step 1 above.
Step 2. Use the title as follows: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Nicolas%20Cage

MediaWiki Table of Contents (ToC) FileTree

I would like to turn the standard Media Wiki ToC to a FileTree structure.
Where you can expand and contract each section.
I want to support unlimited numbers of levels.
A Expand All and Collapase All link would also be nice.
The name of the MediaWiki Extention and a list of URLs (sites)
that implement this type of TreeList for MediaWiki's ToC
so I can read the code, would be very helpful.
Here are example pages that show FileTree structures,
but I don't think they apply to Media Wiki ToC.
http://commons.wikimedia.org/wiki/Template:Category_tree_all
The following is 5 levels deep.
Films of Australia?
The Adventures of Priscilla, Queen of the Desert?
Stephan Elliott?
A Few Best Men?
A Few Best Men premiere in Sydney?
http://risdpedia.net/index.php/Category:Materials
The following is 3 levels deep.
Category:Screen Printing
Category:Screen Printing Ink
Category:Fabric Screen Printing Ink
http://wiki.team-mediaportal.com/Wiki_Help/4_Contribute_to_Wiki/Collapsible_Lists%2F%2FTrees
http://test.wikipedia.org/wiki/User%3aKrinkle/CollapsingTestpageMw
Here are MediaWiki Extensions that may do the trick,
but I can't seem to make them work.
http://www.mediawiki.org/wiki/Extension_Matrix
http://www.mediawiki.org/wiki/Extension:TocTree
http://www.mediawiki.org/wiki/Extension:Treeview
http://www.mediawiki.org/wiki/Extension:Semantic_TreeView
http://www.mediawiki.org/wiki/Extension:TreeAndMenu
http://www.mediawiki.org/wiki/Manual%3aTag_extensions
Here are some resources that mention this type of TreeList for MediaWiki.
But there are no answers for them yet.
https://stackoverflow.com/questions/20490034/treeview-not-working-on-sidebar-in-mediawiki
I figure out where User Preferences > Misc is located.
Special:Preferences#mw-prefsection-misc

Full urls of images of a given page on Wikipedia (only those I see on the page)

I'd want to extract all full urls of images of "Google"'s page on Wikipedia
I have tried with:
http://en.wikipedia.org/w/api.php?action=query&titles=Google&generator=images&gimlimit=10&prop=imageinfo&iiprop=url|dimensions|mime&format=json
but, in this way, I got also not google-related images, such as:
http://upload.wikimedia.org/wikipedia/en/a/a4/Flag_of_the_United_States.svg
http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
http://upload.wikimedia.org/wikipedia/commons/f/fe/Crystal_Clear_app_browser.png
How can I extract just only images that I see on Google page
Retrieve page source code, https://en.wikipedia.org/w/index.php?title=Google&action=raw
Scan it for substrings like [[File:Google web search.png|thumb|left|On February 14, 2012, Google updated its homepage with a minor twist. There are no red lines above the options in the black bar, and there is a tab space before the "+You". The sign-in button has also changed, it is no longer in the black bar, instead under it as a button.]]
Ask API for all pictures on page, http://en.wikipedia.org/w/api.php?action=query&titles=Google&generator=images&gimlimit=10&prop=imageinfo&iiprop=url|dimensions|mime&format=json
Filter out urls but those which match picture names found in step 2.
Steps 2 and 4 need more explanation.
#2. Regexp /\b(File|Image):[^]|\n\r]+/ should be enough. In Ruby's regexps, \b denotes word boundary which might be unsupported in language of your choice. Regexp I proposed will match all cases which come to my mind: [[File:something.jpg]], gallery tags: <gallery>\nFile:one.jpg\nFile:two.jpg\n</gallery>, templates: {{Infobox|pic = File:something.jpg}}. However, it won't match filenames which contain ]. I'm not sure if they're legal, but if they are, they must be very uncommon and it should not be a big deal.
If you want to match only constructs like this: [[File:something.jpg|thumb|description]], following regexp will work better: /\[\[(File|Image):[^]|]+/
#4. I'd remove all characters from names which match /[^A-Za-z0-9]/. It's easier than escaping them and, in most cases, enough.
Icons are most often attached in templates, contrary to pictures related to article subject, which are most often attached directly ([[File:…]]). There are exceptions though, for example in some articles pictures are attached with {{Gallery}} template. There is also <gallery> tag which introduces special syntax for galleries. You got to tune my solution to your needs, and even then it won't be perfect, but it should be good enough.

How to setup content in other languages?

I would like to allow users to create content for their own languages. I am running a single MediaWiki instance, so I cannot set it up for one language per install.
I would like to try and format the pages like the following, where a different language version of the page has the language code appended to it.
myWiki/SomePageContent
myWiki/SomePageContent/de
myWiki/SomePageContent/fr
How can I ensure users follow this structure? Is there some setting in MediaWiki that can help with this? I have no idea what are best practices for this.
Thanks!
Best practices are to use a separate instance of MediaWiki for each language and use interwiki links to connect them. This way, users are in one language and everything works as you'd expect: if you're in the English instance, a link to [[Foo]] stays in English, and only a link to [[fr:Foo]] goes to the French Foo. It's not particularly hard to set this up even with a single server and single database, see http://www.mediawiki.org/wiki/Manual:Wiki_family. The way this appears to the user is configurable: eg. Wikipedia uses http://en.wikipedia.org/wiki/Paris, Wikitravel uses http://wikitravel.org/en/Paris.
If this is not possible for whatever reason, the next best thing to do is to set up a separate namespace for each language (eg. "de" or "fr"), and this way you can at least do eg. searches across one (or more) languages. However, users of languages other than the 'main' language still have to manually punch in the language code in front of every article name and link, so it's not nearly as user-friendly. See http://www.mediawiki.org/wiki/Manual:Namespace.
An easier way for smaller wikis is through the use of a simple template. It may not be as efficient as an extension or creating a family of wikis, which is a lot of work, but quite fast to set up.
Create a page under Template:Otherlang with the following code:
{{otherlang
|ru=Template:Otherlang:ru
}}
This template adds available translations for the page to the top through the use of flags.
To prevent issues, this template must be placed '''at the very beginning of a page'''.
Tip! When contributing a new translation to a document that already has other translations, please carry over the existing translations to the otherlang template of your contributed page. This way all multilingual pages are linked.
== Syntax ==
{{otherlang
| noborder=true (OPTIONAL)
| title=localized page display title
| lang=page:lang
| lang2=page:lang2
| etc...
}}
Warning! Do not include the language of the current page. This will only confuse readers.
=== Example ===
On a page called [[Template:Otherlang]]:
{{otherlang
| title=Template:Otherlang
| ru=Category:Programming:ru
}}
Note that:
* The language "en" is not included, as it is the language of the page that template is being used on.
* title is assigned the translated name of the page, and will appear as the display title (heading) for the page. This can replace the existing {{wrongtitle}} and {{DISPLAYTITLE}} templates currently in common use.
* The English page has no suffix.
== Available Languages ==
{| class="table table-bordered" border="2" cellpadding="7"
! Language
! Syntax
! Result
|- id="en"
|English
|en=Page_name
|[[File:En.png]]
|- id="ru"
|Russian
|ru=Page_name:ru
|[[File:Ru.png]]
|}
{{#if: {{{title|}}} | {{DISPLAYTITLE:{{{title}}}}} }}{{#if: {{{en|}}} | '''[[File:En.png|alt=English|link={{{en}}}]]''' }} {{#if: {{{ru|}}} | [[File:Ru.png|alt=Русский|link={{{ru}}}]] }}
Then within each English article, paste use the following code to get a flag to show up, representing the respective language.
{{otherlang
| title=Tutorials/Galacticraft Getting Started Guide
| ru=Tutorials/Galacticraft_Getting_Started_Guide/ru
}}
An example of this can be found here. If you click on the Russian flag to the right you will find a Russian translation of the article.
anyone interested, you might wanna try this
http://www.mediawiki.org/wiki/Help:Extension:Translate
when this page
myWiki/SomePageContent
is translated to German, it will create the link like this:
myWiki/SomePageContent/de
and so on :)