How should I handle autolinking in wiki page content?

How should I handle autolinking in wiki page content? - html

What I mean by autolinking is the process by which wiki links inlined in page content are generated into either a hyperlink to the page (if it does exist) or a create link (if the page doesn't exist).
With the parser I am using, this is a two step process - first, the page content is parsed and all of the links to wiki pages from the source markup are extracted. Then, I feed an array of the existing pages back to the parser, before the final HTML markup is generated.
What is the best way to handle this process? It seems as if I need to keep a cached list of every single page on the site, rather than having to extract the index of page titles each time. Or is it better to check each link separately to see if it exists? This might result in a lot of database lookups if the list wasn't cached. Would this still be viable for a larger wiki site with thousands of pages?

In my own wiki I check all the links (without caching), but my wiki is only used by a few people internally. You should benchmark stuff like this.

In my own wiki system my caching system is pretty simple - when the page is updated it checks links to make sure they are valid and applies the correct formatting/location for those that aren't. The cached page is saved as a HTML page in my cache root.
Pages that are marked as 'not created' during the page update are inserted into the a table of the database that holds the page and then a csv of pages that link to it.
When someone creates that page it initiates a scan to look through each linking page and re-caches the linking page with the correct link and formatting.
If you weren't interested in highlighting non-created pages however you could just have a checker to see if the page is created when you attempt to access it - and if not redirect to the creation page. Then just link to pages as normal in other articles.

I tried to do this once and it was a nightmare! My solution was a nasty loop in a SQL procedure, and I don't recommend it.
One thing that gave me trouble was deciding what link to use on a multi-word phrase. Say you had some text saying "I am using Stack Overflow" and your wiki had 3 pages called "stack", "overflow" and "stack overflow"....which part of your phrase gets linked to where? It will happen!

My idea would be to query the titles like SELECT title FROM articles and simply check if each wikilink is in that array of strings. If it is you link to the page, if not, you link to the create page.

In a personal project I made with Sinatra (link text) after I run the content through Markdown, I do a gsub to replace wiki words and other things (like [[Here is my link]] and whatnot) with proper links, on each checking if the page exists and linking to create or view depending.
It's not the best, but I didn't build this app with caching/speed in mind. It's a low resource simple wiki.
If speed was more important, you could wrap the app in something to cache it. For example, sinatra can be wrapped with the Rack caching.

Based on my experience developing Juli, which is an offline personal wiki with autolink, generating static HTML approach may fix your issue.
As you think, it takes long time to generate autolinked Wiki page. However, in generating static HTML situation, regenerating autolinked Wiki page happens only when a wikipage is newly added or deleted (in other words, it doesn't happen when updating wikipage) and the 'regenerating' can be done in background so that usually I don't matter how it take long time. User will see only the generated static HTML.

Related

Mediawiki - Automatic two-way links between page sections

I want my MediaWiki install to have two classes of pages. (In the users' eyes - the wiki won't have to know the difference.)
I want some pages to be on topics, and others on sources (name of book, video, etc.)
I want to have a topic page "FAA Licenses" like:
==Medical Certificates==
===3rd Class===
Required for student license, and before student solo flights. {{{link/reference/whatever generally around here to Jeppesen Book#pg27-28}}}
And a source page "Jeppesen Book" like:
==pg27-28==
{{{link to FAA Licenses#3rd Class}}}
These source pages will track the source's (book or video) content. I imagine a source page for a book to have page numbers, and for a video to have start and stop times, or section numbers. (The book or video itself won't be on the source pages.)
So, the source pages will really serve two purposes. First, it will be fairly easy to see which parts of the sources have had notes taken and put into the topic pages. (So non-linear note-taking of sources will be easy -- skipping from source to source on topics, rather than digesting an entire source at once.) Second, it will be easy from a topic page to see where to go back to for a more in-depth review.
There's two issues I'm writing about.
(1) I want the workflow to be the user edits the topic page, putting in links to source pages and sections. I want this one user-addition to automatically make the source page link back to this spot. I want the system to handle the two-way-linking, assuming the user won't be perfect.
(2) I want the user to be able to put links in the topic page to source pages and sections that might not exist yet. I'd need those links to show up as red, to indicate they need to be created. But, still, once created, I want the system to handle the two-way-linking, even if there were multiple red links to the same area. (I could see building up quite a few red links, then having an unorganized "purge" of them by creating the missing pages and sections, and don't want to have to search for all the links to the new areas.) Ideally, I'd love for these source pages to be auto-generated -- so pages and sections were made as links were made to them, and automatically deleted (or at least the backlinks removed) as links were removed to them.
I don't think the MediaWiki what links here functionality does the job. I want this to work on a per-section rather than per-page basis. And, I don't want the user to have to add to each section a "what links here tag" -- I want it to be automatic.

The extension Semantic MediaWiki will allow you to get bidirectional linking in a semi-automatic fassion.
https://www.semantic-mediawiki.org/wiki/Help:Link_Template
shows a high level example.
If you dig deeper into SMW and SemanticForms you'll find how with e.g. SemanticForms you can get a user experience that is close to what you are asking for.
See e.g. http://smw.referata.com/wiki/Discourse_DB and http://www.discoursedb.org/wiki/Main_Page for an application of these principles.

I don't think there is an easy way to do that. You could write an extension that provides a parser function that your users can enter, save the source page + source section + target page + target section in a database at links update, then use the ParserSectionCreate hook to show links based on that. Or you can create two types of templates and write a bot that keeps them in sync.

Show different Main Pages based on host name in MediaWiki

I have two domains pointing to the same wiki sharing the same database.
I would like it so that with domainA.com the main page is MainPageA and with domainB.com it is MainPageB.
The only way to change the the main page of MediaWiki that I know of is to edit MediaWiki:Mainpage, but that is stored in the MySQL database. Since both wikis are sharing the same database, both main pages get changed too.
The reason that the databases are shared is because all articles apply to both wikis, just that the logo of the wiki etc. is different.
Is there some kind of PHP conditional variable I can set to set the Main Page?

You could do this in wikicode, by making your Main Page source look something like this:
{{#switch:{{SERVERNAME}}
|domainA.com={{:Main Page for domainA.com}}
|domainB.com={{:Main Page for domainB.com}}
|#default=<span class=error>Unrecognized domain {{SERVERNAME}}.</span>
}}
or even just:
{{:Main Page for {{SERVERNAME}}}}
For more information, see Help:Magic words at mediawiki.org. (Note that the first version also requires the ParserFunctions extension.)
Ps. There might be some issues with MediaWiki's parser caching that could cause the wrong Main Page to appear. If so, a quick and dirty workaround would be to install the MagicNoCache extension and add __NOCACHE__ to the Main Page.
Pps. A better solution for cache issues might be to make sure that the different sites have separate cache keys, by adding the following line to your LocalSettings.php:
$wgRenderHashAppend .= "!$wgServer";

live content from html to html

I'm using UIWebView to display data from my organization data (publicize and legal), however, for instance, I would only want to pull specific data from the html file rather than pulling the whole URL. e.g. I want to pull the "News" section of the html and I want the user to only stay in that page, not enabling them to go into other parts of the website (e.g. home page, contact us) and allowing them to view the PDF article on the HTML file.
I've asked around and read up on DOM and screen scraping, but it seem that the data pulled are stored in a database instead.
Is there any way that I can pull just the HTML "News" section with the PDF URL into my customized HTML file and that it will be updated live (maybe every 30second it will refresh and pull information from the website so that the content and list of PDF are up to date)(e.g. added in 3new article into the main website, my customize HTML file will also refresh and pull information from website and update my article list)
If anyone can point to me a specific method that allow HTML to HTML data passing (live), that will be great and I can go do more research on it. Currently very lost and confuse as it is my first time doing this. Any help/feedback will be very much appreciated :)
EDIT: For example, google map or google search. I don't want to use the whole google webpage, just taking the important thing that i want like the search result or map display.

This will involve quite a lot of learning on your part - you'll have to learn HTML / the DOM / JavaScript and iOS/UIWebVIew.
Lets leave the live refresh part for now, I'll post another answer or edit to that later on.
That's not going to easy either (check out my earlier posting today on background execution issues that will affect you, unless the update is only to take place in the foreground
iOS Run Code Once a Day)
You will have to do something like this. And note that I've never tried this, nor seen posting of people who have on here, but in theory it should work, but there will be a lot of learning as I've said, and lots of trial and error. Its a big task when you're not familiar with these things.
1) Download the html page and load it in a UIWebView, but that UIWebView is hidden so the user's can't see it.
2) When the page has loaded its dom will be accessable.
3) You can use Javascript to access the DOM and look for the parts you want.
How you inject and run the Javascript in UIWebView can be answered in a separate question (this answer will get too long if all the exact details are included).
4) Remove the parts of the dom you are not interested in. Or use use events to make only those parts you are interested in appear, jQuery can probably help here.
5) Display the UIWebView
Alternatively the HTML could be saved to a file and string parsing could be used to search for the bits you are looking for and create a new text html file from it. I think this would get very messy, better to take advantage of the fact that UIWebView will parse the HTML page and create the dom for you.

tag pages... session variables or many static pages?

i have a website which contains many news articles. IN the database, each article has so-called "tags" which the user sees displayed alongside the article. When the user clicks on the tag, they are directed to a list of other articles also containing this tag.
Should I generate a distinct HTML page for each newly created tag, or should I create one single page and vary the content based on what tag the user clicked on using session variables????
obviously, the pages will not be completely static since I will update them everytime a new article with a matching tag is uploaded

You certainly shouldn't use session data. That is for data that needs to persist, but it set on a per user basis. Using it for per-request data will just break bookmarking and introduce race conditions.
You should have a distinct URI for each tag. It doesn't matter (from an end user perspective) if you use dynamically generated content (either via a query string, or parsing the URI in your server side code (most frameworks, e.g. Dancer, will handle this for you)) or if you use generated static pages.
Static pages make it easier to handle caching and give better performance on very high traffic systems, but tend to require a rebuild of large sections of the site if content changes. You can get similar performance improvements by using server side caching (e.g. via memcached).
Dynamic pages are usually simpler to implement.

I suggest you to create a listing page that contains title and small description of all articles containing a particular tag similar to WordPress.
For example, here is listing page for the tag jQuery:
http://sarfraznawaz.wordpress.com/tag/jquery/

I would create one page, and then rewrite the url so that it referenced the tag page so something like this
Tag element == New
tagpage.aspx
http://www.yourwebsite.com/New.aspx
this allows you to have one page to update the content with but allows each page to be indexed by Google.com.
I'm not sure what language you are using but I would look up URL rewriting
here's a link for rewriting in apache:
http://httpd.apache.org/docs/2.0/misc/rewriteguide.html
here's a link for rewriting in asp.net:
http://msdn.microsoft.com/en-us/library/ms972974.aspx

Mediawiki: configuring the entry page, adding a new page

Have a wiki installed in our organization, and want to start using it.
Failed to find the answers for the next 2 basic questions:
How do I configure the entry page to show a list of all existing pages
How do I create a new page (!). Only succeeded doing it by typing a url of an non existing page. Guess there are nicer methods for this
Thanks
Gidi

For how to show a list of all pages, look at DynamicPageList, which is part of MediaWiki. (There's a more advanced third-party version, but it's not needed for such a simple task.)
Creating a new page really is exactly as you said: Type a URL and save some edits. Most beginning editors will edit a link into a page, and then use that link to browse to the page, so that they don't accidentally forget the spelling and lose the page to the Ether. (Of course it would show up in the recently edited and other special pages.)
This is more of a webapps.stackexchange.com question though.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008