Mediawiki: many transclusions make page loading speed slow? - mediawiki

By transclusion I mean a page like
{{template
| blahblah=
| asd =
| df=
}}
So if there are too many "|"s, then will they make the page loading slow?
Let's say page "Template:*" is
*
so that {{*}} will render a bullet.
Please compare
(Template:A and page "A page")
and
(Template:B and page "B page")
Both A page and B page will display the same thing but which one will be faster to load if there are thousands more transclusion in this way?
Template:A
* {{{a}}}
* {{{b}}}
* {{{c}}}
A page
{{A
|a=q
|b=w
|c=e
}}
Template:B
{{{a}}}
B page
{{B
|a={{*}} q <br> {{*}} w <br> {{*}} e
}}
=====Question added==============
#llmari_Karonen Thank you very much.
What if the number is nearly 1000, so that the A page is
{{A
|a1=q
|a2=w
|a3=e
....
|a999=w
|a1000=h
}}
Still, thanks to caches, "for most page views, template transclusion has no effect on performance"?
And what do you mean by "for most page views"? You mean low enough page views?
You said "the recommended way to deploy MediaWiki is either behind reverse caching proxies or using the file cache. Either of these will add an extra caching layer in front of the parser cache."
Should this be done "before" posting any content on mediawiki? Or it doesn't matter if I do it after I post all the pages to mediawiki?
===What if the transclusion relationship is very complex===
#llmari_Karonen I got one more question. What if the transclusion relation is very complex?
For example
Page A is
{{temp
| ~~~
| ~~~
... (quite many)
| ~~~
}}
And Template:Temp has {{Temp2}},
and Template:Temp2 is again
{{temp3
|~~~
|~~~
... (very many)
|~~~
}}
Even in such case, due to the reasons you mentioned, numerous transclusions won't affect the loading speed of Page A?

Yes and no. Mostly no.
Yes, having lots of template transclusions on a page does slow down parsing somewhat, both because the templates need to be loaded from the DB and because they need to be reparsed every time they're used. However, there's a lot of caching going on:
Once a template is transcluded once on a given page, its source code is cached so that further transclusions of the same template on that page won't cause any further DB queries.
For templates used without parameters, MediaWiki also caches the parsed form of the template. Thus, in your example, {{*}} only needs to be parsed once.
In any case, once the page has been parsed once (typically after somebody edits it), MediaWiki caches the entire parsed HTML output and reuses it for subsequent page views. Thus, for most page views, template transclusion has no effect on performance, since the page will not need to be reparsed. (However, note that the default parser cache lifetime is fairly low. The default is OK for high-traffic wikis like Wikipedia, but for small wikis I'd strongly recommend increasing it to, say, one month, and setting the parser cache type to CACHE_DB.)
Finally, the recommended way to deploy MediaWiki is either behind reverse caching proxies or using the file cache. Either of these will add an extra caching layer in front of the parser cache.
Edit: To answer your additional questions:
Regardless of the number of parameters, each page still contains only one template transclusion (well, except for the {{*}} transclusions on page B, but those should be efficiently cached). Thus, they should be more or less equally efficient (as in, there should not be a noticeable difference in practice).
I mean that, most of the time when somebody views the page, it will (or at least should) be served from the cache, and so does not need to be reparsed. Situations where that does not happen include when:
the time since the page was last parsed exceeds the limit specified by $wgParserCacheExpireTime (24 hours by default, but this can and IMO should be increased for most wikis),
the page has been edited since it was added to the cache, and so needs to be reparsed (this typically happens immediately after clicking the "Save page" button),
a template used on the page has been edited, requiring the page to be reparsed,
another page linked from this page has been created or deleted, requiring a reparse to turn the link from red to blue or vice versa,
the page uses a MediaWiki extension that deliberately excludes it from caching, usually because the extension inserts dynamically changing content into the page,
someone has deliberately purged the page from the cache, causing an immediate reparse, or
the user viewing the page is using an unusual language or has changed some some other options in their preferences that affect page rendering, causing a separate cached version of the page to be generated for them (this version may be reused by any other user using the same set of preferences, or by the same user revisiting the page).
You can add a proxy in front of your wiki, and/or enable the file cache, at any time. Indeed, since setting up effective caching is a somewhat advanced task, you may want to wait until you get your wiki up and running without a front end cache first before attempting it. This also allows you to directly compare the performance before and after setting up the cache.

Related

Way To Modify HTML Before Display using Cocoa Webkit for Internationalization

In Objective C to build a Mac OSX (Cocoa) application, I'm using the native Webkit widget to display local files with the file:// URL, pulling from this folder:
MyApp.app/Contents/Resources/lang/en/html
This is all well and good until I start to need a German version. That means I have to copy en/html as de/html, then have someone replace the wording in the HTML (and some in the Javascript (like with modal dialogs)) with German phrasing. That's quite a lot of work!
Okay, that might seem doable until this creates a headache where I have to constantly maintain multiple versions of the html folder for each of the languages I need to support.
Then the thought came to me...
Why not just replace the phrasing with template tags like %CONTINUE%
and then, before the page is rendered, intercept it and swap it out
with strings pulled from a language plist file?
Through some API with this widget, is it possible to intercept HTML before it is rendered and replace text?
If it is possible, would it be noticeably slow such that it wouldn't be worth it?
Or, do you recommend I do a strategy where I build a generator that I keep on my workstation which builds each of the HTML folders for me from a main template, and then I deploy those already completed with my setup application once I determine the user's language from the setup application?
Through a lot of experimentation, I found an ugly way to do templating. Like I said, it's not desirable and has some side effects:
You'll see a flash on the first window load. On first load of the application window that has the WebKit widget, you'll want to hide the window until the second time the page content is displayed. I guess you'll have to use a property for that.
When you navigate, each page loads twice. It's almost not noticeable, but not good enough for good development.
I found an odd quirk with Bootstrap CSS where it made my table grid rows very large and didn't apply CSS properly for some strange reason. I might be able to tweak the CSS to fix that.
Unfortunately, I found no other event I could intercept on this except didFinishLoadForFrame. However, by then, the page has already downloaded and rendered at least once for a microsecond. It would be great to intercept some event before then, where I have the full HTML, and do the swap there before display. I didn't find such an event. However, if someone finds such an event -- that would probably make this a great templating solution.
- (void)webView:(WebView *)sender didFinishLoadForFrame:(WebFrame *)frame
{
DOMHTMLElement * htmlNode =
(DOMHTMLElement *) [[[frame DOMDocument] getElementsByTagName: #"html"] item: 0];
NSString *s = [htmlNode outerHTML];
if ([s containsString:#"<!-- processed -->"]) {
return;
}
NSURL *oBaseURL = [[[frame dataSource] request] URL];
s = [s stringByReplacingOccurrencesOfString:#"%EXAMPLE%" withString:#"ZZZ"];
s = [s stringByReplacingOccurrencesOfString:#"</head>" withString:#"<!-- processed -->\n</head>"];
[frame loadHTMLString:s baseURL:oBaseURL];
}
The above will look at HTML that contains %EXAMPLE% and replace it with ZZZ.
In the end, I realized that this is inefficient because of page flash, and, on long bits of text that need a lot of replacing, may have some quite noticeable delay. The better way is to create a compile time generator. This would be to make one HTML folder with %PARAMETERIZED_TAGS% inside instead of English text. Then, create a "Run Script" in your "Build Phase" that runs some program/script you create in whatever language you want that generates each HTML folder from all the available lang-XX.plist files you have in a directory, where XX is a language code like 'en', 'de', etc. It reads the HTML file, finds the parameterized tag match in the lang-XX.plist file, and replaces that text with the text for that language. That way, after compilation, you have several HTML folders for each language, already using your translated strings. This is efficient because then it allows you to have one single HTML folder where you handle your code, and don't have to do the extremely tedious process of creating each HTML folder in each language, nor have to maintain that mess. The compile time generator would do that for you. However -- you'll have to build that compile time generator.

Can Go capture a click event in an HTML document it is serving?

I am writing a program for managing an inventory. It serves up html based on records from a postresql database, or writes to the database using html forms.
Different functions (adding records, searching, etc.) are accessible using <a></a> tags or form submits, which in turn call functions using http.HandleFunc(), functions then generate queries, parse results and render these to html templates.
The search function renders query results to an html table. To keep the search results page ideally usable and uncluttered I intent to provide only the most relevant information there. However, since there are many more details stored in the database, I need a way to access that information too. In order to do that I wanted to have each table row clickable, displaying the details of the selected record in a status area at the bottom or side of the page for instance.
I could try to follow the pattern that works for running the other functions, that is use <a></a> tags and http.HandleFunc() to render new content but this isn't exactly what I want for a couple of reasons.
First: There should be no need to navigate away from the search result page to view the additional details; there are not so many details that a single record's full data should not be able to be rendered on the same page as the search results.
Second: I want the whole row clickable, not merely the text within a table cell, which is what the <a></a> tags get me.
Using the id returned from the database in an attribute, as in <div id="search-result-row-id-{{.ID}}"></div> I am able to work with individual records but I have yet to find a way to then capture a click in Go.
Before I run off and write this in javascript, does anyone know of a way to do this strictly in Go? I am not particularly adverse to using the tried-and-true js methods but I am curious to see if it could be done without it.
does anyone know of a way to do this strictly in Go?
As others have indicated in the comments, no, Go cannot capture the event in the browser.
For that you will need to use some JavaScript to send to the server (where Go runs) the web request for more information.
You could also push all the required information to the browser when you first serve the page and hide/show it based on CSS/JavaScript event but again, that's just regular web development and nothing to do with Go.

html page takes over time for load the content

I've used one static cfm page and it's has a single select query for showing a above 3000 records(without pagination). When i am try to view that page in FF its takes the 15 sec for shows the content. If any way(without pagination) to reduce the browser loading time?
Create a page that uses that uses AngularJS to show the table. Then populate the table via an AJAX call to get JSON.
Use fixed table layout so that the browser does not have to re-flow the content as it loads.
Don't load the data into a table at all. Do the layout with div's and span's
Optimize the SELECT query
Only select columns you need.
Avoid wildcards (*) in the SELECT clause
Don't join unnecessary tables.
You can also consider loading content dynamically via ajax.
Without seeing your code (or example code), we can't provide anything specifically tailored to your implementation of the query.
You could potentially
<cfflush>
the content, so it will start sending the response to the browser straight away, rather than building the entire page, then pushing the response back
Some other solutions are better options, especially for long term scalability and maintenance. However, if you're looking for a quick solution for now you could try breaking it up into a series of HTML tables. Every 500 records or so add this:
</table>
<cfflush>
<table...
This will insure that html rendered so far is sent to the browser (via the cfflush) while ColdFusion continues to work on the rest. Meanwhile, by closing out the table before doing so you're allowing the browser to properly render that block of the content in full without risking it waiting for the remainder.
This is a patch, and something you should only do until you can put a more involved solution (such as JQGrid) in place.

REST/Ajax deep linking compatibility - Anchor tags vs query string

So I'm working on a web app, and I want to filter search results.
A nice restful implementation might look like this:
1. mysite.com/clothes/men/hats+scarfs
But lets say we want to ajax up the filtering, like the cool kids, and we want to retain deep linking, we might use the anchor tag and parse that with Javascript to show the correct listings:
2. mysite.com/clothes#/men/hats+scarfs
However, if someone clicks the first link with JS enabled, and then changes filters, we might get:
3. mysite.com/clothes/men/hats+scarfs#/women/shoes
Urk.
Similarly, if someone does not have JS enabled, and clicks link 2 - JS will not parse the options and the correct listings will not be shown.
Are Ajax deep links and non-Ajax links incompatible? It would seem so, as servers cannot parse the # part of a url, since it is not sent to the server.
There's a monkeywrench being thrown into this issue by Google: A proposal for making Ajax crawlable. Google is including recommendations for url structure there that may give you ideas for your own application.
Here's the wrapup:
In summary, starting with a stateful
URL such as
http://example.com/dictionary.html#AJAX
, it could be available to both
crawlers and users as
http://example.com/dictionary.html#!AJAX
which could be crawled as
http://example.com/dictionary.html?_escaped_fragment_=AJAX
which in turn would be shown to users
and accessed as
http://example.com/dictionary.html#!AJAX
View Google's Presentation here (note: google docs presentation)
In general I think it's useful to simply turn off JavaScript and CSS entirely and browse your website and web application and see what ends up getting exposed. Once you get a sense of what's visible, you will understand what most search engines see and that in turn will show you what is and is not getting spidered.
If you go to mysite.com/clothes/men/hats+scarfs with JavaScript enabled then your JavaScript should automatically rewrite that to mysite.com/clothes#men/hats+scarfs - when you click on a filter, they should be controlled by JavaScript meaning you'll only change the hashtag rather than the entire URL (as you're going to have return false anyway).
The problem you have is for non-JS users going to your JS enabled deeplinks as the server can't determine that stuff. Unfortunately, the only thing you can do is take them to mysite.com/clothes and make them start their journey again (as far as I'm aware). You'll need to try and ensure that when people link to the site, they use the hardcoded deeplink rather than the hashed deeplink
I don't recommend ever using the query string as you are sending data back to the server without direct relevance to the prior specified destination. That is a corruptible security hole as malicious code can be manually added to the query string to cause a XSS or buffer overflow attack at your webserver.
I believe REST was intended to work with absolute URIs without a query string, because then your specifying only a location of a resource and it is that location that is descriptive and semantically relevant in addition to the possibility of the resource being so equally relevant. Even if there is no resource at the specified path you have still instantiated a potentially unique and descriptive location that can be processed accordingly.
Users entering the site via deep links
Nonsensical links (like /clothes/men/hats#women/shoes) can be avoided if you construct your Ajax initialisation code in such a way that users who enter the site on filtered pages (e.g. /clothes/women/shoes) are taken to the /clothes page before any Ajax filtering happens. For example, you might do something like this (using jQuery):
$("a.filter")
.each(function() {
var href = $(this).attr("href").replace("/clothes/", "/clothes#");
$(this).attr("href", href);
})
.click(function() {
update_filter($(this).attr("href").split("#")[1]);
});
Users without JavaScript
As you said in the question, there's no way for the server to know about the URL fragment so filtering would not be applied for users without JavaScript enabled if they were given a link to /clothes#filter.
However, even without filtering, these links could be made more meaningful for non-JS users by using the filter strings as IDs in your /clothes page. To prevent this messing with the Ajax experience the IDs would need to be changed (or the elements removed) with JavaScript before the Ajax links were initialised.
How practical this is depends on how many categories you have and what your /clothes page contains.

Handling a form and its action: one page or two?

When doing web programming there seem to be two different ways of handling a form and its action. Is there a reason to prefer one method over the other?
Method 1: One page that displays a form or handles a submitted form. This seems simple, in that there's one form for everything. But it also means the if/else cases can become rather large.
if [form has not been submitted] {
// display form for user input
} else {
// handle submitted form
}
Method 2: Handle user input on one page with a form, with the form submitting to a second page.
page 1 (input.html):
<form action="./submit.html">
// display form for user input
</form>
page 2 (submit.html): Handles the input from the form.
I've seen both of these methods used. The upside of the first method is that there's only one page and one set of variables to worry about, but the page can get large. In the second method, each file is simpler and shorter, but now you have twice as many files to worry about and maintain.
I typically submit back to the same page, as you will often need to redisplay the initial page if there are validation errors.
Once the record has been successfully saved, I typically use the Post/Redirect/Get method to avoid the common multiple submit issue.
I think it depends on what kind of form it is. If it has too much validation I would go with two pages.
I always use to pages though because I like my code to be clean, but I use AJAX whenever possible e.g. contact forms since they are short, and I just post the response back to a div in the form.
Caching.
If the server has a series of static HTML pages livened maybe only by AJAX (and even these requests cached, server side, per user), it reduces its load and traffic significantly. Then confining the dynamic content of targets of forms to a relatively small area is a boon, because a page that is a target of POST can't be retrieved from the cache, it has to be regenerated from scratch no matter how heavily loaded it is.
Of course this doesn't solve the question of n static pages + 1 CGI vs n static pages + m CGI.
But once you don't need to spit out sophisticated HTML, just a simple redirect, keeping things in one place may be profitable - error checking and handling, authentication, session management and so on.
OTOH if every your page is a CGI script that creates a new page on each visit, there is no reason why it can't accept form data and handle it at the same time.