Semantic Media Wiki action=purge alternatives and automation

Semantic Media Wiki action=purge alternatives and automation - mediawiki

I'm having an issue where pages with #ask queries aren't updating after updating content on other pages. The only way to get them to update seems to be using action=purge. Is there a maintenance script that will preform this across all pages? Which variables can I use to reduce the amount of time a page is cached? I'm having trouble determining which cache's I need to adjust.

This behaviour is documented on the Semantic MediaWiki page at https://www.semantic-mediawiki.org/wiki/Help:Embedded_query_update. The documentation should explain fairly well which caches are relevant and what kind of configurtion you need.

Related

Inroducing service-workers on mobile site

I'm new to service workers. I want to integrate service workers in my site.My motive is to improve the performance of my website not making the website offline.Its a real estate website.
So what i have done till now is create modular templates of my site and store them in the cache.
for e.g. template1
<div>
<p>#data</p>
<div>
whenever a fetch occurs on my page i first call an api through ajax get the data and replace the #data variable in the cache response by actual api response and then i returned the new response to the browser.
Question 1:- So i want to know is that the right approach. for html template caching?
In the above approach i'm getting challenges like loops and conditional statements in my html.
Question 2:- Is there any way that i can cache the templates with loops and change them at run time?.
Question 3:- Say if i show the cached app-shell to the user initially, so is that going to effect my site's SEO ranking?.
Question 4:- I have to write new templates of the existing code, which means i have to maintain two codes one for service-workers and other for normal browsers which dont support service workers.Any solution to this?
Regards

That's a lot of questions and I don't think service workers are necessarily your best bet.
Question 1: Personally I recommend using a framework such as KnockoutJS, Angular, Polymer...etc for your HTML templates. These often have template caching built-in.
Question 2: Instead of the approach you are using whereby you are replacing the variables before 'sending them to the browser' most frameworks would use some form of data bindings which would take care of iterations and conditions within the browser.
Question 3: Caching the app-shell would have no effect on SEO and Google has been parsing JavaScript for a while; however personally I would recommend that the website's content loads without JavaScript and then JavaScript is only used to enhance the experience. This would be the same whether using the app-shell model or not.
Question 4: I do not understand your current setup and this would not be an ordinary scenario so you might have something wrong.
Service workers and the Cache API are ordinarily used to cache your static assets, usually fonts, css, JavaScript and HTML templates and should result in improved performance as there are less HTTP requests; but there are other ways to improve performance that will address all of your questions without the use of service workers.

Multiple css files or one big css file?

Which one is better and faster? why?
using only one file for styling:
css/style.css
or
using several files for styling:
css/header.css
css/contact.css
css/footer.css
css/tooltip.css
The reason Im asking it is that im developing a site for users who have very low internet speed. country uganda. So I want to make it as fast as possible.

Using a single file is faster because it requires less HTTP requests (assuming the amount of styles loaded is still the same).
So it's better to keep it in just one file.
Separating CSS should only be done if you want to keep for example IE specific classes separate.

As per Yahoo's Performance Rules [source], It is VERY IMPORTANT to minimize HTTP requests
From the source
Combined files are a way to reduce the number of HTTP requests by combining all scripts into a single script, and similarly combining
all CSS into a single stylesheet. Combining files is more challenging
when the scripts and stylesheets vary from page to page, but making
this part of your release process improves response times.
It is quite uneasy to develop using combined files, so stick to developing with multiple files but you should combine the files once you are deploying the system on the web.
I really recommend using boilerplate's ant build script. You can find it here.
It Combines and minifies CSS

One css file is better than multiple css files because of the overhead involved by the end user's browser to make multiple requests for each file. Other things you can do yo improve the performance include:
Enable gzip impression on your webserver e.g. on Apache so that the files are compressed before downloading
where possible host your files geographically as close to the majority of your end users as possible
use a CDN network for your static content such as css files
Use CSS sprites
Cache your content
Note that there are tools available to help you do this. See 15 ways to optimise css for more information

This is always a better solution to bundle or combine multiple CSS or JavaScript files into fewer HTTP requests. This causes the browser to request a lot fewer files and in turn reduces the time it takes to fetch them.
With a proper caching, you can gain extra bandwidth and even fewer HTTP request.
Update:
There's a new Bundling feature in ASP.Net 4.5 which you might be interested in.
This allows you to have css files separated at compile-time, and in runtime gain benefit of combined resources into one resource

One resource file is always the fastest approach since you reduce the number of HTTP requests made to fetch those files.
I would suggest to use Yslow which is a great extension for firebug that analyzes web pages and suggests ways to improve their performance.

MySQL-based wiki that is suitable for custom applications?

I develop an online, Flash-based multiplayer game. It is a complex game, and requires a lot of documentation to fully explain it to our users. Ideally, I would like to find MySQL-based wiki software that can provide these editable documentation pages outside of Flash (in the HTML realm) but also within Flash for convenience, and so that players can refer to the information without interrupting their game or having to switch back-and-forth between browser tabs. I am expecting that I would need to do a lot of the work on the Flash side myself, as far as formatting, for example, but I would like to feel comfortable in querying the wiki's database to get info directly. I guess this means that I need a wiki that is structured relatively "flat" or intuitively so that I can do things like:
Run a MySQL query that returns a list of all the articles (their titles and IDs) in the wiki
For each article ID in the wiki, return the associated content
This may mean that I have to limit the kinds of formatting I put into the wiki -- things like tables would probably be omitted since they would be very difficult, if not impossible, for me to do on the Flash side. And that is fine!
Basically I am just looking for suggestions for wiki software that is pretty easy to use, but mostly is technically simple enough on the back-end that interfacing with it directly via MySQL is not difficult. When interfacing with the database directly, I only need to READ data. Any time the wiki would be edited or added to would be done via the wiki's actual front-end application.
Thanks for any suggestions!

MediaWiki is the best-known and best-supported MySQL-based Wiki, used for plenty of complex game documentation projects like MinecraftWiki. The database is not all that simple, but it's well documented and basic read operations aren't too hard. For example, here's how to fetch the current content of the page "MyPage":
SELECT old_text FROM page,revision,text WHERE page.page_title="MyPage" AND
page.page_id=revision.rev_page AND revision.rev_text_id=text.old_id;
(And yes, old_text is the current content of the page. Don't ask me why!)
Your main problem will be figuring out how to parse MediaWiki markup, there are plenty of parsers for it but I'm not aware of anything that would work in Flash.

Easy way to "organize"/"render"/"generate" HTML?

I'm a relative newbie to web development. I know my HTML and CSS, and am getting involved with Ruby on Rails for some other projects, which has been daunting but very rewarding.
Basically I'm wondering if there's a language/backbone/program/solution to eliminate the copypasta trivialities of HTML, with some caveats. Currently my website is hosted on a school server and unfortunately can't use Rails. Being a newbie I also don't really know what other technologies are available to me (or even what those technologies might be). I'm essentially looking for a way to auto-insert all of my header/sidebar/footer/menu information, and when those need to be updated, the rest of the pages get updated. Right now, I have a sidebar that is a tree of all of the pages on my website. When I add a page, not only do I need to update the sidebar, I have to update it for every page in my domain. This is really inefficient and I'm wondering if there is a better way.
I imagine this is a pretty widespread problem, but searching Google turns up too many irrelevant links (design template websites, tutorials, etc.). I'd appreciate any help.
Oh, and I've heard of HAML as a way to render HTML; how would it be used in this situation?

Server Side Includes.
Old as time. Supported in most hosting situations. Often forgotten in favour of hugely overcomplicated templating systems. SSI still has a place.

You use a template language.
Most often this will be processed on the server, but there are offline solutions which you run though a utility to generate complete HTML documents for uploading.
I'm rather fond of Template-Toolkit which I usually use server side with Catalyst but it also very usable before you involve a web server using the ttree utility.

...........Wordpress?

I'd recommend Drupal. The tree structure of a menu is an inbuilt function and you basically can forget about it at all. And inserting whatever you want in specified areas (footer, header, whatever's defined in a template). It relies on PHP and MySQL - that stuff can be used on almost any server. And it has a moderate learning curve, so you should be able to start doing magic in little time.

Screen scraping gotchas

When screen-scraping, what are the "gotcha"s to look out for?
The inspiration for this is: my spouse's co-worker asked me to scrape all the pages from a Blogger-hosted blog that her friend with cancer kept in her final months and this lady wanted to keep all of the posts in case the blog were ever deleted. I eventually found a free tool that was barely good enough.
One issue with scraping many Blogger pages is that there's often a navigation menu where you can click on the triangles to expand the post lists by year or month. These little buggers created insane amounts of duplicate content because you'd have the same page over and over again with different combinations of the menus being expanded/collapsed. In Blogger's case I'm not sure this is avoidable since the links are all formatted as real http links and not obvious JavaScript calls. Still, it got me thinking:
If you were to scrape a website, what kinds of potentially non-obvious things would you compensate for?

Do not use regex to scrape
While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.
What I recommend you do is use a DOM parser such as BeautifulSoup or equivalent (SimpleHTMLDom in PHP).
Some may think this is overkill, but in the end, it will be easier to maintain and also allows for more extensibility.
A regular expression could be devised to achieve the same goal but would be limited. For example, developing a regex to get the src and alt tag would force the alt attribute to be after the src or the opposite, and to overcome this limitation would add more complexity to the regular expression.
Also, consider the following. To properly match an <img> tag using regular expressions and to get only the src attribute (captured in group 2), you need the following regular expression:
<\s*?img\s+?[^>]*?\s*?src\s*?=\s*?(["'])((\\?+.)*?)\1[^>]*?>
And then again, the above can fail if:
The attribute or tag name is in capital and the i modifier is not used.
Quotes are not used around the src attribute.
Another attribute then src uses the > character somewhere in their value.
Some other reason I have not foreseen.
So again, simply don't use regular expressions to parse a dom document.

I screen scrape a lot. Some advice:
Emulate a User-Agent string for some browser you want to use. Different websites frequently return very different results depending on what your user agent is. If they don't recognize the User-Agent they will often revert to lowest common denominator, so it's usually best to start with some recent browser. (For example the World of Warcraft Armory returns beautiful, easy to parse XML if it thinks you're a recent Firefox. If it doesn't know what you are it sends terrible HTML).
Be polite to the site you're scraping; don't hit it too hard. Your scraper will go faster if you multi-thread it, making many requests at once, but that will annoy the site owner.
Be smart about error handling. Do not write code like while (1) { makeRequest(); }. If your code or the server throws an error a loop like this will immediately fetch another request, generating another error. It can get ugly quickly. Handle errors well and consider putting in sleeps or exits if you see a lot of errors.
When developing your parsing code, test against a cached version rather than hitting the server every time. Will make your development go faster and is the basis of a simple test suite.

First, I'd check for an RSS feed. On blogger, you just have to add /rss to the root url, if I remember correctly.
Then I'd check if there isn't already some tool to scrape blogger.
Then if there's no RSS feed, and no existing tool, I'd give up and do it by hand with copy/paste. Unless we're talking 5000 pages, it's much faster and easier that way. Take it from someone who's tried.
If you have access to the actual account, blogger has an export function.
edit: Or of course, you could try mechanical turk.

As far as gotchas are concerned..It's usually a good idea to limit the amount of requests made over a certain period of time. Smashing a site with alot of requests in a short space of time is a good way to have your requests rejected.

Aside from the technical considerations, make sure your not putting yourself at legal risk. Most large sites have specific legal language in their terms of use that disallows programmatic access to their services via an automated computer program, and also, the obvious copyright concerns.
From a technical standpoint, definitely use a DOM parser library and you'll save loads of time. Many provide the ability to read HTML into an XML structure that can be queried using XPath to find exactly what you need.

If you know someone who has access to the account, they can use Blogger's export "Export blog" feature.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Semantic Media Wiki action=purge alternatives and automation - mediawiki

This behaviour is documented on the Semantic MediaWiki page at https://www.semantic-mediawiki.org/wiki/Help:Embedded_query_update. The documentation should explain fairly well which caches are relevant and what kind of configurtion you need.

Related

Inroducing service-workers on mobile site

Multiple css files or one big css file?

MySQL-based wiki that is suitable for custom applications?

Easy way to "organize"/"render"/"generate" HTML?

Screen scraping gotchas

Categories

Resources