I have an app with a support form that a customer can use to submit new issues. One of the things that's been requested is to search their public wiki and automatically create suggestions based upon it (very similar to what SO does). Since the page in question is outside of MediaWiki itself, I'm unsure about how to move forward.
My question is twofold:
Is there an out of the box extension to do this sort of thing?
If not, how would you recommend I go about doing it? I've never written a MediaWiki extension, but for the experienced among you, what approach would you take?
I don't know if something like that already exists. But if I had to do it from scratch, I would definitely go on using the mediawiki API. Example here:
Show a list of 10 pages that contain the word MYSEARCHTEXTGOESHERE
http://mywiki.org/wiki/api.php?action=query&list=search&srsearch=MYSEARCHTEXTGOESHERE&srprop=timestamp
It wouldn't be an extension in the wiki, but rather a GET call to the wiki server from your app.
Related
I have some pages which I want to restrict access to specific users, i.e, I want user A and user B only to view this page. How can this be done? Do I need additional extensions or it can be done through LocalSettings.php for example?
You could try this extension: Extension:AccessControl. I've not deployed it to my wiki sites, but a quick read of the pages suggests it might be what you're looking for.
Note though that as Tgr covered in the comments above, Mediawiki is not intended to operate like this, so the extension could be fundamentally flawed in its implementation. Practise due diligence if you try to deploy any 3rd party extensions. Alternatively, look into alternative CMSs that handle access control as a core feature.
I frequently use Google to search for .NET documentation, and invariably, the highest ranked pages are for old versions of the .NET framework.
For example, I just did a Google search for "c# extern".
The first result was for Visual Studio 2005.
The second result was for Visual Studio .NET 2003.
I went through several pages and never did come across the Visual Studio 2010 page.
Interestingly, I tried the same search on Bing, Microsoft's own search engine, and Visual Studio 2005 was still the first hit. However, the second hit was the one I was looking for (Visual Studio 2010).
I realize that many documentation pages on MSDN have a menu at the top that allows you to switch versions, but I don't think it should be necessary to do this. There should be an HTML way to convince search engines that two pages are very similar, and one is newer/more relevant than the other.
Is there anything that can be done in HTML to force a documentation page for a more recent version to get a higher page rank than an essentially equivalent page for an older version?
You can't tell Google what page is preferred
(That's basically the answer to your question)
If someone googles c# extern, that person will get the most relevant pages calculated by googles algorithms. It will differ from user to user and where you are located, but far most how links all over the internet are directed. You can not change this with on-page optimization.
Canonical addresses mentioned by Wander Nauta is not suppose to be used in this manner. We use canonical addresses basically if we wish to tell Google or any other bot that two or more pages are the same. This is not what you where asking for. It would remove the older versions from indexing entirely in favor of the page addressed as the canonical address.
Quoted from http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Of all these pages with identical content, this page is the most useful. Please prioritize it in search results.
...
The rel="canonical" attribute should be used only to specify the preferred version of many pages with identical content...
To lead the client correct I would use, as you already described, a good web interface on the page so that the client easily can find what he or she is looking for.
Google also offers sitelinks for your search results that may or may not appear. I would say this is where you come closest to be able to direct your clients to the most relevant page by your standards on the search page.
Quoted from https://support.google.com/webmasters/bin/answer.py?hl=en&answer=47334
...sitelinks, are meant to help users navigate your site. Our systems analyze the link structure of your site to find shortcuts that will save users time and allow them to quickly find the information they're looking for.
In Googles Webmaster Tools you have an option where you can optimize thees links, at least somewhat.
Quoted from Googles Webmaster Tools
Sitelinks are automatically generated links that may appear under your site's search results. ...
If you don't want a page to appear as a sitelink, you can demote it.
Update
You could theoretically specify what version something on your page is in with "microdata" or similar. By doing this you have at least told the bots that there are two items on this site with the same name but in different versions. I don't think this will have any effect in witch order your sites will be listed in the search result thought. But we never know what the future holds right?
If you check out schema.org you'll see CreativeWork has an property named "version" and SoftwareApplication has one named "softwareVersion".
Google uses microdata to create rich snippets. I haven't heard that Google uses it for anything else but that does of course not mean it isn't so.
Google allows you to specify a canonical address for a specific resource, i.e. the version of a given page you want Google to prioritize. It's pretty easy to use.
However, hints like these are always suggestions. That is, the search engine is free to ignore them, if they support them at all.
For that you would need to know the actual algorithms. I'm guessing that most search engines do a comparison on how well the page matches the search but then also take into account the amount of hits a site gets. So say you have a 98% match with 1000 hits and a 96% match with 5000 hits. The second page may still be ranked higher.
As for what you can do, search engines are "blind" so use CSS and avoid tables for layout purposes. This will allow the engine to get a better match with your content. For a workaround with old version, you could redirect traffic coming in to the new version and then have a link to the old version on that page. Essentially setting it up so that only following that link takes you to the old page.
Since at it's core Google's search is based on links (pagerank algorithm), it would certainly help if each page of the old version linked to it's respective page on the new version. This might not solve the problem completely, but it would certainly help.
I have created a website powered by MediaWiki that offers documentation on the interface for a web application.
The URL for my web application may change. However, many articles on this MediaWiki site link to the application.
I would like to create a global constant somewhere called say "WEB_APP_URL" that I may change at any time, and that editors of the wiki can use to link to the application.
That way, I won't have to do a massive find and replace when my application URL changes.
Is this possible? I am working in a LAMP environment. Thank you.
I think the simplest way is to create a template. That is, you can create a page called Template:web-app URL with this wiki-text:
http://this.is/the/URL/of/the/web.app
and then editors can write things like:
The application is located at {{web-app URL}}.
or:
[{{web-app URL}} David Faux's application]
and the URL will automatically get dropped in.
(That's not the only way — you can get similar effects through internal configurations and hooks — but I think the template-based approach is the simplest.)
This all goes back to some of my original questions of trying to "index" a webpage. I was originally trying to do it specifically in java but now I'm opening it up to any language.
Before I tried using HTML unit and other methods in java to get the information I needed but wasn't successful.
The information I need to get from a webpage I can very easily find with firebug and I was wondering if there was anyway to duplicate what firebug was doing specifically for my needs. When I open up firebug I go to the NET tab, then to the XHR tab and it shows a constantly updating page with the information the server is updating. Then when I click on the request and look at the response it has the information I need, and this is all without ever refreshing the webpage which is what I am trying to do(not to mention the variables it is outputting do not show up in the html of the webpage)
So can anyone point me in the right direction of how they would go about this?
(I will be putting this information into a mysql database which is why i added it as a tag, still dont know what language would be best to use though)
Edit: These requests on the server are somewhat random and although it shows the url that they come from when I try to visit the url in firefox it comes up trying to open something called application/jos
Jon, I am fairly certain that you are confusing several technologies here, and the simple answer is that it doesn't work like that. Firebug works specifically because it runs as part of the browser, and (as far as I am aware) runs under a more permissive set of instructions than a JavaScript script embedded in a page.
JavaScript is, for the record, different from Java.
If you are trying to log AJAX calls, your best bet is for the serverside application to log the invoking IP, useragent, cookies, and complete URI to your database on receipt. It will be far better than any clientside solution.
On a note more related to your question, it is not good practice to assume that everyone has read other questions you have posted. Generally speaking, "we" have not. "We" is in quotes because, well, you know. :) It also wouldn't hurt for you to go back and accept a few answers to questions you've asked.
So, the problem is?:
With someone else's web-page, hosted on someone else's server, you want to extract select information?
Using cURL, Python, Java, etc. is too painful because the data is continually updating via AJAX (requires a JS interpreter)?
Plain jQuery or iFrame intercepts will not work because of XSS security.
Ditto, a bookmarklet -- which has the added disadvantage of needing to be manually triggered every time.
If that's all correct, then there are 3 other approaches:
Develop a browser plugin... More difficult, but has the power to do everything in one package.
Develop a userscript. This is much easier to do and technologies such as Greasemonkey deal with the XSS problem.
Use a browser macro technology such as Chickenfoot. These all have plusses and minuses -- which I won't get into.
Using Greasemonkey:
Depending on the site, this can be quite easy. The big drawback, if you want to record data, is that you need your own web-server and web-application. But this server can be locally hosted on an XAMPP stack, or whatever web-application technology you're comfortable with.
Sample code that intercepts a page's AJAX data is at: Using Greasemonkey and jQuery to intercept JSON/AJAX data from a page, and process it.
Note that if the target page does NOT use jQuery, the library in use (if any) usually has similar intercept capabilities. Or, listening for DOMSubtreeModified always works, too.
If you're using a library such as jQuery, you may have an option such as the jQuery ajaxSend and ajaxComplete callbacks. These could post requests to your server to log these events (being careful not to end up in an infinite loop).
I've become familiar with MediaWiki for various projects and now I much prefer its markup over using a word-processor/html/latex/restructured-text etc.
The thing is that some of the servers I edit on are quite slow, or I may not even want the document to end up online.
So my question is: Short of setting up my own webserver and running a MediaWiki instance, what's a way to edit MediaWiki markup and view locally?
Is there a tool or application which does this?
The formatting does not have to be a 100% match but I would want to be able to copy and paste between this and online MediaWiki docs with minimal editing.
Try using Markitup: make a local .html page to preview your edits in real time.
If you only need to edit the occasional page, you could try LibreOffice, with the libreoffice-wiki-publisher plugin: you can just paste the entire article into a new document, save it locally, edit it and then use the "Send to... wiki" menu to save the wikitext on the remote page.
Offline editing of MediaWiki is definitely a problem to solve; the best place where to discuss it is probably the offline-l mailing list, as Kiwix needs a solution too.
To give an answer to my own question, SoloWiki is an application which fits the bill:
http://solowiki.sourceforge.net
Though I used it a few times, it's not updated since 0.3 in 2010 and the output is limited to headings, bullets, numbering (from what I can tell).
Just including this answer here for completeness, I don't think it's an especially good option.