In MediaWiki, perform an action on every page of a namespace through API or maintenance script - mediawiki

How to programmatically — through the API or through a maintenance script — do an action on every page from a specified namespace?
I would for example like to add every page of the main namespace into a specified category, for maintenance purpose.

I've had to do this on a few occasions.
It looks like the way to do things now is through the Pywikibot scripts. These are tools written in Python for automating tasks on MediaWiki sites. Unlike most of the MediaWiki documentation, the PWB docs are actually pretty thorough.
When I last had to add some text to every page, I couldn't find a bot that worked with my private instance running on my intranet, since most bots were written to work with Wikipedia.
What I ended up doing was just generating a list of page URLs, and then fed them to a custom script that utilized Selenium to automate a browser running on my machine. You can use Selenium from Java, Python, C#, and Ruby.
It's a pretty heavyweight approach though, since you actually have to be running a realtime browser to do the work.
Take a look at Pywikibot.

Related

How do I upload new posts with an admin panel to a Github-hosted webpage?

I have a webpage, actually a blog, posted with Github Pages. It's a simple HTML&CSS page. Normally, I create new files with my new posts in them and upload these files to my repository. However, I want to create an admin panel. Especially in order to post easily, and manage my blog (like adding tags, comments etc). I don't know where to start or what to use. I know how to program in C & C#, so it's not a problem if I have to learn a new language.
Any help would be appreciated.
You may be able to use a Headless CMS. These approaches normally are driven by git or some kind of API (you don't have to write any backend code) to add content to static sites such as yours. Although most of them work with markdown, so you may need some way to render the markdown into your HTML.
Headless CMS is normally used within Jamstack projects, so I'd suggest checking that out if that is something you're interested in.
I learned that I need server-side processing with languages like PHP or Phyton. However, Github Pages is a static site service and does not support dynamic web sites. So I will whether keep writing locally or consider another hosting services.

JSP / HTML UI Design

We have some JSP code to run in our pages.
There is a UI designer who will constantly update the UI but does not have Tomcat server. We prefer he doesn't because he isn't a programmer.
However, it is getting annoying to have to cut and paste the JSP related code each time he updates the UI.
Is there anyway to handle this issue? We prefer to keep the same files, but still have it so he can see his UI work without worrying about the JSP and when he checks in the new files, we don't have to cut and paste our JSP related code.
One example of such code, is that there are certain navigation menu items which are displayed depending on the user.
We are using Tomcat authentication. We could I suppose use AJAX to obtain the user information, but is that less secure? Everything else in the application is AJAX.
The problem here is that this person is not working with the team. Rather, he's creating work for them... and it goes both ways. Read on.
I both do and manage front end development. If this UI person was on my team, I would force him to set up a Tomcat server. He just needs to learn some things.
In effect, when implemented properly, JSP is not much different from any other server-side markup language for views, such as Rails + ERB, PHP, .NET etc... even Javascript templating engines (mustache, handlebars, etc.). The same condition checks, for-loops, auth checks - all basic view-layer logic that is needed is available and usable.
If he's on a Java project / team, he needs to learn the Java front-end. It's that simple.
His main tasks should be basic, and frankly, he shouldn't even need to install a Java IDE to do them. They are:
Get/push source code + analyze diffs (any source control client)
Build / deploy latest to his local environment (scripts or .bat files)
Work on the running app*
(*) The last part is where things get tricky. If you work directly on the running server and then accidentally run a fresh deploy before copying over your updates, you're screwed. If you use symlinks (which are also available in Windows), there may be files that only appear post-compilation, or locks, or sync issues when getting latest code - all creating problems.
The way I have found that works best is to work on the code repo location (pre-build), and create two scripts:
Build+deploy - stops running server, blows out directories and caches, builds latest, and redploys
Update - Synchronizes the View files and any other necessary directories with the deployment target. You must be sure to disable hot-deploy in the Tomcat config, or you'll get memory leak errors.
That said, and it should be obvious by now: Java is one of the most difficult ecosystems to develop UI's for. The compiled nature and complex environment requirements make development slow and tedious, with significant dependencies on different people or systems to make a decent product.
JSP itself, while capable as described above, is almost always organized badly, with various ways of includes, tagfiles, partials, frameworks - it becomes a UI person's worst nightmare. GSP (from Grails) solves a lot of the organizational issues, but will require flexibility from the dev team. Even then it is not an "ideal" solution.
JSP syntax - JSTL, C:tags, etc. creates even greater headaches. Front-end people who do not program, don't use IDE's and therefore don't have a way of looking up methods, objects, parameters etc. when writing or customizing conditional logic or loops. The dev team can help by pre-writing these out on the page, but any time there are changes or enhancements needed, it requires meetings, conversations, and compromise.
In the long run, you should abstract the Java app from a separate, more flexible, more capable front-end technology stack, using REST / JSON-based services to talk between the two. (Side note: For performance / apps with scale, ensure you are using either a custom protocol or Web Sockets).
My preference is node.js, because front-end developers can stick with the language they will know best: Javascript / JSON. But it could be anything that your particular front-enders are comfortable with and can do design with.
The key is to eliminate bottlenecks on both front-end and back end. Both tracks should be able to develop and iterate quickly, with the RESTFUL API being the key point of collaboration.
Lastly, for those of you who are aspiring front-end developers / designers but only know Java (or some other server-side technology), I CHALLENGE YOU to learn something new. User-facing technologies are in a constant state of change, and more recently that change has accelerated. If you want to have UI-competitive products, you need to invest in technologies that will make them competitive.

Running a Perl/TK GUI inside a web page

We have a Perl application which contains a Perl/TK based GUI ( some checkboxes, entry fields, etc.)
I have been asked to modify the Perl / TK GUI part of the application so that it can be run inside a web page. Is this possible?
I found this:
http://oreilly.com/openbook/webclient/ch07.html
however it appears to create a web client and parsing the HTML response to format the output, as opposed to running inside a browser.
I would like to know if it is possible to somehow incorporate a Perl/TK GUI into web browser and if so what is the best way to do so? Maybe something like a plugin (ex. http://www.tcl.tk/software/plugin/)?
The usual way would be to rewrite your application in HTML/CSS/JavaScript. The example you show on the O'Reilly site does the opposite - it shows you how to write a Tk application that will render HTML.
A browser plugin is possible if that will provide what you need. If that is the case then the problem is trivial, but you would need the plugin installed on every PC that needs to have access to your application, and it is possible that there are certain Tk facilities that the plugin doesn't support. All you can do is try it.
There was a project for Netscape that was mentioned in Mastering Perl/Tk called PerlPlus. But it looks like the Sourceforge page hasn't been touched in a while. The intent was to run Perl (and PerlTk) code in a Netscape browser.

How to customize buildbot web pages

I am trying to make some extra web pages for a test buildbot, since I am planning to have one running my project.
Practically I would like to have a waterfall page that show the button to build a specific builder, close to the build name, instead than in the builder page only. I would also like to have some reference documents loaded from inside the builder work folder, and from other locations on the slave machine; using buttons to display or hide them.
I've looked at the manual and I do not see any info about how do you customize or create new html pages, that can leverage on the Buildbot features (like the templates already included with Buildbot do).
I have opened some pages, and see that there are some html files that actually has non-html code statements like
% macro
% for
And so on. I am not a web programmer so I am quite clueless about what should I look for. Tried to google the word macro for HTML and I just got a bunch of results related to Wiki customization; it does not look like it is Python language so I am quite lost.
Is there anyone that was successfully able to make custom pages for the buildbot, and could give me some pointers about what to learn?
Buildbot uses jinja2 for templating, the jinja2 homepage has some nice documentation. This is where the non-html statements come from. I found google's chromium buildbot to be a good starting point, when learning about buildbot customization.
http://buildbot.net/buildbot/docs/0.8.7/developer/webstatus.html
http://jinja.pocoo.org/docs
http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.chromium/templates/

Super-fast screen scraping techniques? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I often find myself needing to do some simple screen scraping for internal purposes (i.e. a third party service I use only publishes reports via HTML). I have at least two or three cases of this now. I could use apache httpclient and create all the necessary screen scraping code but it takes a while. Here is my usual process:
Open up Charles Proxy on the web site and see whats going on.
Start writing some java code using Apache HttpClient, dealing with cookies, multiple requests
use Jericho HTML to deal with parsing of the HTML.
I wish I could just "record my session" quickly and then parametrize the things that vary from session to session. Imagine just using Charles to grab all the request HTTP and then parametrize the relevant query string or post params. Voila I have a reusable http script.
Is there anything that does this already? I remember when I used to work at a big company there used to be a tool we used called Load Runner by Mercury Interactive that essentially had a nice way to record an http session and make it reusable (for testing purposes). That tool, unfortunately, is very expensive.
HtmlUnit is a scriptable, headless browser written in Java. We use it for some extremely fault-heavy, complex web pages and it usually does a very good job.
To simplify things even more you can run it in Jython. The resultant program reads more like a transcript of how one might use a browser than hard work.
You don't mention what you want to use this for; One solution is to simply "script" your web browser using tools like Selenium if having a web browser repeat your actions is an acceptable solution. You can use the Selenium IDE to record what you do and then alter the parameters.
I wish I could just "record my session" quickly and then parametrize the things that vary from session to session.
If you have Visual Studio test edition it's web test function does that exactly. If you aren't using VS or want a stand alone tool I have had great success with OpenSpan. It is more than just web, it does windows apps, and java!
Selenium would be my 1st pick, as the IDE lets you do a lot of things the easy way by "recording" a session for you. But, if you're not happy with what it provides, you can also use the Python module called Beautiful Soup to programmatically walk through a website.
Python and Perl both have a module called Mechanize (WWW::Mechanize for perl) that makes it easy to do browser behavior programmaticly (filling out forms, handling cookies, etc).
So, Python + BeautifulSoup (great html/xml parser) + mechanize (browser functions) = super easy/fast scraper
I used DomInspector for manually inspecting the site of interest to parametrize it's structure. Then simple Apache HttpClient and hand-made parser using this parametrized structure. Basically I could extract any info from any site automatically with a little tweak of parameters.. It's similar to how SAX parser works, all you need to tell it is at what sequence of tags you want to start grabbing the data. For example, google have pretty standard format of search results.. So, you just run to the third occurrence of 'tab' and start getting text from the first 'div' up until the end '/div'
Internet Explorer supports Browser Helper Objects (BHOs). They can access IE' HWND (window handle) and it's easy to scrape the pixels from there. The IWebBrowser2 COM interface also gives you access to the HTTP requests, and you can get back the parsed HTML document via IWebBrowser2::Document = IHTMLDocument / IHTMLDocument2 /IHTMLDocument3
Using FireFox, it should be possible to implement much of it with its powerful support for addons and enhancements, however that wouldn't really mean to run "headless", but really be a real scripted browser. Also, I seem to recall having read that google's chrome browser uses a similar technique to do automated regression testing.
I can't personally vouch for it, but there is a free firefox plugin: DejaClick
I installed it the other day and did some remedial recording, playback, and script editing activities with it. It pulled them off without much of a learning curve. If your end goal is to show something in a web browser, then it should suffice.
They offer web transaction monitoring services, implying that you can export the scripts for other uses, but they may be too proprietary to use outside of your web browser / their paid service.
http://www.dejaclick.com/