Import Wikipedia - get matching articles in different languages

Import Wikipedia - get matching articles in different languages - mediawiki

I have imported Wikipedia database dumps (pages-articles.xml versions) of 2 respective languages, English and Maltese.
However I need to be able to link an article from one language to another. I am under the impression that I have to import additional tables for this. Which tables do I need to import?
Thanks in advance!
Maltese db dump repository: http://dumps.wikimedia.org/mtwiki/20121012/
English db dump repository: http://dumps.wikimedia.org/enwiki/20121001/

That information is in the langlinks table, so you will need to download langlinks.sql.gz, which is a SQL dump, not XML.
One possible issue is that those links may not be symmetric, so for example en:A may link to mt:B, but mt:B links to en:C. You'll have to decide what to do about that.

If I understand correctly, you want to create an English wiki and a Maltese wiki, and to have them link to one another.
To do this, you need to add the interlanguage prefixes for 'en' and 'mt' to the interwiki table. Here's a description of it:
https://www.mediawiki.org/wiki/Interwiki_link
You should probably remove the links to all the other languages from the articles. Otherwise they will show as junk at the bottom of a lot of articles.
P.S.: It's great to hear that you are working with the Maltese language! I really love it :)

Related

Wikipedia revision history

I am trying to get all revision histories for every English wikipedia article. I just need all editors' names and edition sizes (in bytes) along with the article title or id. The wikipedia dump for all revision history is a few TB and my computer cannot handle it. I also tried to use MediaWiki to query the revision histories, but it seems like it will take a very long time to get everything. Is there any other approaches I can try to get the information I want? Thanks.

Taking the problem the other way around, maybe you don't need to download all the data.
For example, if you plan to use SQL, you can do it from the servers without downloading anything.
Please take a look at https://quarry.wmflabs.org/ and its doc.

mediawiki Special:Export

I've just set up a mediawiki server. I wanted to export data from wikipedia, but it doesn't allow for a pagelink_depth higher than 0 by default. It seems that you can only change the maximum pagelink_depth by setting up your own mediawiki and adjusting the $wgExportMaxLinkDepth. Now I've done all that, but obviously my own mediawiki has no content. So I was wondering if there was a way to bulk copy all of wikipedia into my own server. From the information I've read this seems only doable with about a 100 pages at a time. If that's the case there'd be absolutely 0 purpose for the Special:Export in general, as you'd need to know exactly which pages you want to import prior to doing the export, which defeats the purpose altogether. Any help would be much appreciated.

Special:Export isn't meant for a complete export of a wiki, especially not using the web-interface and with so much pages in the database. Special:Export should be used, if you want to export a known page with all contents to import this page (or a small amount of pages) into another wiki, e.g. to export and import a template from one wiki into the other one. So, the Special:Export special page has a valid purpose, but you try to use it for another use case, for which it wasn't developed for ;)
If you want to export any page of a MediaWiki wiki, you should use the maintenance script (run-able through the command line) dumpBackup.php or any other backup script in the maintenance folder. This will ensure, that you get what you want.
For the case of Wikipedia, you can't access these scripts (I mentioned this for general purpose only), but the Wikimedia foundation provides database dumps of the Wikimedia-Wikis, including Wikipedia.

"So I was wondering if there was a way to bulk copy all of wikipedia into my own server" I would recommend against this simply on the sheer size of the data & the vast number of open links (or "redlinks" or "bad links") you would be adding if you didn't actually copy it all in. A better approach is to follow all the Wikipedia conventions about page NAMING, to the punctuation mark.. then write a script that checks say once a night whether you have linked to something that is already defined in Wikipedia, and then imports ONLY THAT PAGE and adds a link up top to the EXACT VERSION OF IT that was imported. That way you only bring in what you actually reference, but your database can integrate with Wikipedia's.
This will also come in immensely handy if you have to support multiple languages, like Spanish or French, as well, since Wikipedia has links to 'the same article in another language' thus translating at least those concepts for you.

Newbie - How to display info from a .dat file?

I'm pretty new to this so I'm not sure if this is a simple request or not but here goes:
I am working on a school website and under each program page is a list of course codes. What I'm looking for is when I click on said course code (ex. HEL2106), to have a lightbox-type of popup that displays program info about said course code. What I have is a .dat file that has all the course codes and descriptions in it, so I would like to use some sort of HTML/CSS/JS that will pop this up and display the correct info about the clicked course from the .dat file.
I'm not 100% sure on how to go about this so if anyone has any suggestions at all, that would be really helpful.
If you need any other details from me, let me know.
Thanks,
(File Info* The .dat file is pretty much just a notepad document with each course code & description in sequence)

Just to let you know, you need to search and learn about a lot of things first.
For data access on a website, you need access to a database. If you don't know about SQL (or any other query language), Query, Database, Tables, Server ... then you should start there.
To read those databases, you need to write code (ASP.NET, PHP, etc) that runs on a web server (Apache, IIS, etc).
If you want to create a website, I recommend you start working with WordPress, Joomla or other CMS (Content Management System) for you to familiarize with a lot of things before jumping to the advance stuff.
YouTube is a very good friend and teacher! :) Start by looking some tutorials there. Hope this will guide you to what you need.

I have no idea what your level of experience is based off your question so I will assume you have a basic understanding of HTML,CSS and JS. If not, then I would recommend Exel Gamboa's answer.
It sounds like you're looking for something like http://fancybox.net/
Of course, it is typically used for displaying images but it could be easily modified for your purpose.
Now about your .dat file. When storing data for large websites, it's typically best to use SQL for databases. This allows you to access data and store it in an organized manner.
As a final recommendation I'd take a look at using a CMS for your website. (Wordpress, WolfCMS, perch, etc...)
Hope this helps.

what data storage model is used to store articles in wikipedia

Articles in wikipedia get edited. They can grow/shrink/updated etc. What file system/database storage layout etc is used underneath to support it. In database course, I had read a bit on variable length record, but that seemed like more for small strings and not for whole document. Like in file system, files can grow/shrink etc, and I think its done by chaining blocks together. each time, we update a file, not the whole file is rewritten. Perhaps something similar would be done here.
I am looking for specific names,terminologies, may be even how the schema in mysql is defined. (I think wikipedia uses mysql).
Below are links to some writeup on wikipedia architecture, but I am not being able to answer my question from these:
http://swe.web.cs.unibo.it/twiki/pub/WikiFactory/AntonelloDiMuroThesis/Wikipedia-cheapandexplosivescalingwithLAMP.pdf
http://dom.as/uc/workbook2007.pdf
Thanks,

See:
http://www.mediawiki.org/wiki/Manual:Database_layout

html tags in mysql value fields, is that right?

I was looking at status.net source code and mysql tables, and they seem to have html tags in their mysql field values. I was just wondering is that right thing to do or is it going to cause some problems in the future?

It depends on where it will be used. It isn't an issue if the intention is to have arbitrary html there. Especially not if the developers and admins are the only ones who can put it in there.
On the other hand, if for example a user of your system managed to put it there and also used the opportunity to put in a script-tag and a reference to their own scripts you might very well be in big trouble (if you don't escape the strings before you render them on your site).

i would like to take the opportunity to quote the favorite sentence of my old it-teacher:
Oh, it depends.
without knowing where and why the tags are stored in a db, it's hard to say if this is a good ideo...

A database can be used for storing just like the filesystem. So in most cases it's not a problem if you store HTML.
Lets take the articles of an WordPress blog as an example. It's definitely OK to store them in the database.

Short answer: Depends
Long answer: This practice is quite common and often unavoidable.
Think about blog posts: the HTML code that is in it marks up the content cannot be separated from the content itself.
Possible issues:
Javascript injection. If I can inject malicious HTML code into your database, I could create links to malware or javascript commands that help install viruses or trojans.
There's always a trade-off.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Import Wikipedia - get matching articles in different languages - mediawiki

That information is in the langlinks table, so you will need to download langlinks.sql.gz, which is a SQL dump, not XML. One possible issue is that those links may not be symmetric, so for example en:A may link to mt:B, but mt:B links to en:C. You'll have to decide what to do about that.

Related

Wikipedia revision history

mediawiki Special:Export

Newbie - How to display info from a .dat file?

what data storage model is used to store articles in wikipedia

html tags in mysql value fields, is that right?

Categories

Resources