what data storage model is used to store articles in wikipedia - mysql

Articles in wikipedia get edited. They can grow/shrink/updated etc. What file system/database storage layout etc is used underneath to support it. In database course, I had read a bit on variable length record, but that seemed like more for small strings and not for whole document. Like in file system, files can grow/shrink etc, and I think its done by chaining blocks together. each time, we update a file, not the whole file is rewritten. Perhaps something similar would be done here.
I am looking for specific names,terminologies, may be even how the schema in mysql is defined. (I think wikipedia uses mysql).
Below are links to some writeup on wikipedia architecture, but I am not being able to answer my question from these:
http://swe.web.cs.unibo.it/twiki/pub/WikiFactory/AntonelloDiMuroThesis/Wikipedia-cheapandexplosivescalingwithLAMP.pdf
http://dom.as/uc/workbook2007.pdf
Thanks,

See:
http://www.mediawiki.org/wiki/Manual:Database_layout

Related

semantic wiki write to wiki like database

I'm a newbie in semantic wiki.
I want to do something database and overview computer component for my
organization.
I read about semantic wiki language but cant understand Is I can do like
this in semantic wiki or not. Help me or give me please directions for
find.
For example, I have a HDD.
Each of these have:
- status used or unused
- if used then the computer (parent) or if the unused - storage room
- serial number
- specification
- and etc.
I also have storage room end etc hierarchy.
How can do it in semantic wiki?
Each hdd will have own page?
I found that it can be done by subobject but subobject cant show in are
page.
How I can describe it and do visible it describing or it can be shown
only with ask?
Maybe it can be done by something else subobject?
Thanks for your time
I found some answer but it's not that I want.
Using subobject accept write some data like a page. BUT it cant be present in a page. It can be show only in query.
It wall be double work for me if I will be the first wrote in object and after show it in some format.
It will be great if I will be can write some data (like the object) and some of wrote data will be show in same page.
it's a pity

Is it safe to use numbers in your web page file names?

Someone recently told me that using numbers in web page file names is not good practice. For example, say I was making a website about Samara Morgan and I had a file named 7days.html - would it be bad to start the file name with a number? Is it riskier than having numbers put later in the file name (ie. day7.html)?
I'm just a tad confused on whether it's generally discouraged to use numbers in file names or not.
EDIT: After asking them to explain a bit more, this is what they said to me:
.... the simplest way I can explain it is that certain programming
languages and operating systems might be confused by putting the
number as the first character. In other words, it has a higher
potential for error, so it's not recommended. That being said, it IS
acceptable to use a number AFTER the first character. By the way, a
domain name (like 4chan.org) is a little different because it's not a
file.
Here are some more tips/best practices (you'll see it as #3):
https://ed.fnal.gov/lincon/tech_web_naming.shtml
I think you need to go back to this someone and ask them for more information - are they saying there's a security problem? a usability problem because of something users might want to do with it? a Search Engine Optimisation trick you're missing that would make it easier for people to find?
I can't actually think of why numbers in URLs would matter for any of these, however. It seems most likely they were thinking of SEO, because that's a constant battle between search engines (who want users to get the results they want) and publishers (who want to get their brand higher up the results) and full of half-understood experiments and dodgy advice.
It's also worth noting that URLs don't exactly have "filenames" at all - they're just a string that the browser sends to the server, and the server may or may not map to a file on disk. Look at the URL of this page, for instance - it contains enough information for the server to look up the right question in a database, plus some human-readable text which is mostly for SEO.
Your server has filenames, of course, but I can't think of any reason why having numbers in those would be a problem, let alone why it would apply particularly to web pages.
Edit based on additional information supplied:
Two things I notice about the link you've added: one, it's twenty years old; two, it includes detailed reasoning for every single point, except point 3. I can't think of any "programming languages and operating systems" that would have a problem with a leading digit. It's actually quite common in some (non-web) contexts, as a way of forcing files to be listed or run in the desired order (e.g. 01-contents.txt, 02-introduction.txt, etc).
I can imagine problems if you began the filename with a ., -, or _, because sometimes there are entrenched conventions that those are hidden, or backups, etc. Either the advice made sense 20 years ago, or the author was being overly conservative to keep the rule simple.
To be precise . Your question refers to whether it is permissible or appropriate to begin the name of a file with an o or more numeric characters .. and according to convenzini on die files used by (main operating system names) this type of naming is allowed and does not present any problem we use it to enterpretazione ..
windows https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
linux https://www.cyberciti.biz/faq/linuxunix-rules-for-naming-file-and-directory-names/
the situation slightly different for the programming languages ​​and the most common case is that of C / C ++ where the use of variables with completely numeric characters or compound nouns that begin with numeric characters can be confusing, and therefore this practice is by some not recommended.
(See this SO for C/C++ vars naming samples and problem Is it safe to use numbers in your web page file names?)
Therefore, in your case that refers to names of files .. the limitations that you have been inidicate are not reflected.
No just keep it like that it doesn't effect anything

html tags in mysql value fields, is that right?

I was looking at status.net source code and mysql tables, and they seem to have html tags in their mysql field values. I was just wondering is that right thing to do or is it going to cause some problems in the future?
It depends on where it will be used. It isn't an issue if the intention is to have arbitrary html there. Especially not if the developers and admins are the only ones who can put it in there.
On the other hand, if for example a user of your system managed to put it there and also used the opportunity to put in a script-tag and a reference to their own scripts you might very well be in big trouble (if you don't escape the strings before you render them on your site).
i would like to take the opportunity to quote the favorite sentence of my old it-teacher:
Oh, it depends.
without knowing where and why the tags are stored in a db, it's hard to say if this is a good ideo...
A database can be used for storing just like the filesystem. So in most cases it's not a problem if you store HTML.
Lets take the articles of an WordPress blog as an example. It's definitely OK to store them in the database.
Short answer: Depends
Long answer: This practice is quite common and often unavoidable.
Think about blog posts: the HTML code that is in it marks up the content cannot be separated from the content itself.
Possible issues:
Javascript injection. If I can inject malicious HTML code into your database, I could create links to malware or javascript commands that help install viruses or trojans.
There's always a trade-off.

Using Semantic MediaWiki for tabular data

Am I completely off-track to think about using Semantic MediaWiki to store (and organise, report on, etc.) 'tabular' data such as financial transactions or weather readings that would usually live in a spreadsheet or database?
It seems that one would need a separate, tiny, page for each tuple; but then, that's by design and perhaps it's perfectly okay.
I ask, simply because SMW seems like such a quick and easy way to get a collaborative data repository up and running.
Semantic MediaWiki is better suited for keeping track of Factual or Encyclopedic data, where you can have pages about everything you need to know about a certain topic.
For tabular or numerical data such as measurements, financial, sensor data, you would indeed need to create little pages about each data point, which is not practical in many cases.
However, there are extensions to Media Wiki that allow you to integrate external data sources (in MySQL databases or CSV files somewhere) with MediaWiki pages. This can allow you to have the best of both worlds - dynamic access and queries of tabular data and semantic annotations of pages around them.
Take a look at :
http://www.mediawiki.org/wiki/Extension:External_Data
No, I don't think it's such a bad idea.
Using SemanticForms you could enter lots of little data pages quickly and easily (for example, an invoice might require additional pages for each line item, but they could all be entered from one form using the 'multiple' feature of the for template form tag). So although I've never tried logging weather data in SMW, I think it would be pretty easy. I don't see what the problem would be with storing data across so many pages; it's easy enough to combine it in whatever formats you require.
Give it a go and let us know how it goes!
You can use either the Semantic Internal Objects extension (SIO), or SMW's built in subobjects (the former works well with the already mentioned External Data extension), to store multiple semantic objects (could be the rows of your spreadsheet) in one page.
However, unless you are really looking for a collaborative tool with semantic capabilities, I doubt SMW is the best suited piece of software for your task.
edit (november 2015): Since SMW version 1.9, there nothing that SIO can do that the built-in subobjects can't, so I would recommend the latter.

Randomizing pages in Wikipedia with MySQL and Perl?

I found a perl script that manages randomizing the wikipedia articles in Wikipedia here. The code seems to be slightly computer generated. Due to my present interest in MySQL, I thought you could possibly have the links and related data in a database.
I know that MySQL is good in maintaining relations between tables, while it seems you can easily implement things with Perl. I feel it somehow fuzzy to draw a line to their specialties. So:
How can you randomize Wikipedia
articles with MySQL and Perl?
If you really want to know how THEY (Wikipedia) do it, have a look at this code directly from Media Wiki:
http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/includes/specials/SpecialRandompage.php
It is open source software after all ;), and that's the beauty of it.
Edit: From having a quick glance at the code, I am pretty sure they're using a field called page_random, set at row creation time. Then, since it's an indexed field, ordering by it with limit 1 is instant (with a given random offset, valid for this application, of course).
This is a very standard way to make random access quick, due to ORDER BY RAND() being extremely slow, as I mentioned in the other answer.
Edit #2: I love how clean and proper OOP Wiki Media's code is. Definitely bookmarking it to show PHP newbies what good PHP code looks like (and to remind myself).
SELECT id FROM articles ORDER BY RAND() LIMIT 1
You could, of course, just link to http://en.wikipedia.org/wiki/Special:Random