I'm trying to extract English Wikipedia articles related to a list of Arabic articles, let's say I have this article
https://ar.wikipedia.org/wiki/%D8%A7%D9%84%D9%82%D8%AF%D8%B3
and I need to extract the English version of it
https://en.wikipedia.org/wiki/Jerusalem
The problem is that I don't have the list of English names corresponding to the Arabic names list to use it with the API's, I think about extracting the language links and do some processing on the result to extract the English names but don't know if there is any easier way using the Mediawiki API's that you can suggest?
The easiest way to do this is through wikidata. There's a wikidata item for all the wikipedia pages, and wikidata has links to all the wiki pages about the particular term. So, here's an example query that would give you the English name from Arabic name:
https://www.wikidata.org/w/api.php?action=wbgetentities&sites=arwiki&titles=%D8%A7%D9%84%D9%82%D8%AF%D8%B3&languages=en
Related
I'm pretty new with Custom Translator and I'm working on a fashion-related EN_KO project.
There are many cases where a single English term has two possible translations into Korean. An example: if "fastening"is related to "bags, backpacks..." is 잠금 but if it's related to "clothes, shoes..." is 여밈.
I'd like to train the machine to recognize these differences. Could it be useful to upload a phrase dictionary? Any ideas? Thanks!
The purpose of training a custom translation system is to teach it how to translate terms in context.
The best way to teach the system how to translate is training with parallel documents of full sentence prose: the same document in two languages. A translation memory extract in a TMX or XLIFF file is the best material, but many other document formats are suitable as well, as long as you have both languages. Have at least 10000 sentences in both languages, upload to http://customtranslator.ai, and build a custom system with it.
If you have documents in Korean that are representative of the terminology and style you want to achieve, without an English match, you can automatically translate those to English, and add to the training material as parallel documents. Be sure to not use the automatically translated documents in the other direction.
A phrase dictionary is of limited help, because it is unaware of context. It is useful only in bootstrapping your custom system or for very rare terms where you cannot find or create a sentence.
I'm trying to get a "category tree" from wikipedia for a project I'm working on. The problem is I only want more common topics and fields of study, so the larger dumps I've been able to find have way too many peripheral articles included.
I recently found the vital articles pages which seem to be a collection of exactly what I'm looking for. Unfortunately I don't really know how to extract the information from those pages or to filter the larger dumps to only include those categories and articles.
To be explicit, my question is: given a vital article level (say level 4), how can I extract the tree of categories and article names for a given list e.g. People, Arts, Physical sciences etc. into a csv or similar file that I can then import into another program. I don't need the actual content of the articles, just the name (and ideally the reference to the article to get more information at a later point).
I'm also open to suggestions about how to better accomplish this task.
Thanks!
Did you use PetScan?. It's wikimedia based tool that allow extract data from pages based on some conditions.
You can achieve your goal by go the tool, then navigate to "Templates&links" tab, then type the page name in field "Linked from All of these pages:", e.g. Wikipedia:Vital_articles/Level/4/History. If you want to add more than one page in the textarea, just type it line by line.
Finally, press Do it! button, and the data will be generated. After that you can download the data from output tab.
I am trying to setup a multilingual encyclopedia (4 languages), where I can have both:
Articles that are translations of other languages, and
Articles that are in a specific language only.
As the wiki grows, I understand that the content of each language can vary.
However, I want to be able to work as fluently as possible between languages.
I checked this article, dating back to 2012, which has a comment from Tgr that basically condemns both solutions.
I also checked this Mediawiki Help Article, but it gives no explanation about the differences between both systems.
My questions are:
1- what is the preferred option now for a multilingual wiki environment that gives the most capabilities and best user experience, given that some of the languages I want are right to left, and some are left to right.
So I want the internationalization of category names, I need to link the categories their corresponding translations, and want users to see the interface in the language that the article is written in.
So Basically as if I have 4 encyclopedias, but the articles are linked to their corresponding translations.
2- Which system would give me a main page per language? So the English readers would see an English homepage, and the French readers see a French homepage..etc?
EDIT:
I have a dedicated server, so the limitation of shared hosting is not there.
Thank you very much.
The Translate extension is meant for maintaining identical translations and tracking up-to-date status while other solutions (interwiki links, Wikibase, homegrown language templates) typically just link equivalent pages together. Translate is useful for things like documentation, but comes with lots of drawbacks (for example, WYSIWYG editing becomes pretty much impossible and even source editing requires very arcane syntax). It's best used for content which is created once and then almost never changes.
You cannot get internationalized category names in a single wiki as far as I know. (Maybe if you wait a year or so... there is ongoing work to fix that, by more powerful Wikibase integration.) Large multi-language wikis like Wikimedia Commons just do that manually (create a separate category page for each category in each language).
I'm starting do develop a game (AS3), and in one step, the participants have to type a word in one of 5 different available languages, and then that word is translated to the other 4.
For the sake of example:
I choose the word "home" in English, and then these fields are filled:
Spanish: casa
Russian: домой
German: Zuhause
French: maison
So the question is, what would be the best approach to do it?, are there any downloadable dictionaries available for different languages?, or it would be better to feed from a web service?.
Also something to consider is that the translations shouldn't consist of more than one word.
I never worked with dictionaries before, so I'd rather investigate a bit instead of starting with the left foot. Thanks.
You have to use property file. This is best approach to do multi-languaged application.
I'm just building an online dictionary English to malayalam, Malayalam to english dictionary.
Here is the link http://www.vanmaram.com/
That website offers the user the option to add words.
I would like to add some words.
If anyone knows where to get english words, could you please give me directions.
It would be very helpful for me
There are meany open dictionary files you can find on the internet, i would recommend using the ones from open office or something like that. They also have a Malayalam one.
Open office allso refers to a more up to date list of data from aspell.net.
you can check this out too : MySQL formatted english words list
dunno much about how many words are not in but it's a huge list anyway :)
There are about 128/129K words in