How to get the Entity of Wikipedia in other language through MediaWiki - mediawiki

I have a list of English Wikipedia title (Wikipedia items) and want to get the their Chinese title. Is there any python methods through MediaWiki can do it?

If you are talking about article names (usually H1 in HTML or extractable from URL) then go to Petscan and use the following settings:
Set English language for the interface (right next to the globe on the top)
Categories - Wikidata
Other sources - Manual list - list your English Wikipedia titles in the column (the first letter of the name should be capitalized, no stop sign needed)
Other sources - Wiki - enwiki
Other sources - Use wiki - (the last option) - znwiki (if you need a certain subversion of Chinese, you have to find its ISO code and replace the "zn")
Click on Do it! - it should generate the list below

Related

MediaWiki API: search for pages in specific namespace, containing a substring (NOT a prefix)

I want to scrape pages from a list of Wikipedia Categories, for which there isn't a 'mother category'. In this case, dishes -- I want to get a list of all of the pages like Category:Vegetable Dishes, Category: Italian Dishes, then scrape and tag the pages in them. I know how to search for pages in a known category, but there are hundreds of categories containing the substring dishes + it feels like it should be easy to list them.
However, mediaWiki allcategories search seems to only allow search by prefix (e.g. from and to results), and while old opensearch documentation still allows search by substring, this is no longer supported. (see updated API docs + it also doesn't work if I try it)
This is very doable in the wikipedia browser, to the point where I think it might be quicker to just scrape search results, but I wonder if I'm missing something?
Thanks to #Tgr, for pointing out that I'd missed the regular search API, which allows for both a text search, specified namespace, and so on.
The correct query for my instance is:
curl "https://en.wikipedia.org/w/api.php?action=query&list=search&srnamespace=14&srsearch=Dishes&format=json"
thanks!

DNN database search and replace tool

I have a DNN (9.3.x) website with CKEditor, 2sxc etc installed.
Now old URLs need to be changed into new URLs because the domain name changed. Does anyone know a tool for searching & replacing URLs in a database of DNN?
I tried "DNN Search and Replace Tool" by Evotiva, but it goes only through native DNN database-tables, leaving 2sxc and other plugin /modules tables untouched.
Besides that, there are data in JSON-format in database-tables of 2sxc, also containing old URLs.
I'm pretty sure that the Evotiva tool can be configured to search and replace in ANY table in the DNN database.
"Easy configuration of the search targets (table/column pairs. Just point and click to add/remove items. The 'Available Targets' can be sorted, filtered, and by default all 'textual' columns of 250 characters or more are included as possible targets."
It's still a text search.
As a comment, you should be trying to use relative URLs and let DNN handle the domain name part..
I believe the Engage F3 module will search Text/HTML modules for replacement strings, but it's open-source, so you could potentially extend it to inspect additional tables.

Custom Translator - Documents are not available after uploading

After I create a project and I upload successfully a TMX in Documents (I can see it in Upload history and in Documents), when I open the project again there are no Documents available and I can't create the model.
I've seen that in the Quickstart examples there are checkboxes next to the name of the projects and of the documents but in my case they are not available.
Moreover, my TMX source language is EN-GB and the language specification appears in "Language(s)"(ex. English - (United Kingdom )-Italian); whereas when I select the language pair in "Upload Files" I can select only English without United Kingdom or US. Do you think this could be the problem?
Does anyone have any ideas/solutions?
Thanks!
Culture is not yet supported. Your guess is right on; "EN-GB" is the problem. Change it to "EN-US" would fix it.
-Mohamed

In the google drive search api, how to group words into a phrase?

I'm using the Google Drive search api with the Files.list - search for files.
I have a query like : fullText contains 'battle of hastings'.
I'm getting results that seem to suggest that it searches for the individual words, rather than the phrase as a whole. I'm not completely clear though, and am relating the API's functionality to what can be done on a Google Search via the website, so please correct me there.
Anyway, I really only want results for the whole phrase - ie like surrounding a phrase in Google's Search web site with double quotes. For example, if you use Google's search web site to search for "no one will have written this before", then it says 'No results found for "no one will have written this before".', but if you don't use double quotes, then you get all sort of stuff.
To summarise:
Does the query api search for individual words and only return files with all those words in, even if they're not as a phrase, or in that order?
Is there a way to make it consider the words as a single phrase?
By using the try section of Files:List and consulting the search parameter documentation.
fullText - Full text of the file including title, description, and content.
contains - The content of one string is present in the other.
I tested using this
fullText contains 'daimto twitter'
It returned all of the files that contain that exact match.
By using the try it facility, I found that the behaviour is similar to the search ui in google drive, and you need to surround phrases that are to be considered one with double quotes. The quotes should be encoded into the URL like this :
https://www.googleapis.com/drive/v2/files?maxResults=100&
q=fullText+contains+'%22flippity+floppity%22'
I'm not sure if the spaces need to be encoded like that, but I tried to emulate it as much as possible.

Trouble Getting a Locally Hosted Copy of the English Language Wiktionary to include the Translations Sections

I used MWDumper - http://www.mediawiki.org/wiki/Mwdumper - to import the xml dump of the English Language Wiktionary (specifically the file named enwiktionary-20120930-pages-meta-current.xml,) to my local server.
I have found that under the Translations section (on each page for each English word,) next to the name of each language where I should be able to see the definition in a foreign language, I instead see Template:Tø, Template:T+, or Template:T- and I am not sure why this is.
As an experiment, I also used WikiTaxi - http://www.yunqa.de/delphi/doku.php/products/wikitaxi/index - with the exact same XML dump and did not have this problem when viewing under WikiTaxi.exe.
I have been searching through mediawiki.org looking for the answer, but have so far not been successful.
Okay, I found out that MWDumper did the right thing importing the xml dump. All the translations are there. I just had to click on the Template:T+, Template:T- and Template:Tø links and add a template according to the instructions at http://www.mediawiki.org/wiki/Templates.