This is something I am trying so hard to get. tried a bunch of options, including this one found here Extracting gettext strings from Javascript and HTML files (templates). No go.
this is the sample html
<h1 data-bind="text: _loc('translate this')"></h1>
the command I have tried (php, glade..)
xgettext -LPHP --force-po -o E:\Samples\poEdit\translated.po --from-code=utf-8 -k_loc E:\Samples\poEdit\html\samplePO.html
glade seems to look only inside tags and completely skips the keyword. Anyone solve this problem?
We eventually ended up writing a small .net application to parse the html and create a json representation and used language PYTHON with xgettext to create the po file from javascript.
Related
i am using the http://i18next.com/ to translate a static website hosted on github gh-pages.
are there any tools that i can use to first extract from say the index.html file and create a index.po file where the translator can localize / internationalize the page and then use a tool like http://pypi.python.org/pypi/pojson to convert this po file to json to be used by i18next?
POEdit will probably do the trick. In PHP, the Gettext extension (http://es.php.net/gettext) will help. Also, I found this link that will detail the process of creating a po file: http://www.wdmac.com/how-to-create-a-po-language-translation.
You can use po4a for creating po files from html and from many other formats.
On an Ubuntu platform, I installed the nice little perl script
libtext-mediawikiformat-perl - Convert Mediawiki markup into other text formats
which is available on cpan. I'm not familiar with perl and have no idea how to go about using this library to write a perl script that would convert a mediawiki file to an html file. e.g. I'd like to just have a script I can run such as
./my_convert_script input.wiki > output.html
(perhaps also specifying the base url, etc), but have no idea where to start. Any suggestions?
I believe #amon is correct that perl library I reference in the question is not the right tool for the task I proposed.
I ended up using the mediawiki API with the action="parse" to convert to HTML using the mediawiki engine, which turned out to be much more reliable than any of the alternative parsers I tried proposed on the list. (I then used pandoc to convert my html to markdown.) The mediawiki API handles extraction of categories and other metadata too, and I just had to append the base url to internal image and page links.
Given the page title and base url, I ended up writing this as an R function.
wiki_parse <- function(page, baseurl, format="json", ...){
require(httr)
action = "parse"
addr <- paste(baseurl, "/api.php?format=", format, "&action=", action, "&page=", page, sep="")
config <- c(add_headers("User-Agent" = "rwiki"), ...)
out <- GET(addr, config=config)
parsed_content(out)
}
The Perl library Text::MediawikiFormat isn't really intended for stand-alone use but rather as a formatting engine inside a larger application.
The documentation at CPAN does actually show a way how to use this library, and does note that other modules might provide better support for one-off conversions.
You could try this (untested) one-liner
perl -MText::MediawikiFormat -e'$/=undef; print Text::MediawikiFormat::format(<>)' input.wiki >output.html
although that defies the whole point (and customization abilities) of this module.
I am sure that someone has already come up with a better way to convert single MediaWiki files, so here is a list of alternative MediaWiki processors on the mediawiki site. This SO question coud also be of help.
Other markup languages, such as Markdown provide better support for single-file conversions. Markdown is especially well suited for technical documents and mirrors email conventions. (Also, it is used on this site.)
The libfoo-bar-perl packages in the Ubuntu repositories are precompiled Perl modules. Usually, these would be installed via cpan or cpanm. While some of these libraries do include scripts, most don't, and aren't meant as stand-alone applications.
How can I convert an RTF file into an HTMLformat ?? I have a text editor which saves the file in rtf format but I need to put the contents on my server. For that I need to convert the rtf file into that of an html.. I am unable to find any help with regards to Objective C . Thanks.
I don't know of any library in objective-c that does rtf to html conversion.
However if you are able to perform the conversion server side then that opens up a lot more possibilities, such as php and c# libraries, as well as gnu utilities.
For example (and you can google for many more):
php: Pa software's RTF to HTML converter (paid product), Martin Mevald's rtf2htm (old GPL'ed software), Marcus Fischer's RTF Parse class (GPL'ed code), Zend's LiveDocX or even alternatives to livedocx on SO alternative-for-phplivedocx .. etc
C#: On SO convert-rtf-to-html, simple-convert-rtf-to-html, or from MS Converting-between-RTF-and-HTML
GNU: GNU's UnRTF utility.
If you really want Objective-c then the source code of the php and GNU solutions could be translated - however that will not be a trivial task. As such I still think your best bet would be to do it server side.
Is there a ruby gem that can parse latex formatted string to html string and pdf binary string including bibtex bibliography?
I'm using textile (redcloth) right now in my rails app to get formated html, but I'd like to use latex to do it. I also would like to use *.bib file for references. And having latex it should also be easy to build a pdf file, in order to provide a pdf version of the same article (nice to have)...
I also could do it with the system call and e.g. texlive, but then I've to save the user input to file and manage these files and put it back to database and that all would take some time. I don't like this approach...
Is there a nice way to do it?
You could try runtex, although I do not know if it will do exactly what you want, I have not tested it.
What would be a good tool, or set of tools, to download a list of URLs and extract only the text content?
Spidering is not required, but control over the download file names, and threading would be a bonus.
The platform is linux.
wget | html2ascii
Note: html2ascii can also be called html2a or html2text (and I wasn't able to find a proper man page on the net for it).
See also: lynx.
Python Beautiful Soup allows you to build a nice extractor.
I know that w3m can be used to render an html document and put the text content in a textfile
w3m www.google.com > file.txt for example.
For the remainder, I'm sure that wget can be used.
Look for the Simple HTML DOM parser for PHP on Sourceforge. Use it to parse HTML that you have downloaded with CURL. Each DOM element will have a "plaintext" attribute which should give you only the text. I was very successful in a lot of applications using this combination for quite some time.
PERL (Practical Extracting and Reporting Language) is a scripting language that is excellent for this type of work. http://search.cpan.org/ contains allot of modules that have the required functionality.
Use wget to download the required html and then run html2text on the output files.