Generate HTML and PDF - html

I'm thinking of the way I'd generate a university newspaper both in PDF and HTML (a website) where every news would contain picture(s) and wonder if there any tools to approach this problem declaratively so that unexperienced users would prepare structured data (text + pictures) and get PDF and website on output on their own with no programmers' intervention. I suspect it can be some sort of XSL-FO, XML editing/processing software.
P. S. A free tool(s) would be a best solution.
Thank you.

For this, a very good approach would be to use DocBook
to write your articles, than let the tools generate HTML and PDFs you need - with just some tuning of the look and feel output from your side.
For DocBook there are many available tools, but a very good one that is free for open source and academics is XMLMind
If your articles are more technical oriented, than DocBook is the quasi-standard (even many publishing houses like O'Reilly use it)
Of course, in the "pure" academics domain, LaTex is quite the standard (and allows to have output in allot of formats), but requires quite allot to learn it, and there are no true WYSIWYG tools to write the articles. If you intend to send the articles to some research papers too, than they are very glad to accept your LaTex input.

we (swansea University) used a content management system to achive this - DotNetNuke in our instance.
We wanted multipage newletter where a summary on the newletter and click more for the fuller article. The content management system allowed normal users to use the software to construct the newletter, they simple created a new child site every month.
We had the newletter emailed out, we we simply grabbed the html from the main page and sent to a distribution list.
Something worth considering - cost = £0

The obvious way of doing it would be to generate all your content in XML, then use either a commercial XSL-FO processor for PDF, or apache FOP to generate the PDF, and whatever your XSL processor is to generate the html.

Very similar to Simon Thompson's answer:
You can use Drupal and its print module.

Related

Multilingual Mediawiki installation using Wiki Family Vs single multilingual MediaWiki Extension

I am trying to setup a multilingual encyclopedia (4 languages), where I can have both:
Articles that are translations of other languages, and
Articles that are in a specific language only.
As the wiki grows, I understand that the content of each language can vary.
However, I want to be able to work as fluently as possible between languages.
I checked this article, dating back to 2012, which has a comment from Tgr that basically condemns both solutions.
I also checked this Mediawiki Help Article, but it gives no explanation about the differences between both systems.
My questions are:
1- what is the preferred option now for a multilingual wiki environment that gives the most capabilities and best user experience, given that some of the languages I want are right to left, and some are left to right.
So I want the internationalization of category names, I need to link the categories their corresponding translations, and want users to see the interface in the language that the article is written in.
So Basically as if I have 4 encyclopedias, but the articles are linked to their corresponding translations.
2- Which system would give me a main page per language? So the English readers would see an English homepage, and the French readers see a French homepage..etc?
EDIT:
I have a dedicated server, so the limitation of shared hosting is not there.
Thank you very much.
The Translate extension is meant for maintaining identical translations and tracking up-to-date status while other solutions (interwiki links, Wikibase, homegrown language templates) typically just link equivalent pages together. Translate is useful for things like documentation, but comes with lots of drawbacks (for example, WYSIWYG editing becomes pretty much impossible and even source editing requires very arcane syntax). It's best used for content which is created once and then almost never changes.
You cannot get internationalized category names in a single wiki as far as I know. (Maybe if you wait a year or so... there is ongoing work to fix that, by more powerful Wikibase integration.) Large multi-language wikis like Wikimedia Commons just do that manually (create a separate category page for each category in each language).

GEDCOM to HTML and RDF

I was wondering if anyone knew of an application that would take a GEDCOM genealogy file and convert it to HTML format for viewing and publishing on the web. I'd like to have separate html files for each individual and perhaps additional files for other content as well. I know there are some tools out there but I was wondering if anyone used any tools and could advise on this. I'm not sure what format to look for such applications. They could be Python or php files that one can edit, or even JavaScript (maybe) or just executable files.
The next issue might be appropriate for a topic in itself. Export of GEDCOM to RDF. My interest here would be to align the information with specific vocabularies, such as BIO or REL which both are extended from FOAF.
Thanks,
Bruce
Like Rob Kam said, Ged2Html was the most popular such program for a long time.
GRAMPS can also create static HTML sites and has the advantage of being free software and having a native XML format which you could easily modify to fit your needs.
Several years ago, I created a simple Java program to turn gedcom into xml. I then used xslt to generate html and rdf. The html I generate is pretty rudimentary, so it would probably be better to look elsewhere for that, but the rdf might be useful to you:
http://jay.askren.net/Projects/SemWeb/
There are a number of these. All listed at http://www.cyndislist.com/gedcom/gedcom-to-web-page-conversion/
Ged2html used to be the most popular and most versatile, but is now no longer being developed. It's an executable, with output customisable through its own scripting syntax.
Family Historian http://www.family-historian.co.uk will create exactly what you are looking for, eg one file per person using the built in Web Site creator. As will a couple of the other Major genealogy packages. I have not seen anything for the RDF part of your question.
I have since tried to produce a Genealogy application using Semantic MediaWiki - MediaWiki, the software behind Wikipedia, and Semantic MediaWiki includes various extensions related to the Semantic Web. I thought it is very easy to use with the forms and the ability to upload a GEDCOM but some feedback from people into genealogy said that it appeared too technical and didn't seem to offer anything new.
So, now the issue is whether to stay with MediaWiki and make it more user friendly or create an entirely new application that allows for adding and updating data in a triple store as well as displaying. I'm not sure how to generate a family tree graphical view of the data, like on sites like ancestry.com, where one can click on a box to see details about the person and update that info or one could click on a right or left arrow around a box to navigate the tree. The data comes from SPARQL queries sent to the data set/triple store both when displaying the initial view and when navigating the tree, where an Ajax call is needed to get more data.
Bruce

How can I get started on programmatically analyzing web site content?

I've been looking for a new hobby programming project, and I think it would be interesting to dabble in ways to programmatically gather information from websites and then analyze that data to do things like aggregate or filter it. For example, if I wanted to write an application that could take Craiglist listings and then do something like display only the ones matching a specific city not just a geographical area. That's just a simple example, but you could go as advanced and sophisticated as how Google analyzes a site's content to know how to rank it.
I know next to nothing about that subject and I think it would be fun to learn more about it, or hopefully do a very modest programming project in that topic. My problem is, I know so little that I don't even know how to find more information about the subject.
What are these types of programs called? What are some useful keywords to use when searching on Google? Where can I get some introductory reading material? Are there interesting papers I should read?
All I need is someone to disabuse me of my ignorance, so that I can do some research on my own.
cURL (http://en.wikipedia.org/wiki/CURL) is a good tool to fetch a website's contents and hand it off to a processor.
If you are proficient with a particular language, see if it supports cURL. If not, PHP (php.net) may be a good place to start.
When you have retrieved a website's content via cURL, you can use the language's text processing functionality to parse the data. You can use regular expressions (http://www.regular-expressions.info/) or functions such as PHP's strstr() to find and extract the particular data you seek.
Programs that "scan" other sites are usually called web crawlers or spiders.
I recently completed a project that uses Google Search Appliance that basically crawls the whole .com domain of the web server.
GSA is very powerful tool that pretty much indexes all the urls it encounters and serves the results.
http://code.google.com/apis/searchappliance/documentation/60/xml_reference.html

MySQL-based wiki that is suitable for custom applications?

I develop an online, Flash-based multiplayer game. It is a complex game, and requires a lot of documentation to fully explain it to our users. Ideally, I would like to find MySQL-based wiki software that can provide these editable documentation pages outside of Flash (in the HTML realm) but also within Flash for convenience, and so that players can refer to the information without interrupting their game or having to switch back-and-forth between browser tabs. I am expecting that I would need to do a lot of the work on the Flash side myself, as far as formatting, for example, but I would like to feel comfortable in querying the wiki's database to get info directly. I guess this means that I need a wiki that is structured relatively "flat" or intuitively so that I can do things like:
Run a MySQL query that returns a list of all the articles (their titles and IDs) in the wiki
For each article ID in the wiki, return the associated content
This may mean that I have to limit the kinds of formatting I put into the wiki -- things like tables would probably be omitted since they would be very difficult, if not impossible, for me to do on the Flash side. And that is fine!
Basically I am just looking for suggestions for wiki software that is pretty easy to use, but mostly is technically simple enough on the back-end that interfacing with it directly via MySQL is not difficult. When interfacing with the database directly, I only need to READ data. Any time the wiki would be edited or added to would be done via the wiki's actual front-end application.
Thanks for any suggestions!
MediaWiki is the best-known and best-supported MySQL-based Wiki, used for plenty of complex game documentation projects like MinecraftWiki. The database is not all that simple, but it's well documented and basic read operations aren't too hard. For example, here's how to fetch the current content of the page "MyPage":
SELECT old_text FROM page,revision,text WHERE page.page_title="MyPage" AND
page.page_id=revision.rev_page AND revision.rev_text_id=text.old_id;
(And yes, old_text is the current content of the page. Don't ask me why!)
Your main problem will be figuring out how to parse MediaWiki markup, there are plenty of parsers for it but I'm not aware of anything that would work in Flash.

PDF creation - Tags - Authouring

This is so vague it's ridiculous but who knows...
We have got this client who will not budge - they are supplying PDF files auto generated by their own software. These files don't import into our (printing) lab management software - made by kodak.
So I emailed Kodak the error log and relevant files and got this back..
DP2 supports the importing of PDF's from – Adobe Illustrator and Quark Express
Some of the capabilities when importing PDF's as ORDER ITEMS is that the images can be modified,
color corrected, or replaced. To accomplish this, the PDF is disassembled. PDF's from Illustrator and Quark,
contain additional information that tells us where everythings goes and how, thus enought information for
us to reassemble the PDF. While other applications do generate PDF's they don't contain this additional
information.
After speaking with a 3rd party 'expert' we need to consider another 3rd party 'rip' software that's fairly expensive. So before I go ahead I thought I'd ask if any one has experience with this stuff?
Cheers
Thats a tough one, PDFs can be created in so many different ways, it hard to tell exactly what any given PDF may be composed of, personally I'd try some different PDF editors first to see if you can exact the data you need before going the expensive route.
Eg Foxit PDF offer an editor (I think its free, or cheap in any case)
Darknight