PDF creation - Tags - Authouring - language-agnostic

This is so vague it's ridiculous but who knows...
We have got this client who will not budge - they are supplying PDF files auto generated by their own software. These files don't import into our (printing) lab management software - made by kodak.
So I emailed Kodak the error log and relevant files and got this back..
DP2 supports the importing of PDF's from – Adobe Illustrator and Quark Express
Some of the capabilities when importing PDF's as ORDER ITEMS is that the images can be modified,
color corrected, or replaced. To accomplish this, the PDF is disassembled. PDF's from Illustrator and Quark,
contain additional information that tells us where everythings goes and how, thus enought information for
us to reassemble the PDF. While other applications do generate PDF's they don't contain this additional
information.
After speaking with a 3rd party 'expert' we need to consider another 3rd party 'rip' software that's fairly expensive. So before I go ahead I thought I'd ask if any one has experience with this stuff?
Cheers

Thats a tough one, PDFs can be created in so many different ways, it hard to tell exactly what any given PDF may be composed of, personally I'd try some different PDF editors first to see if you can exact the data you need before going the expensive route.
Eg Foxit PDF offer an editor (I think its free, or cheap in any case)
Darknight

Related

mediawiki Special:Export

I've just set up a mediawiki server. I wanted to export data from wikipedia, but it doesn't allow for a pagelink_depth higher than 0 by default. It seems that you can only change the maximum pagelink_depth by setting up your own mediawiki and adjusting the $wgExportMaxLinkDepth. Now I've done all that, but obviously my own mediawiki has no content. So I was wondering if there was a way to bulk copy all of wikipedia into my own server. From the information I've read this seems only doable with about a 100 pages at a time. If that's the case there'd be absolutely 0 purpose for the Special:Export in general, as you'd need to know exactly which pages you want to import prior to doing the export, which defeats the purpose altogether. Any help would be much appreciated.
Special:Export isn't meant for a complete export of a wiki, especially not using the web-interface and with so much pages in the database. Special:Export should be used, if you want to export a known page with all contents to import this page (or a small amount of pages) into another wiki, e.g. to export and import a template from one wiki into the other one. So, the Special:Export special page has a valid purpose, but you try to use it for another use case, for which it wasn't developed for ;)
If you want to export any page of a MediaWiki wiki, you should use the maintenance script (run-able through the command line) dumpBackup.php or any other backup script in the maintenance folder. This will ensure, that you get what you want.
For the case of Wikipedia, you can't access these scripts (I mentioned this for general purpose only), but the Wikimedia foundation provides database dumps of the Wikimedia-Wikis, including Wikipedia.
"So I was wondering if there was a way to bulk copy all of wikipedia into my own server" I would recommend against this simply on the sheer size of the data & the vast number of open links (or "redlinks" or "bad links") you would be adding if you didn't actually copy it all in. A better approach is to follow all the Wikipedia conventions about page NAMING, to the punctuation mark.. then write a script that checks say once a night whether you have linked to something that is already defined in Wikipedia, and then imports ONLY THAT PAGE and adds a link up top to the EXACT VERSION OF IT that was imported. That way you only bring in what you actually reference, but your database can integrate with Wikipedia's.
This will also come in immensely handy if you have to support multiple languages, like Spanish or French, as well, since Wikipedia has links to 'the same article in another language' thus translating at least those concepts for you.

GEDCOM to HTML and RDF

I was wondering if anyone knew of an application that would take a GEDCOM genealogy file and convert it to HTML format for viewing and publishing on the web. I'd like to have separate html files for each individual and perhaps additional files for other content as well. I know there are some tools out there but I was wondering if anyone used any tools and could advise on this. I'm not sure what format to look for such applications. They could be Python or php files that one can edit, or even JavaScript (maybe) or just executable files.
The next issue might be appropriate for a topic in itself. Export of GEDCOM to RDF. My interest here would be to align the information with specific vocabularies, such as BIO or REL which both are extended from FOAF.
Thanks,
Bruce
Like Rob Kam said, Ged2Html was the most popular such program for a long time.
GRAMPS can also create static HTML sites and has the advantage of being free software and having a native XML format which you could easily modify to fit your needs.
Several years ago, I created a simple Java program to turn gedcom into xml. I then used xslt to generate html and rdf. The html I generate is pretty rudimentary, so it would probably be better to look elsewhere for that, but the rdf might be useful to you:
http://jay.askren.net/Projects/SemWeb/
There are a number of these. All listed at http://www.cyndislist.com/gedcom/gedcom-to-web-page-conversion/
Ged2html used to be the most popular and most versatile, but is now no longer being developed. It's an executable, with output customisable through its own scripting syntax.
Family Historian http://www.family-historian.co.uk will create exactly what you are looking for, eg one file per person using the built in Web Site creator. As will a couple of the other Major genealogy packages. I have not seen anything for the RDF part of your question.
I have since tried to produce a Genealogy application using Semantic MediaWiki - MediaWiki, the software behind Wikipedia, and Semantic MediaWiki includes various extensions related to the Semantic Web. I thought it is very easy to use with the forms and the ability to upload a GEDCOM but some feedback from people into genealogy said that it appeared too technical and didn't seem to offer anything new.
So, now the issue is whether to stay with MediaWiki and make it more user friendly or create an entirely new application that allows for adding and updating data in a triple store as well as displaying. I'm not sure how to generate a family tree graphical view of the data, like on sites like ancestry.com, where one can click on a box to see details about the person and update that info or one could click on a right or left arrow around a box to navigate the tree. The data comes from SPARQL queries sent to the data set/triple store both when displaying the initial view and when navigating the tree, where an Ajax call is needed to get more data.
Bruce

Generate HTML and PDF

I'm thinking of the way I'd generate a university newspaper both in PDF and HTML (a website) where every news would contain picture(s) and wonder if there any tools to approach this problem declaratively so that unexperienced users would prepare structured data (text + pictures) and get PDF and website on output on their own with no programmers' intervention. I suspect it can be some sort of XSL-FO, XML editing/processing software.
P. S. A free tool(s) would be a best solution.
Thank you.
For this, a very good approach would be to use DocBook
to write your articles, than let the tools generate HTML and PDFs you need - with just some tuning of the look and feel output from your side.
For DocBook there are many available tools, but a very good one that is free for open source and academics is XMLMind
If your articles are more technical oriented, than DocBook is the quasi-standard (even many publishing houses like O'Reilly use it)
Of course, in the "pure" academics domain, LaTex is quite the standard (and allows to have output in allot of formats), but requires quite allot to learn it, and there are no true WYSIWYG tools to write the articles. If you intend to send the articles to some research papers too, than they are very glad to accept your LaTex input.
we (swansea University) used a content management system to achive this - DotNetNuke in our instance.
We wanted multipage newletter where a summary on the newletter and click more for the fuller article. The content management system allowed normal users to use the software to construct the newletter, they simple created a new child site every month.
We had the newletter emailed out, we we simply grabbed the html from the main page and sent to a distribution list.
Something worth considering - cost = £0
The obvious way of doing it would be to generate all your content in XML, then use either a commercial XSL-FO processor for PDF, or apache FOP to generate the PDF, and whatever your XSL processor is to generate the html.
Very similar to Simon Thompson's answer:
You can use Drupal and its print module.

How do i convert hdb file? ... believed to be from act! source

Any ideas ?
I think the original source was a goldmine database, looking around it appears that the file was likely built using an application called ACT which I gather is a huge product I don't really want to be deploying for a one off file total size less than 5 meg.
So ...
Anyone know of a simple tool that I can run this file through to convert it to a standard CSV or something?
It does appear to be (when looking at it in notepad and excel) in some sort of csv type format but it's like the data is encrypted somehow.
Ok this is weird,
I got a little confused because the data looked a complete mess, in actual fact the mess was the data, that's what it was meant to look like.
Simply put, i opened the file in notepad, seemed to have a sort of pattern so i droppped it on excel.
Apparently excel has no issues reading these files ... strange huh !!!
I am unaware of any third party tooling for opening these files specifically, although there is an SDK available for C# which could resolve your problem with a little elbow grease.
The SDK can be aquired for free Here
Also there is a developer forum which could provide some valuable resources including training material with sample code Here
Resources will be provided with the SDK
Also, out of interest since ACT is a Sage product have you any Sage software floating about which you could attempt to access the data with? Most offices have!
Failing all of the above there is a trial available for ACT! Here!
Good luck with your problem!

How to reverse engineer binary file formats for compatibility purposes

I am working of a file preparation software to enable translators work easily and efficiently on a wide range of file formats.
As far as text-based formats (xml, php, resource files,...) are concerned, my small preparation utility works fine, but a major problem for most translators is to handle all kinds of proprietary binary formats (Framemaker, Publisher, Quark...).
These files are rarely requested and need to be opened in expensive applications (few freelance can afford to buy $20,000 worth of software just to handle a few projects per year), and even then it is not convenient to work directly in those applications anyway.
I would like to be able to read these files and extract the text in such a way that it can be translated and then re-imported in the original application with minimal effort, or even better, to recreate a valid native binary file.
Does that sound doable?
Where can I find more information on handling binary file formats and are there useful tools for these kind of jobs (besides regular hex editors)?
Thanks in advance.
Of course reverse engineering is possible, but without format specs it will take a lot of work. I would look at the return on effort regarding supporting these 'rarely requested, very expensive' formats. You may be better off spending that effort improving the core functionality of your app.
Another angle is to contact the companies with these formats, explain your goal, explain that it helps their product, and if they don't see you as competition they might be willing to help.
I know that you want to reverse engineer them - but since these may be propriety file formats you are looking at a very steep curve trying to decode them...
Some (as I have written some propritety formats for interal use before) have specific methods and objects written into them that serve some alternative process than the file contents themselves. Stuff that would prove the new file is illegal.
Just my 2 cents and I am no lawyer =>
Maybe you could pick a cheaper application which has import features for QuarkXPress. For example InDesign should be able to read Quark documents. Then use the importing application to export to whatever format you need - maybe with a help of plug-in.