CHM Creator with ability to parse html meta keywords - chm

I have lots of scanned images of a magazine(published monthly) and i have to organize it in searchable manner.
User should be able to view magazine issue wise or can search for predefined categories/keywords.
What i have thought for now, is to create CHM as it will need less effort than creating a new custom built software.
For that i will create seperate HTMl page(Programatically) with image embedded in it along with the keywords(Stored in Excel sheet along with path of Image) for which that image should be included in result.
So i want a chm creator that can parse html meta tags and add keywords in chm keywords list.
One such software i have found is Abee CHM Maker
But i need some free alternative.
If you have any other idea to organize it with minimal efforts, then also you are welcome...

The standard (free) way to create chm files is using Microsoft's HTML help workshop:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms670169(v=vs.85).aspx
Kind regards,
Bo

Free Pascal has a CHM creator package, a html DOM implementation and a basic commandline compiler for CHM projects (.hhp). The creator package is independent of MS tools or any other binary blob, and available in source. It is portable as far as FPC is portable (not as portable as gcc on paper, but enough in practice with all major architectures and OSes supported)
One could make something like that, I made something similar, but instead of meta, I folded back titles into TOC and index and cleaned up html (TeX4ht output) and fixed links before turning it into a chm.
But it will require some work, and if you are not familiar with Object Pascal/Delphi (the language), it might be a bridge too far. (the hours required would not compare favorably with the costs of the Abee thing, if that would suit your goals).
On the other hand, in a freely programmable system you can decide yourself how far you automatize things. I put in a lot of work once, and now all new output of tex4ht (with a certain fixed set of settings) formats nicely to chms.

See if this helps you (it certainly does what you need):
KEL CHM Creator: http://dumah7.wordpress.com/2009/02/17/kel-chm-creator-v-1-4-0-0/
Alternatively, I think you could add tags on each picture (right click on it-> Properties->Details->Tags) and use Windows explorer for searching them. I have never done this but it is supposed to be working (I guess).

Related

Importing pptx slides into docx document

I have literally hundreds of slides created with python-pptx. Many of these slides have charts I would like to use in a docx file. So what I would like to do is use python-docx to import these slides/charts into a docx file. Is that possible?
No, not with the current python-pptx or python-docx APIs.
Such a thing is possible of course, since the Word application will allow you to "paste" charts from PowerPoint and in fact the charts themselves are specified in DrawingML, an XML vocabulary that is shared between PowerPoint, Word, and Excel.
But to make this work with Python, you'd have to dig quite deep into the internals of both python-pptx and python-docx (although their architectures are much the same). You would probably also need to learn more about the respective XML vocabularies than you really wanted to know. So you might want to consider alternate approaches such as using win32com support for this sort of thing, especially if you are running on Windows and this is a one-time job and does not need to be hosted on a server for ongoing use.
If you thought you did want to tackle it, a good first step might be to inspect the XML related to a PowerPoint chart (located in both the slide and the chart-parts of the PPTX package) and also inspect the corresponding XML that appears in a Word (.docx) file that includes a chart. That will give you an idea of what needs to come over from the PPTX package, what transformations it may need to undergo (namespace changes perhaps) and where it would need to be added into the DOCX package, including updating relationship files and perhaps updating certain ID values to make them unique in the target package.

Pretty-print Lua source-code in external file, without embedding it in the HTML file

Since my experience with HTML is fairly rudimentary (and pretty old), I am not sure if my requirement is realistic.
Lets say that I have quite a few files containing Lua source-code, and all of them have the ".lua" extension and available in a particular subdirectory. What I'd like to do is create a static index.html file, which when loaded in a browser, would show the list of the lua source-code files in a drop-down. Once one of the source-code files is selected, I'd like that the file gets loaded into an "area" on the same page, and is pretty-printed, i.e. with syntax-highlighting in browser. I was wondering if I could use something like the google-code-prettyfy for the syntax-highlighting part ? Also, I am not clear if an external lua sourcecode file can be loaded, and displayed within a certain region of html page as being rendered. If yes, would appreciate elaboration on the how part.
A tool like LDoc can be used to accomplish a lot of what you want, much as Doxygen would be used for a C language source kit.
Both are heavily driven by inclusion of specially formatted comments that carry documentation.
I know Doxygen can fold source code into the generated document set, I don't recall about LDoc. Both are actively under development.
It isn't necessarily a bad idea to use both tools on a project, especially if you have C source code implementing Lua modules. You could use Doxygen to build the overall document tree for your engine and C modules, and LDoc to build documentation of the Lua parts. It should be possible with a little care and configuration of both tools to get them to play well together.

Activating HTML with Haskell

I have a large pile of lecture notes in raw HTML format. I would like to add interactive content to these notes, in particular incorporating online exercises. I have some experience implementing online exercises as cgi-bin executables compiled from Haskell code running on the server, interacting with a student record file and sending suitable HTML back to the browser, using Text.Xhtml to generate the content. Now I plan to integrate the notes and the exercises.
The trouble is that I don't want to spend ages manually transforming my raw HTML into Haskell code to generate exactly the raw HTML I started with. Instead, I'd like to put my Haskell code and my HTML in the same source file, with placeholders in the latter for content generated by the former. A suitable tool should then transform this file into Haskell source code for (e.g.) a cgi-bin executable which generates the corresponding page.
Before I go hacking up such a piece of kit, I thought I'd ask if there's better technology out there already. The fixed points are the large legacy lump of HTML, the need to implement the assessment of the exercises in Haskell, and the need to interact with student records on the server. The handicap is that I need to use the departmental web server and I can't reconfigure it (ok, maybe I could ask nicely): that's one of the reasons I currently use cgi-bin executables, which are just fine on our server already, but I'm open to other possibilities.
My current plan is to write a (I mean adapt an existing) preprocessor to support a special syntax for defining functions of type
Html -> ... -> Html -> Html
that looks a lot like raw HTML with splice points. Then what I do with my existing raw HTML is indent it a bit and mark the holes.
But would that be a waste of time? Please, please tell me that this question is a duplicate!
There are Haskell frameworks like Yesod and Happstack which use templating engines like you describe.
Have you looked at the haskell wiki at http://www.haskell.org/haskellwiki/HSP or
http://www.haskell.org/haskellwiki/Web/Libraries/Templating ?
They may do what you need.
You might find someting to do the job here: Templating packages for Haskell.
And you should probably look into Snap, Yesod or Happstack for serving the content.
I have a large pile of lecture notes in raw HTML format. I would like to add interactive content to these notes, in particular incorporating online exercises.
There is already a system (called "ActiveHs"), written in Haskell, that allows to put lecture notes and interactive exercises in one file.
See:
http://pnyf.inf.elte.hu/fp/UsersGuide_en.xml
http://pnyf.inf.elte.hu/fp/Constructive_en.xml
I can really say that it is very well written code and completely open source!

View the innards of a .ppt file?

I need to figure out what is going on inside a client's .ppt files. What is a good way to get started?
My eventual hope is to convert it to HTML. But if I just export the .ppt to HTML, I get a lot of images (as opposed to text), which is not a Good Thing.
EDIT: software that automatically converts .ppt to HTML would be terrific, provided that it preserves as much information as possible in text format. If that doesn't exist, the next best thing would be to understand the innards of the .ppt and write my own code to do a partial conversion.
EDIT: I used OfficeConvert as recommended by Michiel Leenaars. It got me text all right. My 50-page, 8MB test file turned into 40MB of text. The fact that I got text is good. The fact that the amount went way up is moving in the wrong direction. And there is an awful lot of repetition in there. The word "style" appeared 410815 times; the word "draw" appeared 351229 times.
I think a safe way would be to use OfficeConvert to automatically convert to ODF programmatically with Microsoft Office. Run it with /? to get help. There are some dependencies (see below).
Then use a good ODF library like lpod to look inside it.
You can view some interesting code examples here.
Dependencies:
Microsoft .NET Framework Version 2.0 Redistributable Package (x86)
Primary Interop Assemblies for Office 2007 or Office 2010 (whichever you are using).
I like the Aspose products. (I'm not associated with them other than as a customer.) I've used the PPT one specifically to write code that pokes around in the insides of a PPT. Overkill if you just want to convert it to HTML, but invaluable for the sorts of things I use it for.
If you know Java, Apache has the POI project which lets you take a look at the inners of a PPT project. Could get all the info you want about the project (images, text) and then convert it to html however you like.
Its free too.

Programmatically generate high quality PDFs

Note: I realize this question has already been asked (with a ruby slant) here: Creating on-demand, print-quality PDFs (preferably in Ruby if feasible). BUT there was no decent answer IMHO.
So as you may have guessed, I am looking to find the best approach to producing HIGH QUALITY, print ready PDF documents programmatically. Our requirements need us to be able to have design documents that define place holders for dynamic content like images and text i.e. some kind of template mechanism.
The suggestion has been to use Adobe's InDesign server, but this seems like an expensive solution not to mention a little overkill for our need.
Are there any alternative, cheaper and more fitting solutions out there? The language of the solution doesn't really matter, just as long as it can be executes on a Windows box.
My suggestion would be to look at XSL-FO or thereabouts...
You create an XML doc that describes what you want and there are various libraries and toolkits (I've used XEP from RenderX) that will convert said XML into PDF.
In real terms what we did was take a large lump of data in XML format, use XSLT - templates in effect - to convert the data to formating objects which XEP renders up into something (a 500 page hotel directory with auto-generated TOC and Index) that has been consumed quite happily by at least three different commercial printers. We did some other smaller documents too from time to time.
Downside with this is that its not even remotely a WYSIWYG solution - you're effectively compiling "source code" to get PDF out the back. Upside is that the base technologies are reasonably generic even if the specific toolkits may be a bit less so.
You can convert XML templates to PDFs with Prince.
Prince is a computer program that
converts XML and HTML into PDF
documents. Prince can read many XML
formats, including XHTML and SVG.
Prince formats documents according to
style sheets written in CSS.
I have and also know many people that have had much success with ReportLab an open source Python PDF library (http://www.reportlab.org/rl_toolkit.html).
Its extremely easy to use and very quick to get started. So worth trying out.
I don't know why no one has suggested using LaTeX for this. It's an extremely popular open format for document design and not hard to set up a template that you can fill in text or image content. While the reference implementation of LaTeX runs as a standalone program, if that sounds like too many moving parts for you there are wrapper libraries for Python and other languages you can call via an API.
Java language and JasperReports
Java: iText
C#: iTextSharp
depends on what you want to publish, but take a look at Pentaho reporting
http://reporting.pentaho.org/
rinohtype is an open-source document processor that is capable of producing high-quality print-ready PDF documents. You can use one of the built-in document templates (book, article) or define your own template. The look of document elements can be configured by means of CSS-like style sheets. The contents of your document can be parsed from reStructuredText or CommonMark files, or you can build the document tree programmatically.
Full disclosure: I am the author of rinohtype.