best practices to import text into html - html

What is the best practice for importing text into html from a multipage InDesign document, from designer to non-designer. Document designed on a mac going to CMS on PC - hand off the InDesign File or strip text into word file? Supplying all images and pdf as go-by?

More people are likely to be able to open a PDF than InDesign, especially with font considerations. I prefer to get work in PDF format. I can easily extract the text and I can pull the document into PhotoShop to slice it up. You just have to make sure the quality/compression settings are right so it doesn't muck up the JPEGs too much.

Related

Text not showing in embedded SVG

I am migrating from using the standard JPG and PNG to using SVG files for images to maintain a high quality. I am creating images in Photoshop, saving them as PSD opening them in Illustrator and saving them as SVG and finally uploading them to my website, pure vector images seem to work fine, however I am having a problem with text being rendered correctly. Does anyone know what might be causing this?
I have a link with an example here http://liamhodnett.com//img/case-study/wags-whiskers/banner.svg
Thanks guys!
Convert the text to paths in Illustrator.
If the font is not installed on the user's machine its rendering gets weird. If you have a logo or similar you should convert all text to paths to be safe, except it is of semantic importance to you.
You'll need to convert the SVG files into a workable webfont.
You can use a free web-service such as http://www.icomoon.io and upload/convert the files as needed, and download a working/converted zip file with all you need.
If your running a WordPress site, I've developed a plugin that will allow you to upload SVG files to icomoon, download the .zip and then upload the .zip to the plugin. From there all the icons you've included in the .zip will be useable on your site with no code on your end required.
http://wordpress.org/plugins/svg-vector-icon-plugin/
Good luck!

Copy/paste html inline image from browser to word processors

I am experimenting with html inline images (background: playing with the idea of creating my own CMS which does not keep the images as separate files).
I can copy/paste such an image from the browser (Firefox/IE) to image processing programs like Photoshop or MS Paint, but not to word processors like MS Word or OpenOffice Writer.
Do you have any ideas about what I could do to make these inline images "copy/pastable" for word processors as well?
Example:
<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAcAAAAICAIAAAC6ZnJRAAAAqElEQVQImSXISQ6CMBQA0N7AjYkD4G+hEMLGRGVsK9PCQ7kx0USgDAV6YBe+5UPLnSpu9wzLDJrYnAVpwwNS3B5S6NLTKGxd+2vu9ImJFLd14bWJ9b7sZAa6oIpj1EXmf5/B5nXezoJMDFBzO44ZbhPrc91/I0MXdGKAJkaGFHqGVU517cvYkLGBxgRWQXuGx9JdHkHDYKo8JENz4Y7MoBNkqNxOkLn0fm81MqWZRgG5AAAAAElFTkSuQmCC" /></p>
You can not directly paste images in word processors. To do this First save the image in the appropriate format (png/jpg) using image processor and then copy this saved image and paste it in the word preocessor.
Please choose this as the best answer if you find it helpful.

PDF to Structured Format

I have tons PDFs that I need to convert to some structured format that I can interpret (HTML/XML/etc)
PDFs are in this format:
http://img840.imageshack.us/img840/5407/pdfv.png
I have tried so far a lot of softwares that convert to HTML but all of them have no capabilities to separate the images, they just take like a printscreen of the page without the text and then use this image as a background in the html, using css to position the text
Like this: http://img37.imageshack.us/img37/5015/examplelp.jpg
I have a bunch of PDFs so process each ones images manually is not an option. Does anyone knows any solution for this (even paid softwares)?
I had a similar problem a while back and ended up writing my own solution. It's called PDFX and it's free to use. It converts PDF to a structured-format XML and also renders any bitmap images (not vector graphics) found in the PDF separately.
Example input/output can be found here. You might want to give it a try.

Putting several hundred .doc pages into webpage

I have hundreds of .doc files with text that I need put on web pages.
I realize I could convert every .doc file to .txt, then use a server side include to embed the contents of each page into a webpage. This would save a lot of time because I could simply have one .php?txt=... page which will display a different .txt include depending on the link the user pressed to get there. This works perfectly content-wise.
However, all formatting is lost when it is converted to .txt (titles should be in bold)
When I convert these .doc files to .html using Microsoft Word, the ~20 line documents become bloated >300 line .htm files (probably because each paragraph is put into textboxes)
Dreamweaver's "Clean up Word HTML" helped a bit but the code was still extremely bloated.
How would you suggest going about this?
edit: I may have solved my own question, trying to embed Google docs into my page.
There is a program suite called wv (former mswordview). It has a program wvWare. This software can transform Word documents to HTML.
Furthermore you can use the output from Word and send it through tidy. This corrects markup and usually can handle the mistakes made by Word.
You can try converting the Word documents to a DocBook intermediate format, then you can easily transform the DocBook with existing tools to (X)HTML.
MS Word is bloatware. Its own markup is bloated, and therefore any attempt to automatically convert it to HTML will inherit these problems. You end up with garbage like: <strong><strong></strong></strong> for no good reason.
Dreamweaver can clean it up a lot, but nothing short of strip/remarkup is going to get you clean results.
That's why most people use PDFs for this type of issue.
My immediate reaction would be to convert the docs to PDFs. That will normally preserve formatting quite well, and users typically have their browsers set up to view PDFs one way or another (and the few who don't are undoubtedly accustomed to being unable to view a lot of documents on a lot of sites).
Alright thanks everyone for your suggestions, but I wanted to make this page accessible to everyone without pdf viewers as well.
Google docs allows you to bulk upload your text files (and converts them for you too)
You can then export them into an iframe to embed in any html document.

OCR graph paper

I would like to take a pdf of a scanned graph paper notebook (with handwriting) and turn it into a text file.
How can I do this?
Thanks
Check out an OCR library, like OCRopus. I don't think it takes PDF, so you may have to convert it to a TIFF or JPEG first.
There are OCR libraries that convert typing (OCRopus, tesseract, etc.)
There are also Java based handwriting libraries. I am not sure if OCRopus has that ability, one library I was looking into to do handwriting recognition was:
Online Video
Java Neural Networks
Conceivably you could take the pdf, convert it into a tiff if need be (according to the software), and it would give you something..
Good luck!
If it is the notebook as a PDF file you could e-mail it to a gmail account and then gmail allows you to "view" the PDF from within your browser as an HTML file. Still the pages remain images.
If you would like the text out of it OCR might work but it may also be uncapable of getting the text out of it.