Copy/paste html inline image from browser to word processors - html

I am experimenting with html inline images (background: playing with the idea of creating my own CMS which does not keep the images as separate files).
I can copy/paste such an image from the browser (Firefox/IE) to image processing programs like Photoshop or MS Paint, but not to word processors like MS Word or OpenOffice Writer.
Do you have any ideas about what I could do to make these inline images "copy/pastable" for word processors as well?
Example:
<p><img src="" /></p>

You can not directly paste images in word processors. To do this First save the image in the appropriate format (png/jpg) using image processor and then copy this saved image and paste it in the word preocessor.
Please choose this as the best answer if you find it helpful.

Related

PDF to HTML converter - Stuck

I need to have just one pdf on my website and HTML file. I dont need to be making them on my website I just need to add one pdf to a page and put text over it. Does anyone know of the best way to convert the pdf to HTML. I have found places like cloud converter but it adds so much other stuff on the page with the text that it is impossible to filter through all the css and javascript to find ways to put text over it without it covering it up or weird characteristics arising. I just need the text to be formatted relative to itself and on the page plainly in html. Is this even the right approach. Thanks!

Semantic Screenshots for Web Browsers

An awful lot of modern web traffic (particularly on social media) consists of screenshots from web browsers. These typically include some formatted text, some layout, and some bitmap/vector graphics. E.g.,
It's really easy to take and share a screenshot, but it throws away lots of useful information and doesn't transfer well between devices (not to mention being far less amenable to things like screen readers for the blind and fancy data-mining). Of course the ironic part of this is that HTML/SVG is the perfect format for representing such data, and we're not using it even though it's right there.
html2canvas comes close to doing this, but doesn't properly handle images, see some semi-related discussion here.
My question is this, how can I select a visible area in my browser and save it in a format (ideally HTML) that preserves text and images and renders to something roughly similar when rendered separately? (so that it could be included as e.g. a data iframe for sharing).
I know that this is in general impossible, and that rendering HTML is a complicated task, but I feel like it should be possible to ask the browser something like "what elements are being rendered within these pixel coordinates?".
First:
Right click on page, then click on "Save page as".
Save it with a name that ends with .html (or .webarchive in some scenarios. See which works best for you).
Edit the now saved html file to only have the part you want (you can use any text editor. Sublime Text and Atom are usually suggested).
Then:
You can open it in your browser to see what you are up to.
You might want to inspect where the CSS is from too, and get that in your html's file folder, then link the html file to it, so as to preserve the styles.
As far as I understand, you'd want to bring all the CSS to be inline, or, at least, in the <head> section of the html file, so you can upload it as a single file, and don't need to keep linking it to the CSS file.

PDF to Structured Format

I have tons PDFs that I need to convert to some structured format that I can interpret (HTML/XML/etc)
PDFs are in this format:
http://img840.imageshack.us/img840/5407/pdfv.png
I have tried so far a lot of softwares that convert to HTML but all of them have no capabilities to separate the images, they just take like a printscreen of the page without the text and then use this image as a background in the html, using css to position the text
Like this: http://img37.imageshack.us/img37/5015/examplelp.jpg
I have a bunch of PDFs so process each ones images manually is not an option. Does anyone knows any solution for this (even paid softwares)?
I had a similar problem a while back and ended up writing my own solution. It's called PDFX and it's free to use. It converts PDF to a structured-format XML and also renders any bitmap images (not vector graphics) found in the PDF separately.
Example input/output can be found here. You might want to give it a try.

Putting several hundred .doc pages into webpage

I have hundreds of .doc files with text that I need put on web pages.
I realize I could convert every .doc file to .txt, then use a server side include to embed the contents of each page into a webpage. This would save a lot of time because I could simply have one .php?txt=... page which will display a different .txt include depending on the link the user pressed to get there. This works perfectly content-wise.
However, all formatting is lost when it is converted to .txt (titles should be in bold)
When I convert these .doc files to .html using Microsoft Word, the ~20 line documents become bloated >300 line .htm files (probably because each paragraph is put into textboxes)
Dreamweaver's "Clean up Word HTML" helped a bit but the code was still extremely bloated.
How would you suggest going about this?
edit: I may have solved my own question, trying to embed Google docs into my page.
There is a program suite called wv (former mswordview). It has a program wvWare. This software can transform Word documents to HTML.
Furthermore you can use the output from Word and send it through tidy. This corrects markup and usually can handle the mistakes made by Word.
You can try converting the Word documents to a DocBook intermediate format, then you can easily transform the DocBook with existing tools to (X)HTML.
MS Word is bloatware. Its own markup is bloated, and therefore any attempt to automatically convert it to HTML will inherit these problems. You end up with garbage like: <strong><strong></strong></strong> for no good reason.
Dreamweaver can clean it up a lot, but nothing short of strip/remarkup is going to get you clean results.
That's why most people use PDFs for this type of issue.
My immediate reaction would be to convert the docs to PDFs. That will normally preserve formatting quite well, and users typically have their browsers set up to view PDFs one way or another (and the few who don't are undoubtedly accustomed to being unable to view a lot of documents on a lot of sites).
Alright thanks everyone for your suggestions, but I wanted to make this page accessible to everyone without pdf viewers as well.
Google docs allows you to bulk upload your text files (and converts them for you too)
You can then export them into an iframe to embed in any html document.

best practices to import text into html

What is the best practice for importing text into html from a multipage InDesign document, from designer to non-designer. Document designed on a mac going to CMS on PC - hand off the InDesign File or strip text into word file? Supplying all images and pdf as go-by?
More people are likely to be able to open a PDF than InDesign, especially with font considerations. I prefer to get work in PDF format. I can easily extract the text and I can pull the document into PhotoShop to slice it up. You just have to make sure the quality/compression settings are right so it doesn't muck up the JPEGs too much.