Paste from Outlook/Word/Office to Embeded Browser - html

So, we have a great application, that is going well, but some of our users like to copy their text to word before pasting into our application. When they do that, the HTML is parsed out somewhat properly, but usually contains tags from outlook or word, that our XHTML engine just doesn't like, or understand.
For example, a user types in a note into Word, has some minor formatting in it, and they past into our HTML editor (it's just a basic webbrowser with designmode turned on), the subsequent source includes <_o3a_p> tags, among others.
Am i going to have to just write a stripper for every type of MSO html tag?

I have had good luck pasting WORD content to Libre Office, and then re-selecting and copying the text out of Libre Office into a web form.
It keeps the formatting, and links, and removes all the Microsoft formatting Code.

As a user that sometimes copies data from Word to a web form (I sometimes like to spellcheck first), I've found great success by first pasting into Notepad, then copying from there and pasting into the web form.
However, Word still sometimes has the last laugh. If you have "smart quotes" enabled, it turns
This is the "best" way.
into
This is the “best” way.
(Note the quotes around the word "best").
The easy way to fix this is to turn off Smart Quotes before I begin to type; I can also use Notepad to find all of the "smart quote" symbols (“ ” ‘ ’) and replace them with "normal quote" symbols (" " ' ').

The consensus seems to be that while some tools available are somewhat successful at auto parsing ms work tags, none are 100% perfect. Methods to parse those tags depend upon what framework you are using.
Regular expression would probably be a clean fix.
Some more information about this topic can be found
on this blog post that basically documents the same struggle you seem to be having.

Related

Maintaining font style/formatting into a form that doesn't support html/markdown

I have looked into the previous postings to do with this area but haven't found any relevant answers as perhaps I am asking the wrong question.
On the popular design site Dribbble, there seem to be interesting formatting changes in profile names that break from the conventions of the site's styling.
Alot of people have been adding special characters (ΔδΓ etc.) that can be achieved by pasting into their profile form and saving changes, yet some users have somehow managed to enter formatted versions of their name, despite the profile form not supporting HTML or Markdown. You can see an example in the images below.
An example of copying the font to Google with maintained formatting
When opening in inspector, it also shows the formatted type
How could this be done in a simple text input form that doesn't support HTML/Markdown?
These are almost certainly Unicode characters, just like these characters that you reference in your question: ΔδΓ.
For example, Unicode's mathematical alphanumeric symbols section includes symbols that look like the ones in your screenshot. Since these are separate Unicode characters there is no need for additional formatting.
Users will need to have a font that supports those characters installed locally to view them.

Issue with apostrophes in html and pound symbols

I have a problem, some email and web designs i receive have ’ instead of ' in the text. This creates problems with rendering on some email clients and it's difficult to manually catch them all.
Is there any type of software or online script that converts these symbols (along with the £ sign) to HTML compatible text? Would notepad or anything work?
I
You'll need to convert your text to html characters before putting it into your email html. This is a common issue when you import from MS Word, as it uses characters like curly quotes, hellips and mdashes that need converting first.
There are a whole bunch of converters out there, here are 3:
Email on Acid
Web2Generators
Charset
Here is an example of something written in MS Word:
“Hello?” he said to ‘it’. Wait – I’m not finished…
This converts to this:
“Hello?” he said to ‘it’. Wait – I’m not finished…
You should use the converted version in your email, or you could be lazy and just replace all instances of curly quotes with straight ones in your code. The grammar is not technically accurate, but most people will not mind.

Replacing �'s with quotations and apostrophes

Recently I've done some careless copying and pasting into my html documents. Because the document type is set to Strict, the quotation marks and apostrophes show up as this crazy symbol: �.
Example: Brad says, �Don�t rock the boat baby.�
I considered changing the document type from Strict (which could turn out to be the easiest thing to do) but I'm not sure if changing from Strict would have any negative repercussions.
Naturally, I need to get rid of them. The problem is that I need to replace A LOT of them from a lot of different documents. I'd use the replace feature on Textpad, but it doesn't recognize � , so I can't change it. I've been reduced to going through all of the code and doing the tedious replacing.
Does anyone know of a good way to replace these things? It could be some other software, or anything else really.
I always use textmate on the Mac to replace them because it does recognize those characters. Try notepad++ and a bunch of different text editors and see what you come up with.
I remember dealing with this when I first started out and my secretary got reprimanded for such a crime.

author html for ms word

my objective is to generate HTML markup to target ms word. So far my findings are, if you have all the styles inline to an element, the document, when opened in word renders properly. However it is lengthy task.
<h1 style="font-family:Arial">Inventory</h1>
This is how I try to achieve formatting. If i want to maintain a constant font across the document, in my HTML, I'd have to add font-family to all the elements like I've done above.
Later, I came across a codeproject article. http://www.codeproject.com/KB/office/Wordyna.aspx Now I am sort of convinced that you can declare the styles globally, but the styling language used and the formatting is not like CSS, and, I think its proprietary to ms word document formatting. I am looking for any tutorials/articles for this styling being used.
ps: I am aware about OpenXML etc, etc. I feel its too complex for me to implement at this point.
Word --should-- open valid (read: not Microsoft's proprietary html-ish mess) without fail as it's the rendering engine for Outlook when you open an HTML email. You could go to the effort to build a document entirely in-line (read: only best practice for Microsoft) as we do for HTML emails, but I suspect there are several different ways to skin this cat.
Personally, if I was trying to get a rich text formatted document from html to Word I'd use a tool such as PHPDocX to build a proper word document natively, then if I really wanted Word HTML I could simply hit save on Word. I've had to do similarly with Excel, where it will accept CSV, but the outcome is always better with XLSX, and there's a similar plugin to easily author a proper XLSX document.
If that's too difficult a route (and it's not that bad, trust me) then I'd stick to formatting following HTML Email rules. Simple guides are all over the web, such as here. And, since Outlook 07-current uses Word's html rendering engine, one could deduce that it has the same limitations listed here

Can I reformat HTML in Visual Studio without removing blank lines?

The HTML formatting in Visual Studio works great -- especially considering you can pick a selection and just format that. You can just select a tag or block, right click and do 'Format Selection'. You can also reformat the whole document.
However I like to use a lot of whitespace in my documents to keep things organized and the reformat HTML compresses (deletes!) this whitespace.
Are there any plugins, or external tools for formatting HTML that might make it possible to leave vertical space untouched?
Edit: Bonus points: If anybody has 2010 installed can they check if it already has this feature? If it DOESN'T have this feature I'd like to submit a feature request. Fortunately the new editor is much more extensible, but I don't know if that extends to customization of something like this.
Visual Studio 2012 preserves empty line breaks when formatting HTML. I ran across this SO post because I was looking for an option to remove empty lines!
If the whitespace in your documents has some systematic logic to it,(such as, a Line break before and after each Table Tag) then you might be able to get the kind of behaviour you want when applying formatting.
Check out the options dialog from :
Tools -> Options -> Text Editor -> HTML -> Format -> "Tag specific Options"
This pretty much allows you customize the formatting of each type of tag to a reasonably minute level. In your particular case, the "Line breaks" option might be useful or atleast relevant.
You might also like to try out a custom HTML formatter such as HTML Tidy. Many powerful editors like Notepad++ and UltraEdit have a built-in HTML Tidy module for formatting. Personally, though, I find the formatting capabilities of Visual Studio sufficient for most requirements.
The short answer is no, 2008 will always reformat it the way it has in its settings. You will have to configure the settings in Tag Specific Options to match how you like your HTML to display.
It can do up to two spaces before and after a tag, but unfortunately there's no way to maintain whitespace or formatting for comments (it only recognises start + end tags that are the same)
If you have problems with it matching your coding style, my question about this might help out