I've noticed that when you forward an html email from Gmail (not sure about other providers), the html structure changes in the process. The forwarded html loses all the ids declared inside the original html, also some other 'cleaning' happens on the html too.
Can anybody explain why this happens or if it's possible to avoid? Or is it totally dependent on the smtp provider?
I have an app that monitors emails on a specific inbox and tries to parses it, but as I said when the user forwards his email to this inbox (from gmail), the email html structure gets cleaned and my code can no longer parse the html because a lot of the ids are gone.
Due to this, I have to find a new way to parse what I require from the email, like using regular expressions on the plaintext section of the MIME message.
I've searched about this matter and I couldn't find any single piece of information.
Gmail strips head tag and Ids and classes on pre-processor. That means when you forward or reply, to gmail, these items never existed so are not included on reply.
As Gmail removes head tag, id, classes and more, the best way is to use inline CSS style.
Tip: An inline style loses many of the advantages of a style sheet (by mixing content with presentation). Use this method sparingly.
Related
Documentation about the tag says it must located inside of the <head> and only one <base> tag is permitted.
However, this tag successfully replaces base URL for relative links even its been put somewhere inside the <body>.
The behavior was noticed in a support ticket system with many relative links. The system renders emails, and if an email's HTML code contains the <base> tag, after the email is rendered, all relative links change to the base URL specified in the tag.
The behavior confirmed for Firefox, Chromium-based browsers, Edge. IE11 ignores it. You can check a simple sample here.
Is it possible to protect from such behavior without changing website's HTML markup?
Don't blindly insert an email into an HTML document. Treat it like any other potential source for an XSS attack.
If you are going to allow HTML, then run it through a DOM aware white-list based filter (e.g. HTML Purifier if you are using PHP).
<base> should not be allowed on the white-list.
As #Quentin suggested, the best way to protect your HTML from unwanted HTML is to simply get rid of it before displaying it. Unfortunately, sometimes this will break things! Like in this particular example, if the user sent a will containing a <base> and relative links, removing the tag would break all the links.
To circumvent this, one could use an iframe. They are a very useful tool to sandbox foreign code. This should not be used blindly tough...
As to answer OP, if you have no control over the application used to read the e-mails, your only hope is to tinker with the e-mails themselves. You could create a hook on the inbox to strip down any unwanted HTML (using #Quentin's suggestion) before putting it into the monitored inbox.
I'm the author of an email client and one of things I'm in the process of doing is adding support for HTML editing. The editor itself it build upon a HTML rendering control that I've written from scratch, however it supports most HTML and CSS fairly well. The issue I'm having is formatting the reply to HTML email with the user's reply template, which is also HTML (with a different style sheet). So cleanly merging two HTML documents with their own styles without either of them being messed up by the other document's styles.
When the user replies to a HTML email, I parse out the content of the tag and put it into a that forms part of the reply. That div's style shows a line down the left margin to inform the user that it's quoting the original email. Gmail does the same thing. Anyway the styles from the HTML block as saved separately and then insert into the head part of the new document.
What happens of course is that if the original email defined a style for say a link, that style affects all the links outside the original quoted area. So things like my signature at the bottom and the From/To header rendering that is part of the reply template all get that styling from the source HTML.
I'm wondering if there is an easy straight forward solution to containing all the original styles to just the quoted part of the document? Something like namespacing? Or limited scope styling?
The solution I've come to is to add all the incoming styles from the multiple documents to a global style sheet. Styles are matched by first checking the count of properties is the same, then enumerating each property and comparing it's value. This basically gives the software a minimum number of styles to correctly render the content. It could be really slow in a pathological case but so far it's working well in practice.
As an aside, recently I've noticed a lot of email clients striping all the style out of replies. Which to be honest seems like the cheap and nasty solution to the issue. Even if it does give a consistent look.
I have a quick question I have an HTML Signature for may emails. I need to implement them into my exchange server`s transport rules.
Unfortunately the limit is at 4096 characters and my HTML signature is 8950 Characters.
I would like to know if there is an HTML code that will access a public URL and then gets the necessary HTML content from the public URL, so that I can circumvent the limitation.
Thanks for your help
It should be normal HTML stuff i guess, Office 365 Microsoft is what I am using.
I have my company logo comming from the server, does this mean Gmail user will not be able to see my logo? I am bringing it in via a URL
every time I try to place the HTML code it shows the actual signature, any idea how i can change that?
regards
Its difficult to code for HTML emails as different email clients deal with html emails in different ways.
You could
1. Reduce spacing used in html tags, inline styling.
2. Modify markup so as to use lesser tags
Best would be to reduce the signature content, remove unnecessary parts.
In case you cant do, so you could use a image of your signature & link a image tag.[NOT RECOMMENDED] This wont work in gmail since it blocks image loading by default.
Maybe posting your signature might help?
I've seen all these emails that include pictures, divs, paragraphs and what-not inside an email.
How do I actually go about doing that?
Can anyone give me a rough explanation on how these things work? I am pasting my HTML code inside my email and it only shows text. Is there anything I should enable/disable?
(I know I will need a mailing list, but that's probably a different topic.)
The body of the message should be formatted as Multipart MIME (with the email header stating that it is formatted that way) with at least two parts: A text/html part and a text/plain part (for email clients that don't support HTML and to reduce the number of This Is A Spammer flags given the email).
Most graphical email clients will only allow HTML to be entered using their WYSIWYG tools. Custom HTML requires specialist software.
I'm writing a webmail product and some emails have body css that changes the background ... so when I Html.Decode() that emailbody, it's altering the CSS of the entire page.
Is there a good way to contain that problem?
You can make your CSS more specific than the email's rules. For example:
body.body is more specific than .body or body
Any styles in body.body that clash with those in the lesser examples above, will override. But to stop the styles merging together, you'll need to define every single style.
Alternatively you can go with rewriting the CSS in the emails, which is the way most webmail/desktop email clients go these days, one way or the other. If you prefix all the rules with #emailMessage, for example, and place the email inside a <div id="emailMessage"></div> tag, all the styles in the email will only apply inside that namespace.
Using an iframe to display emails only introduces more problems based around accessibility, etc etc. Good luck.
The answer to your question is probably "iframe", but in your specific situation, writing a webmail client is going to introduce you to a wonderful new hell called "stripping css from possibly extremely invalid html generated by a large variety of clients that all have their own ideas about what kind of html should be allowed in an email".
Good luck!
A common way is to use iframe, although i'm not sure this is applicable for your problem.
Basically it loads a different html page inside another page. Which makes it independent, but it does mean you have 2 pages to display one email.
http://www.w3schools.com/TAGS/tag_iframe.asp