Allow whitespace and line-breaks in HTML input field? - html

In our internal CRM we have a simple html input textarea where you can leave notes and messages. We later use this information to email this, only since that email is in HTML the formating is all wrong.
So if for example I have the following in my MYSQL table:
This is a test message!
Some line
Some more lines
If we later email this it comes out as:
This is a test message! Some line Some more lines
This is obviously not wanted but I don't want to add some complicated WYSIWYG editor to our CRM. Can I allow line-breaks? If so, how?
I don't want to use <pre></pre> tags because I believe it is not supported in all email clients (I could be wrong).

You could use text/plain header, if you don't intend on using any HTML tags in the message. (That would mean no colors, no links, and no text formatting).
You could also make a quick and dirty solution to replace all \ns in your text to <br>\n.

The problem is that html renders all whitespace as single spaces. If you look at the source of the email once it's received, I'll bet the newlines will be there (if they're not, then the problem is on the email generation side).
<pre></pre> is the simplest thing you can do, I think.

A basic solution would be to replace new lines with <br>s.
A smarter one would give special consideration to multiple line breaks (e.g. treating /\n\s*\n/ as a point to end a paragraph and start a new one (</p><p>)).
The specifics would depend on the language you are using to generate the email from the MySQL data. You might want to consider something like a Markdown parser.

You can send emails in two flavours: html and plain text. In Html, line-breaks are not processed (just like in your browser). Looks like this is what you are doing here.
Two solutions: either you send emails in plain text, or you change line-breaks to <br>.

Assuming PHP is in the mix, there's the nl2br() function. Otherwise, rolling your own won't be hard.
The root of this issue is that browsers (mail clients can either use embedded browsers for rendering - Outlook for example - or behave like browsers) will take any amount of whitespace/new line/carriage returns/etc outside of tags in HTML and render them as a single whitespace. This is so you can do things like indent your markup and still have it look sane in the browser.
You will have to insert markup in order to control the rendering as has been has suggested: convert newlines to <br> or <p> tags and so on, much like cms WYSIWYG editors do. Either that or chose a different format for your emails.

Related

Can HTML escape output natively with built-in functions?

What I mean is, is there some functionality built into HTML that you can use to escape output? For example some sort of that tag that would tell the browser that that everything inside this tag should not be considered regular HTML, but treated as regular text.
I know there are things like Google AutoEscape and Microsoft AntiXSS but these are not built into HTML.
And if there isn't the obvious question is why? Since XSS is somewhat common and a well known type of attack that devs can easily miss, why isn't there functionality built into HTMLto prevent this and make it easy on the devs?
There is <pre> that displays its content literally:
The HTML <pre> element (or HTML Preformatted Text) represents
preformatted text. Text within this element is typically displayed in
a non-proportional ("monospace") font exactly as it is laid out in the
file. Whitespace inside this element is displayed as typed.
However, it's not useful protection against XSS or other attacks, since the attacker could simply inject a closing </pre> and then go on doing whatever they want in the rest of the code, which will be interpreted as part of the document.
Security wise, there's no simple alternative to escaping the data you want to output on server side.

Emails sent using PHP have intermittent missing sections

I have a test program that logs the users answers and grade to a log file. I also have it email off this in a nicely formatted html email to the administrator of the tests.
For the most part, this system works. But strangley I've noticed that different email clients are removing portions of the code. It's a table, so a lot of the code is very repetitive, and the sections that get removed are the same every time for each email client (outlook and gmail are the ones I've tested). So for example, if I have a section of the table:
<tr><td style="background:#a66;text-align:center">This is the answer</td></tr>
Then it may come out as:
<tr><td style="backgrouter">This is the answer</td></tr>
And I can't find any correlation between where it does this in each file. Sometimes its near the end, sometimes near the beginning. In some cases, if the test was particularly long, it won't even finish the email.
I have my php outputting the same exact html to a log file on the server, and that always comes out perfect.
What's going on? How do I fix it?
I think this is because of the 998 characters per line limitation on MIME Email.
For more details, you can see below posts:
Reasoning behing 76 being the line length limit for MIME sections, as defined by RFC 2045? (See answer by appleleaf)
HTML safe wrapping of long lines
My solution is to add "\r\n" between HTML tags so it won't exceed 998 characters per line. This works for me.
The one thing I can think to try is to append an !important to the each CSS statement.
<td style="background:#a66 !important;text-align:center !important">
Oh, and just noticed you are missing a closing ;.
That is weird!
Ok, firstly, have you tried spacing and terminating all the styles? eg:
<tr><td style="background-color: #aa6666; text-align: center;">
Secondly, it may be some strange HTML interpretation that Gmail is picking up on although I can't think of a reason for this to happen (eg. style names or reserved function names etc).
Otherwise, I'm stumped. I've only ever seen this happen with yahoo mail, where the HTML in an email broke the yahoo mail layout...
I'd be tempted to use css classes and style them inside a style tag. I've never seen them break.
I am afraid your problem has a deeper problem.
Those clients have some obscure way of processing data, and end up repeatably sent mails from same email dress to render it as it was a quote from some other mail.
I would advise you to check the html consistency of the mails, and to read up suported html for emails.
Also make sure that your email header is saying it is a html formatted email and not plain text. Formatting in the header is also important i would command utf8

Text style affecting the whole site

I've got an input so the user can type either html or plain text. When the user copy & paste text from MS Word, for example, it generates a weird html. Then, when you view that topic, you can see the whole page's style is affected. I don't really know if the generated html has unclosed tags or something, but it looks like it does and thus, the style of the page is affected.
Does anybody know how to "isolate" the html of that div(or whatever the container be) from the whole page's style?
Short of showing the content in an IFRAME, you can't really do that. What I usually do in this situation is apply tag stripping logic to the content as it comes in. You really don't want to allow arbitrary HTML from a security perspective, but even if you don't care what your users input, you should be stripping out invalid HTML tags (Word has a habit of creating tags with weird namespace-looking things like o:p) and running something like Tidy over the result to ensure every tag is properly closed. There are a number of Tidy libraries for .NET out there; here's one.
Here's a quick cut-and-paste of how I've done this in the past. Note that the class implements an interface from the project I used it in, but you get the general idea.
Copying text from word can include <style> tags. The only sure way to isolate these styles is to put the input control in an <iframe>
You can either sanitize the input or display it in an IFrame.
It it were me I'd strip all but basic formatting (e.g., bold, italics) and use Tidy. That's what I end up doing, I strip and convert all the CSS styles of word into <strong>, <em>, etc.

Displaying paragraphs in HTML

I'm writing a web application which needs to bring the stored paragraphs into the front web. The text come from excel work sheet and contains control characters like indent. I want to show the text in the exactly manner as it was in excel. How can I do that then? Thanks in advance.
Without seeing your text to begin with, my initial suggestion would be to wrap it in pre-formatted tags. Note that this won't work for formatting like italics, underlines, etc. Merely white-space:
<pre>
Anything within these tags will
maintain its original formatting. Spaces, new lines, and all.
</pre>
It would be easier for you to save your excel file as web page into new file and use the html of this new file in your application.
You might want to check out RTF-to-HTML conversion.

Garbled html in email

Is there some sort of formatting protocol for html email? We have an automated system that sends reports via email, when I look at the source I see them delimited by line length with an "=" breaking the line. That is, I get something like :
<html><body>some text some text some text some=
some text some text some text some text som<ta=
ble>some text some text some text some text <t=
r><td...
Does anyone have any more information on what this is?
You have to send the message as multi-part MIME. Best practices are:
Always send an HTML and a pure text version this way for mail clients that don't support HTML or some people just turn off HTML in emails (there are security/spam isssues with images although a lot of clients won't auto-download images from non-trusted sites anyway);
Images can be included in the message instead of as straight links. Straight links save on bandwidth but are a spam or even security issue (eg Internet Explorer had a buffer overrun bug with PNG images). Embedded images are references with cid values; and
Only use the most basic HTML. Browser HTML support varies from the primitive to the bizarre. When I looked into doing this we just couldn't get a consistent (or even acceptably different) look and feel on the handful of mail clients we investigated leading us to send our reports as attached PDFs, which are, in a lot of ways, preferable (they can easily be saved, for one).
As to your garbled message, it looks to me that your message isn't being correctly identified as HTML so the mail client is wrapping lines of text at 70 or so characters.
Your message is being translated somehow to "quoted printable" encoding. This is probably an issue with the mail headers you're creating.
Looks like it could be quoted-printable.
How do equals symbols in the HTML look, are they replaced by =3E?
Technically there's nothing wrong with this, but it would be nice to include an alternate plain text,
for those people who can't or don't want to read HTML mail (like me).
The email RFC enforces line length limits, specifically each line should be no longer than 78 characters, excluding CRLF. The equal symbols at the end of each line is just a line delimiter, which will be correctly parsed by any email reader that supports HTML, as long as the necessary headers are in place (Content-Text: text/html). More details on HTML in email conventions can be found here.