Displaying paragraphs in HTML - html

I'm writing a web application which needs to bring the stored paragraphs into the front web. The text come from excel work sheet and contains control characters like indent. I want to show the text in the exactly manner as it was in excel. How can I do that then? Thanks in advance.

Without seeing your text to begin with, my initial suggestion would be to wrap it in pre-formatted tags. Note that this won't work for formatting like italics, underlines, etc. Merely white-space:
<pre>
Anything within these tags will
maintain its original formatting. Spaces, new lines, and all.
</pre>

It would be easier for you to save your excel file as web page into new file and use the html of this new file in your application.

You might want to check out RTF-to-HTML conversion.

Related

MS Word HTML - Is it possible to assign multiple classes to an element?

I'm working on a document printout from MS PowerApps. Best method I have found thus far is to write it in HTML and but save as a .doc file so that it opens in word online. From there, the user can save as PDF. So far, this works surprisingly well and allows for a great deal of control over the output, but one limitation I have found is that word does not seem to recognize multiple classes on a single element. This is kind of a pain as I am using a lot of tables, so I have to either create a new class for every single cell cell format I need or use inline CSS instead. Not huge issue, but it makes for messy code and time consuming updates. Is there any way to achieve this?
Edit:
File here: https://wetransfer.com/downloads/29323f5c8060a374ed23e8ff2b6e9fd320210116015928/c991f4
It's designed to open in word online but it works in desktop as long as the view mode is set to print layout and not web layout.
Edit2: I should note that I did not figure out the headers all by myself, but worked off of some code provided by Georgi Nikolov found here
You can't write HTML or CSS Code into MS Word.
MS Word is Rich Text Editor
Rich Text: Rich Text Format (RTF) is a file format that allows the exchange of text files between different editors and it has its formatting so we can't use it to write HTML.
HTML Must be written in Plain Text Editor because Plain text contains no formatting, only line breaks and spacing. Therefore no text formatting (such as font sizes and colors, bolding or italics) can be used.
some examples for Plain TextEditors that you can use to write HTML and CSS are Notepad and Notepad++

span lang="en-gb" gets generated after copying text

I copy a text from a source in a platform. It is a private platform that has a box where you can type text. There is a button where you can see the HTML source code afterwards. I copied numerous texts with no problem. When I am trying to copy-paste the above, I noticed that in the HTML code a specific tag gets produced.
<p><strong><em><span lang="en-gb">Week of the 5th of September</span></em></strong></p>
So, my question is, how is that possible. Does a text after copying it generates specific tags? So, in the copying process, some things get copied apart from the text we can see... Also, this could be happening because the source text (that is about to be copied) contains characters that are not supported from the unicode set up in the platform (web application)?
I am really curious to understand what is happening.
Based on the fact you said it had a button where you can view source, this sounds like a WYSIWIG (What you see is what you get) editor like CKeditor, TinyMCE, Froala, etc. They take standard HTML textarea elements and using Javascript and CSS convert them into more robust editors. They allow you to do simple text formatting in the textarea, upload images, view source, etc.
They are used a lot in blogs and for content editing for people that don't write code but want to be able to manage and maintain content in web sites. For instance if you type a "paragraph" of text in one of these it will automatically wrap it with the appropriate <p> tags using Javascript.
In your case you're adding content in this box, and it's simply applying the formatting to it with Javascript. It will do the same if you just type in the box, vs. copy/paste.
Here are some links to WYSIWIG editors so you can learn more about how they function:
http://ckeditor.com/
https://www.tinymce.com/
https://www.froala.com/wysiwyg-editor
Fun Fact: The editor you used when you typed your question on Stack Overflow uses one of these. https://meta.stackexchange.com/questions/121981/stackoverflow-official-wmd-editor
It`s not much information, so I‘ll take a guess:
For <strong><em>: The website could eventually use a div with the contenteditable="true" attribute (more info on mdn) as the input method. When you then paste in text from another application that already has markup like bold or italic, it‘s converted to html tags.
The <span lang="en-gb"> could come from the browser, another application or the website through analyzing the text and adding this.

Allow whitespace and line-breaks in HTML input field?

In our internal CRM we have a simple html input textarea where you can leave notes and messages. We later use this information to email this, only since that email is in HTML the formating is all wrong.
So if for example I have the following in my MYSQL table:
This is a test message!
Some line
Some more lines
If we later email this it comes out as:
This is a test message! Some line Some more lines
This is obviously not wanted but I don't want to add some complicated WYSIWYG editor to our CRM. Can I allow line-breaks? If so, how?
I don't want to use <pre></pre> tags because I believe it is not supported in all email clients (I could be wrong).
You could use text/plain header, if you don't intend on using any HTML tags in the message. (That would mean no colors, no links, and no text formatting).
You could also make a quick and dirty solution to replace all \ns in your text to <br>\n.
The problem is that html renders all whitespace as single spaces. If you look at the source of the email once it's received, I'll bet the newlines will be there (if they're not, then the problem is on the email generation side).
<pre></pre> is the simplest thing you can do, I think.
A basic solution would be to replace new lines with <br>s.
A smarter one would give special consideration to multiple line breaks (e.g. treating /\n\s*\n/ as a point to end a paragraph and start a new one (</p><p>)).
The specifics would depend on the language you are using to generate the email from the MySQL data. You might want to consider something like a Markdown parser.
You can send emails in two flavours: html and plain text. In Html, line-breaks are not processed (just like in your browser). Looks like this is what you are doing here.
Two solutions: either you send emails in plain text, or you change line-breaks to <br>.
Assuming PHP is in the mix, there's the nl2br() function. Otherwise, rolling your own won't be hard.
The root of this issue is that browsers (mail clients can either use embedded browsers for rendering - Outlook for example - or behave like browsers) will take any amount of whitespace/new line/carriage returns/etc outside of tags in HTML and render them as a single whitespace. This is so you can do things like indent your markup and still have it look sane in the browser.
You will have to insert markup in order to control the rendering as has been has suggested: convert newlines to <br> or <p> tags and so on, much like cms WYSIWYG editors do. Either that or chose a different format for your emails.

Translate HTML files to another language

I have a website with Dutch text which I want to translate to English. Is there a fast way of doing this with keeping the HTML tags(<strong>,<span>) in tact. I know I can just copy the parsed TEXT into a translator but this will remove the formatting.
I also know that at the end I have to go trough the text manually to fix some minor spelling and grammar.
Online translators are good to turn foreign text into something that can be understood, but they are useless for producing quality translations. Even if you fix obvious problems at the end, you will get an amateurish word-by-word translation. If you want your visitors to take you seriously, you should translate from scratch.
If you want to preserve the HTML formatting at the same time as translating, you will have to work directly with the HTML source and update the text yourself without touching the formatting.
You may be able to use an XML editor like XmlSpy that will let you edit text nodes directly without touching the tagging, but this requires that the HTML is actually XHTML. You may still need to translate some attributes (such as title and alt attributes).
Is a virtual traslate a good option for you? Because if you paste google translato script into your page source, it will translate your text on the site, and the formating will stay there too. http://translate.google.com/translate_tools

Text style affecting the whole site

I've got an input so the user can type either html or plain text. When the user copy & paste text from MS Word, for example, it generates a weird html. Then, when you view that topic, you can see the whole page's style is affected. I don't really know if the generated html has unclosed tags or something, but it looks like it does and thus, the style of the page is affected.
Does anybody know how to "isolate" the html of that div(or whatever the container be) from the whole page's style?
Short of showing the content in an IFRAME, you can't really do that. What I usually do in this situation is apply tag stripping logic to the content as it comes in. You really don't want to allow arbitrary HTML from a security perspective, but even if you don't care what your users input, you should be stripping out invalid HTML tags (Word has a habit of creating tags with weird namespace-looking things like o:p) and running something like Tidy over the result to ensure every tag is properly closed. There are a number of Tidy libraries for .NET out there; here's one.
Here's a quick cut-and-paste of how I've done this in the past. Note that the class implements an interface from the project I used it in, but you get the general idea.
Copying text from word can include <style> tags. The only sure way to isolate these styles is to put the input control in an <iframe>
You can either sanitize the input or display it in an IFrame.
It it were me I'd strip all but basic formatting (e.g., bold, italics) and use Tidy. That's what I end up doing, I strip and convert all the CSS styles of word into <strong>, <em>, etc.