MVC 4 - Displaying HTML vs. Straight Text - html

I have an MVC 4 View where the user can enter straight text or HTML into a text area control. When this text is displayed, I use #HTML.Raw() to display it. If the user entered HTML, everything displays based on the HTML. If he/she didn't all the line breaks are ignored and the text just runs together.
So, what I would like to do is to somehow test to see if the user entered HTML or straight text. If straight text, when displaying the text, I'd like to replace all the line break characters with an HTML break tag to maintain the formatting.
Is there a somewhat reliable way to detect if the text contains
HTML?
Is there a better/easier way to do what I'm trying to do?

Is there a somewhat reliable way to detect if the text contains HTML?
Not really. The problem becomes hardest when someone is writing a plain text enter that discusses HTML.
Is there a better/easier way to do what I'm trying to do?
I quite like Stackoverflow's approach. Just use markdown and provide clear instructions on how to use it beside the editing window.

Related

MS Word HTML - Is it possible to assign multiple classes to an element?

I'm working on a document printout from MS PowerApps. Best method I have found thus far is to write it in HTML and but save as a .doc file so that it opens in word online. From there, the user can save as PDF. So far, this works surprisingly well and allows for a great deal of control over the output, but one limitation I have found is that word does not seem to recognize multiple classes on a single element. This is kind of a pain as I am using a lot of tables, so I have to either create a new class for every single cell cell format I need or use inline CSS instead. Not huge issue, but it makes for messy code and time consuming updates. Is there any way to achieve this?
Edit:
File here: https://wetransfer.com/downloads/29323f5c8060a374ed23e8ff2b6e9fd320210116015928/c991f4
It's designed to open in word online but it works in desktop as long as the view mode is set to print layout and not web layout.
Edit2: I should note that I did not figure out the headers all by myself, but worked off of some code provided by Georgi Nikolov found here
You can't write HTML or CSS Code into MS Word.
MS Word is Rich Text Editor
Rich Text: Rich Text Format (RTF) is a file format that allows the exchange of text files between different editors and it has its formatting so we can't use it to write HTML.
HTML Must be written in Plain Text Editor because Plain text contains no formatting, only line breaks and spacing. Therefore no text formatting (such as font sizes and colors, bolding or italics) can be used.
some examples for Plain TextEditors that you can use to write HTML and CSS are Notepad and Notepad++

span lang="en-gb" gets generated after copying text

I copy a text from a source in a platform. It is a private platform that has a box where you can type text. There is a button where you can see the HTML source code afterwards. I copied numerous texts with no problem. When I am trying to copy-paste the above, I noticed that in the HTML code a specific tag gets produced.
<p><strong><em><span lang="en-gb">Week of the 5th of September</span></em></strong></p>
So, my question is, how is that possible. Does a text after copying it generates specific tags? So, in the copying process, some things get copied apart from the text we can see... Also, this could be happening because the source text (that is about to be copied) contains characters that are not supported from the unicode set up in the platform (web application)?
I am really curious to understand what is happening.
Based on the fact you said it had a button where you can view source, this sounds like a WYSIWIG (What you see is what you get) editor like CKeditor, TinyMCE, Froala, etc. They take standard HTML textarea elements and using Javascript and CSS convert them into more robust editors. They allow you to do simple text formatting in the textarea, upload images, view source, etc.
They are used a lot in blogs and for content editing for people that don't write code but want to be able to manage and maintain content in web sites. For instance if you type a "paragraph" of text in one of these it will automatically wrap it with the appropriate <p> tags using Javascript.
In your case you're adding content in this box, and it's simply applying the formatting to it with Javascript. It will do the same if you just type in the box, vs. copy/paste.
Here are some links to WYSIWIG editors so you can learn more about how they function:
http://ckeditor.com/
https://www.tinymce.com/
https://www.froala.com/wysiwyg-editor
Fun Fact: The editor you used when you typed your question on Stack Overflow uses one of these. https://meta.stackexchange.com/questions/121981/stackoverflow-official-wmd-editor
It`s not much information, so I‘ll take a guess:
For <strong><em>: The website could eventually use a div with the contenteditable="true" attribute (more info on mdn) as the input method. When you then paste in text from another application that already has markup like bold or italic, it‘s converted to html tags.
The <span lang="en-gb"> could come from the browser, another application or the website through analyzing the text and adding this.

Translate HTML files to another language

I have a website with Dutch text which I want to translate to English. Is there a fast way of doing this with keeping the HTML tags(<strong>,<span>) in tact. I know I can just copy the parsed TEXT into a translator but this will remove the formatting.
I also know that at the end I have to go trough the text manually to fix some minor spelling and grammar.
Online translators are good to turn foreign text into something that can be understood, but they are useless for producing quality translations. Even if you fix obvious problems at the end, you will get an amateurish word-by-word translation. If you want your visitors to take you seriously, you should translate from scratch.
If you want to preserve the HTML formatting at the same time as translating, you will have to work directly with the HTML source and update the text yourself without touching the formatting.
You may be able to use an XML editor like XmlSpy that will let you edit text nodes directly without touching the tagging, but this requires that the HTML is actually XHTML. You may still need to translate some attributes (such as title and alt attributes).
Is a virtual traslate a good option for you? Because if you paste google translato script into your page source, it will translate your text on the site, and the formating will stay there too. http://translate.google.com/translate_tools

Displaying paragraphs in HTML

I'm writing a web application which needs to bring the stored paragraphs into the front web. The text come from excel work sheet and contains control characters like indent. I want to show the text in the exactly manner as it was in excel. How can I do that then? Thanks in advance.
Without seeing your text to begin with, my initial suggestion would be to wrap it in pre-formatted tags. Note that this won't work for formatting like italics, underlines, etc. Merely white-space:
<pre>
Anything within these tags will
maintain its original formatting. Spaces, new lines, and all.
</pre>
It would be easier for you to save your excel file as web page into new file and use the html of this new file in your application.
You might want to check out RTF-to-HTML conversion.

Shortened HTML text and malformed tags

In my web application I intend to shorten a lengthy string of HTML formatted text if it is more than 300 characters long and then display the 300 characters and a Read More link on the page.
The issue I came across is when the 300 character limit is reached inside an HTML tag, example: (look for HERE)
<a hreHERE="somewhere">link</a>
<a hre="somewhere">liHEREnk</a>
When this happens, the entire page could become ill-formatted because everything after the HERE in the previous example is removed and the HTML tag is kept open.
I thinking of using CSS to hide any overflow beyond a certain limit and create the "Read More" link if the text is beyond a certain number, but this would entail me including all the text on the page.
I've also thought about splitting the text at . to ensure that it's split at the end of a sentence, but that would mean I would include more characters than I needed.
Is there a better way to accomplish this?
Note: I have not specified a server side language because this is more of a general question, but I'm using ASP.NET/C# .
Extract the plaintext from the HTML, and display that. There are libraries (like the HTML Agility Pack for .NET) that make this easy, and it's not too hard to do it yourself with an XML parser. Trying to fix a truncated HTML snippet is a losing cause.
One option I can think of is to cut it off at 300 characters and make sure the last index of '<' is less than the last index of '>'. If it is, truncate the string right before the last instance of '>', then use a library like tidy html to fix tags that are orphaned (like the </a> in the example).
There are problems with this though. One thing being if there are 300 chars worth of nothing but HTML - your summary will be displayed as empty.
If you do not need the html to be displayed it's far easier to simply extract the plain text and use that instead.
EDIT: Added using something like tidy html for orphaned tags. Original answer only solved cutting thing mid-tag, rather than within an opening/closing tag.