The substitute for <pre> tag? - html

I am looking for a substitute for the HTML <pre> tag. Using the <pre> tag, long lines sometimes have issues on different resolutions, and it just gets worse and worse. The problem is I need to use the <pre> tag because the one who will be the updater of the website doesn't know much HTML and also doesn't have the time. So it would be easiest for him to simply copy and paste the text. But while using the <pre> tag makes that easier, it causes other complications down the line, making other things harder and harder. Suggestions?

Use something like Markdown or Textile on the server to generate HTML from a simplified markup language.

Why would you want a substitute? If the <pre> does what you want, use it. But I agree with you, using <pre> is not going to look good.
David has a good suggestion. Try a search on "convert text to html." If you need a WYSIWYG editor for a CMS that you are building, search for "wysiwyg html editor." TinyMCE is popular, also YUI 2: Rich Text Editor

Related

How to separate design and content in a dynamic website?

In normal case, I can separate the text and the style, but how should I do it, when the text is dynamic (it is editable by the admin user)? The user of course wants to use bold, italic, etc, but if I put a common html-editor (I think) I broke the rule of the separation, because there will be html elements in the text. (I can use BB codes, but it is the same.)
In a long term I think it can cause problems when I want to use the text in any non-html environment. Of course I can strip the html tags, but it is not the way I would like to use (not because it won't work, but the original theoretical issues).
In some cases I can break apart the sentences to solve this problem, but I think it's a bad way, because the parts are pointless alone, and it won't be so easily editable too.
Is there any good solution for this?
That's perfectly ok.
You give the user the oppertuniny to set some attributes for the text (BBCodes recomended).
That is content. Then it's part of the design to interpret the attributes and style it.
For example you may provide the feature to let the user define something like [headline]MyHeadline[/headline]. This is pure content.
How to replace [headline] with HTML and how to style the resulting text is up to the design.
Edit: I recommend BBCodes to provide a closed set of features. That may be easier to deal with. You could just use them in another context and interpret them, instead of stripping out HTML.
If the tags entered are semantic, ie they are using an <i> tag for italic, rather than style="font-style:italic", then your design and content are still separate.
Separating design and content is about separating a site's presentation from the readable code, rather than removing the markup altogether.
I'd advise you focus on Semantic HTML.

Using "wysiwyg editors" like markup input in vim

When adding markup to raw text to turn it into html, wysiwyg editors let you select the piece of text you want to apply the markup to and then press something like <C-b> and get some <strong> markup around of it. It's very quick and useful.
I would like to know what options I have to do this using Vim' visual mode, and maybe make it usable only on html/jsp/php files or so. I have been looking for this for a long time. Does anyone have anything nice to share about this? Thanks in advance.
surround.vim should do what you want:
S<a href='/path/to/link'>
Surround would be my choice too because it is universally useful; not only for html.
Zencoding as well can be used for that with <C-y>,.

Stripping HTML but retaining block/inline structure

I would like to convert HTML to plain text but retain the minimum structure.
All sections which contain stuff only the browser needs to see such as <script> and <style> to be stripped completely.
Convert all block tags to <div> and all inline ones to <span> or remove inlines completely without leaving whitespace and turning anything delineatd by block levels into paragraphs with two linebreaks.
The idea is to turn random web pages into something suitable for natural language text processing without artefacts left from naively removing markup artifically break words up or making unrelated blocks look like sentences.
Any binary, library, or source in any programming language is OK.
Is there a standard source preferably machine-readable with a full list of elements defining which are block, which inline, and which are like <script> and <style> above?
The list of HTML 4 Block-level elements is here: http://htmlhelp.com/reference/html40/block.html
The most popular HTML parsing libraries for Perl are HTML::Parser which is a SAX-style parser and HTML::TreeBuilder which is more DOM-like.
Beyond that, you'll have to decide which elements are important and which are not based on what you're trying do to.
You may want to do some research yourself. Then, when you run into a problem, ask a question related to the problem. This sounds more like specification for a project that you want someone to do for you.
For starters, websites use tags for all sorts of things, and the problem is very complex. You would probably want to save information in h# and p tags, but you also may want to save div tag information if they use the id tag. In short, you'd have to write rules for each website you encounter, or employ some sort of fuzzy logic.
Instead of doing it on a tag by tag basis, why not try detecting sentences and grammar, or things likely to be in headings, and choose tags that include those things while stripping out the rest?
Here's my own tool to solve this problem in Perl using HTML::Parser as a github gist: html2txt.pl
It's unfinished and perhaps slightly Windows-centric but I thought I'd share it since a few people have viewed my question here. Feel free to play with it.

WYSIWYG browser editor that generates *good* HTML?

I'm searching for a "suck less" WYSIWYG in-browser X?HTML editor that generates good HTML code.
(no <font>, <foo style="...">, <p></p><span></span><p><span> </span><span><span>blah</span></<span></p> and so on -- <b> and <i> etc is ok).
Should be easy-to-use as it is going to be used by people that do not know what HTML is.
Any suggestions?
Extra points for Copy-and-Paste-from-Word-readiness! :-)
(I found a lot of editors but they all create that <font> and nested <span> crap that breaks site design and bloats a site with one table up to 100kB.)
Download the current version of CKEditor and look at the XHTML output sample. It shows how to use full WYSIWYG but it doesn't generates font or styles. You just need to adjust the configuration to your needs.
What about WYMEditor?
WYMeditor has been created to generate perfectly structured XHTML strict code, to conform to the W3C XHTML specifications and to facilitate further processing by modern applications.
With WYMeditor, the code can't be contaminated by visual informations like font styles and weights, borders, colors, ... The end-user defines content meaning, which will determine its aspect by the use of style sheets. The result is easy and quick maintenance of information.
I've used it a little and while it takes quite a bit of tweaking if you have very specific needs, it does work out of the box for simple XHTML editing. If you set up specially annotated CSS files then it will detect the styles you want users to use and block level elements to which they apply. You can also tell it how to display these styles in the editor (which might be different from how you want them displayed in the resulting XHTML).
Of course, it generates XHTML, not HTML, so it may not meet your exact needs.
Wikipedia has a category for them:
http://en.wikipedia.org/wiki/Category:JavaScript-based_HTML_editors
You can use Markdown with the WMD UI, it's the one used by Stack Overflow. It always produces valid HTML code.
I just recently searched for an editor to create solid documentation, whose output is suitable for Subversion diffs: https://superuser.com/questions/126621/wysiwyg-editor-for-structured-text-suitable-for-svn-versioning
The editor that was suggested - "KompoZer" - turned out to be fantastic, especially because it generates very clean HTML (in my opinion). And I say that, although I had originally preferred something leaner than HTML.
P.S. Reading your question again, I'm not sure, what you mean with a "browser editor" - are you looking for an editor that can be integrated in an HTML page? KompoZer is based on a browser, but it can probably not be integrated in an HTML page.
I recently switched one of my projects to markdown to avoid this exact issue. There's still a bit of a learning curve for the users but I haven't had to deal with the usual issues that occur when they copy/paste content from Word and wonder why it blew up.
Having said that, I prefer CKEditor over TinyMCE and the Telerik controls. I've generally found it generates somewhat cleaner HTML.
There are several WYSIWG editors for embedding within your website out there.
WYMeditor (http://www.wymeditor.org/) looks very nice and seems to be a good fit for targetting clean and valid XHTML results.
Spaw2. Although it's kinda abandoned now.
The Apple Cocoa NSTextView class exports quite nice html, where all the fiddling is done through specifying a style sheet in the header. The Apple TextEdit editor uses this.
http://tinymce.moxiecode.com/ - easy to use, can import form Word, and restrict formatting to predefined CSS styles, to provide consistent output.
This post is 8+ years old now but still relevant...
I found an awesome github page with a curated list of WYSIWYG editors, including a few WYSIWYM ones which guarantee sane html. As of 2018, the most current and best WYSIWYM one looks like ProseMirror, or maybe ORY Editor if you're looking for something to edit entire webpages(!) in one textfield.

Text style affecting the whole site

I've got an input so the user can type either html or plain text. When the user copy & paste text from MS Word, for example, it generates a weird html. Then, when you view that topic, you can see the whole page's style is affected. I don't really know if the generated html has unclosed tags or something, but it looks like it does and thus, the style of the page is affected.
Does anybody know how to "isolate" the html of that div(or whatever the container be) from the whole page's style?
Short of showing the content in an IFRAME, you can't really do that. What I usually do in this situation is apply tag stripping logic to the content as it comes in. You really don't want to allow arbitrary HTML from a security perspective, but even if you don't care what your users input, you should be stripping out invalid HTML tags (Word has a habit of creating tags with weird namespace-looking things like o:p) and running something like Tidy over the result to ensure every tag is properly closed. There are a number of Tidy libraries for .NET out there; here's one.
Here's a quick cut-and-paste of how I've done this in the past. Note that the class implements an interface from the project I used it in, but you get the general idea.
Copying text from word can include <style> tags. The only sure way to isolate these styles is to put the input control in an <iframe>
You can either sanitize the input or display it in an IFrame.
It it were me I'd strip all but basic formatting (e.g., bold, italics) and use Tidy. That's what I end up doing, I strip and convert all the CSS styles of word into <strong>, <em>, etc.