Clean HTML table of formatting? - html

Anyone know of a way to clean a <table> of all formatting leaving just the basic tags and text?
I have tries Komposer which was useless and even added more formatting rubbish of its own. I them tried Aptana but that only seems to be a text editor, again no use at all.
Any ideas?

When you would like to clean HTML tables (e.g. when you copy them from Word or Excel to an HTML editor) you can use the online Table Cleaner at https://www.r2h.nl/tablecleaner
I strips all the formatiing and returns only clean HTML code so will you have a table without any styling.

How about using a text editor that supports find and replace using regular expressions (such as Notepad++) to remove the unwanted attributes using one regex, and the font tags using another regex?
To match the attributes you need to remove the following regex should do the job:
( style| class| height| width)=("[A-Za-z0-9:;_ -]*"|'[A-Za-z0-9:;_ -]*'|[A-Za-z0-9:;_-]*)
To match font tags, try
<font.*font>
(I've tested these regular expressions with http://gskinner.com/RegExr/).
Edit
It turns out that Notepad++ does not support the logical OR operator in regular expressions. An alternative would be to use another text editor that does, or to write a small app/script to perform the replacements.

Related

How to remove specific HTML tags in Visual Studio

I need to copy/paste text from Microsoft Powerpoint to Visual studio 2010's aspx page. When I copy the text it copies several unwanted tags (like style tags, span, p tag etc.). How can I cleanup that copied text in Visual Studio? I have also installed Resharper, is it useful in removing unwanted tags? For example I want to remove all style tags from a document or want to remove all span tags. I want to cleanup/remove unwanted tags in a single command.
After you have already pasted the text, it will be pretty hard to automatically determine between unwanted and actual tags, perhaps a complicated Replace All with Regex would work. But there are ways to copy/paste pure text, look here: How to copy and paste code without rich text formatting?.
As of now, there is now easy way to remove specific HTML tags. I would suggest you to use find and replace feature of Notepad++ where you can easily write a regular expression to replace tags. Also, I would suggest you to use these links to clean up your HTML
WordToHTML CleanHTML. I hope this helps you to resolve a part of your concern.

Need HTML characters stripped out of excel export, but effects preserved

I'm exporting data using CF9's cfspreeadsheet tags and functions, some columns have HTML formatted text in them. I need to strip out the HTML tags, and convert characters like &lt and &amp to their equivalents. However, I'd also like to keep the effects of bold tags and paragraphs tags if possible.
I know I can use rereplace, and others to brute force the output, but I was hoping for a more elegant solution.
Any ideas?
Thanks for the help!
I need to strip out the HTML tags, and convert characters like &lt and
&amp to their equivalents. However,
I'd also like to keep the effects of
bold tags and paragraphs tags if
possible.
I know I can use rereplace, and others
to brute force the output, but I was
hoping for a more elegant solution.
I do not think such a function exists in CF. It would require some sort of html=>excel conversion of the styles. This thread says that functionality did not even exist in POI (which is used by cfspreadsheet) until recently. So my guess would be it does not exist within the CF spreadsheet functions either.
If you are willing to work lower level, you might check the latest version of POI. See if the mentioned patch is available in the main distribution. Otherwise, rereplace() sounds like the simplest approach.

csharp code to remove all extraneous microsoft html formatting

is there any way to programatically remove all microsoft html formatting that gets put on and simply render it as regular html.
i want to remove all the extra tags as i am trying to load it into tinymce but tinymce doesn't seem to be able to render it.
I've used the regular expressions from these articles:
http://tim.mackey.ie/CleanWordHTMLUsingRegularExpressions.aspx
How do I filter all HTML tags except a certain whitelist?
In my case I wanted to restrict everyone down to a small whitelist of tags. Especially those who paste from Word. TinyMCE has a property "valid_elements" which does exactly this.

Can I reformat HTML in Visual Studio without removing blank lines?

The HTML formatting in Visual Studio works great -- especially considering you can pick a selection and just format that. You can just select a tag or block, right click and do 'Format Selection'. You can also reformat the whole document.
However I like to use a lot of whitespace in my documents to keep things organized and the reformat HTML compresses (deletes!) this whitespace.
Are there any plugins, or external tools for formatting HTML that might make it possible to leave vertical space untouched?
Edit: Bonus points: If anybody has 2010 installed can they check if it already has this feature? If it DOESN'T have this feature I'd like to submit a feature request. Fortunately the new editor is much more extensible, but I don't know if that extends to customization of something like this.
Visual Studio 2012 preserves empty line breaks when formatting HTML. I ran across this SO post because I was looking for an option to remove empty lines!
If the whitespace in your documents has some systematic logic to it,(such as, a Line break before and after each Table Tag) then you might be able to get the kind of behaviour you want when applying formatting.
Check out the options dialog from :
Tools -> Options -> Text Editor -> HTML -> Format -> "Tag specific Options"
This pretty much allows you customize the formatting of each type of tag to a reasonably minute level. In your particular case, the "Line breaks" option might be useful or atleast relevant.
You might also like to try out a custom HTML formatter such as HTML Tidy. Many powerful editors like Notepad++ and UltraEdit have a built-in HTML Tidy module for formatting. Personally, though, I find the formatting capabilities of Visual Studio sufficient for most requirements.
The short answer is no, 2008 will always reformat it the way it has in its settings. You will have to configure the settings in Tag Specific Options to match how you like your HTML to display.
It can do up to two spaces before and after a tag, but unfortunately there's no way to maintain whitespace or formatting for comments (it only recognises start + end tags that are the same)
If you have problems with it matching your coding style, my question about this might help out

Using vim for html decoration

Do you have any preferred methodology for managing html formatting tags in vim?
The best I've come up with is creating some macros to insert tags at the current cursor position - ctrl-i for <i>, ctrl-j for </i>, etc.
It would be handy to be able to, say 2w{something} to italicize 2 words, for instance, without needing to navigate the cursor to the end point. The best option I can think of would let me use the same keystrokes I use to so flexibly delete a string of text that might be word count, regex match, etc. but would insert both opening and closing tags.
Give a look to the surround.vim plugin.
I use Christian Robinson's HTML macros when I have to traffic in raw HTML.
Generally, I prefer to use reStructuredText and generate HTML.