I have a peculiar situation where I have written VBA code that performs a few find/replace actions on HTML special character codes.
I am trying to publish this code to my website in HTML format, but the part where my code references codes like " or & are turning into the actual HTML characters (eg &).
Is there an HTML trick to prevent this from happening, so I can literally display the text " or & in my HTML code?
You may escape it by using a mixture of XML Escaped code and plain text in HTML.
<body>
<ul>
<li>& = &</li>
<li>' = '</li>
<li>" = "</li>
<li>< = <</li>
<li>> = ></li>
</ul>
</body>
To write the actual text value " or & in HTML code, you can use their corresponding character entity references.
For ", you can use " or ".
For &, you can use & or &.
For example, to write the text Hello & World in HTML code, you would write:
<h1>Hello & World</h1>
(Output='Hello & World')
<h1> Gayan said, "I love Coding!"</h1>
(Output='Gayan said, "I love Coding!"')
Related
Texts and/or markups are rendered to output as-is without any html-encoding as we already expect.
For the following, the plain text with markup must be html-encoded.(We don't care about the code output here.)
#{ var theVar = "xyz"; }
some text & other text >>#theVar
So, the html in the output;
some text & other text >>xx
So, when we want to write some static text that needs to be html-encoded we have to use constructs like;
#{ var theVar = "xyz"; }
#("some text & other text >>")#theVar
to get the following html in the output;
some text & other text >>xyz
and for clarity when viewed in browser;
some text & other text >>xyz
So, is there a simple way of doing this? Some shortcut to html encode texts instead of using #("...") for each text which will start to look nasty when there are multiples of them.
What would be the best practice? How do you do this?
So, it is not a big concern when we specify utf-8 encoding for the document. It is not required to html encode characters as entity references except special(<, >, &, ", ') characters when utf-8 encoding used for the document.
Even using & by itself is not wrong for lenient browsers but there would be ambigous cases to consider like volt&. So, it would be better to html encode all of these special characters.
Check the W3 Consortium articles "When to use escapes" section;
http://www.w3.org/International/questions/qa-escapes#use
I'm writing a website using HTML; How do i use character ' for words?
When I try it ruins all the code which follows.
Such as - <h3 class "heading h-03">Children's play</h3>
however 's play</h3>' and everything after all goes blue
Thanks x
In HTML, it's best to use codes. For example:
& = &
Similarly, you can use ' for '
See ASCII HTML Codes
How would I write the entity name in HTML and not have it do its function? Example: I'm doing a tutorial and want to tell someone how to use the non-breaking space in their code ( ) So, how to actually write out "&" "n" "b" "s" "p" ";" but have it be fluid with no spaces?
You can use & instead of &
So will be
You will need to write out a part of the code, in this example, I'll use the ampersand. Instead of writing , write out the ampersand, &, and then write nbsp;. Your final result should be , which will display on the webpage.
You could simply use the HTML for the ampersand as in which would display what you're looking for, i.e.
JavaScript can be used to change the text of HTML element, below example adds non-blocking space entity character into span element.
<p>A common character entity used in HTML is the non-breaking space: <span id="myid"></span></p>
<script>
document.getElementById("myid").textContent= " ";
</script>
I am looking for a regular expression that can convert my font tags (only with size and colour attributes) into span tags with the relevant inline css. This will be done in VB.NET if that helps at all.
I also need a regular expression to go the other way as well.
To elaborate below is an example of the conversion I am looking for:
<font size="10">some text</font>
To then become:
<span style="font-size:10px;">some text</span>
So converting the tag and putting a "px" at the end of whatever the font size is (I don't need to change/convert the font size, just stick px at the end).
The regular expression needs to cope with a font tag that only has a size attribute, only a color attribute, or both:
<font size="10">some text</font>
<font color="#000000">some text</font>
<font size="10" color="#000000">some text</font>
<font color="#000000" size="10">some text</font>
I also need another regular expression to do the opposite conversion. So for example:
<span style="font-size:10px;">some text</span>
Will become:
<font size="10">some text</font>
As before converting the tag but this time removing the "px", I don't need to worry about changing the font size.
Again this will also need to cope with the size styling, font styling, and a combination of both:
<span style="font-size:10px;">some text</span>
<span style="color:#000000;">some text</span>
<span style="font-size:10px; color:#000000;">some text</span>
<span style="color:#000000; font-size:10px;">some text</span>
I am extracting basic HTML & text from CDATA tags in an XML file and then displaying them on a web-page. The text also appears in a rich-text editor so it can be edited/translated, and then saved back into a new XML file. The XML is then going to be read by a flash file, hence the need to use old-fashioned HTML.
The reason I want to convert this code is mainly for display purposes. In order to show the text sizes correctly and for it to work with my rich text editor they need to be converted to XHTML/inline CSS. The rich text editor will also generate XHTML/inline CSS that I need to convert 'back' to standard HTML before it is saved in the XML file.
I don't know a lot about XSLT transformation but I'm not sure that is what I need for this, or it might be more than I need right now, but please correct me if I'm wrong (and point me in the direction of any helpful links you may have on it).
I know the temptation will be to tell me a number of different ways to set up my code to do what I want but there are so many other permutations I haven't even mentioned which have forced me down this route, so literally all I want to do is convert a string containing standard HTML to XHTML/inline CSS, and then the same but the other way round.
Since some people have already given you warnings I'll skip ahead to the regex solution.
First off, I'll lay out a couple of assumptions that aren't set in stone but allow the problem to be approached as you presented it without me doing extra work:
You can use LINQ (otherwise this will need to be updated)
Font/Span tags will be in lowercase (font and span not FONT or SpAn)
Each style attribute value will be properly formatted, ending with a semi-colon ; similar to your samples
Case-sensitivity can be worked in rather simply via the RegexOptions.IgnoreCase although, in turn, the dictionary values will need to be stored as ToLower to keep everything constant when the values are later accessed. The 3rd point ensures splitting text doesn't go haywire.
Below is a sample program that demonstrates the replacements.
Sub Main
Dim inputs As String() = { _
"<font size=""10"">some text</font>", _
"<font color=""#000000"">some text</font>", _
"<font size=""10"" color=""#000000"">some text</font>", _
"<font color=""#000000"" size=""10"">some text</font>", _
"<font size=""10"">some text</font> other text <font color=""#000000"">some text</font>", _
"<span style=""font-size:10px;"">some text</span>", _
"<span style=""color:#000000;"">some text</span>", _
"<span style=""font-size:10px; color:#000000;"">some text</span>", _
"<span style=""color:#000000; font-size:10px;"">some text</span>", _
"<span style=""color:#000000; font-size:10px;"">some text</span> other <font color=""#000000"" size=""10"">some text</font>" _
}
Dim pattern As String = "<(?<Tag>font|span)\b(?<Attributes>[^>]+)>(?<Content>.+?)</\k<Tag>>"
Dim rx As New Regex(pattern)
For Each input As String In inputs
Dim result As String = rx.Replace(input, AddressOf TransformTags)
Console.WriteLine("Before: " & input)
Console.WriteLine("After: " & result)
Console.WriteLine()
Next
End Sub
Public Function TransformTags(ByVal m As Match) As String
Dim rx As New Regex("(?<Key>\b[a-zA-Z]+)=""(?<Value>.+?)""")
Dim attributes = rx.Matches(m.Groups("Attributes").Value).Cast(Of Match)() _
.ToDictionary(Function(attribute) attribute.Groups("Key").Value, _
Function(attribute) attribute.Groups("Value").Value)
If m.Groups("Tag").Value = "font" Then
Dim newAttributes = String.Join("; ", attributes.Select(Function(item) _
If(item.Key = "size", "font-size", item.Key) _
& ":" _
& If(item.Key = "size", item.Value & "px", item.Value)) _
.ToArray()) _
& ";"
Return "<span style=""" & newAttributes & """>" & m.Groups("Content").Value & "</span>"
Else
Dim newAttributes = String.Join(" ", attributes("style") _
.Split(New Char() {";"c}, StringSplitOptions.RemoveEmptyEntries) _
.Select(Function(s) _
s.Trim().Replace("px", "").Replace("font-", "").Replace(":", "=""") _
& """") _
.ToArray())
Return "<font " & newAttributes & ">" & m.Groups("Content").Value & "</font>"
End If
End Function
If you have any questions let me know. Some enhancements can be made if a large amount of text is expected to be processed. For example, the regex object in the TransformTags method can be moved to the class level so it isn't recreated on every transformation.
EDIT: Here's the explanation of the first pattern: <(?<Tag>font|span)\b(?<Attributes>[^>]+)>(?<Content>.+?)</\k<Tag>>
<(?<Tag>font|span)\b - opening < and matches the font or span tag and uses a named group of Tag. The \b matches a word boundary to ensure nothing beyond the tag names specified are matched.
(?<Attributes>[^>]+)> - named group, Attributes, matches everything else in the tag as long as it is not a > symbol, then it matches the closing >
(?<Content>.+?) - named group, Content, matches anything between the tag
</\k<Tag>> - matches the closing tag by back-referencing the Tag group
The second pattern is used to match key-value pairs for the attributes: (?<Key>\b[a-zA-Z]+)=""(?<Value>.+?)""
(?<Key>\b[a-zA-Z]+) - named group, Key, matches any word (alphabets) starting at a word boundary
="" - matches the equal symbol and opening quotation
(?<Value>.+?) - named group, Value, matches anything up till the closing quotation mark. It is non-greedy by specifying the ? symbol after the + symbol. It could've been [^""]+ similar to how the Attributes group was handled in the first pattern.
"" - matches the closing quotation
I don't think regular expressions are the way to go for this problem.
Stick to XML based technologies, such as XSLT to do the transformation.
You shouldn't try to parse HTML with regex. Use XML parsing instead.
I have found a solution to this issue. However it is not one that involves using a regular expression. Though I am very interested in the idea of creating a custom program in and GUI creation tool to accomplish this. The link below will provide the easiest solution to convert any deprecated font tags to inline span tags. This is a crucial and awesome tool.
http://tinymce.moxiecode.com/tryit/full.php
Clicking on html will show the html code for the message. Then you can replace that with the html that has the deprecated <font> tags and they will be converted to inline <span> tags.
It might a good idea to explain why you need to do this, as unless there's a particular goal, this seems to turn one kind of non-semantic code into another kind of non-semantic code.
Might the time be better spent converting to separate HTML and CSS code, based on class and id attributes?
I agree with both comments saying xslt should be used for xml transformation and that style shouldn't be mixed in html... but here is a starting point for your regex (perl, I don't know any VB but it shouldn't be too far) if you're in a hurry :
's/<font(.*)size="([^ ]*)"(.*)color="([^ ]*)"(.*)<\/font>/<span$1style="font-size:$2px;color:$4"$3$5<\/span>/g'
I don't think you can do this in one regex, this one handles the case where size comes before color, you can derive the 3 missing regex from here...
How can i add a line break to the text area in a html page?
i use VB.net for server side coding.
If it's not vb you can use
(ascii codes for cr,lf)
Add a linefeed ("\n") to the output:
<textarea>Hello
Bybye</textarea>
Will have a newline in it.
You could use \r\n, or System.Environment.NewLine.
If you're inserting text from a database or such (which one usually do), convert all "<br />"'s to &vbCrLf. Works great for me :)
In a text area, as in the form input, then just a normal line break will work:
<textarea>
This is a text area
line breaks are automatic
</textarea>
If you're talking about normal text on the page, the <br /> (or just <br> if using plain 'ole HTML4) is a line break.
However, I'd say that you often don't actually want a line break. Usually, your text is seperated into paragraphs:
<p>
This is some text
</p>
<p>
This is some more
</p>
Which is much better because it gives a clue as to how your text is structured to machines that read it. Machines that read it include screen readers for the partially sighted or blind, seperating text into paragraphs gives it a chance of being presented correctly to these users.
I believe this will work:
TextArea.Text = "Line 1" & vbCrLf & "Line 2"
System.Environment.NewLine could be used in place of vbCrLf if you wanted to be a little less VB6 about it.
Escape sequences like "\n" work fine ! even with text area! I passed a java string with the "\n" to a html textarea and it worked fine as it works on consoles for java!
Here is my method made with pure PHP and CSS :
/** PHP code */
<?php
$string = "the string with linebreaks";
$string = strtr($string,array("."=>".\r\r",":"=>" : \r","-"=>"\r - "));
?>
And the CSS :
.your_textarea_class {
style='white-space:pre-wrap';
}
You can do the same with regex (I'm learning how to build regex with pregreplace using an associative array, seems to be better for adding the \n\r which makes the breaks display).