Escape HTML text - html

I am writing html files from a stack. This is a bit of a pain because for every line I have to write something like the following if the file contains quotes.
write "<div id=hidden-" & quote & myKanton & quote && "style=" & quote & "display:block;" &quote&&"class=" &quote & "popuptable" &quote& ">" & LF to file tOutputFileCH
Now I have to add a lot of html code again and I'm wondering if there is an easier way to be able to do something like:
write escaped("my html numbers and "txt" with quotes") to file
I do not need variables within the html text.

Often, people use functions like
function q theText
replace "'" with quote in theText
return theText
end q
which can be used as
write q("<div id=hidden-'" & myKanton & "' style='display:block;'" & "class='popuptable'>" & LF) to file tOutputFileCH
You can use a string like in above example but you can also use any container:
get q(myVariable)
put q(it) into field 1
put q(field 1) into field 2
put q(url myUrl) into url myOtherUrl
put q(the cProperty of me) into myVar
-- etc etc etc
You can also use ยด or ` instead of ' if you change the q function.
By the way, I noticed that you don't include hidden- in the quotes. Are you sure that's correct?

HTML allows use of quotes and single quotes, so you can...
put "<div style='border:1px'>" into tHTML
LiveCode's format command allows you to escape double quotes...
put format("my html numbers and \"txt\" with quotes") into tData

It is working now. I put the html lines in a custom stack property and use that as input when writing the file. Works perfectly. It even seems to work without the q function.
write ( the cMapOverlay of stack "AfaConverter" ) & LF to file tOutputFileCH
I also tried that because
onmouseover="nhpup.popup($('#hidden-VS').html(), {'width': 400});" href="./kantone/index_kanton_VS.html"
this is trouble with q without adaptions because ' is replaced with " which is a problem.

There are some good answers here. Let me suggest another approach. You could use a quoting function, but in a slightly different way:
function q pString
return quote & pString & quote
end q
Then use the LiveCode merge() function. Merge evaluates any LiveCode expression or variable enclosed in [[ ]] and incorporates it into the enclosing quoted text:
write merge("my html numbers and [[q("txt")]]") to file

Related

Replacing HTML tags for plain tags does not work for me properly

so I have tried to replace < an angle bracket with < mark, so I could simply display plain HTML tags on my webpage, however when I do the following:
Visually select the line where I want to replace the bracket by pressing V
Then I try to type: s/</</g
The problem is It will keep the first bracket on place and Its adds lt; only
But I would want to replace the bracket in full, not just add lt; mark, because that wont display the plain HTML tags in the browser.
PLEASE CHECK THE SCREENSHOT FOR FURTHER REFFERENCES.
Why does not work? What am I doing wrong?
From :help :s:
[...]
{string} can be a literal string, or something
special; see |sub-replace-special|.
[...]
Then, from :help sub-replace-special (slightly edited to reduce noise):
[...]
Otherwise these characters in {string} have a special meaning:
& replaced with the whole matched pattern
\& replaced with &
[...]
This means that & is a special character that is supposed to be used in the "replacement" part of a substitution to reuse the whole match. In this case:
:s/</</g
the whole match is < and the replacement is thus <lt;.
In order to use a literal & in the replacement, you must escape it:
:s/</\</g

Adding a signature in pdf in powerapps

i am trying to add a signature in a pdf in powerapps using the peninput field. i add the following in the html file which file is successfully converted in pdf through the flow
<img src='"& JSON(PenInput4.Image; JSONFormat.IgnoreBinaryData) &"'
style='width:80px;height:50px'></img>
but i see this in the pdf a box with an x on the top left corner.
thank you!
The result of the JSON function already includes quotes, so you are "double-quoting" the image source. This is what you have:
<img src='"data:image/png;base64,..."' style='...'></img>
You can either use the double quotes (") that are returned by the JSON function:
"<img src=" & JSON(PenInput4.Image; JSONFormat.IncludeBinaryData) &
" style='width:80px;height:50px'></img>"
Or you if you want to use single quotes (') in your HTML document, you can keep your single quotes and remove the double quotes from the JSON output:
Set(penInputEncodedImage, JSON(PenInput4.Image; JSONFormat.IncludeBinaryData));;
...
"<img src='" &
Mid(penInputEncodedImage, 2, Len(penInputEncodedImage) - 2) &
"' style='width:80px;height:50px'></img>"
Just a final note: you are using JSONFormat.IgnoreBinaryData - the correct flag to be able to encode images should be JSONFormat.IncludeBinaryData. It seems to be working today, but that goes against the documentation so it is a bug that may be fixed someday.

encode/decodeURI for URLs with quotes

I'm having trouble displaying links to URLs with quotes in them and can't figure out a solution despite a load of examples on stackoverflow! Here's the exact string I'm storing in my database (shows Adelaide Antartica)
https://www.google.com/maps/place/67%C2%B007'27.3%22S+68%C2%B008'56.0%22W/#-67.1447827,-68.3886741,71373m/data=!3m1!1e3!4m5!3m4!1s0x0:0x0!8m2!3d-67.124258!4d-68.148903
When I just try putting that into a href it links to...
https://www.google.com/maps/place/67%C2%B007 (i.e. breaks at the first single quote)
But I try using href="encodeURI(theLink)" or href="encodeURIComponent(theLink)" it links to the same thing (I even tried the decode options in case I was thinking about it the wrong way and had the same problem).
Does anyone have a recommendation on the best way to proceed here? I even tried the deprecated "escape" function which also won't work for me. Thanks for any thoughts at all!
(p.s. funnily enough as I'm writing this I see that even Stack Overflow's link is broken in exactly the same way - maybe it's not even possible?!)
EDIT: As requested by Clemzd - I'm using d3 to construct the links, so doing this...
anElement.append("text").html("<a href='" + myData[i].url + "'> a link name </a>");
Works great on everything but links with a single quote regardless of whether I do encodeURI(myData[i].url) or not
You use single quotes to delimit the value of the href attribute, so that value cannot contain unescaped single quotes. That's an issue with HTML markup encoding, not URL encoding.
You can either reverse your use of single and double quotes (encoded URLs cannot contain double quotes, but they can contain single quotes) or replace the single quotes in the URL with a character entity like '. URL encoding by %27 would also work, but that's not a standard encoding that encodeURIComponent does.
There are many ways to solve your issue. All you need to know is if your input may contains ' then you have to escape this character. Otherwise you will get something like anElement.append("text").html("<a href='" + https://www.google.com/maps/place/'link + "'> a link name </a>"); That can't be parsed because of the '.
If you are sure that your link will never contains " then change your code and use " instead as a concatenaion operator.
If not, you can escape ' in server side or client side. For example in client side you can do :
function escapeJavascript(input){
return input.replace(/\\n/g, "\\n")
.replace(/\\'/g, "\\'")
.replace(/\\"/g, '\\"')
.replace(/\\&/g, "\\&")
.replace(/\\r/g, "\\r")
.replace(/\\t/g, "\\t")
.replace(/\\b/g, "\\b")
.replace(/\\f/g, "\\f");
}
And then use it like this: anElement.append("text").html("<a href='" + escapeJavascript(myData[i].url) + "'> a link name </a>");

AppleScript: substring to string or format html

I'm working on my applescript right now and I'm stuck here.. Lets take this snippet as an example of html code
<body><div>Apple don't behave accordingly <a href = "http://apple.com>apple</a></div></body>
What I need now is to return the word without the html tags. Either by deleting the bracket with everything in it or maybe there is any other way to reformat html into plain text..
The result should be:
Apple don't behave accordingly apple
Thought I would add an extra answer because of the problem I had. If you want UTF-8 characters to not get lost you need:
set plain_text to do shell script "echo " & quoted form of ("<!DOCTYPE HTML PUBLIC><meta charset=\"UTF-8\">" & html_string) & space & "| textutil -convert txt -stdin -stdout"
You basically need to add the <meta charset=\"UTF-8\"> meta tag to make sure textutil sees this as an utf-8 document.
How about using textutil?
on run -- example (don't forget to escape quotes)
removeMarkup from "<body><div>Apple don't behave accordingly apple</div></body>"
end run
to removeMarkup from someText -- strip HTML using textutil
set someText to quoted form of ("<!DOCTYPE HTML PUBLIC>" & someText) -- fake a HTML document header
return (do shell script "echo " & someText & " | /usr/bin/textutil -stdin -convert txt -stdout") -- strip HTML
end removeMarkup
on findStrings(these_strings, search_string)
set the foundList to {}
repeat with this_string in these_strings
considering case
if the search_string contains this_string then set the end of the foundList to this_string
end considering
end repeat
return the foundList
end findStrings
findStrings({"List","Of","Strings","To","find..."}, "...in String to search")

Regular expression from font to span (size and colour) and back (VB.NET)

I am looking for a regular expression that can convert my font tags (only with size and colour attributes) into span tags with the relevant inline css. This will be done in VB.NET if that helps at all.
I also need a regular expression to go the other way as well.
To elaborate below is an example of the conversion I am looking for:
<font size="10">some text</font>
To then become:
<span style="font-size:10px;">some text</span>
So converting the tag and putting a "px" at the end of whatever the font size is (I don't need to change/convert the font size, just stick px at the end).
The regular expression needs to cope with a font tag that only has a size attribute, only a color attribute, or both:
<font size="10">some text</font>
<font color="#000000">some text</font>
<font size="10" color="#000000">some text</font>
<font color="#000000" size="10">some text</font>
I also need another regular expression to do the opposite conversion. So for example:
<span style="font-size:10px;">some text</span>
Will become:
<font size="10">some text</font>
As before converting the tag but this time removing the "px", I don't need to worry about changing the font size.
Again this will also need to cope with the size styling, font styling, and a combination of both:
<span style="font-size:10px;">some text</span>
<span style="color:#000000;">some text</span>
<span style="font-size:10px; color:#000000;">some text</span>
<span style="color:#000000; font-size:10px;">some text</span>
I am extracting basic HTML & text from CDATA tags in an XML file and then displaying them on a web-page. The text also appears in a rich-text editor so it can be edited/translated, and then saved back into a new XML file. The XML is then going to be read by a flash file, hence the need to use old-fashioned HTML.
The reason I want to convert this code is mainly for display purposes. In order to show the text sizes correctly and for it to work with my rich text editor they need to be converted to XHTML/inline CSS. The rich text editor will also generate XHTML/inline CSS that I need to convert 'back' to standard HTML before it is saved in the XML file.
I don't know a lot about XSLT transformation but I'm not sure that is what I need for this, or it might be more than I need right now, but please correct me if I'm wrong (and point me in the direction of any helpful links you may have on it).
I know the temptation will be to tell me a number of different ways to set up my code to do what I want but there are so many other permutations I haven't even mentioned which have forced me down this route, so literally all I want to do is convert a string containing standard HTML to XHTML/inline CSS, and then the same but the other way round.
Since some people have already given you warnings I'll skip ahead to the regex solution.
First off, I'll lay out a couple of assumptions that aren't set in stone but allow the problem to be approached as you presented it without me doing extra work:
You can use LINQ (otherwise this will need to be updated)
Font/Span tags will be in lowercase (font and span not FONT or SpAn)
Each style attribute value will be properly formatted, ending with a semi-colon ; similar to your samples
Case-sensitivity can be worked in rather simply via the RegexOptions.IgnoreCase although, in turn, the dictionary values will need to be stored as ToLower to keep everything constant when the values are later accessed. The 3rd point ensures splitting text doesn't go haywire.
Below is a sample program that demonstrates the replacements.
Sub Main
Dim inputs As String() = { _
"<font size=""10"">some text</font>", _
"<font color=""#000000"">some text</font>", _
"<font size=""10"" color=""#000000"">some text</font>", _
"<font color=""#000000"" size=""10"">some text</font>", _
"<font size=""10"">some text</font> other text <font color=""#000000"">some text</font>", _
"<span style=""font-size:10px;"">some text</span>", _
"<span style=""color:#000000;"">some text</span>", _
"<span style=""font-size:10px; color:#000000;"">some text</span>", _
"<span style=""color:#000000; font-size:10px;"">some text</span>", _
"<span style=""color:#000000; font-size:10px;"">some text</span> other <font color=""#000000"" size=""10"">some text</font>" _
}
Dim pattern As String = "<(?<Tag>font|span)\b(?<Attributes>[^>]+)>(?<Content>.+?)</\k<Tag>>"
Dim rx As New Regex(pattern)
For Each input As String In inputs
Dim result As String = rx.Replace(input, AddressOf TransformTags)
Console.WriteLine("Before: " & input)
Console.WriteLine("After: " & result)
Console.WriteLine()
Next
End Sub
Public Function TransformTags(ByVal m As Match) As String
Dim rx As New Regex("(?<Key>\b[a-zA-Z]+)=""(?<Value>.+?)""")
Dim attributes = rx.Matches(m.Groups("Attributes").Value).Cast(Of Match)() _
.ToDictionary(Function(attribute) attribute.Groups("Key").Value, _
Function(attribute) attribute.Groups("Value").Value)
If m.Groups("Tag").Value = "font" Then
Dim newAttributes = String.Join("; ", attributes.Select(Function(item) _
If(item.Key = "size", "font-size", item.Key) _
& ":" _
& If(item.Key = "size", item.Value & "px", item.Value)) _
.ToArray()) _
& ";"
Return "<span style=""" & newAttributes & """>" & m.Groups("Content").Value & "</span>"
Else
Dim newAttributes = String.Join(" ", attributes("style") _
.Split(New Char() {";"c}, StringSplitOptions.RemoveEmptyEntries) _
.Select(Function(s) _
s.Trim().Replace("px", "").Replace("font-", "").Replace(":", "=""") _
& """") _
.ToArray())
Return "<font " & newAttributes & ">" & m.Groups("Content").Value & "</font>"
End If
End Function
If you have any questions let me know. Some enhancements can be made if a large amount of text is expected to be processed. For example, the regex object in the TransformTags method can be moved to the class level so it isn't recreated on every transformation.
EDIT: Here's the explanation of the first pattern: <(?<Tag>font|span)\b(?<Attributes>[^>]+)>(?<Content>.+?)</\k<Tag>>
<(?<Tag>font|span)\b - opening < and matches the font or span tag and uses a named group of Tag. The \b matches a word boundary to ensure nothing beyond the tag names specified are matched.
(?<Attributes>[^>]+)> - named group, Attributes, matches everything else in the tag as long as it is not a > symbol, then it matches the closing >
(?<Content>.+?) - named group, Content, matches anything between the tag
</\k<Tag>> - matches the closing tag by back-referencing the Tag group
The second pattern is used to match key-value pairs for the attributes: (?<Key>\b[a-zA-Z]+)=""(?<Value>.+?)""
(?<Key>\b[a-zA-Z]+) - named group, Key, matches any word (alphabets) starting at a word boundary
="" - matches the equal symbol and opening quotation
(?<Value>.+?) - named group, Value, matches anything up till the closing quotation mark. It is non-greedy by specifying the ? symbol after the + symbol. It could've been [^""]+ similar to how the Attributes group was handled in the first pattern.
"" - matches the closing quotation
I don't think regular expressions are the way to go for this problem.
Stick to XML based technologies, such as XSLT to do the transformation.
You shouldn't try to parse HTML with regex. Use XML parsing instead.
I have found a solution to this issue. However it is not one that involves using a regular expression. Though I am very interested in the idea of creating a custom program in and GUI creation tool to accomplish this. The link below will provide the easiest solution to convert any deprecated font tags to inline span tags. This is a crucial and awesome tool.
http://tinymce.moxiecode.com/tryit/full.php
Clicking on html will show the html code for the message. Then you can replace that with the html that has the deprecated <font> tags and they will be converted to inline <span> tags.
It might a good idea to explain why you need to do this, as unless there's a particular goal, this seems to turn one kind of non-semantic code into another kind of non-semantic code.
Might the time be better spent converting to separate HTML and CSS code, based on class and id attributes?
I agree with both comments saying xslt should be used for xml transformation and that style shouldn't be mixed in html... but here is a starting point for your regex (perl, I don't know any VB but it shouldn't be too far) if you're in a hurry :
's/<font(.*)size="([^ ]*)"(.*)color="([^ ]*)"(.*)<\/font>/<span$1style="font-size:$2px;color:$4"$3$5<\/span>/g'
I don't think you can do this in one regex, this one handles the case where size comes before color, you can derive the 3 missing regex from here...