Unicode, VBScript and HTML - html

I have the following radio box:
<input type="radio" value="香">香</input>
As you can see, the value is unicode. It represents the following Chinese character: 香
So far so good.
I have a VBScript that reads the value of that particular radio button and saves it into a variable. When I display the content with a message box, the Chinese Character appears. Additionally I have a variable called uniVal where I assign the unicode of the Chinese character directly:
radioVal = < read value of radio button >
MsgBox radioVal ' yields chinese character
uniVal = "香"
MsgBox uniVal ' yields unicode representation
Is there a possibility to read the radio box value in such a way that the unicode string is preserved and NOT interpreted as the chinese character?
For sure, I could try to recreate the unicode of the character, but the methods I found in VBScript are not working correctly due to VBScripts implicit UTF-16 setting (instead of UTF-8). So the following method does not work correctly for all characters:
Function StringToUnicode(str)
result = ""
For x=1 To Len(str)
result = result & "&#"&ascw(Mid(str, x, 1))&";"
Next
StringToUnicode = result
End Function
Cheers
Chris

I got a solution:
JavaScript is in possession of a function that actually works:
function convert(value) {
var tstr = value;
var bstr = '';
for(i=0; i<tstr.length; i++) {
if(tstr.charCodeAt(i)>127)
{
bstr += '&#' + tstr.charCodeAt(i) + ';';
}
else
{
bstr += tstr.charAt(i);
}
}
return bstr;
}
I call this function from my VBScript... :)

Here is a VBScript function that will always return a positive value for the Unicode code point of a given character:-
Function PositiveUnicode(s)
Dim val : val = AscW(s)
If (val And &h8000) <> 0 Then
PositiveUnicode = (val And &h7FFF) + &h8000&
Else
PositiveUnicode = CLng(val)
End If
End Function
This will save you loading two script engines to acheive a simple operation.
"not working correctly due to VBScripts implicit UTF-16 setting (instead of UTF-8)."
This issue has nothing to do with UTF-8. It is purely the result of AscW use of the signed integer type.
As to why you have to recreate the &#xxxxx; encodings that you sent this is result of how HTML (and XML) work. The use of this character encoding entity is a convnience that the specification does not require to remain intact. Since the character encoding of the document is quite capable or representing that character the DOM is at liberty to convert it.

Related

Find the word and replace with html tag using regex

I have a text equation like: 10x^2-8y^2-7k^4=0.
How can I find the ^ and replace it with <sup>2</sup> in the whole string using regex. The result should be like:
I tried str = str.replace(/\^\s/g, "<sup>$1</sup> ") but I’m not getting the expected result.
Any ideas that can help to solve my problem?
I think you're looking for something like
\^(\d+)
It matches the ^, captures the exponent and replace with
<sup>$1</sup>
See it here at regex101.
Edit:
To meet your new demands, check this fiddle. It handles the sub as well using replace with a function.
Your current pattern matches a caret followed by a space character (space, tab, new-line, etc.), but you want to match a caret followed by a single character or multiple characters wrapped in accolades, as your string is in TeX.
/\^(?:([\w\d])|\{([\w\d]{2,})\})/g
Now, using str = str.replace(/\^(?:([\w\d])|\{([\w\d]{2,})\})/g, "<sup>$1</sup>"); should do the job.
You can make a more generic function from this expression that can wrap characters prefixed by a specific character with a specific tag.
function wrapPrefixed(string, prefix, tagName) {
return string.replace(new RegExp("\\" + prefix + "(?:([\\w\\d])|\\{([\\w\\d]{2,})\\})"), "<" + tagname + ">$1</" + tagname + ">");
}
For instance, calling wrapPrefixed("1_2 + 4_{3+2}", "_", "sub"); results in 1<sub>2</sub> + 4<sub>3+2</sub>.

text field greater than or equal to having weird effects in AS3

I have a trigger set like below
var thiseffect:Boolean = false;
if (thistx.text >="6" && thistx.text <="12")
{ thiseffect = true; }
and the trigger will not activate in this case however if I change the 12 value in this trigger to a value below 10, OR if I change the 6 value to something greater than 10 it will trigger with no problem
Im not really sure why that is, has anyone encountered this before?
This isnt exactly an answer but rather a solution
I have converted my text input into a number variable and the trigger activates with no problem now
var thiseffect:Boolean = false;
var mynum:Number = Number(thistx.text);
if (mynum>=6 && mynum<=12)
{ thiseffect = true; }
You can use the following operators to compare strings: <, <=, !=, ==, =>, and >.
But You should note: When using these operators with strings, ActionScript considers the character code value of each character in the string, comparing characters from left to right.
So in your example it compares left to right character by character not by the actual integer value.
trace("12" <= "6") ;//evaluates true
trace("12" <= "06");//evaluates false
refer to Adobe Doc files here.

iTextSharp HTML to PDF preserving spaces

I am using the FreeTextBox.dll to get user input, and storing that information in HTML format in the database. A samle of the user's input is the below:
                                                                     133 Peachtree St NE                                                                     Atlanta,  GA 30303                                                                     404-652-7777                                                                      Cindy Cooley                                                                     www.somecompany.com                                                                     Product Stewardship Mgr                                                                     9/9/2011Deidre's Company123 Test StAtlanta, GA 30303Test test.  
I want the HTMLWorker to perserve the white spaces the users enters, but it strips it out. Is there a way to perserve the user's white space? Below is an example of how I am creating my PDF document.
Public Shared Sub CreatePreviewPDF(ByVal vsHTML As String, ByVal vsFileName As String)
Dim output As New MemoryStream()
Dim oDocument As New Document(PageSize.LETTER)
Dim writer As PdfWriter = PdfWriter.GetInstance(oDocument, output)
Dim oFont As New Font(Font.FontFamily.TIMES_ROMAN, 8, Font.NORMAL, BaseColor.BLACK)
Using output
Using writer
Using oDocument
oDocument.Open()
Using sr As New StringReader(vsHTML)
Using worker As New html.simpleparser.HTMLWorker(oDocument)
worker.StartDocument()
worker.SetInsidePRE(True)
worker.Parse(sr)
worker.EndDocument()
worker.Close()
oDocument.Close()
End Using
End Using
HttpContext.Current.Response.ContentType = "application/pdf"
HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment;filename={0}.pdf", vsFileName))
HttpContext.Current.Response.BinaryWrite(output.ToArray())
HttpContext.Current.Response.End()
End Using
End Using
output.Close()
End Using
End Sub
There's a glitch in iText and iTextSharp but you can fix it pretty easily if you don't mind downloading the source and recompiling it. You need to make a change to two files. Any changes I've made are commented inline in the code. Line numbers are based on the 5.1.2.0 code rev 240
The first is in iTextSharp.text.html.HtmlUtilities.cs. Look for the function EliminateWhiteSpace at line 249 and change it to:
public static String EliminateWhiteSpace(String content) {
// multiple spaces are reduced to one,
// newlines are treated as spaces,
// tabs, carriage returns are ignored.
StringBuilder buf = new StringBuilder();
int len = content.Length;
char character;
bool newline = false;
bool space = false;//Detect whether we have written at least one space already
for (int i = 0; i < len; i++) {
switch (character = content[i]) {
case ' ':
if (!newline && !space) {//If we are not at a new line AND ALSO did not just append a space
buf.Append(character);
space = true; //flag that we just wrote a space
}
break;
case '\n':
if (i > 0) {
newline = true;
buf.Append(' ');
}
break;
case '\r':
break;
case '\t':
break;
default:
newline = false;
space = false; //reset flag
buf.Append(character);
break;
}
}
return buf.ToString();
}
The second change is in iTextSharp.text.xml.simpleparser.SimpleXMLParser.cs. In the function Go at line 185 change line 248 to:
if (html /*&& nowhite*/) {//removed the nowhite check from here because that should be handled by the HTML parser later, not the XML parser
Thanks for the help everyone. I was able to find a small work around by doing the following:
vsHTML.Replace(" ", " ").Replace(Chr(9), " ").Replace(Chr(160), " ").Replace(vbCrLf, "<br />")
The actual code does not display properly but, the first replace is replacing white spaces with , Chr(9) with 5 , and Chr(160) with .
I would recommend using wkhtmltopdf instead of iText. wkhtmltopdf will output the html exactly as rendered by webkit (Google Chrome, Safari) instead of iText's conversion. It is just a binary that you can call. That being said, I might check the html to ensure that there are paragraphs and/or line breaks in the user input. They might be stripped out before the conversion.

Actionscript problems with social share encoding

I'm trying to make some "social share" buttons at my site, but the urls I generate just don't get decoded by this services.
One example, for twitter:
private function twitter(e:Event):void {
var message:String = "Message with special chars âõáà";
var url:String = "http://www.twitter.com/home?status=";
var link:URLRequest = new URLRequest( url + escape(message) );
}
But when twitter opens up, the message is:
Message with special chars
%E2%F5%E1%E0
Something similar is happening with Facebook and Orkut (but these two hide the special chars).
Someone know why is this happening?
The problem is that the escape() function doesn't take UTF-8 enconding into account. The function you want for encoding the querystring using UTF-8 is encodeURIComponent().
So, let's say you have an "ñ" (eñe in Spanish, or n plus tilde). I'm using "ñ", because I remember both its code point and its UTF-8 representation, since I always use it for debugging, but the same applies for other non-ASCII, non-alphanumeric number.
Say you have the string "Año" ("year" in Spanish, by the way).
The code points (both in Unicode and iso-8859-1) are:
A: 0x41
ñ: 0xf1
o: 0x6f
If you call escape(), you'll get this:
A: A
ñ: %F1
o: o
"A" and "o" don't need to be encoded. The "ñ" is encoded as "%" plus its code point, which is 0xf1.
But, twitter, facebook, etc, expect UTF-8. 0xf1 is not a valid UTF-8 sequence and should be represented with a 2 bytes sequence. Meaning, "ñ" should be encoded as:
0xC3
0xB1
This is what encodeURIComponent does. It will encode "año" this way:
A: A
ñ: %C3
%B1
o: o
So, to sum up, instead of this:
var link:URLRequest = new URLRequest( url + escape(message) );
try this
var link:URLRequest = new URLRequest( url + encodeURIComponent(message) );
And it should work fine.

replace keyword within html string

I am looking for a way to replace keywords within a html string with a variable. At the moment i am using the following example.
returnString = Replace(message, "[CustomerName]", customerName, CompareMethod.Text)
The above will work fine if the html block is spread fully across the keyword.
eg.
<b>[CustomerName]</b>
However if the formatting of the keyword is split throughout the word, the string is not found and thus not replaced.
e.g.
<b>[Customer</b>Name]
The formatting of the string is out of my control and isn't foolproof. With this in mind what is the best approach to find a keyword within a html string?
Try using Regex expression. Create your expressions here, I used this and it works well.
http://regex-test.com/validate/javascript/js_match
Use the text property instead of innerHTML if you're using javascript to access the content. That should remove all tags from the content, you give back a clean text representation of the customer's name.
For example, if the content looks like this:
<div id="name">
<b>[Customer</b>Name]
</div>
Then accessing it's text property gives:
var name = document.getElementById("name").text;
// sets name to "[CustomerName]" without the tags
which should be easy to process. Do a regex search now if you need to.
Edit: Since you're doing this processing on the server-side, process the XML recursively and collect the text element's of each node. Since I'm not big on VB.Net, here's some pseudocode:
getNodeText(node) {
text = ""
for each node.children as child {
if child.type == TextNode {
text += child.text
}
else {
text += getNodeText(child);
}
}
return text
}
myXml = xml.load(<html>);
print getNodeText(myXml);
And then replace or whatever there is to be done!
I have found what I believe is a solution to this issue. Well in my scenario it is working.
The html input has been tweaked to place each custom field or keyword within a div with a set id. I have looped through all of the elements within the html string using mshtml and have set the inner text to the correct value when a match is found.
e.g.
Function ReplaceDetails(ByVal message As String, ByVal customerName As String) As String
Dim returnString As String = String.Empty
Dim doc As IHTMLDocument2 = New HTMLDocument
doc.write(message)
doc.close()
For Each el As IHTMLElement In doc.body.all
If (el.id = "Date") Then
el.innerText = Now.ToShortDateString
End If
If (el.id = "CustomerName") Then
el.innerText = customerName
End If
Next
returnString = doc.body.innerHTML
return returnString
Thanks for all of the input. I'm glad to have a solution to the problem.