Clipboard.GetText(TextDataFormat.Html) string length less than header EndHTML - html

Have some code that looks at HTML on the clipboard. It worked on XP but has the following problem on W7. The HTML clipboard header EndHTML value is greater than the string length of the resulting Clipboard.GetText. Both IE11 and Firefox have the same problem. To test browse to: https://stackexchange.com/
edit select all, and copy. Following code snippet to shows the problem. Maybe something to do with encoding?
Dim dto As IDataObject = Clipboard.GetDataObject()
Dim CBdata As String = ""
Dim startHTML As String = ""
Dim endHTML As String = ""
If dto.GetDataPresent(DataFormats.Html) Then
CBdata = Clipboard.GetText(TextDataFormat.Html)
Dim m As Match = Regex.Match(CBdata, "StartHTML:(\d+)")
If m.Success Then startHTML = m.Groups(1).Value
m = Regex.Match(CBdata, "EndHTML:(\d+)")
If m.Success Then endHTML = m.Groups(1).Value
Console.WriteLine("CB data length=" & CBdata.Length.ToString)
Console.WriteLine("EndHtml=" & endHTML)
Console.WriteLine("StartHtml=" & startHTML)
'To get just the html:
'CBdata.Substring(startHTML, endHTML - startHTML)
' but of course throws a subscript exception
End If

The StartHTML and EndHTML counters count bytes, not characters. So if you have a text with one UTF8 character that is encoded in two bytes, you'll get the length of 2 instead of 1.
Now, why it worked in XP and now it doesn't in Win 7. It may be a change in .NET 4.5 that I discovered:
Different behavior of DataObject.GetData(DataFormats.Html) in .NET 4.5

Related

Streamreader to String not working properly

I am getting HttpWebResponse encoded in Base64
following lines get the webresponse from API.
Dim myResp As HttpWebResponse = myReq.GetResponse()
Dim myreader As New System.IO.StreamReader(myResp.GetResponseStream)
the response which i get is something like following, however actual response is too long and i cannot paste here so i have manually stripped the actual response.
{"status":"1","data":"eyJiMmIiOlt7ImludiI6W3siaXRtcyI6W3sibnVtIjoxODAxLCJpdG1fZGV0Ijp7ImNzYW10IjowLCJzYW10Ijo4MDkuOTEsInJ0IjoxOCwidHh2YWwiOjg5OTksImNhbXQiOjgwOS45MX19XSwidmFsIjoxMDYxOC44MiwiaW52X3R5cCI6IlIiLCJwb3MiOiIyNCIsImlkdCI6IjExLTA3LTIwMTgiLCJyY2hyZyI6Ik4iLCJpbnVtIjoiUldHSjA3LzE4LzAwMDU4NCIsImNoa3N1bSI6IjVjMjNiY2M1ZTQ3ZDI0NjU5YWQzNTEzNTM1YjhiNTAzNmM4NGU0MzU5NWJiMTVjYzA4M2VkYzBiNTQzZTQ1MzcifSx7Iml0bXMiOlt7Im51bSI6MTgwMSwiaXRtX2RldCI6eyJjc2FtdCI6MCwic2FtdCI6NDE4LjUsInJ0IjoxOCwidHh2YWwiOjQ2NTAsImNhbXQiOjQxOC41fX1dLCJ2YWwiOjU0ODcsImludl90eXAiOiJSIiwicG9zIjoiMjQiLCJpZHQiOiIyNS0wNy0yMDE4IiwicmNocmciOiJOIiwiaW51bSI6IlJXR0owNy8xOC8wMDEyNjEiLCJjaGtzdW0iOiJjOGEyMjNmNmMzYjY5ODZiYzE2MmNjYjdmMDhlZTYxMTdjYTdkOWZhNmEzYTExMWY1MmVjNzllYmExMGM5MWQ3In1dLCJjZnMiOiJZIiwiY3RpbiI6IjI0QUFCQ1I3MTc2QzFaSiJ9LHsiaW52IjpbeyJpdG1zIjpbeyJudW0iOjEsIml0bV9kZXQiOnsiY3NhbXQiOjAsInNhbXQiOjMzNzUsInJ0IjoxOCwidHh2YWwiOjM3NTAwLCJjYW10IjozMzc1fX1dLCJ2YWwiOjQ0MjUwLCJpbnZfdHlwIjoiUiIsInBvcyI6IjI0IiwiaWR0IjoiMzEtMDctMjAxOCIsInJjaHJnIjoiTiIsImludW0iOiJULTAxNzcvMjAxOC0xOSIsImNoa3N1bSI6ImYzNzFmYjA0N2FjNTRlOTkwYzZjNzM5Zjk0NTgwMzZlMWQxNjE0N2IxYmQ0ZTkxY2FlNmEwN2IyOGVlYzE0YWUifV0sImNmcyI6IlkiLCJjdGluIjoiMjRBQURDSTIwMzJFMVo5In1dfQ=="}
I am not sure why above Base64 Encoded message starts with {"status":"1","data":" and then ends with "}.
Actual Base64 data starts after {"status":"1","data":"
Due to those unsupported characters at starting and ending of the stream , i first try to convert actual response to string as shown below.
Dim myResp As HttpWebResponse = myReq.GetResponse()
Dim myreader As New System.IO.StreamReader(myResp.GetResponseStream)
Actual stream response returns around 248000 characters (as per response received in POSTMAN with same API). Streamreader information in Debug mode also shows same 248000 number. But when i convert them into string with following code line, string gets slimmed to around only 32000 characters. I don't know why this is happening?
Dim myText As String = myreader.ReadToEnd
'''Then following code will remove all those unwanted characters from starting string, which are {"status":"1","data":"
Dim Final_text As String = myText.Substring(myText.Substring(0, myText.LastIndexOf("""")).LastIndexOf("""") + 1)
'''Following code will remove two characters "} from end of the string.
Final_text = Final_text.Trim().Remove(Final_text.Length - 2)
''' Now Decode this proper Base64 String to JSON format
Dim data As Byte() = Convert.FromBase64String(Final_text)
Dim decodedString As String = Encoding.UTF8.GetString(data)
Dim JsonP As JObject = JObject.Parse(decodedString)
Dim SetPointerOut As JToken = JsonP("b2b")
Two things: why converting from Stream to String cut down actual response? 248000 charters to just apprx. 32000 characters. In debug mode if i type in ?mytext.length it returns 248000 as value. But When i hover mouse and brows what is in mytext variable, it shows me around 32000 charters only.
Service provider says Response which i get from API is Base64 encoded and i have to decode it before using it as JSON. Then why do i get unsupported characters at starting of the stream (even in Postman), is it Base64 Encoded message in serialized manner?
Am I doing right process to first convert the stream to string, remove unwanted characters and then Decode it? or there is some other way around.
Ok, issue of 32768 character in debug mode of Visual Studio is it self.
VS2015 had bug in which it does not support more than 32768 characters. Read
Why strings are shown partially in the Visual Studio 2008 debugger?
and
Visual Studio Text Visualizer missing text
The method which i was using to remove extra unwanted characters from "mytext" string, still works and give result. But as #Steve suggested in comment to the question, I should parse the JSON string. I find that idea much better and correct method.
so final code is like below:
Dim myResp As HttpWebResponse = myReq.GetResponse()
Dim myreader As New System.IO.StreamReader(myResp.GetResponseStream)
Dim myText As String = myreader.ReadToEnd
Dim json As String = myText
Dim jsonResult = JsonConvert.DeserializeObject(Of Dictionary(Of String, Object))(json)
Dim jsonObject As Newtonsoft.Json.Linq.JObject = Newtonsoft.Json.Linq.JObject.Parse(json)
Dim jsonValue As JValue = jsonObject("data")
Dim Final_text As String = jsonValue.ToString
''' No need of following code as doing JSON parse above
''' Dim Final_text As String = myText.Substring(myText.Substring(0, myText.LastIndexOf("""")).LastIndexOf("""") + 1)
'''Final_text = Final_text.Trim().Remove(Final_text.Length - 2)
Dim data As Byte() = Convert.FromBase64String(Final_text)
Dim decodedString As String = Encoding.UTF8.GetString(data)
Dim JsonP As JObject = JObject.Parse(decodedString)
Dim SetPointerOut As JToken = JsonP("b2b")

Send RDLC in an email using vb.net

Okay my website generates thousands of PDFs using RDLCs but my problem is sometimes I want to email them but I don't want to attach a PDF to an email. So what I need is a way to generate the report then either convert it into text or html so I can send it as the body of an email.
Also I am using reportviewr version 11
Also I have tried exporting it as a .doc then trying to convert it to text and i have tried to export it to an excel document then tried to convert it and none of it works.
Dim warn() As Warning = Nothing
Dim streamids() As String = Nothing
Dim mimeType As String = String.Empty
Dim encoding As String = String.Empty
Dim extension As String = String.Empty
Dim bytes() As Byte
bytes = rv.LocalReport.Render("MHTML", Nothing, mimeType, encoding, extension, streamids, warn)
'Only one copy of the notice is needed
'If Not Directory.Exists(strFilePath) Then Directory.CreateDirectory(strFilePath)
Dim fs As New FileStream(strFilePath, FileMode.Create)
fs.Write(bytes, 0, bytes.Length)
fs.Close()
here is the code i'm using but it gives me an error : Specified argument was out of the range of valid values. Parameter name: format
Also i know this code works because I use the exact same thing to export the rdlc to a PDF
Ok so I solved my own problem with some Research about bytes.
Here is the code that I used to solve my problem.
What I did was exported the reportviewr as a word document and then converted all bytes to text. Then from that you end up with a whole bunch of gibberish but eventually you will find the text from your RDLC. So what I did was split the string up to where I was only left with the wording from my RDLC.
Review the code below:
Function GetRDLCText(ByVal rv As ReportViewer) As String
Dim warn() As Warning = Nothing
Dim streamids() As String = Nothing
Dim mimeType As String = String.Empty
Dim encoding As String = String.Empty
Dim extension As String = String.Empty
Dim bytes() As Byte
Dim msg() As String
bytes = rv.LocalReport.Render("WORD", Nothing, mimeType, encoding, extension, streamids, warn)
'Word is the only export that contains text from the rdlc
Dim content As String = System.Text.Encoding.Unicode.GetString(bytes)
msg = content.Split("Ù")
msg = msg(1).Split("Ѐ")
Return msg(0)
End Function
This solution is not for everyone, but it works for what I need it to do.

Encryption between ColdFusion and MS Access

I'm trying to make it possible for the data stored in a MSSQL database to be encrypted/decrypted in both Access 2013 as well as ColdFusion. The Access database uses vba to sync data to the SQL database and I've found a few possible solutions for encryption but can't seem to get the results to match the same thing encrypted in ColdFusion.
www.ebcrypt.com appears to be the easiest but when I encrypt with either Blowfish, RIJNDAEL or any of the other methods, the results are not the same as what I encrypt in ColdFusion.
I decided to try to use the native CryptoAPI but the same thing happens when I try to match what vba is doing in ColdFusion I keep getting different results.
I wonder if either the vba or ColdFusion methods I'm using are taking the key I'm passing in and transforming it so it no longer matches. I've tried setting keys manually and even generating it with ColdFusion and then setting it in the vba code to match with no luck.
ColdFusion code trying to use RC4:
<cfset test_key = "ZXNlmehY30y3ophXVJ0EJw==">
<cfset encryptedString = Encrypt("CF String",test_key, "RC4")>
<cfoutput>
Encrypted String: #encryptedString#<br />
Encryption Key: #test_key#
</cfoutput>
VBA Code with the same settings: (clsCryptoFilterBox code is here)
NOTE: It appears that this defaults to RC4, which is why I'm using that in ColdFusion above.
Dim encrypted As clsCryptoFilterBox
Set encrypted = New clsCryptoFilterBox
encrypted.Password = "ZXNlmehY30y3ophXVJ0EJw=="
encrypted.InBuffer = "CF String"
encrypted.Encrypt
MsgBox ("Encrypted: " & encrypted.OutBuffer)
EDIT: Ok, more info. I found that ColdFusion needed the key in base64 even though the variable test_key should have worked but apparently the output of a base64 encoded string is not the same as other text encoded into base64.
EDIT 2: I got it working using the Blowfish algorithm found in the file on this website.
Here is my working CF code:
<cfset test_key = toBase64("1234567812345678")>
<cfset encryptedString = Encrypt("CF String", test_key, "RC4", "HEX")>
<cfoutput>
Encrypted String: #encryptedString#<br />
Encryption Key: #test_key#
</cfoutput>
Which outputs:
Encrypted String: F8B519877DC3B7C997
Encryption Key: MTIzNDU2NzgxMjM0NTY3OA==
I had to modify the code in VBA to pad using PKCS7 but once I did that, I was able to verify that it was working correctly. If anyone is interested I could post my changes to the VBA code where I modified the padding as well as added a check on decryption to verify the data via the padding.
I found a decent Blowfish algorithm packaged in the test app found on this download site that actually works with some modifications.
It was using spaces to pad the input text which is not what ColdFusion was doing, so this was making the encrypted string turn out different. The standard encryption that CF does pads with bytes that are all the same and are set to the number of padding bytes being used.
New EncryptString() function:
Public Function EncryptString(ByVal tString As String, Optional ConvertToHEX As Boolean) As String
Dim ReturnString As String, PartialString As String * 8
Dim tPaddingByte As String
Dim tStrLen As Integer
Dim tBlocks As Integer
Dim tBlockPos As Integer
tStrLen = Len(tString)
'Divide the length of the string by the size of each block and round up
tBlocks = (-Int(-tStrLen / 8))
tBlockPos = 1
Do While tString <> ""
'Check that we are not on the last block
If tBlockPos <> tBlocks Then
'Not on the last block so the string should be over 8 bytes, no need to pad
PartialString = Left$(tString, 8)
Else
'Last block, we need to pad
'Check to see if the last block is 8 bytes so we can create a new block
If Len(tString) = 8 Then
'Block is 8 bytes so add an extra block of padding
tString = tString & String(8, Chr(8))
tPaddingByte = " " 'Not really necessary, just keeps the String() function below happy
Else
'Set the value of the padding byte to the number of padding bytes
tPaddingByte = Chr(8 - Len(tString))
End If
PartialString = Left$(tString & String(8, tPaddingByte), 8)
End If
ReturnString = ReturnString & Encrypt(PartialString)
tString = Mid$(tString, 9)
tBlockPos = tBlockPos + 1
Loop
If ConvertToHEX = True Then
EncryptString = ToHEX(ReturnString)
Else
EncryptString = ReturnString
End If
End Function
Since the padding is not just spaces, it needs to be removed on decryption but there is an easy way to do it that also makes this whole process even better. You read the last byte, and then verify the other padding bytes with it.
Public Function DecryptString(ByVal tString As String, Optional ConvertFromHEX As Boolean) As String
Dim ReturnString As String, PartialString As String * 8
Dim tPos As Integer
Dim tPadCount As Integer
If ConvertFromHEX = True Then
tString = HexToString(tString)
End If
Do While tString <> ""
PartialString = Left$(tString, 8)
ReturnString = ReturnString & Decrypt(PartialString)
tString = Mid$(tString, 9)
Loop
'Check the last byte and verify the padding and then remove it
tPadCount = ToHEX(Right(ReturnString, 1))
If tPadCount < 8 Or tPadCount > 1 Then
'Get all the padding bytes and verify them
Dim tPaddingBytes As String
tPaddingBytes = Right(ReturnString, tPadCount)
Dim i As Integer
For i = 1 To tPadCount
If Not tPadCount = Int(ToHEX(Left(tPaddingBytes, 1))) Then
MsgBox "Error while decrypting: Padding byte incorrect (" & tPadCount & ")"
GoTo Done
End If
Next i
ReturnString = Left(ReturnString, Len(ReturnString) - tPadCount)
Else
MsgBox "Error while decrypting: Last byte incorrect (" & tPadCount & ")"
End If
Done:
DecryptString = ReturnString
End Function
I was able to export the class module and can import it into any other possible projects that may need basic encryption. There is a Rijndael class that appears to be working but in a non-standard way as well that I may get around to fixing later, but for now this is what I was looking for.

VB.NET ~ how does one navigate to a website and download the html then parse out code to only display input elements?

I have tried a few things like converting HTML to XML and then using an XML navigator to get input elements but I get lost whenever I start this process.
What I am trying to do is to navigate to a website which will be loaded using textbox1.text
Then download the html and parse out the input elements like . username, password, etc and place the element by type (id or name) into the richtextbox with the attribute beside the name.
Example.
Username id="username"
Password id="password"
Any clues or how to properly execute an HTML to XML conveter, reader, parser?
Thanks
It sounds like you just need a good HTML parsing library (instead of trying to use an XML parser). The HTML Agility Pack often fits this need. There are other options as well.
Somthing like below uses a streamreader to extract the source of the page into a string result
Dim uri As String = "https://www.yourUrl.com"
Dim request As HttpWebRequest = CType(WebRequest.Create(uri), HttpWebRequest)
Dim objRequest As HttpWebRequest = WebRequest.Create(uri)
Dim result As String
objRequest.Method = "GET"
Dim objResponse As HttpWebResponse = objRequest.GetResponse()
Dim sr As StreamReader
sr = New StreamReader(objResponse.GetResponseStream())
result = sr.ReadToEnd()
sr.Close
Then use regular expression (regex) to extra the attributes needed. for example something like this
Dim pattern As String = "(?<=Username id="")\w+"
Dim m0 As MatchCollection = Regex.Matches(result, pattern, RegexOptions.Singleline)
Dim m As Match
Dim k As Integer = 0
dim strUserID as String = ""
For Each m In m0
'extract the values for username id
strUserID = m0[k].Value;
k=k+1
Next
You'll need to change the pattern so it can pick up the other attributes you want to find, but this shouldn't be difficult

Convert html to plain text in VBA

I have an Excel sheet with cells containing html. How can I batch convert them to plaintext? At the moment there are so many useless tags and styles. I want to write it from scratch but it will be far easier if I can get the plain text out.
I can write a script to convert html to plain text in PHP so if you can't think of a solution in VBA then maybe you can sugest how I might pass the cells data to a website and retrieve the data back.
Set a reference to "Microsoft HTML object library".
Function HtmlToText(sHTML) As String
Dim oDoc As HTMLDocument
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function
Tim
A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.
Function StripTags(ByVal html As String) As String
Dim text As String
Dim accumulating As Boolean
Dim n As Integer
Dim c As String
text = ""
accumulating = True
n = 1
Do While n <= Len(html)
c = Mid(html, n, 1)
If c = "<" Then
accumulating = False
ElseIf c = ">" Then
accumulating = True
Else
If accumulating Then
text = text & c
End If
End If
n = n + 1
Loop
StripTags = text
End Function
This can leave lots of extraneous whitespace, but it will help in removing the tags.
Tim's solution was great, worked liked a charm.
I´d like to contribute: Use this code to add the "Microsoft HTML Object Library" in runtime:
Set ID = ThisWorkbook.VBProject.References
ID.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 2, 5
It worked on Windows XP and Windows 7.
Tim's answer is excellent. However, a minor adjustment can be added to avoid one foreseeable error response.
Function HtmlToText(sHTML) As String
Dim oDoc As HTMLDocument
If IsNull(sHTML) Then
HtmlToText = ""
Exit Function
End-If
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function
Yes! I managed to solve my problem as well. Thanks everybody/
In my case, I had this sort of input:
<p>Lorem ipsum dolor sit amet.</p>
<p>Ut enim ad minim veniam.</p>
<p>Duis aute irure dolor in reprehenderit.</p>
And I did not want the result to be all jammed together without breaklines.
So I first splitted my input for every <p> tag into an array 'paragraphs', then for each element I used Tim's answer to get the text out of html (very sweet answer btw).
In addition I concatenated each cleaned 'paragraph' with this breakline character Crh(10) for VBA/Excel.
The final code is:
Public Function HtmlToText(ByVal sHTML As String) As String
Dim oDoc As HTMLDocument
Dim result As String
Dim paragraphs() As String
If IsNull(sHTML) Then
HtmlToText = ""
Exit Function
End If
result = ""
paragraphs = Split(sHTML, "<p>")
For Each paragraph In paragraphs
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = paragraph
result = result & Chr(10) & Chr(10) & oDoc.body.innerText
Next paragraph
HtmlToText = result
End Function
Here's a variation of Tim's and Gardoglee's solution that does not require setting a reference to "Microsoft HTML object library". This method is known as Late Binding and will also work in vbscript.
Function HtmlToText(sHTML) As String
Dim oDoc As Object ' As HTMLDocument
If IsNull(sHTML) Then
HtmlToText = ""
Exit Function
End If
Set oDoc = CreateObject("HTMLFILE")
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function
Note that if you are using VBA in Access 2007 or greater, there is an Application.PlainText() method built-in that does the same thing as the code above.