Microsoft Cognitive API document size limit of 10240 bytes - json

When submitting a document to the API for key phrases, the returned JSON response has the error "A document within the request was too large to be processed. Limit document size to: 10240 bytes."
According to https://learn.microsoft.com/en-us/azure/cognitive-services/cognitive-services-text-analytics-quick-start, "The maximum size of a single document that can be submitted is 10KB, and the total maximum size of submitted input is 1MB. No more than 1,000 documents may be submitted in one call."
The document in question is a string of length 7713. The byte length using Encoding.UTF8.GetBytes() is 7763.
The entire byteArray that is submitted is of length 7965.
Smaller strings work fine, but any strings greater than length 3000 seem to have this problem. Below is the code, written in VB.NET:
' Create a JSONInput object containing the data to submit
Dim myJsonObject As New JSONInput
Dim input1 As New JSONText
input1.id = "1"
input1.text = text
myJsonObject.documents.Add(input1)
' Translate the JSONInput object to a serialized JSON string
Dim jss As New JavaScriptSerializer()
Dim jsonString As String = jss.Serialize(myJsonObject)
' Windows Cognitive Services URL
Dim request As System.Net.WebRequest = System.Net.WebRequest.Create("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases")
' Set the Method property of the request to POST.
request.Method = "POST"
' Add a header with the account key.
request.Headers.Add("Ocp-Apim-Subscription-Key", accountKey_TextAnalytics)
' Create POST data and convert it to a byte array.
Dim postData As String = jsonString
Dim byteArray As Byte() = Encoding.UTF8.GetBytes(postData)
' Set the ContentType property of the WebRequest.
request.ContentType = "application/json"
' Set the ContentLength property of the WebRequest.
request.ContentLength = byteArray.Length
' Get the request stream.
Dim dataStream As System.IO.Stream = request.GetRequestStream()
' Write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length)
' Close the Stream object.
dataStream.Close()
' Get the response.
Dim response As System.Net.WebResponse = request.GetResponse()
' Get the stream containing content returned by the server.
dataStream = response.GetResponseStream()
' Open the stream using a StreamReader for easy access.
Dim reader As New System.IO.StreamReader(dataStream)
' Read the content.
Dim responseFromServer As String = reader.ReadToEnd()
' Display the content.
Console.WriteLine(responseFromServer)
' Clean up the streams.
reader.Close()
dataStream.Close()
response.Close()
' Deserialize the json data
jss = New JavaScriptSerializer()
Dim jsonData = jss.Deserialize(Of Object)(responseFromServer)
' List of key phrases to be returned
Dim phrases As New List(Of String)
For Each item As String In jsonData("documents")(0)("keyPhrases")
phrases.Add(item)
Next
Return phrases
My question is, what might I be doing wrong here, that I'm receiving messages that my document is exceeding the size limit of 10240 bytes, but it appears that the data that I POST is well under that limit.

As Assaf mentioned above, please make sure to specify UTF-8 encoding.

Related

Vb.Net Get Resposnes form called API using WebClient

Currently, I am developing a web service which going to be called post JSON. It works fine and I got my records posted without any issue.
My issue is to display the response. I used UploadData to send .. Do I need to use download data to receive? What if I need to show in the response in MessageBox.
Note that I am expecting a response in JSON format as well. Let me know at least the concept ? First I guess, I need to receive the response and I will deserialize it.
Here is my current code. Working fine but I can't show the response.
Public Function postData(ByVal JsonBody As String) As Boolean
Dim webClient As New WebClient()
Dim resByte As Byte()
Dim resString As String
Dim reqString() As Byte
Try
Dim APIusername As String = "XXXXX"
Dim APIPassword As String = "XXXXX"
webClient.Headers("content-type") = "application/json"
webClient.Credentials = New System.Net.NetworkCredential(APIusername, APIPassword)
reqString = Encoding.Default.GetBytes(JsonBody)
resByte = webClient.UploadData(Me.urlToPost, "post", reqString)
resString = Encoding.Default.GetString(resByte)
Console.WriteLine(resString)
' Here I need to show the responses
webClient.Dispose()
Return True
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
Return False
End Function
I have used http request instead of web client and it works.

How to send a byte array as json using Visual Basic 6?

I have an Visual Basic 6 application which should read a file into a byte array, put that into a json object and make a post request to a rest API. I have everything working except the byte array part. When I use the code below, the API just receives the string "bytes" and not the actual content of the request that should look something like this "JVBERi0xLjQKJeLjz9MKCj"
Private Function PostDocumentPrint() As String
'Create http client
Dim http As Object
Set http = CreateObject("WinHttp.WinHttprequest.5.1")
url = "XXX"
http.Open "Post", url, False
'Set request parameters
http.SetRequestHeader "charset", "UTF-8"
http.SetTimeouts 500, 500, 500, 500
' Read file into byte array
Dim fileNum As Integer
Dim bytes() As Byte
fileNum = FreeFile
Open "FileToSend.pdf" For Binary As fileNum
ReDim bytes(LOF(fileNum) - 1)
Get fileNum, , bytes
Close fileNum
' Send the request
Dim jsonStringPostBody As String
jsonStringPostBody = " {""fileData"": "" " + bytes + """} "
http.Send jsonStringPostBody
End Function
I believe I need to convert the byte array to a string somehow. For example I tried this:
Dim s As String
s = StrConv(bytes, vbUnicode)
MsgBox s
But it does not look correct.

Streamreader to String not working properly

I am getting HttpWebResponse encoded in Base64
following lines get the webresponse from API.
Dim myResp As HttpWebResponse = myReq.GetResponse()
Dim myreader As New System.IO.StreamReader(myResp.GetResponseStream)
the response which i get is something like following, however actual response is too long and i cannot paste here so i have manually stripped the actual response.
{"status":"1","data":"eyJiMmIiOlt7ImludiI6W3siaXRtcyI6W3sibnVtIjoxODAxLCJpdG1fZGV0Ijp7ImNzYW10IjowLCJzYW10Ijo4MDkuOTEsInJ0IjoxOCwidHh2YWwiOjg5OTksImNhbXQiOjgwOS45MX19XSwidmFsIjoxMDYxOC44MiwiaW52X3R5cCI6IlIiLCJwb3MiOiIyNCIsImlkdCI6IjExLTA3LTIwMTgiLCJyY2hyZyI6Ik4iLCJpbnVtIjoiUldHSjA3LzE4LzAwMDU4NCIsImNoa3N1bSI6IjVjMjNiY2M1ZTQ3ZDI0NjU5YWQzNTEzNTM1YjhiNTAzNmM4NGU0MzU5NWJiMTVjYzA4M2VkYzBiNTQzZTQ1MzcifSx7Iml0bXMiOlt7Im51bSI6MTgwMSwiaXRtX2RldCI6eyJjc2FtdCI6MCwic2FtdCI6NDE4LjUsInJ0IjoxOCwidHh2YWwiOjQ2NTAsImNhbXQiOjQxOC41fX1dLCJ2YWwiOjU0ODcsImludl90eXAiOiJSIiwicG9zIjoiMjQiLCJpZHQiOiIyNS0wNy0yMDE4IiwicmNocmciOiJOIiwiaW51bSI6IlJXR0owNy8xOC8wMDEyNjEiLCJjaGtzdW0iOiJjOGEyMjNmNmMzYjY5ODZiYzE2MmNjYjdmMDhlZTYxMTdjYTdkOWZhNmEzYTExMWY1MmVjNzllYmExMGM5MWQ3In1dLCJjZnMiOiJZIiwiY3RpbiI6IjI0QUFCQ1I3MTc2QzFaSiJ9LHsiaW52IjpbeyJpdG1zIjpbeyJudW0iOjEsIml0bV9kZXQiOnsiY3NhbXQiOjAsInNhbXQiOjMzNzUsInJ0IjoxOCwidHh2YWwiOjM3NTAwLCJjYW10IjozMzc1fX1dLCJ2YWwiOjQ0MjUwLCJpbnZfdHlwIjoiUiIsInBvcyI6IjI0IiwiaWR0IjoiMzEtMDctMjAxOCIsInJjaHJnIjoiTiIsImludW0iOiJULTAxNzcvMjAxOC0xOSIsImNoa3N1bSI6ImYzNzFmYjA0N2FjNTRlOTkwYzZjNzM5Zjk0NTgwMzZlMWQxNjE0N2IxYmQ0ZTkxY2FlNmEwN2IyOGVlYzE0YWUifV0sImNmcyI6IlkiLCJjdGluIjoiMjRBQURDSTIwMzJFMVo5In1dfQ=="}
I am not sure why above Base64 Encoded message starts with {"status":"1","data":" and then ends with "}.
Actual Base64 data starts after {"status":"1","data":"
Due to those unsupported characters at starting and ending of the stream , i first try to convert actual response to string as shown below.
Dim myResp As HttpWebResponse = myReq.GetResponse()
Dim myreader As New System.IO.StreamReader(myResp.GetResponseStream)
Actual stream response returns around 248000 characters (as per response received in POSTMAN with same API). Streamreader information in Debug mode also shows same 248000 number. But when i convert them into string with following code line, string gets slimmed to around only 32000 characters. I don't know why this is happening?
Dim myText As String = myreader.ReadToEnd
'''Then following code will remove all those unwanted characters from starting string, which are {"status":"1","data":"
Dim Final_text As String = myText.Substring(myText.Substring(0, myText.LastIndexOf("""")).LastIndexOf("""") + 1)
'''Following code will remove two characters "} from end of the string.
Final_text = Final_text.Trim().Remove(Final_text.Length - 2)
''' Now Decode this proper Base64 String to JSON format
Dim data As Byte() = Convert.FromBase64String(Final_text)
Dim decodedString As String = Encoding.UTF8.GetString(data)
Dim JsonP As JObject = JObject.Parse(decodedString)
Dim SetPointerOut As JToken = JsonP("b2b")
Two things: why converting from Stream to String cut down actual response? 248000 charters to just apprx. 32000 characters. In debug mode if i type in ?mytext.length it returns 248000 as value. But When i hover mouse and brows what is in mytext variable, it shows me around 32000 charters only.
Service provider says Response which i get from API is Base64 encoded and i have to decode it before using it as JSON. Then why do i get unsupported characters at starting of the stream (even in Postman), is it Base64 Encoded message in serialized manner?
Am I doing right process to first convert the stream to string, remove unwanted characters and then Decode it? or there is some other way around.
Ok, issue of 32768 character in debug mode of Visual Studio is it self.
VS2015 had bug in which it does not support more than 32768 characters. Read
Why strings are shown partially in the Visual Studio 2008 debugger?
and
Visual Studio Text Visualizer missing text
The method which i was using to remove extra unwanted characters from "mytext" string, still works and give result. But as #Steve suggested in comment to the question, I should parse the JSON string. I find that idea much better and correct method.
so final code is like below:
Dim myResp As HttpWebResponse = myReq.GetResponse()
Dim myreader As New System.IO.StreamReader(myResp.GetResponseStream)
Dim myText As String = myreader.ReadToEnd
Dim json As String = myText
Dim jsonResult = JsonConvert.DeserializeObject(Of Dictionary(Of String, Object))(json)
Dim jsonObject As Newtonsoft.Json.Linq.JObject = Newtonsoft.Json.Linq.JObject.Parse(json)
Dim jsonValue As JValue = jsonObject("data")
Dim Final_text As String = jsonValue.ToString
''' No need of following code as doing JSON parse above
''' Dim Final_text As String = myText.Substring(myText.Substring(0, myText.LastIndexOf("""")).LastIndexOf("""") + 1)
'''Final_text = Final_text.Trim().Remove(Final_text.Length - 2)
Dim data As Byte() = Convert.FromBase64String(Final_text)
Dim decodedString As String = Encoding.UTF8.GetString(data)
Dim JsonP As JObject = JObject.Parse(decodedString)
Dim SetPointerOut As JToken = JsonP("b2b")

How should this HTTP request for oAuth token be formatted?

In a VB .net environment, I am making the following call, trying to implement the authorize step of an OAuth process to connect to an Accelo API (a time-entry and billing type of app). I'm trying to get an access token:
Dim jsonstring = "{'Content-Type':'Application/x-www-Form-urlencoded',
'authorization':'Basic MDBhM...GbG5oLlZB'}"
Dim data = Encoding.UTF8.GetBytes(jsonstring)
Dim result_post = SendRequest(New Uri("https://ourinfo.api.accelo.com/oauth2/v0/token"), data, "application/json", "POST")
with the function defined as this:
Private Function SendRequest(uri As Uri, jsonDataBytes As Byte(), contentType As String, method As String) As String
Dim req As WebRequest = WebRequest.Create(uri)
req.ContentType = contentType
req.Method = method
req.ContentLength = jsonDataBytes.Length
Dim stream = req.GetRequestStream()
stream.Write(jsonDataBytes, 0, jsonDataBytes.Length)
'stream.Close()
Dim response = req.GetResponse().GetResponseStream()
Dim reader As New StreamReader(response)
Dim res = reader.ReadToEnd()
reader.Close()
response.Close()
Return res
End Function
and I keep receiving an error on this line:
Dim response = req.GetResponse().GetResponseStream()
Saying
"System.Net.WebException: 'The remote server returned an error: (400) Bad Request.'"
It seems to me like a syntax error or something in the way my calling method is formed or the format of the parameters passed. I got the HTTP request code suggestion/format from here:
How to POST a JSON to a specific url using VB.NET?
and I'm using the Accelo API to set the content-type and authorization "basic" part where the string is encoded in Base 64. This is a service application so i'm supposed to be able to get the token in 1-step (no user confirmation is required). I already have a "token" from when I registered, but the API still indicates I should do this code. i'm following this:
https://api.accelo.com/docs/?_ga=2.218971609.1390377756.1568376911-2053161277.1565440093#service-applications
Can anyone tell me what exactly I'm doing wrong here? I'm confused and this is the first time I'm trying to implement OAuth.
There is some example code in the API documentation that looks like this:
POST /oauth2/v0/token HTTP/1.1
Host: planet-express.api.accelo.com
Content-Type: application/x-www-form-urlencoded
Authorization: Basic {client_credentials}
grant_type=client_credentials
scope=read(staff)
And I'm not sure the difference between the = and : syntax purposes. I was not able to search and find any answers to whether I'm calling everything correctly. Should I be passing the scope and grant_type in the JSON string, or setting it as a property on "req" object in the "SendRequest" function? I know that the grant_type is supposedly required but how do I set it?
What's the token I received initially when I registered, if I'm supposed to get a token this way?
I got it working with the help of a colleague. Apparently I was confusing header data and the json "data,"and I was not formatting the data correctly either. I should have been looking at the WebResponse too, not only the WebRequest. I changed my code to the following, which works now:
Sub SendRequestGetAccess()
Dim req As WebRequest = WebRequest.Create(uri)
req.Method = "POST"
req.Headers.Add("Authorization", "Basic " & "MDB....")
req.ContentType = "Application/x-www-Form-urlencoded"
req.ContentLength = jsonDataBytes.Length
Dim stream = req.GetRequestStream()
stream.Write(jsonDataBytes, 0, jsonDataBytes.Length)
stream.Close()
Dim response As WebResponse = req.GetResponse()
Console.WriteLine((CType(response, HttpWebResponse)).StatusDescription)
Dim dataStream = response.GetResponseStream()
Dim reader As StreamReader = New StreamReader(dataStream)
Dim responseFromServer As String = reader.ReadToEnd()
Console.WriteLine(responseFromServer)
reader.Close()
dataStream.Close()
response.Close()
'Dim firstItem = jsonResult.Item("data").Item(0).Value(Of String)("token")
Dim j As Object = New JavaScriptSerializer().Deserialize(Of Object)(responseFromServer)
Dim _itemvalue = j("itemkey")
End Sub
And I had to use dashes instead of colons in my json data:
Dim jsonstring = "grant=creds" & "&scope=read"
Dim data = Encoding.UTF8.GetBytes(jsonstring)
SendRequestGetAccess(New Uri("https...site.com/oauth2/v0/token"), data)

Parsing html table containing images to datatable attribute

i used the following code to parse html table inner text to datatable (using Html-Agility-Pack):
Imports System.Net
Public Sub ParseHtmlTable(byval HtmlFilePath as String)
Dim webStream As Stream
Dim webResponse = ""
Dim req As FileWebRequest
Dim res As FileWebResponse
' REQUEST PAGE (We are requesting Google Finance Page with NSE:RENUKA Stock Info
req = WebRequest.Create("file:///" & HtmlFilePath)
req.Method = "GET" ' Method of sending HTTP Request(GET/POST)
res = req.GetResponse ' Send Request
webStream = res.GetResponseStream() ' Get Response
Dim webStreamReader As New StreamReader(webStream)
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml(webStreamReader.ReadToEnd())
Dim nodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//table/tbody/tr")
Dim dtTable As New DataTable("Table1")
Dim Headers As List(Of String) = nodes(0).Elements("th").Select(Function(x) x.InnerText.Trim).ToList
For Each Hr In Headers
dtTable.Columns.Add(Hr)
Next
For Each node As HtmlAgilityPack.HtmlNode In nodes
Dim Row = node.Elements("td").Select(Function(x) x.InnerText.Trim).ToArray
dtTable.Rows.Add(Row)
Next
dtTable.WriteXml("G:\1.xml", XmlWriteMode.WriteSchema)
End Sub
How to parse an html table containing images to a Datatable and saving images as binary or saving their links using VB.net
I found the answer finally. images look like:
<img src="img.jpg"/>
We can use
.SelectNodes("./img").Attributes("src").Value()
To return the image path on the node containing it