My POST request on TIdHttpServer contains strange characters, json string - json

I have an issue with the JSON string I get when receiving a POST request. Currently this is the way I'm reading it:
procedure TForm1.IdHTTPServer1CommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
Stream : TStream;
S : string;
begin
If ARequestInfo.Command = 'POST' then
begin
Stream := ARequestInfo.PostStream;
if assigned(Stream) then
begin
Stream.Position := 0;
S := UTF8ToAnsi(ReadStringFromStream(Stream));
end;
end;
end;
I tried ReadStringFromStream() alone and with UTF8ToAnsi()and AnsiToUTF8(), but I keep getting a string that looks like this:
'['#$A#9'{'#$A#9#9'"test":"bb",'#$A#9#9'"test":"aa"'#$A#9'}'#$A']'
I know it has something to do with encoding, but I don't know how to fix it.

You do know that the hash (#) sign denotes a character value and that the dollar ($) sign denotes hexadecimal values, do you. Thus #$A means character decimal 10, which happens to mean NewLine and #9 means character 9 which is the TAB character. There is nothing unexpected in the return string. If you feed it into something that understands a NewLine without a preceding CarriageReturn it will probably look as you expected.
The debugger for exmple, uses the #-syntax for characters that cant be otherwise visually represented.

The data you showed in your example is perfectly fine, as explained by Tom B. You are looking at the string data in the debugger, where #A is a line break and #9 is a tab character, so the actual string looks like this:
[
{
"test":"bb",
"test":"aa"
}
]
Which is valid JSON.
However, the way you are reading the data is not OK, especially if you are using a Unicode version of Delphi (2009+). You are not passing any value to the AByteEncoding parameter of ReadStringFromString(), so it will decode the stream bytes using Indy's default encoding, which is 7bit US-ASCII by default (see the GIdDefaultTextEncoding variable in the IdGlobal unit). JSON uses UTF-8 by default, so you will corrupt the JSON if it contains any non-ASCII characters. Using UTF8ToAnsi() after the fact won't fix that.
Your code should look like this instead:
procedure TForm1.IdHTTPServer1CommandGet(AContext: TIdContext; ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
Stream : TStream;
S : string;
begin
If ARequestInfo.CommandType = hcPOST then
begin
Stream := ARequestInfo.PostStream;
if Assigned(Stream) then
begin
S := ReadStringFromStream(Stream, -1, IndyTextEncoding_UTF8);
end
end
end;
That tells Indy to decode the stream bytes as UTF-8 to UTF-16, then return the decoded string (if you are using a non-Unicode version of Delphi, the UTF-16 data will be converted to ANSI upon exit, subject to the optional ADestEncoding parameter of ReadStringFromStream()).

Related

Decoding and comparing JSON with accented char

I have an IntraWeb app. In the HTML template, I have Javascript creating a JSON document.
This JSON is sent to the IntraWeb backend and I receive the JSON as:
{"order":"Razão Social"}
I parse the JSON and put "Razão Social" in a var _order.
My problem is when I try to compare that value with a string, it fails. I am having some problem with the encoding. The line
if uppercase(_order) = 'RAZÃO SOCIAL' then
is always false.
I put a breakpoint and I can see the accented char is not OK.
s:=aParams.Values['xorder'];
if s<>'' then begin
jso := TJSonObject.ParseJSONValue(TEncoding.UTF8.GetBytes(s),0) as TJSONObject;
try
jso.TryGetValue<string>('order',_order);
finally
jso.free;
end;
end;
if uppercase(_order) = 'RAZÃO SOCIAL' then
_order:='Order by A.razao_social ';
UpperCase supports ASCII characters only. Instead compare string case insensitively using AnsiCompareText or AnsiSameText, which are aware of Unicode.

The input is not a valid Base-64 string Error, My JSON uses double quotation marks instead of single quotations

I am trying to send a JSON post, but my JSON seems to differ a little bit from what the site usually sends
Here is what i am sending:
{"CodeNumberTextBox":"","txusername":"yC6IBEbznlRlKOKv8zrhiA","txpass":"pAQAyrr5u9/hK35iTIlt7Q=="}
Here is what the website sends when you click login:
{ CodeNumberTextBox:'', txusername:'yC6IBEbznlRlKOKv8zrhiA', txpass:'pAQAyrr5u9/hK35iTIlt7Q==' }
Here is the error i get when i send my JSON:
'The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters. '
I use the code below to create my JSON:
jsonRawRequest := TJSONObject.Create;
jsonObject.AddPair('CodeNumberTextBox', '');
jsonObject.AddPair('txusername', 'yC6IBEbznlRlKOKv8zrhiA');
jsonObject.AddPair('txpass', 'pAQAyrr5u9/hK35iTIlt7Q==');
jsonRequest := TStringStream.Create(jsonRawRequest.ToString(), TEncoding.UTF8);
idHttp.Request.ContentType := 'application/json';
idHttp.Request.Referer := 'SomeURL';
idHttp.Post(URL, jsonRequest, ms)

Does SuperObject have UTF-8 support

I have been using superobject for all my json parsing needs and today I ran into a bit of a problem that I cannot seem to fix. I downloaded a json file that had an entry in it that looked like this: "place" : "café"and when I tried to parse the file and show it in a messagebox the word café turned out like this: café which tells me that the there is some kind of conversion failure going on when the file was parsed using superobject so before I invest any more time in this library, I would like to know if it supports UTF-8 and if so, how would I go about enabling it.
BTW, The pseudo code I am using to parse the file looks something like this:
uses
SuperObject
...
const
jsonstr = '{ "Place" : "café" }';
...
var
SupOB : ISuperObject;
begin
SupOB := SO(jsonstr);
ShowMessage(SupOB['Place'].AsString);
end;
Is the conversion failing because I am casting the object as a string? I tried also using AsJsonto see if that would have any effect, but it did not so I am not sure what is needed to make objects like these display as they are intended and would appreciate some help. Finally, I have checked and verified that the original file that is being parsed is indeed encoded as UTF-8.
You say you are parsing a file, but your example is parsing a string. That makes a big difference, because if you are reading file data into a string first, you are likely not reading the file data correctly. Remember that Delphi strings use UTF-16 in Delphi 2009 and later, but use ANSI in earlier versions. Either way, not UTF-8. So if your input file is UTF-8 encoded, you must decode its data to the proper string encoding before you can then parse it. café is the UTF-8 encoded form of café being mis-interpreted as ANSI.
Reading and writing files json encoded utf8. Tested on Delphi 2007.
function ReadSO(const aFileName: string): ISuperObject;
var
input: TFileStream;
output: TStringStream;
begin
input := TFileStream.Create(aFileName, fmOpenRead, fmShareDenyWrite);
try
output := TStringStream.Create('');
try
output.CopyFrom(input, input.Size);
Result := TSuperObject.ParseString(PWideChar(UTF8ToUTF16(output.DataString)), true, true);
finally
output.Free;
end;
finally
input.Free;
end;
end;
procedure WriteSO(const aFileName: string; o: ISuperObject);
var
output: TFileStream;
input: TStringStream;
begin
input := TStringStream.Create(UTF16ToUTF8(o.AsJSon(true)));
try
output := TFileStream.Create(aFileName, fmOpenWrite or fmCreate, fmShareDenyWrite);
try
output.CopyFrom(input, input.Size);
finally
output.Free;
end;
finally
input.Free;
end;
end;
Functions UTF8ToUTF16 and UTF16ToUTF8 from unit JclConversions http://sourceforge.net/projects/jcl/.

Does the ulkJSON library have limitations when dealing with base64 in Delphi 7?

I'm working on a project that is using Delphi 7 to consume RESTful services. We are creating and decoding JSON with the ulkJSON library. Up to this point I've been able to successfully build and send JSON containing a base64 string that exceed 5,160kb. I can verify that the base64 is being received by the services and verify the integrity of the base64 once its there. In addition to sending, I can also receive and successfully decode JSON with a smaller (~ 256KB or less) base64.
However I am experiencing some issues on the return trip when larger (~1,024KB+) base64 is involved for some reason. Specifically when attempting to use the following JSON format and function combination:
JSON:
{
"message" : "/9j/4AAQSkZJRgABAQEAYABgAAD...."
}
Function:
function checkResults(JSONFormattedString: String): String;
var
jsonObject : TlkJSONObject;
iteration : Integer;
i : Integer;
x : Integer;
begin
jsonObject := TlkJSONobject.Create;
// Validate that the JSONFormatted string is not empty.
// If it is empty, inform the user/programmer, and exit from this routine.
if JSONFormattedString = '' then
begin
result := 'Error: JSON returned is Null';
jsonObject.Free;
exit;
end;
// Now that we can validate that this string is not empty, we are going to
// assume that the string is a JSONFormatted string and attempt to parse it.
//
// If the string is not a valid JSON object (such as an http status code)
// throw an exception informing the user/programmer that an unexpected value
// has been passed. And exit from this routine.
try
jsonObject := TlkJSON.ParseText(JSONFormattedString) as TlkJSONobject;
except
on e:Exception do
begin
result := 'Error: No JSON was received from web services';
jsonObject.Free;
exit;
end;
end;
// Now that the object has been parsed, lets check the contents.
try
result := jsonObject.Field['message'].value;
jsonObject.Free;
exit;
except
on e:Exception do
begin
result := 'Error: No Message received from Web Services '+e.message;
jsonObject.Free;
exit;
end;
end;
end;
As mentioned above when using the above function, I am able to get small (256KB and less) base64 strings out of the 'message' field of a JSON object. But for some reason if the received JSON is larger than say 1,024kb the following line seems to just stop in its tracks:
jsonObject := TlkJSON.ParseText(JSONFormattedString) as TlkJSONobject;
No errors, no results. Following the debugger, I can go into the library, and see that the JSON string being passed is not considered to be JSON despite being in the format listed above. The only difference I can find between calls that work as expected and calls that do not work as expect appears to be the size of base64 being transmitted.
Am I missing something completely obvious and should be shot for my code implementation (very possible)? Have I missed some notation regarding the limitations of the ulkJSON library? Any input would be extremely helpful. Thanks in advance stack!
So after investigating this for hours over the course of some time, I did discover that the library indeed was working properly and there was no issue.
The issue came down to the performance of my machine as it was taking on average 215802 milliseconds (3.5967 minutes) to process a moderately sized image (1.2 meg) in base64 format. This performance scaled according to the size of the base64 string (faster for smaller, longer for larger).

Remove invalid UTF-8 characters from a string

I get this on json.Marshal of a list of strings:
json: invalid UTF-8 in string: "...ole\xc5\"
The reason is obvious, but how can I delete/replace such strings in Go? I've been reading docst on unicode and unicode/utf8 packages and there seems no obvious/quick way to do it.
In Python for example you have methods for it where the invalid characters can be deleted, replaced by a specified character or strict setting which raises exception on invalid chars. How can I do equivalent thing in Go?
UPDATE: I meant the reason for getting an exception (panic?) - illegal char in what json.Marshal expects to be valid UTF-8 string.
(how the illegal byte sequence got into that string is not important, the usual way - bugs, file corruption, other programs that do not conform to unicode, etc)
In Go 1.13+, you can do this:
strings.ToValidUTF8("a\xc5z", "")
In Go 1.11+, it's also very easy to do the same using the Map function and utf8.RuneError like this:
fixUtf := func(r rune) rune {
if r == utf8.RuneError {
return -1
}
return r
}
fmt.Println(strings.Map(fixUtf, "a\xc5z"))
fmt.Println(strings.Map(fixUtf, "posic�o"))
Output:
az
posico
Playground: Here.
For example,
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
s := "a\xc5z"
fmt.Printf("%q\n", s)
if !utf8.ValidString(s) {
v := make([]rune, 0, len(s))
for i, r := range s {
if r == utf8.RuneError {
_, size := utf8.DecodeRuneInString(s[i:])
if size == 1 {
continue
}
}
v = append(v, r)
}
s = string(v)
}
fmt.Printf("%q\n", s)
}
Output:
"a\xc5z"
"az"
Unicode Standard
FAQ - UTF-8, UTF-16, UTF-32 & BOM
Q: Are there any byte sequences that are not generated by a UTF? How
should I interpret them?
A: None of the UTFs can generate every arbitrary byte sequence. For
example, in UTF-8 every byte of the form 110xxxxx2 must be followed
with a byte of the form 10xxxxxx2. A sequence such as <110xxxxx2
0xxxxxxx2> is illegal, and must never be generated. When faced with
this illegal byte sequence while transforming or interpreting, a UTF-8
conformant process must treat the first byte 110xxxxx2 as an illegal
termination error: for example, either signaling an error, filtering
the byte out, or representing the byte with a marker such as FFFD
(REPLACEMENT CHARACTER). In the latter two cases, it will continue
processing at the second byte 0xxxxxxx2.
A conformant process must not interpret illegal or ill-formed byte
sequences as characters, however, it may take error recovery actions.
No conformant process may use irregular byte sequences to encode
out-of-band information.
Another way to do this, according to this answer, could be
s = string([]rune(s))
Example:
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
s := "...ole\xc5"
fmt.Println(s, utf8.Valid([]byte(s)))
// Output: ...ole� false
s = string([]rune(s))
fmt.Println(s, utf8.Valid([]byte(s)))
// Output: ...ole� true
}
Even though the result doesn't look "pretty", it still nevertheless converts the string into a valid UTF-8 encoding.