How to replace special symbols in a binary? - binary

I am trying to read a pdf from a URL, return it as a binary and replace some characters. This is working for plain text with the following code but if the pdf has any special symbols like Trademark, copyright etc then my webservice is unable to return the result. Can some one please help me how to achieve this. The output should definitely be a binary output :
String html="";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
InputStream in = new URL(jsonobj.getString("xBody")).openStream();
int reads = in.read();
while(reads != -1){
baos.write(reads);
reads = in.read();
}
html= baos.toString();

The method baos.toString() internally calls new String(buffer), which uses the default encoding (the encoding actually being used by your system, probably not UTF-8). Try to provide the encoding explicitly, as follows:
String html = new String(baos.toByteArray(), "UTF-8");

Related

Superscript characters are resulting in to junk characters in generated CSV file when using ICsvListWriter

I am trying to write superscript characters in .csv file. I am using method write(List<?> columns)of org.supercsv.io.ICsvListWriter. In generated .csv file the superscript character is coming along with junk character before it.
List columns = new ArrayList();
String myString = "abcd1";
columns.add(myString.replaceAll("1", "¹"));
csvWriter.write(columns);
In the generated .csv file it is coming as
abcd¹
I also tried with unicode but it is not helping.
columns.add(myString.replaceAll("1", "\u00B9"));
Any suggestion here please?
Found a solution for this problem. Correction was needed in creating ICsvListWriter object. Previously I was having this code where 'response' is HttpServletResponse.
CsvPreference preference = new CsvPreference.Builder(CsvPreference.STANDARD_PREFERENCE).useEncoder(new DefaultCsvEncoder()).build();
ICsvListWriter csvWriter = new CsvListWriter(response.getWriter(), preference);
This code is enhanced to this:
ServletOutputStream output = response.getOutputStream();
output.write(new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF });
PrintWriter writer = new PrintWriter(new OutputStreamWriter(output, "UTF-8"));
CsvPreference preference = new CsvPreference.Builder(CsvPreference.STANDARD_PREFERENCE).useEncoder(new DefaultCsvEncoder()).build();
ICsvListWriter csvWriter = new CsvListWriter(writer, preference);
This fixed the issue and all of the superscript characters are now coming properly in generated CSV file without any junk characters. No mater whether I use actual superscript characters or their Unicode, this fix works.

How to read a text file with other encoding than UFT8 or UTF16 in WinRT?

If I read a textfile using FileIO.ReadTextAsync, ReadLinesAsync or a DataReader, I can only specify a member of the UnicodeEncoding enum for the encoding. This includes for some reason only Utf8, Utf16BE and Utf16LE. How can I read a text file with another encoding (like Windows-1252 or even regular Unicode (with 2 Bytes for all characters)) then?
This may be important if Windows Store Apps share text files with Desktop applications or read text files from the internet.
Hans' comment actually gave the answer to my question. Sample for Windows-1252:
string filePath = ...
StorageFile file = await StorageFile.GetFileFromPathAsync(filePath);
IBuffer buffer = await FileIO.ReadBufferAsync(file);
byte[] fileData = buffer.ToArray();
Encoding encoding = Encoding.GetEncoding("Windows-1252");
string text = encoding.GetString(fileData, 0, fileData.Length);
#JürgenBayer buffer.ToArray() wasn't available for me.
So, instead of writing:
string text = await FileIO.ReadTextAsync(file);
I wrote:
IBuffer buffer = await FileIO.ReadBufferAsync(file);
byte[] fileData;
CryptographicBuffer.CopyToByteArray(buffer, out fileData);
Encoding encoding = Encoding.GetEncoding("Windows-1252");
string text = encoding.GetString(fileData, 0, fileData.Length);

Using as3Crypto to encrypt/decrypt with only URL Query save chars

I was using as3Crypto with no probs
http://www.zedia.net/2009/as3crypto-and-php-what-a-fun-ride/
but it produces a string which includes equal (and probably other URL Query unsafe characters). Is there a way to encrypt like this?
Current code below:
public function encrypt(txt:String = ''):String
{
var data:ByteArray = Hex.toArray(Hex.fromString(txt));
var pad:IPad = new PKCS5;
var mode:ICipher = Crypto.getCipher(type, key, pad);
pad.setBlockSize(mode.getBlockSize());
mode.encrypt(data);
return ''+Base64.encodeByteArray(data);
}
Yes, base 64 encoding is the normal way to do this, although you must still URL escape the result, because Base64 contains unsafe characters as well ('/', '+' and '=' to be precise).

Strange Base64 encode/decode problem

I'm using Grails 1.3.7. I have some code that uses the built-in base64Encode function and base64Decode function. It all works fine in simple test cases where I encode some binary data and then decode the resulting string and write it to a new file. In this case the files are identical.
But then I wrote a web service that took the base64 encoded data as a parameter in a POST call. Although the length of the base64 data is identical to the string I passed into the function, the contents of the base64 data are being modified. I spend DAYS debugging this and finally wrote a test controller that passed the data in base64 to post and also took the name of a local file with the correct base64 encoded data, as in:
data=AAA-base-64-data...&testFilename=/name/of/file/with/base64data
Within the test function I compared every byte in the incoming data parameter with the appropriate byte in the test file. I found that somehow every "+" character in the input data parameter had been replaced with a " " (space, ordinal ascii 32). Huh? What could have done that?
To be sure I was correct, I added a line that said:
data = data.replaceAll(' ', '+')
and sure enough the data decoded exactly right. I tried it with arbitrarily long binary files and it now works every time. But I can't figure out for the life of me what would be modifying the data parameter in the post to convert the ord(43) character to ord(32)? I know that the plus sign is one of the 2 somewhat platform dependent characters in the base64 spec, but given that I am doing the encoding and decoding on the same machine for now I am super puzzled what caused this. Sure I have a "fix" since I can make it work, but I am nervous about "fixes" that I don't understand.
The code is too big to post here, but I get the base64 encoding like so:
def inputFile = new File(inputFilename)
def rawData = inputFile.getBytes()
def encoded = rawData.encodeBase64().toString()
I then write that encoded string out to new a file so I can use it for testing later. If I load that file back in as so I get the same rawData:
def encodedFile = new File(encodedFilename)
String encoded = encodedFile.getText()
byte[] rawData = encoded.decodeBase64()
So all that is good. Now assume I take the "encoded" variable and add it to a param to a POST function like so:
String queryString = "data=$encoded"
String url = "http://localhost:8080/some_web_service"
def results = urlPost(url, queryString)
def urlPost(String urlString, String queryString) {
def url = new URL(urlString)
def connection = url.openConnection()
connection.setRequestMethod("POST")
connection.doOutput = true
def writer = new OutputStreamWriter(connection.outputStream)
writer.write(queryString)
writer.flush()
writer.close()
connection.connect()
return (connection.responseCode == 200) ? connection.content.text : "error $connection.responseCode, $connection.responseMessage"
}
on the web service side, in the controller I get the parameter like so:
String data = params?.data
println "incoming data parameter has length of ${data.size()}" //confirm right size
//unless I run the following line, the data does not decode to the same source
data = data.replaceAll(' ', '+')
//as long as I replace spaces with plus, this decodes correctly, why?
byte[] bytedata = data.decodeBase64()
Sorry for the long rant, but I'd really love to understand why I had to do the "replace space with plus sign" to get this to decode correctly. Is there some problem with the plus sign being used in a request parameter?
Whatever populates params expects the request to be a URL-encoded form (specifically, application/x-www-form-urlencoded, where "+" means space), but you didn't URL-encode it. I don't know what functions your language provides, but in pseudo code, queryString should be constructed from
concat(uri_escape("data"), "=", uri_escape(base64_encode(rawBytes)))
which simplifies to
concat("data=", uri_escape(base64_encode(rawBytes)))
The "+" characters will be replaced with "%2B".
You have to use a special base64encode which is also url-safe. The problem is that standard base64encode includes +, / and = characters which are replaced by the percent-encoded version.
http://en.wikipedia.org/wiki/Base64#URL_applications
I'm using the following code in php:
/**
* Custom base64 encoding. Replace unsafe url chars
*
* #param string $val
* #return string
*/
static function base64_url_encode($val) {
return strtr(base64_encode($val), '+/=', '-_,');
}
/**
* Custom base64 decode. Replace custom url safe values with normal
* base64 characters before decoding.
*
* #param string $val
* #return string
*/
static function base64_url_decode($val) {
return base64_decode(strtr($val, '-_,', '+/='));
}
Because it is a parameter to a POST you must URL encode the data.
See http://en.wikipedia.org/wiki/Percent-encoding
paraquote from the wikipedia link
The encoding used by default is based
on a very early version of the general
URI percent-encoding rules, with a
number of modifications such as
newline normalization and replacing
spaces with "+" instead of "%20"
another hidden pitfall everyday web developers like myself know little about

Replacing string in html dynamically in Android

I am using "loadDataWithBaseUrl(...)" to load a html file, stored in assets, to Webview. that contains a string "Loading..." and a rotating GIF. String "Loading..." is hard coded, and it'll not be localized. How to replace that string dynamically, so that it can be localized?
Please help me to resolve this.
There are various solutions I could think of :
Load a different asset file according to the current language (get the current language using Locale.getDefault()), This way you can translate your HTML files independently.
Use place holders in your HTML file (for instance #loading_message#), then load the asset file in a String, replace all the occurences of the placeholder by the appropriate localised message (String.replaceAll("#loading_message#", getText(R.string.loading_message).toString())), finally load the processed HTML into the WebView using the loadData(String data, String mimeType, String encoding) function.
To load the asset file, you can do something like that:
File f = new File("file:///android_asset/my_file.html");
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);
StringBuffer sb = new StringBuffer();
String eachLine = br.readLine();
while(eachLine != null) {
sb.append(eachLine);
sb.append("\n");
eachLine = br.readLine();
}
// sb.toString is your HTML file as a String
I had a similar problem when using the WebView to show help text that should be translated.
My solution was to add multiple translated HTML files in assets and loading them with:
webView.loadUrl("file:///android_asset/" + getResources().getString(R.string.help_file));
For more details go to: Language specific HTML Help in Android
String str = "Loading ..."
String newStr = str.substring("Loading ".length());
newStr = context.getResourceById(R.string.loading) + newStr;
I hope the code is sufficiently clear to understand the idea: extract the string without "Loading " and concatenate it with the localized version of "Loading" string