box-api Download a File: how to save raw text response to file with correct encoding - box-api

I implemented downloading a file. I received raw string data. I save this data to a file, but can't open it, because file is invalid.
I tried to download jpg image.
I think this is a result of incorrect converting this data to bytes.
What Encoding I need to use?

Related

reading arabic text from text file and save the output in json

I performed ocr on images to extract Arabic content. I stored the output in a text file using
f=open(filename,'w',encoding='utf-8')
f.write(text)
f.close()
The output in the txt file is readable. But when I read the txt file using
file=open(filename,'r',encoding='utf-8')
json[name]=file.read()
I get this weird encoding that i couldn't solve
It turned out that the problem is from json when using dump I changed ensure_ascii=False and it kept it as it is.
same problem but in json

Check the CSV file encoding in Data Factory

I am implementing a pipeline to move csv files from one folder to another in a data lake with the condition that the CSV file is encoded in UTF8.
Is it possible to check the encoding of a csv file directly in data factory/data flow?
Actually, the encoding is set in the connection conditions of the dataset. What happens in this case, if the encoding of the csv file is different?
What happens at the database level if the csv file is staged with a wrong encoding?
Thank you in advance.
Just for now, we can't check the file encoding in Data Factory/Data Flow directly. We must per-set the encoding type to read/write test files:
Ref: https://learn.microsoft.com/en-us/azure/data-factory/format-delimited-text#dataset-properties
The Data Factory default file encoding is UTF-8.
Like #wBob said, you need to achieve the encoding check in code level, like Azure Function or Notebook and so on. Call these actives in pipeline.
HTH.

Trying to load a UTF-8 CSV file with a flat file source in SSIS, keep getting errors saying it is a ANSI file format

I have a SSIS data flow task that reads from a CSV file and stores the results in a table.
I am simply loading the CSV file by rows (not even seperating the columns) and dumpting the entire row to the database, very simple process.
The file contains UTF-8 characters, and the file also has the UTF BOM already as I verified this.
Now when I load the file using a flat file connection, I have the following settings currently:
Unicode checked
Advanced editor shows the column as "Unicode text stream DT_NTEXT".
When I run the package, I get this error:
[Flat File Source [16]] Error: The data type for "Flat File
Source.Outputs[Flat File Source Output].Columns[DataRow]" is DT_NTEXT,
which is not supported with ANSI files. Use DT_TEXT instead and
convert the data to DT_NTEXT using the data conversion component.
[Flat File Source [16]] Error: Unable to retrieve column information
from the flat file connection manager.
It is telling me to use DT_TEXT but my file is UTF-8 and it will loose its encoding right? Makes no sense to me.
I have also tried with the Unicode checkbox unchecked, and setting the codepage to "65001 UTF-8" but I still get an error like the above.
Why does it say my file is an ANSI file?
I have opened my file in sublime text and saved it as UTF-8 with BOM. My preview of the flat file does show other languages correctly like Chinese and English combined.
When I didn't check Unicode, I would also get this error saying the flat files error output column is DT_TEXT and when I try and change it to Unicode text stream it gives me a popup error and doesn't allow me to do this.
I have faced this same issue for years, and to me it seems like it could be a bug with the Flat File Connection provider in SQL Server Integration Services (SSIS). I don't have a direct answer to your question, but I do have a workaround. Before I load data, I convert all UTF-8 encoded text files to UTF-16LE (Little Endian). It's a hassle, and the files take up about twice the amount of space uncompressed, but when it comes to loading Unicode into MS-SQL, UTF-16LE just works!
With regards to the actual conversion step I would say that is for you to decide what will work best in your workflow. When I have just a few files then I convert them one-by-one in a text editor, but when I have a lot of files then I use PowerShell. For example,
Powershell -c "Get-Content -Encoding UTF8 'C:\Source.csv' | Set-Content -Encoding Unicode 'C:\UTF16\Source.csv'"

 or ? character is prepended to first column when reading csv file from s3 by using camel

The csv file is located in S3 bucket, and I am using camel aws to consume the csv file.
However, whenever the csv file is loaded to local,  or ? character is pretended to first column.
For example,
original file
firstname, lastname
brian,xi
after load to local
firstname,lastname
brian,xi
I have done research on this link : R's read.csv prepending 1st column name with junk text
however, it does not seem to work for camel.
how to read csv file from s3
use aws-s3 to consume csv file from s3 bucket such as "Exchange s3File = consumer.receive(s3Endpoint)" where s3Endpoint = "aws-s3://keys&secret?prefix=%s&deleteAfterRead=false&amazonS3Client=#awsS3client"
The characters  are a UTF-8 BOM (Hex EF BB BF). So this is meta data about the file content that is placed at the beginning of the file (because there is no "header" or similar place where it can be saved to).
If you read a file that begins with this sequence, but you read it as Windows standard encoding (CP1252) or ISO-8859-1, you get exactly these three strange characters at the beginning of the file content.
To avoid that you have to read the file as UTF-8 and BOM aware as suggested in #jws comment. He also provided this link with an example how to use a BOMInputStream to read such files correctly.
If the file is correctly read, and you write it back into a file with a different encoding like CP1252, the BOM should be removed.
So, now the question is how exactly do you read the file with Camel? If you (or a library) read it (perhaps by default) with a non-UTF-8 encoding, that explains why you get these characters in the file content.

Sending an html file saved in file system as a response

I have developed a small Java Web Application where a user will upload a XML file. Once the user submits the form, the XML file is converted into an html file using XSLT and saved on the file system. Now I want to send this file back to the client/user as a response. How can I do that ?
Read the bytes of the file, and send them to the HTTP response's output stream. You should also set the content type to text/html, and the content length to the length of the file.