Opening an UTF-8 CSV containing HTML with Excel - html

I have a multi-language website and i need to open an csv in Excel for the translation company to translate the content from english to mandarin.
The file is UTF-8 and when I open it by double clicking, Excel doesn't care what charset it is and some characters are messed up. When I open it through Data->Import Text, select UTF-8 and pick my semicolon delimiter and such, the characters are good but Excel generates new rows if it comes across a html closing tag
Any help? I'd be glad to upload the CSV somewhere so you can try for yourselves.

Throw Excel in the garbage and use OpenOffice Calc. It did the job perfectly! When I open the CSV with Calc, it automatically asks for the charset and it had no problems with keeping the HTML in one cell instead of adding new rows for each tag

Related

Writing Chinese characters to csv from peoplecode peoplesoft

I have hard-coded Chinese characters in peoplecode. It is written to a CSV file. This CSV file is attached via an email notification. However, when the user receives the email and opens the CSV file attachment, the Chinese characters are being shown as some weird symbols or characters. I am using app engine by the way that uses PSUNX.
Anyone have any workaround about this?
The problem appears to be that you are not writing the same character set that your recipient is opening the file with. Since you are using UTF8, your choice does support the Chinese characters.
I see you have a couple options:
Find out the character set your recipient is using and use that character set when writing the file.
Educate the recipient that the file is in UTF8 and that they may need to open it differently. Here is a link on how to open a CSV using UTF8 in Excel.
Alright managed to solve it using UTF8BOM.

CodedUI test does not read data from CSV input file

I am having difficulty mapping a CSV file with the Coded UI test method. This is most likely a stupid question but I cannot seem to find a solution for my problem, at least not one that works. I have made sure to set the property of the CSV file to Copy always.
I have also imported the CSV file by writing the following line above the test method.
[DataSource("Microsoft.VisualStudio.TestTools.DataSource.CSV", "|DataDirectory|\\Data\\login.csv", "login#csv", DataAccessMethod.Sequential), DeploymentItem("login.csv"), TestMethod]
The file name is login.csv and it resides in the Data folder.
The test will compile without any problem but once the test executes the fields that should receive input from the CSV file are left empty and the execution is interrupted. I've tried replacing the data from the CSV file by using Strings and it works perfectly fine. The piece of code I am using to import each parameter is:
TestContext.DataRow["Username"].ToString()
Also, the CSV file contains something along the following lines:
Username,Password,Fullname
admin#mail.com,password,Admin
Is there anyone who can point what it is I am forgetting.
Update: I pinpointed the issue, it seems like the issue only revolves around the first column in the csv file. When I try to import any of the other values it works perfectly fine.
Some text files start with a Byte Order Mark (BOM). The CSV reader within Coded UI does not handle the BOM and treats it as part of the first field name. The screen shot below shows the debug trace of a CSV file with a BOM and that same file shown in Notepad++. The DataRow.ItemArray[...] values are as expected. The DataRow.Table.Columns.ResultsView[...] shows the field names, but the first field name includes the BOM.
This CSV file with a BOM was created in Visual Studio using Solution Explorer => Add => New item => C# => General => Text file. Previously I have created a spread sheet with Microsoft Excel and saved it as a CSV file, that file did not have a BOM. I have also created files with Notepad++ and saved as CSV and they did not have a BOM. It appears that Visual Studio creates files with a BOM but when editing CSV files it does not add a BOM.
Visual Studio can create files with the correct encoding. Within "Step 2 - Create a data set" of this Microsoft page it states the text below. (Thanks also to Holistic Developer for providing very similar details in a comment.):
It is important to save the .csv file using the correct encoding. On the FILE menu, choose Advanced Save Options and choose Unicode
(UTF-8 without signature) – Codepage 65001 as the encoding.
For Visaul Studio 2010, i could solve issue be selecting "Western European (Windows) - Codepage 1252" encoding for CSV files.
Summary of steps:
In visual studio 2010, Open CSV file > Go to File menu > Select " Advanced Save Options" > Select "Western European (Windows) - Codepage 1252" > Save.
This should help.
This is not the best solution but its kind of a workaround. I simply set the first element to something random and since I don't need access to the first element it doesn't matter that I don't have access to it.
If anyone finds a correct way to solve this problem I'd be grateful for your solution.

Using UTF-8 encoding, CSV file with special properties/foreign characters not preserved when imported into MySQL (phpMyAdmin)

My table needs to support pretty much all characters (Japanese, Danish, Russian, etc.)
However, while saving the 2-columned table as CSV from Excel with UTF-8 encoding, then importing it with phpMyAdmin with UTF-8 encoding selected, a lot of the original characters go missing (the ones with special properties such as umlauts, accents, etc.) Also, anything following problematic characters is removed entirely. I haven't the slightest idea what is causing this problem.
EDIT: For those that come upon the same issue, I'd suggest opening your CSV file in Notepad++ and going to "Encoding > Convert to UTF-8" (not "Encode in UTF-8") first. Then import it. It will surely work.
I found an answer here:
https://help.salesforce.com/apex/HTViewSolution?id=000003837
Bascially save as a unicode text file from excel,
then replace all tabs with commas in code friendly text editor,
re-save as utf8
change file from .txt to .csv
exporting directly from excel to .csv causes problems with Japanese, this is why I went searching for help...

Bengali-language text not displayed in Unicode CSV file

I have an Excel file in the Bengali language. To display the Bengali text properly I need Bengali fonts installed on the PC.
I converted the Excel file into CSV using Office 2010. But it only shows '?' marks instead of the Bengali characters. Then I used the Google Docs for the conversion, with the same problem, but with unreadable characters rather than '?'s. I pasted extracts from that file in an HTML file and tried to view it in my browser unsuccesfully.
What should I do to get a CSV file from an .xlsx file in Bengali so that I can import that into a MySQL database?
Edit: The answer accepted in this SO question made me go to Google Docs.
According to the answers to the question Excel to CSV with UTF8 encoding, Google Docs should save CSV properly, contrary to Excel, which destroys all characters that are not representable in the “ANSI” encoding being used. But maybe they changed this, or something wrong, or the analysis of the situation is incorrect.
For properly encoded Bangla (Bengali) processed in MS Office programs, there should be no need for any “Bangla fonts”, since the Arial Unicode MS font (shipped with Office) contains the Bangla characters. So is the data actually in some nonstandard encoding that relies on a specially encoded font? In that case, it should first be converted to Unicode, though possibly it can be somehow managed using programs that consistently use that specific font.
In Excel, when using Save As, you can select “Unicode text (*.txt)”. It saves the data as TSV (tab-separated values) in UTF-16 encoding. You may then need to convert it to use comma as separator instead of tab, and/or from UTF-16 to UTF-8. But this only works if the original data is properly encoded.

Special characters from CSV to MySQL doesn't work?

I'm saving out a .csv file from Excel and importing it to a MySQL database (with phpMyAdmin 2.6.4-pl3).
A few fields have trademark symbols. but show up as "ª". I thought it was something to do with the encoding of the fields form the database, but I have changed them and found no difference. UTF-8 at least shows the small 'a,' while others I have tried just convert it to a '?'. If I leave it at UTF-8 and manually go in after importing the .csv to change the 'ª' to '™' it works fine, but since I have about 150 products that would take forever.
I think the issue is that Excel does not export the .csv file as UTF-8, so the character gets lost. I am exporting this information to a PDF so I cannot use any standard web workarounds like I have seen on other posts.
Any ideas on a way to fix this? Thanks.
MySQL allows the specification of the encoding for each database. Either change the database's encoding to something useful, like UTF-8, or convert your input data to the current database encoding.
Use Open office SpreadSheet to import data into sql instead of Excel and CSV / txt file.
You can convert Excel or CSV into open office spreadsheet and import in phpMyAdmin