Using NetLogo's csv extension to read special characters - csv

Working with data in Guinea there are administrative boundaries with special characters, specifically:
Guéckédou
The CSV was apparently created/edited in Mac and Notepad++ detects it as ANSI and displays correctly on my Windows machine. It's working fine in Excel too but in NetLogo, for example when printing in the Command Center, it's:
Gu�ck�dou
Creating a CSV in notepad++ where UTF-8 is enforced works with the csv extension.
Not sure if it's related but a somewhat similar problem seems to be when exporting the world to csv and opening in Excel- it gives the following for a perfectly fine string that was added as attribute to patches using the gis extension:
Guéckédou
Is there a way to consume that CSV other than converting it to UTF-8?

Related

Freemarker CSV generation - CSV with Chinese text truncates the csv contents

I have this very weird problem. I'm using Java 8, Struts2 and Freemarker 2.3.23 to generate reports in csv and html file formats (via.csv.ftl and .html.ftl templates both saved in utf-8 encoding), with data coming from postgres database.
The data has chinese characters in it and when I generate the report in html format, it is fine and complete and chinese characters are displayed properly. But when the report is generated in csv, I have observed that:
If I run the app with -Dfile.encoding=UTF-8 VM option, the chinese characters are generated properly but the report is incomplete (i.e. the texts are truncated specifically on the near end part)
If I run the app without -Dfile.encoding=UTF-8 VM option, the chinese characters are displayed in question marks (?????) but the report is complete
Also, the app uses StringWriter to write the data to the csv and html templates.
So, what could be the problem? Am I hitting Java character limits? I do not see error in the logs either. Appreciate your help. Thanks in advance.
UPDATE:
The StringWriter returns the data in whole, however when writing the data to the OutputStream, this is where some of the data gets lost.
ANOTHER UPDATE:
Looks like the issue is on contentLength (because the app is a webapp and csv is generated as file-download type) being generated from the data as Strings using String.length(). The String.length() method returns less value when there should be more. Maybe it has something to do with the chinese characters that's why length is being reported with less value.
I was able to resolve the issue with contentLength by using String.getBytes("UTF-8").length

Special characters in CSV (utf-8) file appear as ? on new laptop but not on old one (both with Excel 2016)

I regularly export CSV files from Shopware and edit them in Excel (Windows 10 + Office 2016). The special symbols appear garbled (e.g. –) but I can correct that with a "find-and-replace" macro. Annoying but workable.
However, I just got a new laptop also with Windows 10 + Office 2016 but there, the special symbols appear as white question marks on black diamonds (��). When I open the same files on the old PC I still get the good old garbled (but fixable) special symbols.
I have checked every setting I can think of but cannot find any difference between the 2 PCs. Does anyone have an idea what could be causing this and how to fix it?
Thanks!
The "garbled characters" in the old laptop are UTF-8-encoded file data decoded as (probably) Windows-1252 encoding. It seems like the new laptop is using a different default encoding.
If you export your CSV files as UTF-8 w/ BOM and Excel will display them properly without "find-and-replace". If Shopware doesn't have the option to export as UTF-8 w/ BOM, you can use an editor like NotePad++ to load the UTF-8-encoded CSV and re-save it as UTF-8 w/ BOM.
The UTF-16 encoding should also work if that is an option for export.
The culprit was an optional beta setting under Control panel / Clock and Region / Administrative / Change System locale => Beta: Use Unicode UTF-8 for worldwide language support. Once I unchecked the box, the �� disappeared and everything was back to normal.
The next part of the solution is to open the CSV files with a text editor, e.g. Notepad, and save them with UTF-8 w/ BOM encoding. After doing that, the special characters appear correctly in Excel, eliminating the need for "find and replace".
Big thanks to Mark Tolonen + Skomisa for pointing me in the right direction.

Thai characters not shown correctly while spooling csv using sqlplus and opening in ms-excel?

I am spooling csv from sqlplus database in unix os have some Thai characters. After spooling when i am opening csv using ms-excel it is giving me some different encoding characters. The spooled file is UTF-8 format.While opening file using open text/csv and change encoding to 65001(Utf-8) it is working fine.I also added BOM(Byte of mark) character but it didn't work fine.Please suggest any solution to open csv directly in excel without using open text/csv to get correct output with thai characters.

CodedUI test does not read data from CSV input file

I am having difficulty mapping a CSV file with the Coded UI test method. This is most likely a stupid question but I cannot seem to find a solution for my problem, at least not one that works. I have made sure to set the property of the CSV file to Copy always.
I have also imported the CSV file by writing the following line above the test method.
[DataSource("Microsoft.VisualStudio.TestTools.DataSource.CSV", "|DataDirectory|\\Data\\login.csv", "login#csv", DataAccessMethod.Sequential), DeploymentItem("login.csv"), TestMethod]
The file name is login.csv and it resides in the Data folder.
The test will compile without any problem but once the test executes the fields that should receive input from the CSV file are left empty and the execution is interrupted. I've tried replacing the data from the CSV file by using Strings and it works perfectly fine. The piece of code I am using to import each parameter is:
TestContext.DataRow["Username"].ToString()
Also, the CSV file contains something along the following lines:
Username,Password,Fullname
admin#mail.com,password,Admin
Is there anyone who can point what it is I am forgetting.
Update: I pinpointed the issue, it seems like the issue only revolves around the first column in the csv file. When I try to import any of the other values it works perfectly fine.
Some text files start with a Byte Order Mark (BOM). The CSV reader within Coded UI does not handle the BOM and treats it as part of the first field name. The screen shot below shows the debug trace of a CSV file with a BOM and that same file shown in Notepad++. The DataRow.ItemArray[...] values are as expected. The DataRow.Table.Columns.ResultsView[...] shows the field names, but the first field name includes the BOM.
This CSV file with a BOM was created in Visual Studio using Solution Explorer => Add => New item => C# => General => Text file. Previously I have created a spread sheet with Microsoft Excel and saved it as a CSV file, that file did not have a BOM. I have also created files with Notepad++ and saved as CSV and they did not have a BOM. It appears that Visual Studio creates files with a BOM but when editing CSV files it does not add a BOM.
Visual Studio can create files with the correct encoding. Within "Step 2 - Create a data set" of this Microsoft page it states the text below. (Thanks also to Holistic Developer for providing very similar details in a comment.):
It is important to save the .csv file using the correct encoding. On the FILE menu, choose Advanced Save Options and choose Unicode
(UTF-8 without signature) – Codepage 65001 as the encoding.
For Visaul Studio 2010, i could solve issue be selecting "Western European (Windows) - Codepage 1252" encoding for CSV files.
Summary of steps:
In visual studio 2010, Open CSV file > Go to File menu > Select " Advanced Save Options" > Select "Western European (Windows) - Codepage 1252" > Save.
This should help.
This is not the best solution but its kind of a workaround. I simply set the first element to something random and since I don't need access to the first element it doesn't matter that I don't have access to it.
If anyone finds a correct way to solve this problem I'd be grateful for your solution.

Bengali-language text not displayed in Unicode CSV file

I have an Excel file in the Bengali language. To display the Bengali text properly I need Bengali fonts installed on the PC.
I converted the Excel file into CSV using Office 2010. But it only shows '?' marks instead of the Bengali characters. Then I used the Google Docs for the conversion, with the same problem, but with unreadable characters rather than '?'s. I pasted extracts from that file in an HTML file and tried to view it in my browser unsuccesfully.
What should I do to get a CSV file from an .xlsx file in Bengali so that I can import that into a MySQL database?
Edit: The answer accepted in this SO question made me go to Google Docs.
According to the answers to the question Excel to CSV with UTF8 encoding, Google Docs should save CSV properly, contrary to Excel, which destroys all characters that are not representable in the “ANSI” encoding being used. But maybe they changed this, or something wrong, or the analysis of the situation is incorrect.
For properly encoded Bangla (Bengali) processed in MS Office programs, there should be no need for any “Bangla fonts”, since the Arial Unicode MS font (shipped with Office) contains the Bangla characters. So is the data actually in some nonstandard encoding that relies on a specially encoded font? In that case, it should first be converted to Unicode, though possibly it can be somehow managed using programs that consistently use that specific font.
In Excel, when using Save As, you can select “Unicode text (*.txt)”. It saves the data as TSV (tab-separated values) in UTF-16 encoding. You may then need to convert it to use comma as separator instead of tab, and/or from UTF-16 to UTF-8. But this only works if the original data is properly encoded.