PHP: creating CSV file with windows encoding - csv

i am creating csv files with php. To write the data into my csv file, i use the php function "fputcsv".
this is the issue:
i can open the created file normally with Excel. But i cant import the file into a shopsystem (in this case "shopware"). It says something like "the data could not be read".
And now comes the clue:
If i open the created file and choose "save as" and select "CSV (comma delimited)" in type, this file can be imported into shopware. I read something about the php function "mb_convert_encoding" which i used to encode the data, but it could not fix the problem.
I will be very glad if you can help me.
thanks.

Thanks for your input.
I solved this problem by replacing fputcsv with fwrite. Then i just needed to add "\r\n" (thanks wmil) to the end of the line and the generated file can be read by shopware.
Obviously the fputcsv function uses \n and not \r\n as EOL character.

I think you cannot set the encode using fputcsv. However fputcsv looks to the locale setting, wich you can change with setlocale.
Maybe you could send your file directly to the users browser and use changing contenttype and charset with header function.

This can't be answered without knowing more about your system. Most likely it has nothing to do with character encoding. It's probably a problem with wrong number of columns or column headers being incorrect.
If it is a character encoding issue, your best bet is:
$new_str = mb_convert_encoding($str, 'Windows-1252', 'auto');
Also end newlines with \r\n, not just \n.
If that doesn't work you'll need to check the software docs.

Related

Golang CSV read : extraneous " in field error

I am using a simple program to read CSV file, somehow I noticed when I created a CSV using EXCEL or windows based computer go library fails to read it. even when I use cat command it only shows me last line on the terminal. It always results in this error extraneous " in field.
I researched somewhat than I found it is somewhat related to carriage return differences between OS.
But I really want to ask how to make a generic csv reader. I tried reading the same csv using pandas and it was reading successfully. But i am not been able to achieve this using my Go code.
Also screen shot of correct csv Is here
Your file clearly shows that you've got an extra quote at the end of the content. While programs like pandas may be fine with that, I assume it's not valid csv so go does return an error.
Quick example of what's wrong with your data: https://play.golang.org/p/KBikSc1nzD
Update: After your update and a little bit of searching, I have to apoligize, the carriage return does matter and seems to be tha main culprit here, Go seems to be ok handling the \r\n windows variant but not the \r one. In that case what you can do is wrap the bytes.Reader into a custom reader that replaces the \r byte with the \n byte.
Here's an example: https://play.golang.org/p/vNjzwAHmtg
Please note, that the example is just that, an example, it's not handling all the possible cases where \r might be a legit byte.

When COPYing to Redshift, how to deal with special characters in a CSV?

I am using a COPY with ACCEPTINVCHARS to load a CSV into Amazon Redshift.
Unfortunately I get errors like
Missing newline: Unexpected character 0x69 found at location 129
However, if I try to use the ESCAPE option as well, I get the exception
CSV is not compatible with ESCAPE
What am I supposed to do in order to COPY this into Redshift? I'm fine if the chars get replaced with ? or whatever.
Ignore the header as the headers might not be of the same datatype as your fields.
Use IGNOREHEADER AS
Refer to the forum for more details,
https://forums.aws.amazon.com/thread.jspa?messageID=557452
For future generations, "CSV is not compatible with ESCAPE" is probably right but you don't actually need the CSV keyword to load CSV, so it's worth trying to remove the CSV keyword from your copy command.

how to use ascii character for quote in COPY in cqlsh

I am uploading data from a a big .csv file into Cassandra using copy in cqlsh.
I am using cassandra 1.2 and CQL 3.0.
However since " is part of my data I have to use some other character for uploading my data, I need to use any extended ASCII characters. I tried various approaches but fails.
The following works, but need to use an extended ascii characters for my purpose..
copy (<columnnames>) from <filename> where deleimiter='|' and quote = '"';
copy (<columnnames>) from <filename> where deleimiter='|' and quote = '~';
When I give quote='ß', I get the error below:
:"quotechar" must be an 1-character string
Pls advice on how I can use an extended ASCII character for quote parameter..
Thanks in advance
A note on the COPY documentation page suggests that for bulk loading (like in your case), the json2sstable utility should be used. You can then load the sstables to your cluster using sstableloader. So I suggest that you write a script/program to convert your CSV to JSON and use these tools for your big CSV. JSON will not have any problem handling all characters from ASCII table.
I had a similar problem, and inspected the source code of cqlsh (it's a python script). In my case, I was generating the csv with python, so it was a matter of finding the right python csv parameters.
Here's the key information from cqlsh:
csv_dialect_defaults = dict(delimiter=',', doublequote=False,
escapechar='\\', quotechar='"')
So if you are lucky enough to generate your .csv file from python, it's just a matter of using the csv module with:
writer = csv.writer(open("output.csv", 'w'), **csv_dialect_defaults)
Hope this helps, even if you are not using python.

Changing The Delimiter to CTRL+A in Python CSV Module

I'm trying to write a csv file with the delimiter ctrl+a. I'm going to have to eventually write the file to hadoop and I'm unable to use a standard delimiter.
Currently I'm trying this:
writer = csv.writer(f, delimiter = "\u0001")
for item in aList:
writer.writerow(item)
f.close()
However, the outputted excel file doesn't appear to be written correctly...
Some rows are condensed into one block, while others will have one field in the first and then the rest condensed into the second block, etc.
Is the error where I'm setting up the writer object, or am I just not familiar with separating files this way?
You can try using the nonprinting "group separator" character, which can be represented in python code as '\035'
see http://www.asciitable.com/index/asciifull.gif for some other nonprinting characters if you need more.
It may be helpful to include more context about why you want to use nonstandard delimiter. And whether Excel parsing of the file is necessary, or just a quick check to see if the file might be parsed properly by the target system, Hadoop.

Using EmEditor saving a Unicode file to another format distorts/changes the format. Solution?

There is a MySQL backup file which is a huge file - about 3 GB. There is one table that has a LONGBLOB column that stores JPEG image data.
The file imports successfully if done from MySQL Workbench - Data Import/Restore.
I need to open this file and extract the first few lines (about two rows of INSERTs of the table with the image data) so that I can test if another program can import this data into another MySQL database.
I tried opening the file with EmEditor (which is good at opening large files) and then copy/paste only upto one Insert statement of the script into a new file (upto about line 25, because the table in question is the first table in the backup script), and then Paste the selection into a new file.
Here comes the problem:
However this messes up the encoding (even though I save as utf8). I realize this when I try to import (restore) this new file (again using MySQL Workbench) into a MySQL database, the restore goes ahead without errors, but the JPEG images in the blob column are now destroyed/corrupted.
My guess is that the encoding is different between the original file and new file.
EmEditor does not show the encoding on the original file, there is an option to detect, and it detects it as 'UTF8 Unsigned'. But when saving I save it as UTF8. I tried also saving as ANSI, ISO8859 (windows default), etc, etc.. but everytime the same result.
Do you have any solution for this particular problem? ie I want to only cut the first few lines of the huge backup file and save to a new file keeping the encoding the same, so that the images (blobs) are not changed. Is there any way this can be done with EmEditor (ie do I have the wrong approach [ie Cut-Paste]?) Is there any specialized software that can do this? How can I diagnose what is going wrong here?
Thanks for any responses.
this messes up the encoding (even though I save as utf8)
UTF-8 is not a good choice for arbitrary binary data. There are many sequences of high-bytes which are not valid in UTF-8, so you will mangle them at some point during the load-alter-save process.
If you load the file using an encoding that maps every single byte to a unique character, and re-save the file using that same encoding, you should preserve the original content(*). ISO-8859-1 is the encoding usually chosen for this purpose, since it simply maps each byte 0..0xFF to the Unicode code point with the same number.
(*: assuming the editor is binary-safe with regard to other tricky points like nulls, \n/\r and other control characters... I believe EmEditor can be.)
When opening the original file in EmEditor, trying selecting the encoding as Binary (ASCII View). The Binary (ASCII View) will, as bobince said, map each byte to a unique character and preserve that when you save the file. I think this should fix your problem.