Problems importing excel data into MySQL via CSV - mysql

I have 12 excel files, each one with lots of data organized in 2 fields (columns): id and text.
Each excel file uses a diferent language for the text field: spanish, italian, french, english, german, arabic, japanese, rusian, korean, chinese, japanese and portuguese.
The id field is a combination of letters and numbers.
I need to import every excel into a different MySQL table, so one table per language.
I'm trying to do it the following way:
- Save the excel as a CSV file
- Import that CSV in phpMyAdmin
The problem is that I'm getting all sorts of problems and I can't get to import them properly, probably because of codification issues.
For example, with the Arabic one, I set everything to UTF-8 (the database table field and the CSV file), but when I do the import, I get weird characters instead of the normal arabic ones (if I manually copy them, they show fine).
Other problems I'm getting are that some texts have commas, and since the CSV file uses also commas to separate fields, in texts that are imported are truncated whenever there's a comma.
Other problems are that, when saving as CSV, the characters get messed up (like the chinese one), and I can't find an option to tell excel what encoding I want to use in the CSV file.
Is there any "protocol" or "rule" that I can follow to make sure that I do it the right way? Something that works for each different language? I'm trying to pay attention to the character encoding, but even with that I still get weird stuff.
Maybe I should try a different method instead of CSV files?
Any advice would be much appreciated.

OK, how do I solved all my issues? FORGET ABOUT EXCEL!!!
I uploaded the excels to Googledocs spreadsheets, downloaded them as CSV, and all the characters were perfect.
Then I just imported into their corresponding fields of the tables, using a "utf_general_ci" collation, and now everything is uploaded perfectly in the database.

One standard thing to do in a CSV is to enclose fields containing commas with double quotes. So
ABC, johnny cant't come out, can he?, newfield
becomes
ABC, "johnny cant't come out, can he?", newfield
I believe Excel does this if you choose to save as file type CSV. A problem you'll have is that CSV is ANSI-only. I think you need to use the "Unicode Text" save-as option and live with the tab delimiters or convert them to commas. The Unicode text option also quotes comma-containing values. (checked using Excel 2007)
EDIT: Add specific directions
In Excel 2007 (the specifics may be different for other versions of Excel)
Choose "Save As"
In the "Save as type:" field, select "Unicode Text"
You'll get a Unicode file. UCS-2 Little Endian, specifically.

Related

Getting HTML heavy data from mySQL to excel

I've been given a mySQL database from a custom-coded CMS and I need to get its data into a CSV file for importing into Excel for further futzing.
The problem is that the data in the database has a lot of HTML code in it (<p class="foo"> and that type of thing), so exporting as a CSV gets screwed up as some of the text has commas and other control characters in it.
Looked at all the export options via phpMyAdmin but couldn't really find anything that would work.
How can I get this into Excel?
Try tu use MySQL Workbench and there's excel xml export format, it should work also with html inside.
Actually, by searching here I found the answer tho had to take a bit of a different approach.
I used \t as the separator (tab character), Columns enclosed with ~, Columns escaped by \, put column names in the first row.
I had to use LibreOffice to import the sheet as Excel didn't have the flexibility needed. Got it working now!

import csv utf8 to phpmyadmin [bug]

I am using phpmyadmin(Version : 4.6.4) as a platform to import CSV encoded with UTF-8 to database. I am able to import the data, but with no idea why is the first two character of at first-column first-row went missing and this happens every time i import a CSV.
raw: A1610011111-001,N,N,N,N,N,N,N,N,N,N,N,N,N,--
This is what the data supposed to be -> (A1)
CSV data
This is the imported data (A1 went missing)->imported data
If the data is more than one row, the result will be same(only 1st two character went missing)
I am not sure what is the problem and what is the solution. Please give me a hand on this.
Well for anyone still searching for an answer, this is what worked for me after numerous tries.
In Excel (365), you can chose between
save as csv UTF-8 (sperated by comma's)
save as csv (seperated by delimiter)
As contradictory as it may seem, when I use the first option, I lose my first 2 characters, whereas choosing the 2nd. option I lose nothing.
So saving without the UTF-8 seems to do the trick.

unicode flat file can't find CRLF row delimiter

I have a csv file that contains unicode characters (specifically Hindi characters) that I need to import using SSIS. When I set the connection to unicode, SSIS cannot find the CRLF delimiters. If I uncheck the unicode check box, it finds the CRLF delimiters just fine.
How can I correctly import this data?
Rather than ticking the "Unicode" checkbox beside the "Locale" drop-down, leave the checkbox blank and pick "65001 (UTF-8)" option from the "Locale" drop-down instead. I discovered it just today after wasting some 30 minutes trying various combinations of encodings etc. May work in your case as well.

SSIS package for export data into csv file to FTP

I'm creating SSIS package for to get .csv file to my local server and transfer it to FTP
When I get my csv into FTP and open into excel, My data getting shift over to other columns. Is there internally any kind set up do I need to change?
Also I tried different text qualifier still did not work.
It sounds like there may be hidden characters in your data set. If you are using comma's you may want to consider using a lesser used character for the delimiter such as a pipe "|". For instance an address may naturally have comma's. If a pipe shows up in an address field it's probably a type-o, and is far less likely. Things that shift data cells are often things like tab characters and CRLF. You can also open your data set in a text editor like notepad ++ and choose the "Show all Characters" option under "View->Show Symbols" menu option to see what the exact character is. If it's rampant in your data set you can use the replace function within the Derived Column Task to scrub the data as it comes out of the data source.

How can I quickly reformat a CSV file into SQL format in Vim?

I have a CSV file that I need to format (i.e., turn into) a SQL file for ingestion into MySQL. I am looking for a way to add the text delimiters (single quote) to the text, but not to the numbers, booleans, etc. I am finding it difficult because some of the text that I need to enclose in single quotes have commas themselves, making it difficult to key in to the commas for search and replace. Here is an example line I am working with:
1239,1998-08-26,'Severe Storm(s)','Texas,Val Verde,"DEL RIO, PARKS",'No',25,"412,007.74"
This is FEMA data file, with 131246 lines, I got off of data.gov that I am trying to get into a MySQL database. As you can see, I need to insert a single quote after Texas and before Val Verde, so I tried:
s/,/','/3
But that only replaced the first occurrence of the comma on the first three lines of the file. Once I get past that, I will need to find a way to deal with "DEL RIO, PARKS", as that has a comma that I do not want to place a single quote around.
So, is there a "nice" way to manipulate this data to get it from plain CSV to a proper SQL format?
Thanks
CSV files are notoriously dicey to parse. Different programs export CSV in different ways, possibly including strangeness like embedding new lines within a quoted field or different ways of representing quotes within a quoted field. You're better off using a tool specifically suited to parsing CSV -- perl, python, ruby and java all have CSV parsing libraries, or there are command line programs such as csvtool or ffe.
If you use a scripting language's CSV library, you may also be able to leverage the language's SQL import as well. That's overkill for a one-off, but if you're importing a lot of data this way, or if you're transforming data, it may be worthwhile.
I think that I would also want to do some troubleshooting to find out why the CSV import into MYSql failed.
I would take an approach like this:
:%s/,\("[^"]*"\|[^,"]*\)/,'\1'/g
:%s/^\("[^"]*"\|[^,"]*\)/'\1'/g
In words, look for a double quoted set of characters or , \|, a non-double quoted set of characters beginning with a comma and replace the set of characters in a single quotation.
Next, for the first column in a row, look for a double quoted set of characters or , \|, a non-double quoted set of characters beginning with a comma and replace the set of characters in a single quotation.
Try the csv plugin. It allows to convert the data into other formats. The help includes an example, how to convert the data for importing it into a database
Just to bring this to a close, I ended up using #Eric Andres idea, which was the MySQL load data option:
LOAD DATA LOCAL INFILE '/path/to/file.csv'
INTO TABLE MYTABLE FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n';
The initial .csv file still took a little massaging, but not as much as I were to do it by hand.
When I commented that the LOAD DATA had truncated my file, I was incorrect. I was treating the file as a typical .sql file and assumed the "ID" column I had added would auto-increment. This turned out to not be the case. I had to create a quick script that prepended an ID to the front of each line. After that, the LOAD DATA command worked for all lines in my file. In other words, all data has to be in place within the file to load before the load, or the load will not work.
Thanks again to all who replied, and #Eric Andres for his idea, which I ultimately used.