Verifying and Formatting Text before importing it to MySql - mysql

I have an issue importing some CSV/TXT files.
Here at the company we receive files from other sources (companies). Some of these files sometimes come partially broken.
For example, a file containing 6 columns (id, name, city, state, zipCode, phone) and 2 million lines. The first 10.000 lines of that file are OK. But in the middle of the file instead of 6 columns, it has 5 or even 7 columns.
It seems like somebody "merged" several files into this one and did not pay attention to the number of columns. So when I import it to my MySql database table, the data comes very messy due to the columns being broken. The zipCode records show up on the field state and so on.
I was wondering how to scan such file before importing it to my DB, something like counting the ";" delimiters of each line. Would it be done using Regex or what would be the best option for that?
My program is written in Lazarus/Pascal.

I would read the file line by line and check the columns.
If a line respects the expected columns (count, , copy it in another file (input_OK.csv).
If it doesn't dump it in a broken lines file (input_KO.csv).
Study input_KO.csv errors, correct them then import the corrected file into the database.
IMO, a regex will take long here.

Related

MySQL - importing a CSV file

I am trying to import a large excel file containing text, values and links to websites etc into MySQL.
I have saved my .xlsx file as a .csv file and am trying to import it using the "table data import" tool. When I reach the screen where it asks for the encoding I am currently seeing all the correct column headings, but only five out of 15 of my rows. I think this must be due to an unrecogonized character being present in the 6th row. However, I do not know how to find what this character might be.
Also, if I select different encodings, then my 6th row appears but not the rest.
So, if anyone can help me to work out which characters are causing this error, I would be very grateful.
Thanks,
Sarah

Importing ODS to phpmyadmin error 1117: too many columns

so I am currently testing one web application, and for that I need to import an excel file to phpmyadmin.
I need to import the file as an *.ods. To do that, I know I need to rename the file so that it matches the table name, and set values in first row to match columns. However, whenever I try to import the file, I get an error 1117: too many columns, listing all the unecessary empty columns in my ods file (F,G,H,I,J....).
Is there any way to remove those columns, or have them be ignored?
A lot of things can wrong when you're importing a spreadsheet. If your boss highlighted row 70,000 the color "invisible" (yes kids, that's a color now), the row will stretch into infinity and give a too many columns error. Save as csv and you delete all that mess, but then you have to make sure your delimiters are nice and neat or your fields will wander into their neighbor's columns.

Access Import CSV file

I'm trying to import a CSV file into Access. However the way the CSV file is formatted, it creates three lines for the same transaction. Is it possible to tell Access that every three lines belong to the same transaction when creating or appending the table? VBA is fine, if that's the only way possible.
Currently during the import for one transaction, Access creates 8 fields for the first line, 29 fields for the second line, and 8 more for the third line. I would either prefer having the 48 fields for the one transaction or telling access I only need certain fields. For example, I need only field 2 of the first line, field 11 of the second line and field3 of the third line for each transaction.
Thanks in advance for your help.
Not aware of any way of doing that within access. However CSV is a very specific format and the way yours is laid out is incorrect according to CSV standards. It should be a single record to a line.
Use an advanced text editor such as Sublime or Notepad++ to edit the CSV file such that each newline character actually represents the end of a record.

phpmyadmin export CSV to excel drops data

I'm using MySQL with XAMPP and using phpmyadmin to extract data from tables. If I choose to export to CSV, the data appears to be fine. But when I select to export using the "CSV for MS Excel" option, I'm losing some data in the export file. Settings are the same in both cases.
Specifically, if a field has a comma in it, it appears that at least sometimes the data after the comma is dropped. Note that the comma is contained in quotes with other text in the standard CSV format, so the comma should not be seen as a field delimiter. The data after the comma within the field is dropped, but in addition, data in fields that follow the field with the comma are also dropped, but not necessarily for the entire record.
So, let's say record 2 has a comma in a text-based field in column C, such as "big spender, nice guy." What goes into Column C in Excel is "big spender" with ", nice guy" being dropped. In addition, Columns D, E, F and G may also lose their data. But in some cases it appears that later columns (perhaps H, I, J and K) may have the correct data in them. I'm not suggesting it always loses data for any specific number of columns, just that some number seem to lose data but at times later columns start having data again in the correct column.
I can't see a clear pattern to what gets dropped and what doesn't, just that what I describe above happened yesterday in a data set I'm using. Note I can see the complete data in the SQL table, and if I use the straight CSV export, it appears that no data is lost.
Could this be a bug? I've searched for known bugs and found none. FYI, I'm using Excel in Office 2007 on a Windows 7 machine. The original data source is SugarCRM.
Thanks so much.
Open up the CSV file phpmyadmin made for you with a text editor, not with Excel. Find the offending row (the one with big spender, nice guy in it). Look to see whether it looks like this
"whatever","whatever","big spender, nice guy", 123, 456
or
whatever,whatever,big spender, nice guy, 123, 456
If it's the second one your columns aren't delimited properly. CSV is deceptively hard to get right because of this, and because of the possibility of this kind of text string:
Joe said, "O'Meara is a big spender and a nice guy!"
You may wish to try exporting your data in a tab-delimited rather than comma-delimited file to overcome this. You can do this by specifying ordinary, not Excel-style, CSV. Then specify
\t
where it ask you for "Columns separated with:".
Excel will be able to figure this out as it reads it.

process csv File with multiple tables in SSIS

i trying to figure out if its possible to pre-process a CSV file in SSIS before importing the Data into SQL.
I currently receive a file that contains 8 tables with different structures in one flat file.
the Tables are identified by a row with the Table name in it encapsulated by Square Brackets i.e. [DOL_PROD]
the the data is underneath in standard CSV format. Headers first and then the data.
the tables are split by a blank line and the process repeats for the next 7 tables.
[DOL_CONSUME]
TP Ref,Item Code,Description,Qty,Serial,Consume_Ref
12345,abc,xxxxxxxxx,4,123456789,abc
[DOL_ENGPD]
TP Ref,EquipLoc,BackClyLoc,EngineerCom,Changed,NewName
is it posible to split it out into seperate CSV files? or Process it in a loop?
i would really like to be able to perform this all with SSIS automatically.
Kind Regards,
Adam
You can't do that by flat file source and connection manager alone.
There are two ways to achieve your goal:
You can use Script Component as source of the rows and to process the files, and then you'd do whatever you want with a file programatically.
The other way, is to read your flat file treating every row as a single column (i.e. without specifying delimiter), and then, via Data Flow Transformations, you'd be splitting rows, recognizing table names, splitting flows and so on.
I'd strongly advise you to use Script Component, even if you'd have to learn .NET first, because the second option will be a nightmare :). I'd use Flat File Source to extract lines from file as single column, and thet work it in Script Component, rather than reading a "raw" file directly.
Here's a resource that should get you started: http://furrukhbaig.wordpress.com/2012/02/28/processing-large-poorly-formatted-text-file-with-ssis-9/