MySQL LOAD DATA - Loading a text file with ColumnName=Value format - mysql

I've started learning SQL over the past few days, but am stuck while attempting to get my data into the table.
The data's stored in a text file with the format:
ColumnName1=SomeInteger
ColumnName2=SomeInteger
ColumnName3=SomeString
... etc
So far I've managed to create a table (which has about 150 Columns, that I'm hoping to split up and group seperately once I know more) by stripping the =SomeValue in Python. Then wrapping the column names with CREATE TABLE in a spreadsheet. A bit messy, but it works for now.
Now I'm stuck at the following point:
LOAD DATA INFILE 'path/to/file.txt'
INTO TABLE tableName
COLUMNS TERMINATED BY '\n'
LINES STARTING BY '=';
I'm trying to get SQL to insert the data into the column names specified (incase they're not always in the same order), ignore the equals sign, and use the unique filename as my index.
I've also tried escaping the equals character with '\=', because the MySQL documentation mentions that everything before the LINES STARTING BY parameter should be ignored. Typing LINES STARTING BY 'ColumnName1=' manages to ignore the first instance, but it's not exactly what I want, and doesn't work for the remaining lines.
I'm not averse to reading more documentation or tutorials, if someone could point me in the right direction.
edit: Rows are delimited like so: I've been given about 100,000 ini files. Each of which is named FirstName_LastName.ini (uniqueness is guaranteed), and each row of data is contained within the ini files. I need to bring the archaic method of account storage into the 21st century.
MySQL's LOAD DATA is rumored to be especially fast for this type of task, which is why I began looking into it as an option. I was just wondering if it's possible to manipulate it to work with data in my format, or if I'm better off putting all 100k files through a parser. I'm still open to suggestions that use SQL if there's any magicians reading this.
p.s: If anyone has better ideas for how to get my data (from this text format) into individual tables, I'd love to hear them too.

Personally, I would probably do the whole thing in python, using the MySQLdb module (probably available in a package named something like python-mysqldb or MySQL-python in your favorite distribution). Format your data into a list of tuples and then insert it. Example from http://mysql-python.sourceforge.net/MySQLdb.html:
import MySQLdb
datalist = [("Spam and Sausage Lover's Plate", 5, 1, 8, 7.95 ),
("Not So Much Spam Plate", 3, 2, 0, 3.95 ),
("Don't Wany ANY SPAM! Plate", 0, 4, 3, 5.95 )]
db = MySQLdb.connect(user='dude', passwd='foo', db='mydatabase')
c = db.cursor()
c.executemany(
"""INSERT INTO breakfast (name, spam, eggs, sausage, price)
VALUES (%s, %s, %s, %s, %s)""",
datalist)

Related

Issue with importing column with null values from Excel to MYSQL Workbench

I have a MYSQL database that I manage using MYSQL Workbench. I am trying to import data from Excel (converted to CSV, of course) into an existing table, which I do quite often and generally without problem. Technically from MYSQL's perspective, MYSQL Workbench treats this by generating a series of "Insert into" statements, not as an import statement.
I'm having trouble because the particular set of data that I'm importing has no values in the last column. In the database, the last column in the table expects a 2-digit number (or NULL). When I import the data and MYSQL Workbench turns it into "Insert into" statements, I'd expect the value inserted in the last column to be NULL. Instead, I keep getting a 1644 error for each and every row that it is trying to insert.
Here's what I've tried that hasn't worked:
- Leaving the last column blank
- Filling the column with the word NULL
- Filling the column with the characters \N
- Adding one fake row (to be deleted later) that actually has a value in the last column so that it can realize there's a column there
I'm out of ideas. Ideally I would like to stick to my process of using MYSQL Workbench's "import" (i.e. Insert into) feature so that I can add blocks of data from Excel, and I can't be manually editing each line in the database.
EXAMPLE (when I just leave the last column blank)
Here's what a row of data looks like in the CSV:
4209,Reading,2015,1. Fall,10/12/15,114,212,3.4,93
Here's what the auto-generated "Insert into" looks like:
INSERT INTO Tests (TestID, Subject, Year, Season, TestStartDate, Duration, RIT, StdError, Percentile, AccuratePercentile) VALUES ('4209', 'Reading', '2015', '1. Fall', '10/12/15', '114', '212', '3.4', '93', '');
Here's what the error looks like (I get one of these for each "Insert into")
ERROR 1644: 1644: Error: Trying to insert an incorrect value in Percentile/Duration/RIT/AccuratePercentile; TestID: 4209
SQL Statement:
INSERT INTO Tests (TestID, Subject, Year, Season, TestStartDate, Duration, RIT, StdError, Percentile, AccuratePercentile) VALUES ('4209', 'Reading', '2015', '1. Fall', '10/12/15', '114', '212', '3.4', '93', '')
The way you are doing the import (via the resultset import) is merely designed for simple cases. You cannot configure much there, but it requires only very few steps.
For a more powerful import try the relatively new Table Import (use the context menu in the schema tree on a schema or table node). This allows to import CSV + JSON data and you can configure details much more (file encoding, columns to import, data types etc.). The table import doesn't use INSERT statements but the LOAD DATA command and hence is much faster too.

insert fields from database1 into table from database2

I am using prestashop and have data in zencart I am matching up information and want to select the data to be inserted into a different table under different fields.
insert into presta_table1 (c1, c2, ...)
select c1, c2, ...
from zen_table1`
Since a lot is different I need to do approximately 800 records once I match up what field is what in what table.
I recently found a example
USE datab1;INSERT INTO datab3.prestatable (author,editor)
SELECT author_name,editor_name FROM author,datab2.editor
WHERE author.editor_id = datab2.editor.editor_id;
be nice to find a way to import avoiding duplicates
I am unable to find examples of this.
Here is what I did to get data out of this POS (Point of sale) system that uses mysql for a database.
I found tables with the data I needed and I exported the data out that came out in a csv format. I used calc in Libreoffice to open and then in anouther sheet manipulate the data into the example csv feilds and that worked good.
I had some issues with some of the data but I used consol commands to help me get by these let me share them with you.
Zencart description data exported from mysqlworkbench had some model numbers I needed to export out into there own field
43,1,"Black triple SS","Black Triple SS
12101-57 (7.5 inch)",,0
i used a command to add in " , " in so I could extract the data and over lay it . essential copy past into the spreadsheet Calc where I needed it in the normal order.
this sed command removes the ),( in the file and replaces with a carriage return
I did a database dump and removed starting ( and ending ) then saved as zencart_product.csv then ran this in a console
sed 's/),(/\n/g' zencart_product.csv > zencart_productNEW.csv
I had about 1000 files with $ and # in them so I put them all in a dir and renamed them with
get rid of the $ symbol
rename 's/\$//g' *
get rid of the $ symbol
rename 's/\$//g' *
get rid of space
rename "s/\s+//g" *
I hope people stuck in some software that want the data out are able to get it out with some time and effort and that this helps someone. Thanks

Skip invalid data when importing .tsv using MySQL LOAD DATA IFILE

I am trying to import a bunch of .tsv files into a MySQL database. However, in some of the files, there are errors in some of the rows (these files were generated from another system where data is manually inputted, so these errors are human errors). When I try to use LOAD DATA INFILE to import them, when the command gets to that row of bad data, the command writes NULL values for that field and then proceeds to stop the command, whereas I need it to keep going.
The bad rows look like this:
value1, value 2, value 3
bob, 3, st
john, 4, rd
dianne4ln
jack, 7, cir
I've made sure the line terminators are correct, and use Ignore and Replace parameters to no avail.
Use IGNORE in your query to skip error lines and proceed. See here and here

Unable to import 3.4GB csv into redshift because values contains free-text with commas

And so we found a 3.6GB csv that we have uploaded onto S3 and now want to import into Redshift, then do the querying and analysis from iPython.
Problem 1:
This comma delimited file contains values free text that also contains commas and this is interfering with the delimiting so can’t upload to Redshift.
When we tried opening the sample dataset in Excel, Excel surprisingly puts them into columns correctly.
Problem 2:
A column that is supposed to contain integers have some records containing alphabets to indicate some other scenario.
So, the only way to get the import through is to declare this column as varchar. But then we can do calculations later on.
Problem 3:
The datetime data type requires the date time value to be in the format YYYY-MM-DD HH:MM:SS, but the csv doesn’t contain the SS and the database is rejecting the import.
We can’t manipulate the data on a local machine because it is too big, and we can’t upload onto the cloud for computing because it is not in the correct format.
The last resort would be to scale the instance running iPython all the way up so that we can read the big csv directly from S3, but this approach doesn’t make sense as a long-term solution.
Your suggestions?
Train: https://s3-ap-southeast-1.amazonaws.com/bucketbigdataclass/stack_overflow_train.csv (3.4GB)
Train Sample: https://s3-ap-southeast-1.amazonaws.com/bucketbigdataclass/stack_overflow_train-sample.csv (133MB)
Try having different delimiter or use escape characters.
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_preparing_data.html
For second issue, if you want to extract only numbers from the column after loading into char use regexp_replace or other functions.
For third issue, you can as well load it into VARCHAR field and then use substring cast(left(column_name, 10)||' '||right(column_name, 6)||':00' as timestamp)
to load it into final table from staging table
For the first issue, you need to find out a way to differentiate between the two types of commas - the delimiter and the text commas. Once you have done that, replace the delimiters with a different delimiter and use the same as delimiter in the copy command for Redshift.
For the second issue, you need to first figure out if this column needs to be present for numerical aggregations once loaded. If yes, you need to get this data cleaned up before loading. If no, you can directly load this as char/ varchar field. All your queries will still work but you will not be able to do any aggregations (sum/ avg and the likes) on this field.
For problem 3, you can use Text(date, "yyyy-mm-dd hh:mm:ss") function in excel to do a mass replace for this field.
Let me know if this works out.

Best way to read CSV in Ruby. FasterCSV?

I have a CSV file that I want to read with Ruby and create Ruby objects to insert into a MySQL database with Active Record. What's the best way to do this? I see two clear options: FasterCSV & the Ruby core CSV. Which is better? Is there a better option that I'm missing?
EDIT: Gareth says to use FasterCSV, so what's the best way to read a CSV file using FasterCSV? Looking at the documentation, I see methods called parse, foreach, read, open... It says that foreach "is intended as the primary interface for reading CSV files." So, I guess I should use that one?
Ruby 1.9 adopted FasterCSV as its core CSV processor, so I would say it's definitely better to go for FasterCSV, even if you're still using Ruby 1.8
If you have a lot of records to import you might want to use MySQL's loader. It's going to be extremely fast.
LOAD DATA INFILE can be used to read files obtained from external sources. For example, many programs can export data in comma-separated values (CSV) format, such that lines have fields separated by commas and enclosed within double quotation marks, with an initial line of column names. If the lines in such a file are terminated by carriage return/newline pairs, the statement shown here illustrates the field- and line-handling options you would use to load the file:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
If the input values are not necessarily enclosed within quotation marks, use OPTIONALLY before the ENCLOSED BY keywords.
Use that to pull everything into a temporary table, then use ActiveRecord to run queries against it to delete records that you don't want, then copy from the temp table to your production one, then drop or truncate the temp. Or, use ActiveRecord to search the temporary table and copy the records to production, then drop or truncate the temp. You might even be able to do a table-to-table copy inside MySQL or append one table to another.
It's going to be tough to beat the speed of the dedicated loader, and using the database's query mechanism to process records in bulk. The step of turning a record in the CSV file into an object, then using the ORM to write it to the database adds a lot of extra overhead, so unless you have some super difficult validations requiring Ruby's logic, you'll be faster going straight to the database.
EDIT: Here's a simple CSV header to DB column mapper example:
require "csv"
data = <<EOT
header1, header2, header 3
1, 2, 3
2, 2, 3
3, 2, 3
EOT
header_to_table_columns = {
'header1' => 'col1',
'header2' => 'col2',
'header 3' => 'col3'
}
arr_of_arrs = CSV.parse(data)
headers = arr_of_arrs.shift.map{ |i| i.strip }
db_cols = header_to_table_columns.values_at(*headers)
arr_of_arrs.each do |ary|
# insert into the database using an ORM or by creating insert statements
end