I have ASCII files with a static number of characters for each line with no delimiters. I'd like to use LOAD DATA INFILE to import into my table.
Example of file:
USALALABAMA
USARARKANSAS
USFLFLORIDA
The structure for this table:
country Char(2)
state Char(2)
name Varchar(70)
CREATE TABLE `states` (
`country` char(2) COLLATE latin1_general_ci NOT NULL,
`state` char(2) COLLATE latin1_general_ci NOT NULL,
`name` varchar(70) COLLATE latin1_general_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1_general_ci COLLATE=latin1_general_ci;
Is it possible to specify a start and end position for each column?
According to the documentation, you can load a fixed format file without using a temporary table.
If the FIELDS TERMINATED BY and FIELDS ENCLOSED BY values are both empty (''), a fixed-row (nondelimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. For TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT, the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.
The positions are derived from the columns definitions, which in your case match the structure of the file. So you just need to do:
LOAD DATA INFILE 'your_file' INTO TABLE your_table
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\r\n'
SET name = trim(name);
First create a temporary table which you will load all lines into it, then you can load the data from the temporary table into the main table and split to fields using substring
Something like this:
CREATE TEMPORARY TABLE tmp_lines
(countrystring TEXT);
LOAD DATA INFILE 'yourfilegoeshere' INTO TABLE tmp_lines
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\r\n';
INSERT INTO main_table SELECT SUBSTRING(countrystring,1,2), SUBSTRING(countrystring,3, 2), SUBSTRING(countrystring,5) from tmp_lines;
Another way to do this is just assigning a variable and splitting it direct in your load.
LOAD DATA INFILE 'yourfilegoeshere' INTO TABLE main_table
LINES TERMINATED BY '\r\n' (#_var)
set
field1=TRIM(SUBSTR(#_var from 1 for 2)),
field2=TRIM(SUBSTR(#_var from 3 for 2)),
field3=TRIM(SUBSTR(#_var from 5 for 70));
Just be sure not to specify any field separator, otherwise you will have to use more variables, note that I'm using TRIM to clean data in the same statement.
Related
I am storing the contents of text files in a table
CREATE TABLE Pages
(
ID int(11) unsigned NOT NULL,
Text mediumtext COMPRESSED,
PRIMARY KEY(ID)
) ENGINE=ARIA DEFAULT CHARSET=utf8 COLLATE utf8_general_ci ROW_FORMAT=DYNAMIC
I try to INSERT each file's contents directly via LOAD DATA INFILE
LOAD DATA INFILE 'file.txt' INTO TABLE table
FIELDS TERMINATED BY '\0' LINES TERMINATED BY '' (Text)
SET ID=$id
The problem is that if I ideally use TERMINATED BY '', it gives the error
You can't use fixed rowlength with BLOBs; please use 'fields
terminated by'
I used '\0' assuming the null character does not exist in the text file. Although it works, is there a more standard way to do so?
I have this table:
CREATE TABLE `country` (
`name` VARCHAR(60) NOT NULL,
`code` VARCHAR(3) UNIQUE NOT NULL,
PRIMARY KEY (`code`)
);
As you can see the primary key of this table is the word code
When I try to select a specific code in this table, that is 2 characters long, it cannot find anything.
On the other hand, when I select a 3 characters long code like this:
select * from `country` where `code` = "TZA";
I get the result I want
I searched for my variable in the table (for example the code "AL") and it appears to be registered.
Why is this happening and how could I make it work?
Thank you in advance!
I am importing my data from a csv file that looks like this:
LOAD DATA LOCAL INFILE 'path_to_file\\countries.csv'
INTO TABLE `country`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(`name`, `code`);
I have tried selecting with a space in the end of the code and on the front of it:
select * from `country` where `code` = 'AL ';
select * from `country` where `code` = ' AL';
But they output nothing
The real solution is:
When importing this CSV file you should use:
LOAD DATA LOCAL INFILE 'path_to_file\\countries.csv'
INTO TABLE `country`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS
(`name`, `code`);
Because your lines seems to be terminated the way Windows terminates lines.
Each month I load new data to MySQL Database. Columns 'Event', Event Description' and 'Location' are standard columns. The Columns which hold the data for the months are dynamic. That is this month I am reading a file which has months 2019M01, 2019M02 and 2019M03. Next month I will read 2019M02, 2019M03, 2019M04.
Is there any way to easily prepare tables that can read these dynamic .csv files that can take into account the dynamic nature of these data columns ?
Currently for each month I create new tables and then as we move to the next month I drop the previous table and create a new one which has the correct column names for the new month.
Currently my code is as follows:
/*We are in Month January*/;
CREATE TABLE ex_1 (
`Event` TEXT,
`Event Description` TEXT,
`Location` TINYTEXT,
`2019M01` DECIMAL (15,10) NOT NULL, `2019M02` DECIMAL (15,10) NOT NULL, `2019M03` DECIMAL (15,10) NOT NULL
);
LOAD DATA LOCAL INFILE 'C:/Users/BPerei23/Desktop/WORK/Projects/MAPE/MI-STAT/MarketIntelligence_DS-Spine_20190124.csv' REPLACE INTO TABLE ex_1
CHARACTER SET Latin1 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
INSERT INTO FINAL_TABLE
SELECT * FROM ex_1;
/*We are now in Month February*/;
DROP TABLE ex_1;
CREATE TABLE ex_2 (
`Event` TEXT,
`Event Description` TEXT,
`Location` TINYTEXT,
`2019M02` DECIMAL (15,10) NOT NULL, `2019M03` DECIMAL (15,10) NOT NULL, `2019M04` DECIMAL (15,10) NOT NULL
);
LOAD DATA LOCAL INFILE 'C:/Users/BPerei23/Desktop/WORK/Projects/MAPE/MI-STAT/MarketIntelligence_DS-Spine_20190224.csv' REPLACE INTO TABLE ex_2
CHARACTER SET Latin1 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
INSERT INTO FINAL_TABLE
SELECT * FROM ex_2;
I am trying to import a small data set of Berlin street addresses using MySQL's LOAD DATA statement. The problem is that after the import runs, all of the beautiful ß characters in the German street names have become ß sets.
Here's the create-table statement I used for this table:
CREATE TABLE `subway_distances` (
`STN` varchar(255) DEFAULT NULL,
`HNR` int(9) DEFAULT NULL,
`Lat` decimal(36,15) DEFAULT NULL,
`Lon` decimal(36,15) DEFAULT NULL,
`Distance` decimal(45,20) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
... and here is my MySQL shell code:
charset utf8;
TRUNCATE TABLE subway_distances;
LOAD DATA LOCAL INFILE '/path/to/output.csv'
INTO TABLE berlin.subway_distances
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';
SELECT * FROM subway_distances LIMIT 0,10;
I have looked at output.csv in vim, and the eszett character appears to be fine there.
I am assuming that I simply need a different encoding declaration in MySQL, but I'm not sure where to start.
I am also assuming that collation doesn't matter yet, since I'm not comparing values -- just purely trying to get a valid import.
I found an answer to this relatively quickly. It looks like I just need to specify the CHARACTER SET value in my LOAD DATA statement. So the new statement looks like this:
LOAD DATA LOCAL INFILE '/path/to/output.csv'
INTO TABLE berlin.subway_distances
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';
I am trying to import a csv file that is delimited by tabs.
Here is my query
LOAD DATA LOCAL INFILE 'c:/news.csv'
INTO TABLE news
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r'
(url, storyid, title, date, details, category, author);
What happens is only the first column is loaded, (url).
The rest shows NULL. I have tried lines terminated by \n as well. Same result.
Any advice?
Table structure for table `news`
--
CREATE TABLE IF NOT EXISTS `news` (
`url` varchar(62) DEFAULT NULL,
`storyid` int(15) DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`date` date DEFAULT NULL,
`details` longtext,
`category` varchar(255) DEFAULT NULL,
`author` varchar(110) DEFAULT NULL
)
It depends on the exact format of your .csv file but for Windows .csv format I always use
LINES TERMINATED BY '\r\n'
also (again depending on the data) try
FIELDS ESCAPED BY '\\' TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '\"'
If you're unsure of the exact nature of the data sometimes it is better to view it in hexadecimal to see how the lines are really terminated. I use Hexedit - http://www.hexedit.com/
Hope this helps.
Dermot
Like I said in the comments you can use '/r/n' for a new line.
However your csv file contains only 1 column, namely a full line of text.
That is probably also why only the first table column is filled and the rest is null.
LOAD DATA LOCAL INFILE 'c:/news.csv'
INTO TABLE news
COLUMNS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
(url, storyid, title, date, details, category, author)
This worked.
Turned out that even though it looks tab separated, it is comma separated. Dermot was right that you need to view it in hexadecimal view to see how it is really deliminated.