CSV imports with no errors but missing rows - mysql

I am loading a csv file into an empty table with success and no errors. When comparing the number of original rows (from viewing .csv in external application and from the Output Response) with the number of rows in my resulting table (from Table Inspector) seems like not all rows are importing. Table Inspector reports that the table has 1,416,824 rows while the original csv has 1,419,910 rows. There should be no replicated primary keys in the data set though it should have error'd out on those lines in my mind if that were the case.
Table structure:
CREATE TABLE `table1` (
`pkfield` varchar(10) NOT NULL,
`field1` varchar(3) DEFAULT NULL,
`field2` varchar(1) DEFAULT NULL,
`field3` varchar(1) DEFAULT NULL,
PRIMARY KEY (`pkfield`),
UNIQUE KEY `pkfield_UNIQUE` (`pkfield`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Import command:
LOAD DATA INFILE 'c:/table1.csv'
INTO TABLE myschema.table1
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
MySQL Workbench Response:
1419910 row(s) affected Records: 1419910 Deleted: 0 Skipped: 0 Warnings: 0
Sample from csv file (data mocked up):
6623950258,XYZ,A,B
5377103432,XYZ,A,B
9131144416,XYZ,A,A
1326703267,XYZ,A,B
7847786312,XYZ,A,B
1119927042,XYZ,B,B
4144055385,CDE,A,B
4563489252,XYZ,A,B
5733611912,XYZ,A,B
3309418377,XYZ,A,B
6928148128,XYZ,A,B
1152657670,XYZ,A,B
8143082292,CDE,A,B
9373340750,XYZ,A,A
3318949288,XYZ,A,B
1166427684,XYZ,A,B
5062296807,XYZ,B,A
4624323293,XYZ,A,B
3088992643,CDE,A,B
6477504847,XYZ,A,B
Any suggestions or explanations would be greatly appreciated. Kind regards.

Honestly, I'm myself not sure why the number of rows isn't accurate after a fresh import of a table. I think the Table Inspector fetches the data from some statistics table and to my understanding that gets updated only when the table changes by more than 10 %. Perhaps this is the reason. However, the accurate amount of rows can always be fetched with traditional
select count(*) from myschema.table1;
As #nbayly told, this gives the expected result of 1419910 rows which matches the number LOAD DATA had reported.

Honestly, for now, I didn't know why this happened. But I knew a solution to this.
At first, I thought It might be because of special characters present in the csv records. but even after removing those still, I was receiving not all records. I also noticed that records don't need to be in thousand to be missed by load data. It even happens in a few hundred records.
So for now the only reliable solution is to import using phpMyAdmin. Please remove special characters etc before importing, and also remove new lines from fields header etc.
phpMyAdmin seems to have some sort of parser that tokenized the csv file and then create SQL from those tokens. So it does not use at all load data command and hence it did import corrects.
The downside is it is through GUI and all those slowness that comes with point and click etc.

Related

My Sql Bulk inserts loading the data with truncating

I am trying to load the data from CSV file to MYSql database through bulk insert option. Here are the below create table syntax and CSV file
CREATE TABLE discounts (
id INT NOT NULL ,
title VARCHAR(10) NOT NULL,
expired_date DATE NOT NULL,
amount VARCHAR(255 ) NOT NULL
);
CSV file format:
"475","Back","20140401","FFFF"
"476","bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb","20140901","DDD"
SQL Query :
LOAD DATA INFILE 'C:\Users\karthick\Desktop\data.csv'
INTO TABLE discounts
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
In above create table syntax i have specified the column "title" data length to "10". But the value in data file for second row exceeds the length 10.
When i executed the SQL query the data are loaded successfully to MySQL database and here are the below output & My values in second row is getting truncated for the field "title". Could you please suggest how to stop loading the row without truncating it. Also it should load the next consecutive row without terminating if the data are appropriate. Please suggest
Database Output :
'475', 'Back', '2014-04-01', 'FFFF'
'476', 'bbbbbbbbbb', '2014-09-01', 'DDD'
Here is trick you may use. Assuming the maximum width title you want to persist is 100 characters, you may create the table as follows:
CREATE TABLE discounts (
id INT NOT NULL,
title VARCHAR(101) NOT NULL, -- slightly larger than desired max width
expired_date DATE NOT NULL,
amount VARCHAR(255 ) NOT NULL
);
Then load your data as you were doing. Records having titles which exceed a width of 100 would in fact have a width of 101 in your database table. Then, you may target such records for deletion:
DELETE
FROM discounts
WHERE LENGTH(title) > 100;
If you want, you can also now resize the title column to a width of exactly 100:
ALTER TABLE discounts MODIFY COLUMN title VARCHAR(100);
There might be a way to do this from LOAD DATA, but in general this tool is fairly simple and designed to just blindly load data into a MySQL table. LOAD DATA does have the ability to transform data as it is read, but I am not sure if it can block it.
As per my understanding, below are few points that you want to achieve:
1) Data should not get truncated if title length is more than specified field length as per table structure.
2) If title length is more, then that record should get skipped while doing an importing of records & rest of the process should continue ahead.
Answer as per mysql database taken into consideration:
You can make use of sql_mode as TRADITIONAL (Make MySQL behave like a “traditional” SQL database system. A simple description of this mode is “give an error instead of a warning” when inserting an incorrect value into a column. Reference: https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html)
Now after setting this mode, while doing records import, error will occur if any incorrect data or value out of range is getting to insert into table.
Next part, for out of range values their is no way to skip the error rows. You can check existing discussion link: Skip error lines while loading data to mysql table from delimited file
For skipping rows which are breaking unique constraints or possibly creating duplicate records, can be skipped using IGNORE keyword along with LOAD DATA INFILE.
Refer: https://dev.mysql.com/doc/refman/5.5/en/load-data.html

MYSQL: Can't Select values although I know they are there

I'm dealing with this problem in my MYSQL database for several hours now. I work with OS X 10.8.4 and use the tool Sequel Pro to work with my database. The table I have troubles with looks like this:
CREATE TABLE `Descriptions` (
`id` int(11) unsigned zerofill NOT NULL AUTO_INCREMENT,
`company` varchar(200) DEFAULT NULL,
`overview` mediumtext,
`trade` mediumtext,
PRIMARY KEY (`id`))
ENGINE=InnoDB AUTO_INCREMENT=1703911 DEFAULT CHARSET=utf8;
I imported a csv file like this:
LOAD DATA LOCAL INFILE 'users/marc/desktop/descriptions kopie.txt'
INTO TABLE descriptions
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
(#dummy, company, overview, trade)
When I look at the data in my table now, everything looks the way I expect (SELECT * Syntax). But I can't work with the data. When I try to select the company 'SISTERS', from which I know that it exists, it gives me no results. Also the fields "overview" and "trade" are not NULL when there's no data, it is just an empty string. The other tables in the database works just fine with the imported data. Somehow MySQL just doesn't see the values as something to work with, it doesn't bothers to read them.
What I tried so far:
- I used text wrangler to convert the csv to txt (utf-8) and loadet it into the database, did not work
- I changed the fields into BLOB and back to varchar/mediumtext to force mysql to do something with the data, did not work
- I tried to use the Sequel Pro Import function, did not change anything
- I tried to make a new table and copy the old one into it, did not change anything
- I tried to force mysql to change the values by using the concat syntax (just adding random variables which I could delete later again)
Could it have something to do with the collation settings? Could it has something to do with my regional settings (Switzerland) on my OS X) Any other ideas? I would appreciate any help very much.
Kind Regards,
Marc
I could solve the problems. When I opened the csv in Text Wrangler and let the invisible characters show, it was full of red reversed question marks. Those sneaky bastards, they messed up everything. I don't now what they are, but they were the problem. I removed them with the "Zap Gremlins..." option.

mysql: import csv in existing table

Imagine you have a CSV file with only text in it and line ending \r\n.
like this:
abc
def
hij
...
xyz
You want to import this file in an existing multi-column table, where each line of text needs to go in a line of one specific (currently empty) column (lets name it needed) of the table. Like this:
| a | b | c |needed|
|foo|bar|baz|______|<-- abc
|foo|bar|baz|______|<-- def
...
|foo|bar|baz|______|<-- xyz
The data from the CSV file does not need to be inserted in a certain order. It really does not matter which field of needed has which data from the CSV in it, as long as every line gets imported, everything is fine.
I've tried lots of things and its driving me mad, but I can't figure out, how this could be done. Can this be solved somehow with LOAD DATA INFILE and update/replace/insert command? What would you do in a case like this? Is it even possible with mysql? Or do I need a custom php script for this?
I hope the question is clear enough. Please ask, if something is unclear to you.
OK, here you go...
Add a new column to stuff populated with 1-600,000
ALTER TABLE stuff ADD COLUMN newId INTEGER UNIQUE AUTO_INCREMENT;
Create a temp table for your CSV file
CREATE TABLE temp (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
data VARCHAR(32)
);
I'm guessing the required length of the data.
Import into the temporary table
LOAD DATA INFILE <wherever> INTO TABLE temp;
Add the data to stuff
UPDATE stuff AS s
JOIN temp AS t ON t.id=s.newId
SET s.needed=t.data;
Tidy up
DROP TABLE temp;
ALTER TABLE stuff DROP COLUMN newId;
I must admit that I haven't tested the LOAD DATA INFILE or the UPDATE statements, but I have checked all the table fiddling. In particular, MySQL will populate the newId column with consecutive numbers.
I don't think there will be a problem if the number of rows in stuff doesn't match the number of lines in your csv file - obviously, there will be a few rows with needed still NULL if there are fewer lines than rows.
Most easiest way I found was with Libreoffice Calc+Base.
Import/Export Data in Base.
But becarefull,if you have "not null" options in columns and by mistake there is a cell which has no data in it, that row will be skiped.
So first disable the not null options from columns.
Oh and there is Microsoft Office Excel way,but I did not done it since it required a plugin to install and I was lazy.

LOAD DATA LOCAL INFILE help required

Here's my query for loading mysql table using csv file.
LOAD DATA LOCAL INFILE table.csv REPLACE INTO TABLE table1 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\'' LINES TERMINATED BY 'XXX' IGNORE 1 LINES
SET date_modified = CURRENT_TIMESTAMP;
Suppose my CSV contains 500 records with 15 columns. I changed three rows and terminated them with 'XXX'. I now want to update the mysql table with this file. My primary key is auto-incremented value. When I run this query, all the 500 rows are getting updated with old data and the rows I changed are getting added as new ones. I dont want the new ones. I want my table to be replaced with csv as-is. I tried changing my primary key to non-AI, it still didnt work. Any pointers please?? Thanks.
I am making some assumptions here.
1) You dont have the autonumber value in your file.
Since your primary key is not in your file MySQL will not be able to match rows. A autonumber primary key is a artificial key thus it is not part of the data. MySQL adds this artificial primary key when the row is inserted.
Lets assume your file contained some unique identifier lets call it Identification_Number. This number is both in the file and your table uses it as a primary key in this case MySQL will be able to identify the rows from the file and match them to the rows in the table.
While a lot of people will only use autonumbers in a database I always check if there is not a natural key in the data. If I identify one I do some performance testing with this natural key in a table. Then based on the performance metrics of both I then decide on a key.
Hopefully I did not get your question wrong but I suspect this might be the case.

mysqldump table without dumping the primary key

I have one table spread across two servers running MySql 4. I need to merge these into one server for our test environment.
These tables literally have millions of records each, and the reason they are on two servers is because of how huge they are. Any altering and paging of the tables will give us too huge of a performance hit.
Because they are on a production environment, it is impossible for me to alter them in any way on their existing servers.
The issue is the primary key is a unique auto incrementing field, so there are intersections.
I've been trying to figure out how to use the mysqldump command to ignore certain fields, but the --disable-keys merely alters the table, instead of getting rid of the keys completely.
At this point it's looking like I'm going to need to modify the database structure to utilize a checksum or hash for the primary key as a combination of the two unique fields that actually should be unique... I really don't want to do this.
Help!
To solve this problem, I looked up this question, found #pumpkinthehead's answer, and realized that all we need to do is find+replace the primary key in each row with the NULL so that mysql will use the default auto_increment value instead.
(your complete mysqldump command) | sed -e "s/([0-9]*,/(NULL,/gi" > my_dump_with_no_primary_keys.sql
Original output:
INSERT INTO `core_config_data` VALUES
(2735,'default',0,'productupdates/configuration/sender_email_identity','general'),
(2736,'default',0,'productupdates/configuration/unsubscribe','1'),
Transformed Output:
INSERT INTO `core_config_data` VALUES
(NULL,'default',0,'productupdates/configuration/sender_email_identity','general'),
(NULL,'default',0,'productupdates/configuration/unsubscribe','1'),
Note: This is still a hack; For example, it will fail if your auto-increment column is not the first column, but solves my problem 99% of the time.
if you don't care what the value of the auto_increment column will be, then just load the first file, rename the table, then recreate the table and load the second file. finally, use
INSERT newly_created_table_name (all, columns, except, the, auto_increment, column)
SELECT all, columns, except, the, auto_increment, column
FROM renamed_table_name
You can create a view of the table without the primary key column, then run mysqldump on that view.
So if your table "users" has the columns: id, name, email
> CREATE VIEW myView AS
SELECT name, email FROM users
Edit: ah I see, I'm not sure if there's any other way then.
Clone Your table
Drop the column in clone table
Dump the clone table without the structure (but with -c option to get complete inserts)
Import where You want
This is a total pain. I get around this issue by running something like
sed -e "s/([0-9]*,/(/gi" export.sql > expor2.sql
on the dump to get rid of the primary keys and then
sed -e "s/VALUES/(col1,col2,...etc.) VALUES/gi" LinxImport2.sql > LinxImport3.sql
for all of the columns except for the primary key. Of course, you'll have to be careful that ([0-9]*, doesn't replace anything that you actually want.
Hope that helps someone.
SELECT null as fake_pk, `col_2`, `col_3`, `col_4` INTO OUTFILE 'your_file'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM your_table;
LOAD DATA INFILE 'your_file' INTO TABLE your_table
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';
For added fanciness, you can set a before insert trigger on your receiving table that sets the new primary key for reach row before the insertion occurs, thereby using regular dumps and still clearing your pk. Not tested, but feeling pretty confident about it.
Use a dummy temporary primary key:
Use mysqldump normally --opts -c. For example, your primary key is 'id'.
Edit the output files and add a row "dummy_id" to the structure of your table with the same type as 'id' (but not primary key of course). Then modify the INSERT statement and replace 'id' by 'dummy_id'. Once imported, drop the column 'dummy_id'.
jimyi was on the right track.
This is one of the reasons why autoincrement keys are a PITA. One solution is not to delete data but add to it.
CREATE VIEW myView AS
SELECT id*10+$x, name, email FROM users
(where $x is a single digit uniquely identifying the original database) either creating the view on the source database (which you hint may not be possible) or use an extract routine like that described by Autocracy or load the data into staging tables on the test box.
Alternatively, don't create the table on the test system - instead put in separate tables for the src data then create a view which fetches from them both:
CREATE VIEW users AS
(SELECT * FROM users_on_a) UNION (SELECT * FROM users_on_b)
C.
The solution I've been using is to just do a regular SQL export of the data I'm exporting, then removing the primary key from the insert statements using a RegEx find&replace editor. Personally I use Sublime Text, but I'm sure TextMate, Notepad++ etc. can do the same.
Then I just run the query in which ever database the data should be inserted to by copy pasting the query into HeidiSQL's query window or PHPMyAdmin. If there's a LOT of data I save the insert query to an SQL file and use file import instead. Copy & paste with huge amounts of text often makes Chrome freeze.
This might sound like a lot of work, but I rarely use more than a couple of minutes between the export and the import. Probably a lot less than I would use on the accepted solution. I've used this solution method on several hundred thousand rows without issue, but I think it would get problematic when you reach the millions.
I like the temporary table route.
create temporary table my_table_copy
select * from my_table;
alter table my_table_copy drop id;
// Use your favorite dumping method for the temporary table
Like the others, this isn't a one-size-fits-all solution (especially given OP's millions of rows) but even at 10^6 rows it takes several seconds to run but works.