SQLLDR - problem with WHEN clauses - sql-loader

I have multiple when clauses in my control file, the data that i am loading in half of them satisfies the when clauses and gets inserted into the desired table. The other half arent (which i expect) but i was expecting the data that doesnt meet the when conditions to be placed into a discard file but there is none created.
Any ideas?
LOAD DATA
INFILE '/u04/app/vpht_app/flat_files/icr_load/marc/sqlldr_load/CSSO_CCRBSCREDENTIALS_COMSUMER23062010160322.txt'
BADFILE '/u04/app/vpht_app/flat_files/icr_load/marc/sqlldr_load/CSSO_CCRBSCREDENTIALS_COMSUMER23062010160322.bad'
DISCARDFILE '/u04/app/vpht_app/flat_files/icr_load/marc/sqlldr_load/CSSO_CCRBSCREDENTIALS_COMSUMER23062010160322.dsc'
INSERT
INTO TABLE "DCVPAPP"."RBS_CC_CUSTOMERINFO"
INSERT
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(CC_USER_NAME POSITION(24:73),
ACCOUNTID POSITION(1:12),
CUSTOMERID POSITION(14:22))
INTO TABLE "DCVPAPP"."RBS_CC_SECURITYDETAILS"
WHEN (481:481) = 'N' AND (477:479) ='0'
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(
CC_USER_NAME POSITION(24:73),
RBSPIN POSITION(75:274),
RBSPASSWORD POSITION(276:475),
fill1 filler,
fill2 filler,
fill3 filler,
fill4 filler,
FAILCODECOUNT POSITION(477:479),
FAILPASSWORDCOUNT POSITION(477:479)
)
INTO TABLE "DCVPAPP"."RBS_CC_SECURITYDETAILS"
WHEN (481:481) = 'N' AND (477:479) ='1'
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(
CC_USER_NAME POSITION(24:73),
RBSPIN POSITION(75:274),
RBSPASSWORD POSITION(276:475),
fill1 filler,
fill2 filler,
fill3 filler,
fill4 filler,
FAILCODECOUNT POSITION(477:479),
FAILPASSWORDCOUNT POSITION(477:479)
)
My table structure is:
Create table RBS_CC_CUSTOMERINFO
(
CC_USER_NAME VARCHAR2(50),
ACCOUNTID VARCHAR2(12) NOT NULL,
CUSTOMERID VARCHAR2(9) NOT NULL,
CUST_MIGRATION_STATUS VARCHAR2(1) DEFAULT 'N' NOT NULL,
CONSTRAINT pk_01 PRIMARY KEY (CC_USER_NAME)
);
Create table RBS_CC_SECURITYDETAILS
(
CC_USER_NAME VARCHAR2(50),
RBSPIN VARCHAR2(200) NOT NULL,
RBSPASSWORD VARCHAR2(200) NOT NULL,
FAILCODECOUNT NUMBER (9) NOT NULL,
FAILPASSWORDCOUNT NUMBER (9) NOT NULL,
CONSTRAINT pk_secur
FOREIGN KEY (CC_USER_NAME)
REFERENCES RBS_CC_CUSTOMERINFO(CC_USER_NAME)
)
and my sample data is below( These have been right padded since these are fixed fields) the last record should be discarded and placed in side the discard file since it doesnt meet any of the when clause conditions, but no discard file is created. I have tried it with one when clause and the discard file is created,seems using more than one table the discard file isnt created.
ACC000000001,CUSTID213,MARC_VAF ,1234 ,pet ,0 ,N,N,FULL
ACC000000002,CUSTID214,TOBY_123 ,1352 ,bailey ,1 ,Y,N,FULL
ACC000000003,CUSTID215,KEVIN_VAF81 ,YY33OF ,water ,2 ,Y,N,FULL
ACC000000015,CUSTID227,SAM_EGD ,CARRY42 ,some password ,-3 ,Y,N,FULL
Thanks

I used SQL*Loader on your sample data, and found the following in the log file that SQL*Loader left behind:
Table "DCVPAPP"."RBS_CC_CUSTOMERINFO":
4 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Table "DCVPAPP"."RBS_CC_SECURITYDETAILS":
0 Rows successfully loaded.
0 Rows not loaded due to data errors.
4 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Table "DCVPAPP"."RBS_CC_SECURITYDETAILS":
0 Rows successfully loaded.
0 Rows not loaded due to data errors.
4 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
In the first block, all the data was loaded because there were no WHEN clauses to fail. With the other two, all rows failed the WHEN clauses. Since the first block loaded all four rows, there was nothing to write to the discard file, so SQL*Loader didn't create one.
The WHEN clauses in the second two blocks seem to reference parts of the data a long way off the end of your sample data. They both appear to use data from positions 477 onwards, whereas the longest line in your sample data is only 68 characters long. Since each field has at most one trailing space, I'll assume that your sample data has somehow got mangled and that there should be many more spaces than as above.
Anyway, I commented out the section of your controlfile that inserts into RBS_CC_CUSTOMERINFO, emptied the tables and reran SQL*Loader. This time, all four rows were written to the discard file.
If you want data that matches neither of the two WHEN clauses to be written to a discard file, how about splitting the controlfile into two separate controlfiles, one which loads the data into RBS_CC_CUSTOMERINFO using the first block, and one which loads the data into RBS_CC_SECURITYDETAILS using the other two blocks?

Related

MySQL performance issue on ~3million rows containing MEDIUMTEXT?

I had a table with 3 columns and 3600K rows. Using MySQL as a key-value store.
The first column id was VARCHAR(8) and set to primary key.The 2nd and 3rd columns were MEDIUMTEXT. When calling SELECT * FROM table WHERE id=00000 MySQL took like 54 sec ~ 3 minutes.
For testing I created a table containing VARCHAR(8)-VARCHAR(5)-VARCHAR(5) where data casually generated from numpy.random.randint. SELECT takes 3 sec without primary key. Same random data with VARCHAR(8)-MEDIUMTEXT-MEDIUMTEXT, the time cost by SELECT was 15 sec without primary key.(note: in second test, 2nd and 3rd column actually contained very short text like '65535', but created as MEDIUMTEXT)
My question is: how can I achieve similar performance on my real data? (or, is it impossible?)
If you use
SELECT * FROM `table` WHERE id=00000
instead of
SELECT * FROM `table` WHERE id='00000'
you are looking for all strings that are equal to an integer 0, so MySQL will have to check all rows, because '0', '0000' and even ' 0' will all be casted to integer 0. So your primary key on id will not help and you will end up with a slow full table. Even if you don't store values that way, MySQL doesn't know that.
The best option is, as all comments and answers pointed out, to change the datatype to int:
alter table `table` modify id int;
This will only work if your ids casted as integer are unique (so you don't have e.g. '0' and '00' in your table).
If you have any foreign keys that references id, you have to drop them first and, before recreating them, change the datatype in the other columns too.
If you have a known format you are storing your values (e.g. no zeros, or filled with 0s up to the length of 8), the second best option is to use this exact format to do your query, and include the ' to not cast it to integer. If you e.g. always fill 0 to 8 digits, use
SELECT * FROM `table` WHERE id='00000000';
If you never add any zeros, still add the ':
SELECT * FROM `table` WHERE id='0';
With both options, MySQL can use your primary key and you will get your result in milliseconds.
If your id column contains only numbers so define it as int , because int will give you better performance ( it is more faster)
Make the column in your table (the one defined as key) integer and retry. Check first performance by running a test within your DB (workbench or simple command line). You should get a better result.
Then, and only if needed (I doubt it though), modify your python to convert from integer to string (and/or vise-versa) when referencing the key column.

Can I create a mapping from interger values in a column to the text values they represent in sql?

I have a table full of traffic accident data with column headers such as 'Vehicle_Manoeuvre' which contains integers for example 13 represents the vehicle manoeuvre which caused the accident was 'overtaking moving vehicle'.
I know the mappings from integers to text as I have a (quite large) excel file with this data.
An example of what I want to know is percentage of the accidents involved this type of manoeuvre but I don't want to have to open the excel file and find the mappings of integers to text every time I write a query.
I could manually change the integers of all the columns (write query with all the possible mappings of each column, add them as new column, then delete the orginial columns) but this sould take a long time.
Is it possible to create some type of variable (like an array with first column as integers and second column with the mapped text) that SQL could use to understand how text relates to the integers allowing me to write a query below:
SELECT COUNT(Vehicle_Manoeuvre) FROM traffictable WHERE Vehicle_Manoeuvre='overtaking moving vehicle';
rather than:
SELECT COUNT(Vehicle_Manoeuvre) FROM traffictable WHERE Vehicle_Manoeuvre=13;
even though the data in the table is still in integer form?
You would do this with a Maneeuvres reference table:
create table Manoeuvres (
ManoeuvreId int primary key,
Name varchar(255) unique
);
insert into Manoeuvres(ManoeuvreId, Name)
values (13, 'Overtaking');
You might even have such a table already, if you know that 13 has a special meaning.
Then use a join:
SELECT COUNT(*)
FROM traffictable tt JOIN
Manoeuvres m
ON tt.Vehicle_Manoeuvre = m.ManoeuvreId
WHERE m.name = 'Overtaking';

MySQL LOAD DATA INFILE issue with updating + inserting

I am given a rather poorly structured table that has a Primary Key set to autoincrement and an UNIQUE key that is just unique. Conceptually, the UNIQUE key was supposed to be the primary key, but whoever made the table didn't have the UNQIUE key's column information at the time of the table's construction.
Now, we need to start doing regular update to this table where a provided text file contains updated rows and new rows. The challenge would be to replace the row if there's a matching value in the UNIQUE key and we actually don't care about the primary key itself as long as it autoincrements.
However, the way that LOAD DATA INFILE is structured is that it'd reset the PK we already have, which is bad - The reason we kept the PK is that it is foreign key to other legacy table (Sigh...).
So... is there a way I can make an elegant SQL-only update script that reads the updated table in text form and just updates based on the UNIQUE key column without screwing up the PK?
I guess a solution would be to export the table to tab form and do VLOOKUP to assign rows with the matching PK value (or NULL if it is a new row).
Any input?
Edit: Someone suggested that I do LOAD DATE INFILE into a temporary table and then do INSERT/UPDATE from there. Based on what this post and that post say, here's the script I propose:
// Create temporary table
CREATE TABLE tmp {
// my specifications
}
// Load into temporary table
LOAD DATA LOCAL INFILE *'[my tab file]'*
REPLACE INTO TABLE *mytable* FIELDS TERMINATED BY '\t' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
// Set copy all the columns over except the PK column. This is for rows with existing UNIQUE values
UPDATE mytable
RIGHT JOIN tmp ON mytable.unique = tmp.unique
SET mytable.col1 = tmp.col1, ..., mytable.coln = tmp.coln, mytable.unique = tmp.unique;
// Now insert the rows with new UNIQUE values
INSERT IGNORE INTO mytable (mytable.col1, mytable.col2, ...)
SELECT tmp.col1, tmp.col2, ... FROM tmp
// Delete the temporary table now.
DROP tmp;
Edit2: I updated the above query and tested it. It should work. Any opinions?
You can load data into new table using LOAD DATA INFILE. Then use INSERT, UPDATE statements to change your table with data from new table, in this case you can link tables as you want - by primary/unique key or by any field(s).

MySQL: reorder rows from file association

A MySQL photo gallery script requires that I provide the display order of my gallery by pairing each image title to a number representing the desired order.
I have a list of correctly ordered data called pairs_list.txt that looks like this:
# title correct data in list
-- -------
1 kmmal
2 bub14
3 ili2
4 sver2
5 ell5
6 ello1
...
So, the kimmals image will be displayed first, then the bub14 image, etc.
My MySQL table called title_order has the same titles above, but they are not paired with the right numbers:
# title bad data in MySQL
-- -------
14 kmmal
100 bub14
31 ili2
47 sver2
32 ell5
1 ello1
...
How can I make a MySQL script that will look at the correct number-title pairings from pairs_list.txt and go through each row of title_order, replacing each row with the correct number? In other words, how can I make the order of the MySQL table look like that of the text file?
In pseudo-code, it might look like something like this:
Get MySQL row title
Search pair_list.txt for this title
Get the correct number-title pair in list
Replace the MySQL number with the correct number
Repeat for all rows
Thank you for any help!
if this is not a one time task but will be frequently called function, then maybe you can have the following scenario:
create a temp table, insert all the values from pairs_list.txt into this temp table using mysql load data infile function.
create a procedure (or a insert trigger maybe?) on that temp table which would update your main table according to whatever inserted.
in that procedure (or a insert trigger), I would have a cursor getting all values from temp table and for each value from that cursor update the selected title on your main table.
delete all from that temp table
I'd suggest you to do this simple way -
1 Remove all primary and unique keys from the title_order table, and create unique index (or primary key) on title field -
ALTER TABLE title_order
ADD UNIQUE INDEX UK_title_order_title (title);
2 Use LOAD DATA INFILE with REPLACE option to load data from the file and replace -
LOAD DATA INFILE 'pairs_list.txt'
REPLACE
INTO TABLE title_order
FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\r\n'
IGNORE 2 LINES
(#col1, #col2)
SET order_number_field = #col1, title = TRIM(#col2);
...specify properties you need in LOAD DATA INFILE command.

A performance question in MySQL

I’m seeing a performance behavior in mysqld that I don’t understand.
I have a table t with a primary key id and three data columns col1, … col4.
The data are in 4 TSV files 'col1.tsv', … 'col4.tsv'. The procedure I use to ingest them is:
CREATE TABLE t (
id INT NOT NULL,
col1 INT NOT NULL,
col2 INT NOT NULL,
col3 INT NOT NULL,
col4 CHAR(12) CHARACTER SET latin1 NOT NULL );
LOAD DATA LOCAL INFILE # POP 1
'col1.tsv' INTO TABLE t (id, col1);
ALTER TABLE t ADD PRIMARY KEY (id);
SET GLOBAL hot_keys.key_buffer_size= # something suitable
CACHE INDEX t IN hot_keys;
LOAD INDEX INTO CACHE t;
DROP TABLE IF EXISTS tmpt;
CREATE TABLE tmpt ( id INT NOT NULL, val INT NOT NULL );
LOAD DATA LOCAL INFILE 'col2.tsv' INTO TABLE tmpt tt;
INSERT INTO t (id, col2) # POP 2
SELECT tt.id, tt.val FROM tmpt tt
ON DUPLICATE KEY UPDATE col2=tt.val;
DROP TABLE IF EXISTS tmpt;
CREATE TABLE tmpt ( id INT NOT NULL, val INT NOT NULL );
LOAD DATA LOCAL INFILE 'col3.tsv' INTO TABLE tmpt tt;
INSERT INTO t (id, col3) # POP 3
SELECT tt.id, tt.val FROM tmpt tt
ON DUPLICATE KEY UPDATE col3=tt.val;
DROP TABLE IF EXISTS tmpt;
CREATE TABLE tmpt ( id INT NOT NULL,
val CHAR(12) CHARACTER SET latin1 NOT NULL );
LOAD DATA LOCAL INFILE 'col4.tsv' INTO TABLE tmpt tt;
INSERT INTO t (id, col4) # POP 4
SELECT tt.id, tt.val FROM tmpt tt
ON DUPLICATE KEY UPDATE col4=tt.val;
Now here’s the performance thing I don’t understand. Sometimes the POP 2
and 3 INSERT INTO … SELECT … ON DUPLICATE KEY UPDATE queries run very fast with mysqld
occupying 100% of a core and at other times mysqld bogs down at 1% CPU reading t.MYD, i.e. table t’s MyISAM data file, at random offsets.
I’ve had a very hard time isolating in which circumstances it is fast and
in which it is slow but I have found one repeatable case:
In the above sequence, POP 2 and 3 are very slow. But if I create t
without col4 then POP 2 and POP 3 are very fast. Why?
And if, after that, I add col4 with an ALTER TABLE query then POP 4 runs
very fast too.
Again, when the INSERTs run slow, mysqld is bogged down in file IO
reading from random offsets in table t’s MyISAM data file. I don’t even
understand why it is reading that file.
MySQL server version 5.0.87. OS X 10.6.4 on Core 2 Duo iMac.
UPDATE
I eventually found (what I think is) the answer to this question. The mysterious difference between some inserts being slow and some fast is dependent on the data.
The clue was: when the insert is slow, mysqld is seeking on average 0.5GB between reads on t.MYD. When it is fast, successive reads have tiny relative offsets.
The confusion arose because some of the 'col?.tsv' files happen to have their rows in roughly the same order w.r.t. the id column while others are randomly ordered relative to them.
I was able to drastically reduce overall processing time by using sort(1) on the tsv files before loading and inserting them.
It's a pretty open question... here's a speculative, open answer. :)
... when the INSERTs run slow, mysqld is bogged down in file IO reading from random offsets in table t’s MyISAM data file. I don’t even understand why it is reading that file.
I can think of two possible explanations:
Even after it knows there is a primary key collision, it has to see what used to be in the field that will be updated -- if it is coincidentally the destination value already, 0 in this case, it won't perform the update -- i.e. zero rows affected.
Moreover, when you update a field, I believe MySQL re-writes the whole row back to disk (if not multiple rows due to paging), and not just that single field as one might assume.
But if I create t without col4 then POP 2 and POP 3 are very fast. Why?
If it's a fixed-row size MyISAM table, which it looks like due to the datatypes in the table, then including the CHAR field, even if it's blank, will make the table 75% larger on disk (4 bytes per INT field = 16 bytes, whereas the CHAR(12) would add another 12 bytes). So, in theory, you'll need to read/write 75% more.
Does your dataset fit in memory? Have you considered using InnoDB or Memory tables?
Addendum
If the usable/active/hot dataset goes from fitting in memory to not fitting in memory, an orders of magnitude decrease in performance isn't unheard of. A couple reads:
http://jayant7k.blogspot.com/2010/10/foursquare-outage-post-mortem.html
http://www.mysqlperformanceblog.com/2010/11/19/is-there-benefit-from-having-more-memory/