loading 1 field from CSV into multiple columns in Oracle - sql-loader

I am trying to insert 1 column from CSV into 2 different oracle columns. but it looks like SQL Loader looks at least n fields from CSV to load n columns in oracle and my CTL script does not work for loading n field from CSV to n+1 column in Oracle where I am trying to load one of the field into 2 different oracle columns. Plz advise
Sample data file is:
id,name,imei,flag
1,aaa,123456,Y
my oracle table has below column
create table samp (
id number,
name varchar2(10),
imei varchar2(10),
tac varchar2(3),
flag varchar2(1) )
i need to load the imei from csv onto imei in Oracle Table and substr(imei,1,3) into tac Oracle column
my Control file is:
OPTIONS (SKIP=1)
load data
infile 'xxx.csv'
badfile 'xxx.bad'
into table yyyy
fields terminated by ","
TRAILING NULLCOLS
( id,name,imei,tac "substr(:imei,1,3)", flag)
Error from the log file:
Record 1: Rejected - Error on table yyyy, column flag
Column not found before end of logical record (use TRAILING NULLCOLS)

Ok, keep in mind the control file matches the input data by field in the order listed, THEN the name as defined is used to match to the table column.
The trick is to call the FIELD you need to use twice by something other than an actual column name, like imei_tmp and define it as BOUNDFILLER which means use it like a FILLER (don't load it) but remember it for future use. After the flag field, there are no more data fields so SQLLDR will try to match using the column names.
This is untested, but should get you started (the call to TRIM( ) may not be needed):
...
( id,
name,
imei_tmp BOUNDFILLER,
flag,
imei "trim(:imei_tmp)",
tac "substr(:imei_tmp,1,3)"
)

Related

how to map column names in a hive table and replace it with new values in hive table

I have a csv data as below where data comes every 10mins in the following format. I need to insert this data into hive by mapping column names with different column names. (columns don't come in constant order they change their order, we have total 10 columns sometimes we miss many columns like one example below below)
sample csv file :-
1 2 6 4
u f b h
a f r m
q r b c
now while inserting into hive i need to replace column names
for example
1 -> NBR
2 -> GMB
3 -> GSB
4 -> KTC
5 -> VRV
6 -> AMB
now I need to insert into hive table as below
NBR GMB GSB KTC VRV AMB
u f NULL h NULL b
a f NULL m NULL r
can anyone help me with this how to insert this values into hive
Assuming you can get column headers in you source CSV, you will need to map them from source number to their column names.
sed -i 's/1/NBR/g; s/2/GMB/g; s/3/GSB/g; s/4/KTC/g; s/5/VRV/g; s/6/AMB/g;...;...;...;...' input.csv
Since you only get an unknown subset of the total columns in your hive table, you will need to translate your CSV from
NBR,GMB,AMB,KTC
u,f,b,h
a,f,r,m
q,r,b,c
to
NBR,GMB,GSB,KTC,VRV,AMB,...,...,...,...
u,f,null,b,null,h,null,null,null,null
a,f,null,r,null,m,null,null,null,null
q,r,null,b,null,c,null,null,null,null
in order to properly insert them into your table.
From the Apache Wiki:
Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal
Using LOAD DATA INPATH, even with the tblproperties("skip.header.line.count"="1") set, still requires a valid SQL literal for all columns in the table. This is why youre missing columns.
If you can not get the producer of the CSV to create a file with 1,2,...9,10 columns in order with your table columns and either consecutive commas or a null character in the data, write some kind of script to add missing column names, in the order you need them in, and the required null values in the data.
If you will have header in csv like 1,2,3,4 (as you wrote in the comment), you could use the next syntax:
insert into table (columns where you want to insert) select 1,2,3,4 (columns) from csv_table;
So, if you could know the order of csv columns, you could write easily the insert, naming only the column that you need to populate, no matter the order in the target table.
Before you could run the above insert, you should create a table that reads from csv!

How to get row ids when using LOAD LOCAL DATA INFILE?

I have MySQL database with table into which I insert from multiple files using
LOAD DATA LOCAL INFILE ... statement. I have PRIMARY KEY ID set to auto_increment. The problem is, when I want to update only part of the table.
Say I've inserted file_1, file_2, file_3 in the past and now I want to update only file_2. I imagine the process in pseudo workflow
delete old data related to file_2
insert new data from file_2
However, it is hard to determine, which data are originally from file_2. In order to find out, I've come up with this idea:
When I insert the data, I will note the ids of the rows, which I've inserted, since I am using auto_increment I can note something like from_id, to_id for each of the file. Then, when I want to update only file_x I will delete only the data with from_id <= id <= to_id (where from_id, to_id relates to the file_x).
After little bit of searching, I've found out about ##identity and last_insert_id() (see), however, when I use select last_insert_id() after LOAD DATA LOCAL INFILE I get only one id, and not the maximal id corresponding to the data, but the last added (as it is defined). I am connecting to the database from Python using mysql.connnector using
cur.execute("select last_insert_id();")
print(cur.fetchall())
# gives
# [(<some_number>,)]
So, is there a way, how to retrieve all (or at least the minimal and maximal) ids which were assigned to the data imported using the LOAD DATA LOCAL INFILE... statement as mentioned above?
If you need to remember the source of each record from the table then you better store the information in a field.
I would add a new field (src) of type TINYINT to the table and store the ID of the source (1 for file_1, 2 for file_2 a.s.o.). I assume there won't be more than 255 sources; otherwise use SHORTINT for its type.
Then, when you need to update the records imported from file_2 you have two options:
delete all the records having src = 2 then load the new records from file into the table; this is not quite an update, it is a replacement;
load the new records from file into a new table then copy from it the values you need to update the existing records.
Option #1
Deletion is an easy job:
DELETE FROM table_1 WHERE src = 2
Loading the new data and setting the value of src to 2 is also easy (it is explained in the documentation):
LOAD DATA INFILE 'file.txt'
INTO TABLE table_1
(column1, column2, column42) # Put all the columns names here
# in the same order the values appear in the file
SET src = 2 # Set values for other columns too
If there are columns in the file that you don't need then load their values into variables and simply ignore the variables. For example, if the third column from the file doesn't contain useful information you can use:
INTO TABLE table_1 (column1, column2, #unused, column42, ...)
A single variable (I called it #unused but it can have any name) can be used to load data from all the columns you want to ignore.
Option #2
The second option requires the creation of a working table but it's more flexible. It allows updating only some of the rows, based on usual WHERE conditions. However, it can be used only if the records can be identified using the values loaded from the file (with or without the src column).
The working table (let's name it table_w) has the columns you want to load from the file and is created in advance.
When it's the time to update the rows imported from file_2 you do something like this:
truncate the working table (just to be sure it doesn't contain any leftovers from a previous import);
load the data from file into the working table;
join the working table and table_1 and update the records of table_1 as needed;
truncate the working table (cleanup of the current import).
The code:
# 1
TRUNCATE table_w;
# 2
LOAD DATA INFILE 'file.txt'
INTO TABLE table_w
(column_1, column_2, column 42); # etc
# 3
UPDATE table_1 t
INNER JOIN table_w w
ON t.column_1 = w.column_1
# AND t.src = 2 # only if column_1 is not enough
SET t.column_2 = w.column_2,
t.column_42 = w.column_42
# WHERE ... you can add extra conditions here, if needed
# 4
TRUNCATE TABLE table_w

How to do something like SELECT "all columns except .."?

I have a MySQL table with data in it, called current. I import new data into a table called temp. Both these tables have auto_increment ID columns.
The table structure is not known in advance for the data import (there are various file structures that I need to import), event though the structure of current and temp will be the same.
Because of the unknown column configuration of the import files (tables created on the fly for each different file configuration), I cannot select specific columns, hence I would have to select all columns, less the ID column from table temp and import the result into table current.
I need to import into temp first, as the files can be large, and I need to do processing on the data before saving into the database, so I do not want to do any operations on the current table before I have imported the separate file first.
The ID column from the temp table prevents the insert into the current table due to duplicate key.
So I need something like this:
INSERT INTO `current`
(SELECT **ALL COLUMNS EXCEPT ID** FROM `temp`)
Any ideas on how to write the section ALL COLUMNS EXCEPT ID? Is this even possible?
There's no * except foo. You'll have to list all of the columns, except the ones you don't want.
SELECT field1, field2, ..., fieldN ...
You could do it via dynamic scripting, e.g. query information_schema for the field names, build up the field list as a string, prepare that string as query, execute it, etc...

MySQL read parameter from file for select statement

I have a select query as follows:
select * from abc where id IN ($list);
The problem with the length of variable $list, it may have a length of 4000-5000 characters, as a result of which the length of actually executed query increases and its get pretty slow.
Is there a method to store the values of $list in a file and ask MySQL to read it from that file, similar to LOAD DATA INFILE 'file_name' for insertion into table?
Yes, you can (TM)!
First step: Use CREATE TEMPORARY TABLE temp (id ... PRIMARY KEY) and LOAD DATA INFILE ... to create and fill a temporary table holding your value list
Second step: Run SELECT abc.id FROM abc INNER JOIN temp ON abc.id=temp.id
I have the strong impression this only checks out as a win, if you use the same value list quite a few times.

MySQL: reorder rows from file association

A MySQL photo gallery script requires that I provide the display order of my gallery by pairing each image title to a number representing the desired order.
I have a list of correctly ordered data called pairs_list.txt that looks like this:
# title correct data in list
-- -------
1 kmmal
2 bub14
3 ili2
4 sver2
5 ell5
6 ello1
...
So, the kimmals image will be displayed first, then the bub14 image, etc.
My MySQL table called title_order has the same titles above, but they are not paired with the right numbers:
# title bad data in MySQL
-- -------
14 kmmal
100 bub14
31 ili2
47 sver2
32 ell5
1 ello1
...
How can I make a MySQL script that will look at the correct number-title pairings from pairs_list.txt and go through each row of title_order, replacing each row with the correct number? In other words, how can I make the order of the MySQL table look like that of the text file?
In pseudo-code, it might look like something like this:
Get MySQL row title
Search pair_list.txt for this title
Get the correct number-title pair in list
Replace the MySQL number with the correct number
Repeat for all rows
Thank you for any help!
if this is not a one time task but will be frequently called function, then maybe you can have the following scenario:
create a temp table, insert all the values from pairs_list.txt into this temp table using mysql load data infile function.
create a procedure (or a insert trigger maybe?) on that temp table which would update your main table according to whatever inserted.
in that procedure (or a insert trigger), I would have a cursor getting all values from temp table and for each value from that cursor update the selected title on your main table.
delete all from that temp table
I'd suggest you to do this simple way -
1 Remove all primary and unique keys from the title_order table, and create unique index (or primary key) on title field -
ALTER TABLE title_order
ADD UNIQUE INDEX UK_title_order_title (title);
2 Use LOAD DATA INFILE with REPLACE option to load data from the file and replace -
LOAD DATA INFILE 'pairs_list.txt'
REPLACE
INTO TABLE title_order
FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\r\n'
IGNORE 2 LINES
(#col1, #col2)
SET order_number_field = #col1, title = TRIM(#col2);
...specify properties you need in LOAD DATA INFILE command.