how to map column names in a hive table and replace it with new values in hive table - mysql

I have a csv data as below where data comes every 10mins in the following format. I need to insert this data into hive by mapping column names with different column names. (columns don't come in constant order they change their order, we have total 10 columns sometimes we miss many columns like one example below below)
sample csv file :-
1 2 6 4
u f b h
a f r m
q r b c
now while inserting into hive i need to replace column names
for example
1 -> NBR
2 -> GMB
3 -> GSB
4 -> KTC
5 -> VRV
6 -> AMB
now I need to insert into hive table as below
NBR GMB GSB KTC VRV AMB
u f NULL h NULL b
a f NULL m NULL r
can anyone help me with this how to insert this values into hive

Assuming you can get column headers in you source CSV, you will need to map them from source number to their column names.
sed -i 's/1/NBR/g; s/2/GMB/g; s/3/GSB/g; s/4/KTC/g; s/5/VRV/g; s/6/AMB/g;...;...;...;...' input.csv
Since you only get an unknown subset of the total columns in your hive table, you will need to translate your CSV from
NBR,GMB,AMB,KTC
u,f,b,h
a,f,r,m
q,r,b,c
to
NBR,GMB,GSB,KTC,VRV,AMB,...,...,...,...
u,f,null,b,null,h,null,null,null,null
a,f,null,r,null,m,null,null,null,null
q,r,null,b,null,c,null,null,null,null
in order to properly insert them into your table.
From the Apache Wiki:
Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal
Using LOAD DATA INPATH, even with the tblproperties("skip.header.line.count"="1") set, still requires a valid SQL literal for all columns in the table. This is why youre missing columns.
If you can not get the producer of the CSV to create a file with 1,2,...9,10 columns in order with your table columns and either consecutive commas or a null character in the data, write some kind of script to add missing column names, in the order you need them in, and the required null values in the data.

If you will have header in csv like 1,2,3,4 (as you wrote in the comment), you could use the next syntax:
insert into table (columns where you want to insert) select 1,2,3,4 (columns) from csv_table;
So, if you could know the order of csv columns, you could write easily the insert, naming only the column that you need to populate, no matter the order in the target table.
Before you could run the above insert, you should create a table that reads from csv!

Related

SSIS Lookup Transformation No Match Output Only Populates Null

I am trying to use the lookup transformation but can not seem to get the functionality out of it that I need. I have two tables that are the exact same structure
Temp Table (input): Smaller table but may have entries that do not exist in other table
Reference Lookup Table: Larger table that may not have identical entries to Temp Table.
I am trying to compare the entries of the Temp Table to the entries of the Reference Lookup Table. Anything that exists in the Temp Table, but not the Lookup should be output to a separate table (No match output).
It is a very simple Data Flow, but it does not seem to accomplish the lookup properly. It will find "No Match" rows, but the "no match" table is populated with null values for every column. I am trying to figure out why the data is losing its values?
How the Lookup is setup:
The data in temp table is what drives your data flow. 151 rows flowed out of it.
Your lookup is going to match based on whatever criteria you specify and you've identified that if there is no match, I want to push the no-match data into a table.
Since the lookup task cannot add columns to the no-match output path, this would imply your source (temp table) started NULL across the board.
Drop a data viewer/data tap onto the data flow between the lookup and the destination and then compare that data to your source. I suspect you're going to discover that the process that populated Temp table is at fault.
In the Lookup Transformation, in the columns tab you have identified that you want to use the value from the reference table to replace the value from the source.
Which works great until you get a no-match. In which case, the component is going to do the non-intuitive (even to me with 15+ years of working with it) action of update that column whether it matches or not.
Source query
SELECT 21 AS tipID, NULL AS tipYear
UNION ALL SELECT 22, 2020
UNION ALL SELECT 64263810, 2020
This adds three rows to my data flow, the first with no tipYear and the next two rows with a year of 2020. Stamp of 1 in the below image
Lookup query
SELECT
*
FROM
(
values (20, 1111), (21, 2021), (22, 2022)
)D(tipID, tipYear)
This reference data will supply a year for all the matches (21 and 22). In the matched path, we'll see 21 supplied with a value and 22 will have its year updated. Stamp 2 in the image
For id 64263810 however, no match will be found and we'll see the initial value of 2020 replaced with the matching row aka NULL. Stamp 3
Lessons learned. If you need to use the data from the reference table but have a no-match output path, do not replace column in the lookup transformation (unless your intention is to wipe out data)

Can I create a mapping from interger values in a column to the text values they represent in sql?

I have a table full of traffic accident data with column headers such as 'Vehicle_Manoeuvre' which contains integers for example 13 represents the vehicle manoeuvre which caused the accident was 'overtaking moving vehicle'.
I know the mappings from integers to text as I have a (quite large) excel file with this data.
An example of what I want to know is percentage of the accidents involved this type of manoeuvre but I don't want to have to open the excel file and find the mappings of integers to text every time I write a query.
I could manually change the integers of all the columns (write query with all the possible mappings of each column, add them as new column, then delete the orginial columns) but this sould take a long time.
Is it possible to create some type of variable (like an array with first column as integers and second column with the mapped text) that SQL could use to understand how text relates to the integers allowing me to write a query below:
SELECT COUNT(Vehicle_Manoeuvre) FROM traffictable WHERE Vehicle_Manoeuvre='overtaking moving vehicle';
rather than:
SELECT COUNT(Vehicle_Manoeuvre) FROM traffictable WHERE Vehicle_Manoeuvre=13;
even though the data in the table is still in integer form?
You would do this with a Maneeuvres reference table:
create table Manoeuvres (
ManoeuvreId int primary key,
Name varchar(255) unique
);
insert into Manoeuvres(ManoeuvreId, Name)
values (13, 'Overtaking');
You might even have such a table already, if you know that 13 has a special meaning.
Then use a join:
SELECT COUNT(*)
FROM traffictable tt JOIN
Manoeuvres m
ON tt.Vehicle_Manoeuvre = m.ManoeuvreId
WHERE m.name = 'Overtaking';

SSIS Lookup data update

I have created a SSIS Package that reads data from a CSV file and loads into table1 . the other data flow tasks does a look up on table 1 .Table1 has columns x , y, z, a ,b . Table 2 has columns a , b ,y,z Lookup is done based on columns y and z . Based on the column y and z , it is picking up a and b from table 1 and updating table 2 . The problem is the data gets updated but i get multiple rows of data thats is one without updation and one after updation .
I can provide more clear explanation if needed .
Fleshing out Nick's suggestion, I would get rid of your second data flow (the one from Table 2 to Table 2).
After the first Dataflow that populates table 1, then just do an EXECUTE SQL task that performs an UPDATE on Table 2, and joins to Table 1 to get the new data.
EDIT in response to comment:
You need to use a WHERE clause that will match rows uniquely. Apparently Model_Cd is not a UNIQUE column in JLRMODEL_DIMS. If you cannot make the WHERE clause unique because of the relationship between the two tables, then you need to select either an aggregate [Length (cm)] like MIN(), MAX() etc, or you need to use TOP 1, so that you only get one row from the subquery.

loading 1 field from CSV into multiple columns in Oracle

I am trying to insert 1 column from CSV into 2 different oracle columns. but it looks like SQL Loader looks at least n fields from CSV to load n columns in oracle and my CTL script does not work for loading n field from CSV to n+1 column in Oracle where I am trying to load one of the field into 2 different oracle columns. Plz advise
Sample data file is:
id,name,imei,flag
1,aaa,123456,Y
my oracle table has below column
create table samp (
id number,
name varchar2(10),
imei varchar2(10),
tac varchar2(3),
flag varchar2(1) )
i need to load the imei from csv onto imei in Oracle Table and substr(imei,1,3) into tac Oracle column
my Control file is:
OPTIONS (SKIP=1)
load data
infile 'xxx.csv'
badfile 'xxx.bad'
into table yyyy
fields terminated by ","
TRAILING NULLCOLS
( id,name,imei,tac "substr(:imei,1,3)", flag)
Error from the log file:
Record 1: Rejected - Error on table yyyy, column flag
Column not found before end of logical record (use TRAILING NULLCOLS)
Ok, keep in mind the control file matches the input data by field in the order listed, THEN the name as defined is used to match to the table column.
The trick is to call the FIELD you need to use twice by something other than an actual column name, like imei_tmp and define it as BOUNDFILLER which means use it like a FILLER (don't load it) but remember it for future use. After the flag field, there are no more data fields so SQLLDR will try to match using the column names.
This is untested, but should get you started (the call to TRIM( ) may not be needed):
...
( id,
name,
imei_tmp BOUNDFILLER,
flag,
imei "trim(:imei_tmp)",
tac "substr(:imei_tmp,1,3)"
)

MySQL insert into table not inserting all fields

I have 2 existing tables in a MySql DB. The tables have identical structures. I want to copy data from table to another.
insert into `Table1`
select * from Table2
where department = "engineering"
the above code seemed to work and it copied the data correctly except for 1 column. The "department" column did not copy over so it was blank. All the other fields seemed to copy over correctly for all of the records.
What can be causing this? As I mentioned both tables have identical structures, same number of columns and everything...
Any ideas?
Note:I just realized that there are actually 2 columns that are not copying over. The "department" and "Category" fields come over blank. So basically when I am inserting the data from table 2 into table 1, 12 out of 14 columns are successfully copied over but then there are 2 columns that remain blank.
Below is the DESCRIBE of Table1 and Table2
The only difference I can see when I do a Describe on both tables is that the 2 fields in question have a data type of enum (.....) but they have differences in between the parenthesis. Could this be causing the issue and if so is there a simple way around it? I'm thinking I might have to do an update query after I do the initial insert that will bring in the "department" and "category" fields from table 2 into table 1 by joining in the ID field.
From the docs:
If you insert an invalid value into an ENUM (that is, a string not
present in the list of permitted values), the empty string is
inserted instead as a special error value.
Read about ENUM.