How to read a file which is having delimiter in the field values enclosed with double quotes in AbInitio - ab-initio

I have a .dat file which is delimited by a pipe (|). Now my columns can have | in the data. I am facing issues while reading this file and loading it column by column.
DML used:
record
string("|") col1,
string("|") col2,
string("|") col3,
string("|") col4,
end
Source value:
"Col1"|"col2"|"col3"|"col4"
"units of the price | currency used" | "ABC" | "20210831" | ""
So col1 = units of the price | currency used, col2 = ABC , col3 = 20210831 , col4 = null
As per my dml it is breaking first col in 2 and hence failing.
How can I read the file and load it with the correct values.

You could define the delimiter as string(\"|"\") i.e. use "|" as the delimiter instead of only | - you will end up with an extra " at the beginning and end of the record, but that is easily removed later

Please use "READ SEPARATED VALUES" component to achieve this.

Related

Load data into hive table from CSV containing dictionary

I've a CSV file and Im trying to put it in a hive table.
The CSV contains dictionary for some columns.
The file looks like this :
a,b,{'c':'d','e':'f'},g
So the table should look like :
| col1 | col2 | col3 | col4|
| -------- | -------------- | ---- | --- |
| a | c | {'c':'d','e':'f'} | g |
But it picks up the comma inside parenthesis.
How do I ignore the comma inside the parenthesis.
Im using this to write the hive table.
create external table mytable(
col1 string,
col2 string,
col3 string,
col4 string
)
row format delimited fields terminated by ',' stored as textfile location '/user/myuser/mydir/';
If you can enclose your columns with double quotes, you can use OpenCSVSerde to load data correctly. Most csv generation facility should be able to do this.
Your file should look like below-
"a","b","{'c':'d','e':'f'}","g"
your script should be like this -
col1 STRING,
col2 STRING,
col3 STRING,
col4 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "\""
)
LOCATION '/your/folder/location/';
You can check this linke https://cwiki.apache.org/confluence/display/Hive/CSV+Serde. Now, this process will work perfectly for string columns, for other data type, you need to convert them in next step.

MySQL Error Code: 1262. Row x was truncated; it contained more data than there were input columns

I need to load file content into a table. The file contains text separated by commas. It is a very large file. I can not change the file it is already given to me like this.
12.com,128.15.8.6,TEXT1,no1,['128.15.8.6']
23com,122.14.10.7,TEXT2,no2,['122.14.10.7']
45.com,91.33.10.4,TEXT3,no3,['91.33.10.4']
67.com,88.22.88.8,TEXT4,no4,['88.22.88.8', '5.112.1.10']
I need to load the file into a table of four columns. So for example, the last row above should be in the table as follows:
table.col1: 67.com
table.col2: 88.22.88.8
table.col3: TEXT3
table.col4: no3
table.col5: ['88.22.88.8', '5.112.1.10']
Using MySQL workbench, I created a table with five columns all are of type varchar. Then I run the following SQL command:
LOAD DATA INFILE '/var/lib/mysql-files/myfile.txt'
INTO TABLE `mytable`.`myscheme`
fields terminated BY ','
The last column string (which contains commas that I do not want to separate) causes an issue.
Error:
Error Code: 1262. Row 4 was truncated; it contained more data than there were input columns
How can I overcome this problem please.
Not that difficult simply using load data infile - note the use of a variable.
drop table if exists t;
create table t(col1 varchar(20),col2 varchar(20), col3 varchar(20), col4 varchar(20),col5 varchar(100));
truncate table t;
load data infile 'test.csv' into table t LINES TERMINATED BY '\r\n' (#var1)
set col1 = substring_index(#var1,',',1),
col2 = substring_index(substring_index(#var1,',',2),',',-1),
col3 = substring_index(substring_index(#var1,',',3),',',-1),
col4 = substring_index(substring_index(#var1,',',4),',',-1),
col5 = concat('[',(substring_index(#var1,'[',-1)))
;
select * from t;
+--------+-------------+-------+------+------------------------------+
| col1 | col2 | col3 | col4 | col5 |
+--------+-------------+-------+------+------------------------------+
| 12.com | 128.15.8.6 | TEXT1 | no1 | ['128.15.8.6'] |
| 23com | 122.14.10.7 | TEXT2 | no2 | ['122.14.10.7'] |
| 45.com | 91.33.10.4 | TEXT3 | no3 | ['91.33.10.4'] |
| 67.com | 88.22.88.8 | TEXT4 | no4 | ['88.22.88.8', '5.112.1.10'] |
+--------+-------------+-------+------+------------------------------+
4 rows in set (0.00 sec)
In this case for avoid the problem related with the improper presence of comma you could import the rows .. in single column table .. (of type TEXT on Medimun TEXT ..as you need)
ther using locate (one for 1st comma , one for 2nd, one for 3th .. ) and substring you could extract form each rows the four columns you need
and last with a insert select you could populate the destination table .. separating the columns as you need ..
This is too long for a comment.
You have a horrible data format in your CSV file. I think you should regenerate the file.
MySQL has facilities to help you handle this data, particularly the OPTIONALLY ENCLOSED BY option in LOAD DATA INFILE. The only caveat is that this allows one escape character rather than two.
My first suggestion would be to replace the field separates with another character -- tab or | come to mind. Any character that is not used for values within a field.
The second is to use a double quote for OPTIONALLY ENCLOSED BY. Then replace '[' with '"[' and ] with ']"' in the data file. Even if you cannot regenerate the file, you can pre-process it using something like grep or pearl or python to make this simple substitution.
Then you can use the import facilities for MySQL to load the file.

update string through part of string in case of appearance special event in string via MYSQL

I have a table with a varchar field that contains a description with variable lenght. I want to update it in the form shown below (delete the part after 20161203_ and before LC (1001_) in the first case) in case of appearance LC in string.
For example if the table contained:
|col1 |
+-----+
|20161512_NL_Luxus_1_DE |
|20161217_1001_LC_YoBirthdayNo_A_CH |
|20161512_NL_SDT_4_DE|
|20170117_2003_LC_YoBirthdayYes_A_DE |
I want a query that will return:
|result|
+------+
|20161512_NL_Luxus_1_DE|
|20161217_LC_YoBirthdayNo_A_CH|
|20161512_NL_SDT_4_DE|
|20170117_LC_YoBirthdayYes_A_DE |
I tried it like:
UPDATE table1 SET col1 = CASE WHEN col1 LIKE '%LC%' THEN SUBSTRING_INDEX(SUBSTRING_INDEX(col1, '_', 1), '_', 2)
but that´s not correct...
Thanks in advance!
I also first attempted to do this using SUBSTRING_INDEX, but then gave up when it appeared that it only supports single character delimiters. My fallback solution is just to use a combination of INSTR and SUBSTRING. In the query below, I concatenate together the updated col1, in the process splicing out the four digit number which you want to remove.
UPDATE table1
SET col1 = CONCAT(SUBSTRING(col1, 1, 8), -- keep 8 digit date
SUBSTRING(col1, INSTR(col1, '_LC_'))) -- and everything from _LC_
WHERE col1 LIKE '%_LC_%' -- onwards

Passing comma delimited parameter to stored procedure in mysql

I have fields in mytable tab1:
col1 | col2 | id
--------------------------
1,2,3 | 2,3,4,5 | a1
6,7,8,9 | 2,9,11 | a2
i want to pass this fields to my stored procedure input like where col1 in ('1,2') and col2 in ('3,4');
but it is not working ..
Something like this should work:
SELECT t.* FROM tab1 t
WHERE 1 IN (t.col1) AND 2 IN (t.col1) ...
AND 3 IN (t.col2) AND 4 IN (t.col2) ...
You would need to build up your query based on the inputs to your stored procedure, but otherwise this should work for you. Post your current proc?
EDIT
The find_in_set function will also work for you, but you will still have to split your inputs and call it once per number passed to the proc (i.e. the first parameter to FIND_IN_SET cannot be a comma separated list). Reference here: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set

mysql/oracle stored math formula

is there any way apply math formula from stored string in Oracle and or MySQL?
col1 | col2 | formula
---------------------
2 | 2 | col1*col2
2 | 3 | col1+col2
SELECT * from tbl
result:
col1 | col2 | formula
---------------------
2 | 2 | 4
2 | 3 | 5
edit: for each row another formula
I think what you're saying is you want to have the database parse the formula string. For example, for Oracle you could
Add a column to the table to contain the result
Run an update statement which would call a PL/SQL function with the values of the columns in the table and the text of the formula
update {table} set formula_result = fn_calc_result (col1, col2, formula_column);
The PL/SQL function would create a string by replacing the "col1" and "col2" and so forth with the actual values of those columns. You can do that with regular expresions, as long as the formulas are consistently written.
Then use
execute immediate 'select '||{formula}||' from dual' into v_return;
return v_return;
to calculate the result and return it.
Of course, you could also write your own parser. If you decide to go that way, don't forget to handle operation precedence, parentheses, and so forth.
I think you want a virtual column. See here for excellent article on its setup and use.
you may do it via a PL/SQL script that you can trigger automcatically when inserting the data.
See http://en.wikipedia.org/wiki/PL/SQL
PL/SQL is a kind of program that executes in the database itself. It's quite easy to do.