Move data from mysql to oracle with data contain html format use sql loader - mysql

Please help move data from mysql to oracle with data contain html format use sqlloader.
I has export data mysql to file csv.
sample data csv :
±14044±©±1±©±1±©±1±©±MailManager Attachment±©±image001.gif±©±6416-01-11 11:30:06±©±6416-01-11 11:30:06±©±null±©±null±©±0±©±1±©±0±©±null±
±14045±©±1±©±1±©±1±©±MailManager Attachment±©±image002.jpg±©±6416-01-11 11:30:06±©±6416-01-11 11:30:06±©±null±©±null±©±0±©±1±©±0±©±null±
±14046±©±1±©±1±©±1±©±Emails±©±"
<p>"
 </p>"
<p style=""margin:0;padding:0;"">"
On 02-20-2014 13:26:49, crmtelesales#fecredit.com.vn, wrote:</p>"
<blockquote style=""border:0;margin:0;border-left:1px solid #808080;padding:0 0 0 2px;"">"
<div style=""font-size:13px;font-family:tahoma;color:rgb(0,0,0);font-weight:normal;font-style:normal;background-image:none;background-attachment:scroll;background-position:0% 0%;"">"
do not reply</div>"
<br />"
 </blockquote>"
<br />±©±2014-03-03 10:11:39±©±2014-03-03 10:11:39±©±null±©±null±©±0±©±1±©±0±©±Re: tests±
My control file
LOAD DATA
INFILE '/home/ggt/csv/vtiger_crmentity.csv'
TRUNCATE INTO TABLE DWVTIGER.VTIGER_CRMENTITY
FIELDS TERMINATED BY "," ENCLOSED BY '|'
TRAILING NULLCOLS
(
CRMID ,
SMCREATORID ,
SMOWNERID ,
MODIFIEDBY ,
SETYPE ,
DESCRIPTION NULLIF DESCRIPTION='null',
CREATEDTIME date "yyyy-mm-dd hh24:mi:ss" ,
MODIFIEDTIME date "yyyy-mm-dd hh24:mi:ss" ,
VIEWEDTIME date "yyyy-mm-dd hh24:mi:ss" NULLIF VIEWEDTIME='null',
STATUS NULLIF STATUS='null',
VERSION ,
PRESENCE NULLIF PRESENCE='null',
DELETED ,
LABEL NULLIF LABEL='null'
)

SQL*Loader cannot load data that is in HTML format. You will need to get the data out of the page and make a true CSV before SQL Loader can load it. It's made to load data in flat files, where each row is a record and each record is alike (basically).
You need to read up on this: https://docs.oracle.com/cd/B28359_01/server.111/b28319/ldr_concepts.htm#g1013706
and this: https://docs.oracle.com/cd/B28359_01/server.111/b28319/ldr_control_file.htm#i1006645

Related

Athena create table from CSV file (S3 bucket) with semicolon

I'm trying to create a table with a S3 bucket which has an CSV file. Because of the regional settings the CSV has semicolon as a seperator and one row even contains commas.
Input CSV file:
Name;Phone;CRM;Desk;Rol
First Name;f.name;Name, First;IT;Inbel
First2 Name2;f2.name2;Name2, First2;IT;Inbel
First3 Name3;f3.name3;Name3, First3;IT;Inbel
First4 Name4;f4.name4;Name4, First4;IT;Inbel
Athena query:
CREATE EXTERNAL TABLE IF NOT EXISTS `a`.`test` (
`Name` string,
`Phone` string,
`CRM` string,
`Desk` string,
`Rol` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://***/test/'
TBLPROPERTIES ('has_encrypted_data'='false');
The output comes out as:
Name;Phone;CRM;Desk;Rol
First Name;f.name;Name First;IT;Inbel
First2 Name2;f2.name2;Name2 First2;IT;Inbel
First3 Name3;f3.name3;Name3 First3;IT;Inbel
First4 Name4;f4.name4;Name4 First4;IT;Inbel
I tried scanning the web for solutions (especially for the seperator), but nothing seems to work. I don't want to change regional settings and would love to keep the input file as is. Also if someone knows the solution for the CRM column it would be a bonus!

how to load field data that exceed 4000 characters by sqlldr .ctl with out changing table column datatype from nvarchar2(2000) to nclob in table

i am trying to load a data through sqlldr to staging table which have more than 4000 characters. i don't want to change datatype from my existing nvarchar(2000) in staging table. how to to load data by removing some received data in that column in .ctl file to load into staging table?
my control file
load data
CHARACTERSET UTF8
APPEND INTO TABLE STAGING
FIELDS TERMINATED BY '\t'
TRAILING NULLCOLS
(uid,
linked char(4000))
and column datatypes in the table:
uid not null number(12)
linked NVARCHAR(2000)
Record 1: Rejected - Error on table STAGING, column LINKED.
Field in data file exceeds maximum length
Data file: (second column facing problem starting with 242357)
22 242357, 242359, 242375, 242376, 242395, 242421, 242422, 242423, 242424, 242425, 242426, 242427, 242428, 242429, 242431, 242432, 242433, 242434, 242435, 242436, 242437, 242438, 242439, 242441, 242442, 242443, 242445, 242446, 242447, 242448, 242449, 242451, 242452, 242453, 242454, 242455, 242456, 242457, 242458, 242462, 242463, 242464, 242465, 242466, 242467, 242468, 24247, 242524, 242525, 242533, 242535, 242544, 242551, 242552, 242553, 242554, 242556, 242557, 242558, 242559, 242565, 242577, 242636, 242646, 242727 ...... so on

Powershell - amend date format to values in CSV column

I'm looking for a PowerShell script for the following ...
I have a CSV spread sheet which automatically downloads from a website, the 2nd column of the spread sheet is a date value in the UK format dd/mm/yyyy. I'd like a script that will change the date format of all values in column 2 to the US date format yyyy/mm/dd.
I'm importing the CSV into a MySQL database using LOAD DATA INFILE and at present the dates are going in 2017/02/15 when they should be 2015/02/17.
An example of the CSV file format is as follows ...
Col1,Col2,Col3,Col4
Value1,17/02/15,Value3,Value4
Value1,18/02/15,Value3,Value4
Value1,19/02/15,Value3,Value4
I need it to become ...
Col1,Col2,Col3,Col4
Value1,2015-02-17,Value3,Value4
Value1,2015-02-18,Value3,Value4
Value1,2015-02-19,Value3,Value4
one way could be (will replace all date in UK format to US format, regardless of the column):
(get-content c:\file.csv) -replace "(\d{2})\/(\d{2})\/(\d{4})", '$3/$2/$1'
if you want to save the result pipe to set-content :
(get-content c:\file.csv) -replace "(\d{2})\/(\d{2})\/(\d{4})", '$3/$2/$1' |sc c:\file.csv
I like to use TryParseExact. You can convert many formats and you can find invalid values.
Example:
$dateString = "17/02/2015"
$format = "dd/MM/yyyy"
[ref]$parsedDate = get-date
$parsed = [DateTime]::TryParseExact($dateString, $format,[System.Globalization.CultureInfo]::InvariantCulture,[System.Globalization.DateTimeStyles]::None,$parseddate)
$parsedDate.Value.ToString("yyyy/MM/dd")

hive: create table / data type syntax for comma separated files

The text file is comma separated. However, one of the columns ex: "Issue" with value "Other (phone, health club, etc)" also contains commas.
Question: What should the data type of "Issue" be? And how should I format the table (row format delimited terminated by) so that the comma in the column (issue) is accounted for correctly
I had set it this way:
create table consumercomplaints (ComplaintID int,
Product string,
Subproduct string,
Issue string,
Subissue string,
State string,
ZIPcode int,
Submittedvia string,
Datereceived string,
Datesenttocompany string,
Company string,
Companyresponse string,
Timelyresponse string,
Consumerdisputed string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
location '/user/hive/warehouse/mydb/consumer_complaints.csv';
Sample data --
Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?
943291,Debt collection,,Cont'd attempts collect debt not owed,Debt is not mine,MO,63123,Web,07/18/2014,07/18/2014,"Enhanced Recovery Company, LLC",Closed with non-monetary relief,Yes,
943698,Bank account or service,Checking account,Deposits and withdrawals,,CA,93030,Web,07/18/2014,07/18/2014,U.S. Bancorp,In progress,Yes,
943521,Debt collection,,Cont'd attempts collect debt not owed,Debt is not mine,OH,44116,Web,07/18/2014,07/18/2014,"Vital Solutions, Inc.",Closed with explanation,Yes,
943400,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,MD,21133,Web,07/18/2014,07/18/2014,"The CBE Group, Inc.",Closed with explanation,Yes,
I think you need to format your output data by some control character like Control-A. I don't think there will be any data type to support this. OR you can write a UDF to load the data and take care of formatting in the UDF logic.
Short of writing a serde, you could do 2 things,
escape the comma in the original data before loading, using some character. for e.g. \
and then use the hive create table command using row format delimited fields terminated by ',' escaped by **'\'**
you can use a regex that takes care of the comma enclosed within double quotes,
so first you apply a regex to data as shown in hortonworks/apache manuals,
regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 1) player_id source:https://web.archive.org/web/20171125014202/https://hortonworks.com/tutorial/how-to-process-data-with-apache-hive/
Ensure that you are able to load and see your data using this expression ( barring the enclosed commas).
Then modify the expression to account for enclosed commas. You can do something like this,
String s = "a,\"hi, I am here\",c,d,\"ahoy, mateys\"";
String pattern ="^(?:([^\",]*|\"[^\"]*\"),?){4}";
p = Pattern.compile(pattern);
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("YES-"+m.groupCount());
System.out.println("=>"+m.group(1));
}
by changing {4} to {1}, {2}, ... you can get respective fields.

Export to csv empty string as NULL

I'm trying to export some data from MsSQL to a CSV file using sqsh.
Assuming the SQL statement is SELECT * from [dbo].[searchengines].
The resulting CSV is something like these,
field1,field2,field3,field4,field5
,,"Google",,
,,"Yahoo",,
,,"Altavista",,
,,"Lycos",,
What can I do to make it into something like these :
field1,field2,field3,field4,field5
NULL,NULL,"Google",NULL,NULL
NULL,NULL,"Yahoo",NULL,NULL
NULL,NULL,"Altavista",NULL,NULL
NULL,NULL,"Lycos",NULL,NULL
I basically want to change fields that are empty into NULL.
Any idea?
Unfortunately the empty string in csv output for nullable columns is hard coded in sqsh. See src/dsp_csv.c on line 144 where the call is made:
dsp_col( output, "", 0 );
You could replace it by
dsp_fputs( "NULL", output );
and rebuild sqsh. In the next release I will come up with a more elaborate solution.