how to load field data that exceed 4000 characters by sqlldr .ctl with out changing table column datatype from nvarchar2(2000) to nclob in table - sql-loader

i am trying to load a data through sqlldr to staging table which have more than 4000 characters. i don't want to change datatype from my existing nvarchar(2000) in staging table. how to to load data by removing some received data in that column in .ctl file to load into staging table?
my control file
load data
CHARACTERSET UTF8
APPEND INTO TABLE STAGING
FIELDS TERMINATED BY '\t'
TRAILING NULLCOLS
(uid,
linked char(4000))
and column datatypes in the table:
uid not null number(12)
linked NVARCHAR(2000)
Record 1: Rejected - Error on table STAGING, column LINKED.
Field in data file exceeds maximum length
Data file: (second column facing problem starting with 242357)
22 242357, 242359, 242375, 242376, 242395, 242421, 242422, 242423, 242424, 242425, 242426, 242427, 242428, 242429, 242431, 242432, 242433, 242434, 242435, 242436, 242437, 242438, 242439, 242441, 242442, 242443, 242445, 242446, 242447, 242448, 242449, 242451, 242452, 242453, 242454, 242455, 242456, 242457, 242458, 242462, 242463, 242464, 242465, 242466, 242467, 242468, 24247, 242524, 242525, 242533, 242535, 242544, 242551, 242552, 242553, 242554, 242556, 242557, 242558, 242559, 242565, 242577, 242636, 242646, 242727 ...... so on

Related

Athena create table from CSV file (S3 bucket) with semicolon

I'm trying to create a table with a S3 bucket which has an CSV file. Because of the regional settings the CSV has semicolon as a seperator and one row even contains commas.
Input CSV file:
Name;Phone;CRM;Desk;Rol
First Name;f.name;Name, First;IT;Inbel
First2 Name2;f2.name2;Name2, First2;IT;Inbel
First3 Name3;f3.name3;Name3, First3;IT;Inbel
First4 Name4;f4.name4;Name4, First4;IT;Inbel
Athena query:
CREATE EXTERNAL TABLE IF NOT EXISTS `a`.`test` (
`Name` string,
`Phone` string,
`CRM` string,
`Desk` string,
`Rol` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://***/test/'
TBLPROPERTIES ('has_encrypted_data'='false');
The output comes out as:
Name;Phone;CRM;Desk;Rol
First Name;f.name;Name First;IT;Inbel
First2 Name2;f2.name2;Name2 First2;IT;Inbel
First3 Name3;f3.name3;Name3 First3;IT;Inbel
First4 Name4;f4.name4;Name4 First4;IT;Inbel
I tried scanning the web for solutions (especially for the seperator), but nothing seems to work. I don't want to change regional settings and would love to keep the input file as is. Also if someone knows the solution for the CRM column it would be a bonus!

How to handle spaces in filed names in a json file while creating Athena table [duplicate]

This question already has answers here:
Column name with a Space - Athena
(3 answers)
Closed 1 year ago.
My Json file has field names with spaces in it like "Customer ID".The json file sits in the S3 bucket .So,when I try creating an Athena table on this json file ,it throws me error as the field names has spaces.it loads fine when the fields with spaces are removed while loading.How do I handle this situation so the entire data gets properly loaded.
if you have a chance convert your json file to csv
you can try something like :
CREATE EXTERNAL TABLE IF NOT EXISTS db.table_name (
.......
`Customer_ID` int ,
   .......
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',    
'escapeChar' = '\\' )
STORED AS TEXTFILE LOCATION 's3://location'
TBLPROPERTIES ('skip.header.line.count'='1')
main thought is  - TBLPROPERTIES ('skip.header.line.count'='1') here you skip header in you .csv file and set your custom column name

How to load data in MySQL from txt file which has both field name and value in a row?

I am a newbie in MySQL but not in programming. I am trying to load a txt file with following data in MySQL:
{"id": "5bf573b81c5a1dcdd96ec666", "DisplayName": "Eureka", "NormalizedName": "eureka", "JournalId": "465895"}
{"id": "5bf573b81c5a1dcdd96ec667", "DisplayName": "The Artist and Journal of Home Culture", "NormalizedName": "the artist and journal of home culture", "JournalId": "1137746"}
{"id": "5bf573b81c5a1dcdd96ec668", "DisplayName": "Cumberland Law Review", "NormalizedName": "cumberland law review", "JournalId": "2978343"}
Each line is in second brackets, containing column names and quoted value separated by semicolon. I would like to have a table with these column names tabulating their respective values without any punctuation mark. For two column data file, I did it manually with the following code:
DROP TABLE IF EXISTS venue_linking_pairs;
CREATE TABLE venue_linking_pairs (mid VARCHAR(40), aid VARCHAR(40)); -- optimise the field length
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/venue_linking_pairs.txt'
INTO TABLE venue_linking_pairs
FIELDS terminated by ','
(mid, aid)
SET
mid = SUBSTRING(mid, 10, 24) ,
aid = SUBSTRING(aid, 10, 24) ;
SELECT * FROM venue_linking_pairs ORDER BY mid LIMIT 10;
I would like do it more automatically e.g. setting the number of the table columns in MySQL equal to the txt data file, making their name same to the txt file one, removing the punctuation mark automatically without manually check the characters and use the substring function. I would be grateful for your help.

Neo4j CSV import datatype error

I am trying to import genealogy data into Neo4j using a CSV file. The dates are strings such as 2012 or 19860105). However, when importing, Neo4j interprets them as LongValue, creating an error.
My import statement is either
LOAD CSV WITH HEADERS FROM 'file:///Neo4jPersonNodes1.csv' AS line FIELDTERMINATOR '|'
CREATE (:Person{RN: toInteger(line[0]),fullname: line[1],surname: line[2],name: line[3],sex: line[4],union_id: toInteger(line[5]),mn: line[6],BD: line[7],BDGed: line[8],DD: line[9],DDGed: line[10],bp_id: toInteger(line[11]),dp_id: toInteger(line[12]),BP: line[13],DP: line[14],kit: line[15]})
or, adding the toString() function
LOAD CSV WITH HEADERS FROM 'file:///Neo4jPersonNodes1.csv' AS line FIELDTERMINATOR '|'
CREATE (:Person{RN: toInteger(line[0]),fullname: toString(line[1]),surname: toString(line[2]),name: toString(line[3]),sex: toString(line[4]),union_id: toInteger(line[5]),mn: toString(line[6]),BD: toString(line[7]),BDGed: toString(line[8]),DD: toString(line[9]),DDGed: toString(line[10]),bp_id: toInteger(line[11]),dp_id: toInteger(line[12]),BP: toString(line[13]),DP: toString(line[14]),kit: toString(line[15])})
A sample of the CSV is
"RN"|"fullname"|"surname"|"name"|"sex"|"union_id"|"mn"|"BD"|"BDGed"|"DD"|"DDGed"|"bp_id"|"dp_id"|"BP"|"DP"|"kit"
"5"|"Ale Harmens Slump"|"Slump"|"Ale Harmens"|"M"|"313"|"3"|"18891223"|"23 Dec 1889"|"19890111"|"11 Jan 1989"|"23"|"4552"|"Echten, Friesland, Neth."|"Sebastopol, California"|""
the error message is:
Neo4j.Driver.V1.ClientException: 'Error when pulling unconsumed
session.run records into memory in session: Expected Long(7) to be a
org.neo4j.values.storable.TextValue, but it was a
org.neo4j.values.storable.LongValue'
I'm not sure why Neo4j does not treat the numeric string as a string.
Since your CSV file has a header row (and specified WITH HEADERS), your Cypher code must treat line as a map (whose property names match all your header names) instead of as an array.
For example, instead of line[0], you must use line.RN. If you fix all the uses of line accordingly, you should no longer get such errors.

AWS Glue Crawler for JSONB column in PostgreSQL RDS

I've created a crawler that looks at a PostgreSQL 9.6 RDS table with a JSONB column but the crawler identifies the column type as "string". When I then try to create a job that loads data from a JSON file on S3 into the RDS table I get an error.
How can I map a JSON file source to a JSONB target column?
It's not quite a direct copy, but an approach that has worked for me is to define the column on the target table as TEXT. After the Glue job populates the field, I then convert it to JSONB. For example:
alter table postgres_table
alter column column_with_json set data type jsonb using column_with_json::jsonb;
Note the use of the cast for the existing text data. Without that, the alter column would fail.
Crawler will identify JSONB column type as "string" but you can try to use Unbox Class in Glue to convert this column to json
let's check the following table in PostgreSQL
create table persons (id integer, person_data jsonb, creation_date timestamp )
There is an example of one record from person table
ID = 1
PERSON_DATA = {
"firstName": "Sergii",
"age": 99,
"email":"Test#test.com"
}
CREATION_DATE = 2021-04-15 00:18:06
The following code need to be added in Glue
# 1. create dynamic frame from catalog
df_persons = glueContext.create_dynamic_frame.from_catalog(database = "testdb", table_name = "persons", transformation_ctx = "df_persons ")
# 2.in path you need to add your jsonb column name that need to be converted to json
df_persons_json = Unbox.apply(frame = df_persons , path = "person_data", format="json")
# 3. converting from dynamic frame to data frame
datf_persons_json = df_persons_json.toDF()
# 4. after that you can process this column as a json datatype or create dataframe with all necessary columns , each json data element can be added as a separate column in dataframe :
final_df_person = datf_persons_json.select("id","person_data.age","person_data.firstName","creation_date")
You can also check the following link:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-Unbox.html