I have a ebcdic file, I wanted to select only first 2 columns values in the output in the m_dump command
Output of m_dump values are like below:
[record
id "100"
type_code "20"
frstname "abcd"
lastname "efgh"
new_line "\n"]
or please help me to create a .dat file with delimiter from the output of m_dump command in ab initio
Using m_dump command you can't print limited columns in the output.
But if you know old delimiter of a file, then you can use tr command and then write only required data into temp file, after that use m_dump command.
You can use a RUN PROGRAM component and then drop unwanted fields with REFORMAT connected to the output port of RUN PROGRAM.
Saving the output to the file later would be straightforward with OUTPUT FILE.
Related
I currently have some HIVE code that creates a table where I declare all the column names and type. Then I use "load data inpath" to load a CSV into that table so I can then join to my database tables. The problem lies in that sometimes the CSV columns maybe in a different order. I can not control that as it is sent to me from a different source. I am wondering if there is a way to create the temp table I do daily without declaring the column names and just allow it to read from the CSV. This would allow me to not have to manually review the file every morning to check the columns are in the right order?
Because the column order keeps changing, as a first step you could use a shell script to read the header columns and generate a create table script for the temp table. Thereafter,execute the generated create table string and load the file into the temp table. From temp table, you can then load it to the target table.
A sample bash script to give you an idea.
#!/bin/bash
#File check
if [ -s "$1" ]
then echo "File $1 exists and is non-empty"
else
echo "File $1 does not exist or is empty"
fi
create_tbl_cmd="create table tblname ("
#Get fields from file header
fields=`head -1 $1`
#add string as the datatype for each field
fields="${fields//,/ string,} string)"
create_table_cmd="$create_tbl_cmd$fields"
#echo $create_table_cmd
#Execute the $create_table_cmd using Beeline or Hive command line invoking the necessary command
#Execute the Load into temp table
#Execute the Insert from temp table to target table
Execute the bash script above with the csv file argument as
bash scriptname.sh filename.csv
I cannot quite figure out how to change the format of a column in my data file. I have the data set proc imported, and it guessed the format of a specific column as numeric, I would like to to be character-based.
This is where I'm currently at, and it does not change the format of my NUMBER column:
proc import
datafile = 'datapath'
out = dataname
dbms = CSV
replace
;
format NUMBER $8.
;
guessingrows = 20000
;
run;
You could import the data and then format after using - I believe the following would work.
proc sql;
create table want as
select *,
put(Number, 4.) as CharacterVersion
from data;
quit;
You cannot change the type/format via PROC IMPORT. However, you can write a data step to read in the file and then customize everything. If you're not sure how to start with that, check the log after you run a PROC IMPORT and it will have the 'skeleton' code. You can copy that code, edit it, and run to get what you need. Writing from scratch also works using an INFILE and INPUT statement.
From the help file (search for "Processing Delimited Files in SAS")
If you need to revise your code after the procedure runs, issue the RECALL command (or press F4) to the generated DATA step. At this point, you can add or remove options from the INFILE statement and customize the INFORMAT, FORMAT, and INPUT statements to your data.
Granted, the grammar in this section is horrific! The idea is that the IMPORT Procedure generates source code that can be recalled and modified for subsequent submission.
How do I parse a command line parameter and use it as a variable in the script to be used as filename to be saved as. I have tried the below but it is not working
fname:.z.X[2]
.....
...more code...
....
/Save the table into a csv file
`:(fname,".csv") 0:csv 0: table
You need to always remember the left of right evaluation.
In your case you are trying to write the csv delimited table to (fname,".csv"), which is only a string.
Further you want to use `$ to parse to a symbol (not `:), and use hsym to create a file path (prefix with ":")
bash> q script.q filename
q)(hsym `$ .z.x[0],".csv") 0:csv 0: ([]10?10)
`:filename.csv
I created a shell script such that will create a string that contain the process of table creation for db2 . As in Example:
string=" db2 "CREATE TABLE foo (......... ""
Now my script will connect to the database and input the string which translate to db2 that will create a table .Before shell inputs the string , I enabled on db2 the command
db2 update command options using z on test-database.txt
so that I want to save all the outputs on textfile
However, my problem is I want to for that string to show in the output file created by db2 just like when you are typing in db2 to create a table, but in never shows in the output file. It rather will show the result whether table successfully created or not in test-database.txt , e.g
The SQL command completed successfully.
Is there a way to make the output file show the creation of table ? . Thanks in advance
You are talking about the options for the db2clp, which has many different options.
If I understood, you are writing a script (a bash script, I think so) and you want to retrieve the command output. For this, you have two options
Write the command output into a file, and then read the file.
Redirect the command output to a varaible.
The first option is the easier one. This option uses the z option, that writes the whole output to a file. You can change this behaviour just by printing out what you want, and then redirecting the output to a file.
db2 -tf myfile.sql -z /tmp/output
VAR=$(cat /tmp/output)
The second option is a little tricky, because redirection implies the creation of another shell, and then you should reload the db2 profile. This option uses the v option, that is the standard output, and I hope the output is what you want to have.
VAR=$(. ~db2inst1/sqllib/db2profile ; db2 -tvf myfile.sql)
Finally, you just need to process the content of VAR, via awk, sed, grep, etc.
For more information: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html
I tried the following command to load a CSV file using Pig with the command:
A = LOAD '/USER/XYZ/PIG/FILENAME.ASC' USING PIGSTORAGE(',');
While it loaded and gave no error, cat a gave me a Directory does not exist error. I'm new to Pig and know I did something very wrong there. How do I check if it is indeed loaded? Or is loaded a misnomer, and the file just exists on the HDFS?
Next, I'd like to cut a few columns of data from the CSV file and store it in another file. How can I go about it?
I don't necessarily need the script/code, but if you could point me to the right functions that will accomplish what I want to do, that would be great. Thanks!
To see the current content of A you can use DUMPA;. To see the schema/relationship you can use DESCRIBEA;.
Once you know the schema of A you can project out the fields you want. E.G. B = FOREACH A GENERATE $0 AS foo, $4 AS bar ; to select only the 1st and 5th columns, naming them foo and bar respectively.
Storing can be done with STOREB INTO 'myoutdir' USING PigStorage('|') ; where the char you choose to be a delimiter can be any single char.
So, for example this is how the script would look while I am testing it:
A = LOAD '/USER/XYZ/PIG/FILENAME.ASC' USING PIGSTORAGE(',') ;
DESCRIBE A ;
DUMP A ;
B = FOREACH A GENERATE $0, $4;
DESCRIBE B ;
DUMP B ;
STORE B INTO 'myoutdir' USING PigStorage('|') ;