How to change csv file delimiter - csv

here's a csv file items.txt
item-number,item-description,item-category,cost,quantity-available
I tried to change the field separator from , to \n using awk and I need a easy way to do it.
1) this command does not work
awk -F, 'BEGIN{OFS="\n"} {print}' items.txt
2) this command works, but the real csv i need to process has 15+ columns, i dont want to provide all columns variables
awk -F, 'BEGIN{OFS="\n"} {print $1,$2}' items.txt
Thanks in advance.

If your fields do not contain ,'s, you may use tr:
tr ',' '\n' < infile > outfile

You were close. After setting the OFS to newline, you will need to re-construct the fields to get them separated to newlines.
$1=$1 re-evaluates the fields and separates them by OFS which by default is space. Since we set the OFS to RS which by default is new line, you get the desired output.
The 1 at the end of statement is idiomatic way of saying print the line.
$ awk 'BEGIN{FS=",";OFS=RS}{$1=$1}1' file
item-number
item-description
item-category
cost
quantity-available

You need to get awk to re-evaluate the string by changing a field.
awk -F, 'BEGIN{OFS="\n"} {$1 = $1; print}' items.txt
Or, if you're sure the first column is always non-empty and nonzero, you could use the somewhat simpler:
awk -F, 'BEGIN{OFS="\n"} $1 = $1' items.txt

Related

prefix every header column with string using awk

I have a bunch of big csv I want to prefix every header column with fixed string. There is more than 500 columns in every file.
suppose my header is:
number;date;customer;key;amount
I tried this awk line:
awk -F';' 'NR==1{gsub(/[^a-z_]/,"input_file.")} { print }'
but I get (note fist column is missing prefix and separator is removed):
numberinput_file.dateinput_file.customerinput_file.keyinput_file.amount
expected output:
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
In any awk that'd be:
$ awk 'NR==1{gsub(/^|;/,"&input_file.")} 1' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
but sed exists to do simple substitutions like that, e.g. using a sed that has -E to enable EREs (e.g. GNU and BSD sed):
$ sed -E '1s/^|;/&input_file./g' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
If you're using GNU tools then you could use either of the above to change all of your CSV files at once with either of these:
awk -i inplace 'NR==1{gsub(/^|;/,"&input_file.")} 1' *.csv
sed -i -E '1s/^|;/&input_file./g' *.csv
Your gsub would brutally replace any nonalphabetic character anywhere in the input with the prefix - including your column separators.
The print can be abbreviated to the common idiom 1 at the very end of your script; this simply means "this condition is true; perform the default action for every line (i.e. print it all)" though this is just a stylistic change.
awk -F';' 'NR==1{
sub(/^/, "input_file."); gsub(/;/, ";input_file."); }
1' filename
If you want to perform this on multiple files, probably put a shell loop around it. If you only want to concatenate everything to standard output, you can give all the files to Awk in one go (in which case you probably don't want to print the header line for any file after the first; maybe change the 1 to NR==1 || FNR != 1).
I would use GNU AWK following way, let file.txt content be
number;date;customer;key;amount
1;2;3;4;5
6;7;8;9;10
then
awk 'BEGIN{FS=";";OFS=";input_file."}NR==1{$1="input_file." $1}{print}' file.txt
output
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
1;2;3;4;5
6;7;8;9;10
Explanation: I set OFS to ; followed by prefix. Then in first line I add prefix to first column, which trigger string rebuilding. No modification is done in any other line, thus they are printed as is.
(tested in GNU Awk 5.0.1)
Also with awk using for loop and printf:
awk 'BEGIN{FS=OFS=";"} NR==1{for (i=1; i<=NF; i++) printf "%s%s", "input_file." $i, (i<NF ? OFS : ORS)}' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount

Update a CSV file to drop the first number and insert a decimal place in a particular column

I need help to perform the following
My CSV file looks like this
900001_10459.jpg,036921,Initiated
900002_10454.jpg,027964,Initiated
900003_10440.jpg,021449,Initiated
900004_10440.jpg,016650,Initiated
900005_10440.jpg,013929,Initiated
What I need to do is generate a new csv file to be as follows
900001_10459.jpg,3692.1,Initiated
900002_10454.jpg,2796.4,Initiated
900003_10440.jpg,2144.9,Initiated
900004_10440.jpg,1665.0,Initiated
900005_10440.jpg,1392.9,Initiated
if I was to do this as a test
echo '036921' | awk -v range=1 '{print substr($0,range+1)}' | sed 's/.$/.&/'
I get
3692.1
Can anyone help me so I can incorporate that, (or anything similar) to change my CSV file?
Try
sed 's/,0*([0-9]*)([0-9]),/,\1.\2,/' myfile.csv
Using awk and with the conditions specified in the comment, you can use:
$ awk -F, '{ printf "%s,%06.1f,%s\n", $1, $2 / 10, $3 }' data
900001_10459.jpg,3692.1,Initiated
900002_10454.jpg,2796.4,Initiated
900003_10440.jpg,2144.9,Initiated
900004_10440.jpg,1665.0,Initiated
900005_10440.jpg,1392.9,Initiated
$
With the printf format string providing the commas, there's no need to set OFS (because OFS is not used by printf).
Assuming that values with leading zeros appears solely in 2nd column I would use GNU AWK for this task following way, let file.txt content be
900001_10459.jpg,036921,Initiated
900002_10454.jpg,027964,Initiated
900003_10440.jpg,021449,Initiated
900004_10440.jpg,016650,Initiated
900005_10440.jpg,013929,Initiated
then
awk 'BEGIN{FS=",0?";OFS=","}{$2=gensub(/([0-9])$/, ".\\1", 1, $2);print}' file.txt
output
900001_10459.jpg,3692.1,Initiated
900002_10454.jpg,2796.4,Initiated
900003_10440.jpg,2144.9,Initiated
900004_10440.jpg,1665.0,Initiated
900005_10440.jpg,1392.9,Initiated
Explanation: I set field separator (FS) to be , optionally followed by 0, so leading zero will be discarded as part of separator. In 2nd I replace last digit by . followed by that digit. Finally I print such changed line, using , as separators.
(tested in gawk 4.2.1)
I wish to have 4 numbers (including zeros) and the last value (5th value) separated from the 4 values by a decimal point.
If I understand, you need not all digits of that field but only the last five digits.
Using awk you can get the last five with the substr function and then print the field with the last digit separeted from de previous 4 by a decimal point, using the sub() function:
awk -F',' -v OFS=',' '{$2= substr($2, length($2) - 4, length($2) ); sub(/[[:digit:]]{1}$/, ".&",$2);print}' file
900001_10459.jpg,3692.1,Initiated
900002_10454.jpg,2796.4,Initiated
900003_10440.jpg,2144.9,Initiated
900004_10440.jpg,1665.0,Initiated
900005_10440.jpg,1392.9,Initiated

Delete rows of CSV file based on the value of a column

Here's an example of a few lines of my CSV file:
movieID,actorID,actorName,ranking
1,don_rickles,Don Rickles,3
1,jack_angel,Jack Angel,6
1,jim_varney,Jim Varney,4
1,tim_allen,Tim Allen,2
1,tom_hanks,Tom Hanks,1
1,wallace_shawn,Wallace Shawn,5
I would like to remove all rows that have a ranking of > 4, so far I've been trying use this awk line:
awk -F ',' 'BEGIN {OFS=","} { if (($4) < 5) print }' file.csv > file_out.csv
It should print all the rows with a ranking (4th column) of less than 5 to a new file. I can't tell exactly what this line actually does, but it's not what I want. can someone tell me where I've gone wrong with that line?
Instead of deleting the records, think of which ones you're going to print. I guess it's <=4. In idiomatic awk you can write this as
$ awk -F, '$4<=4' file
1,don_rickles,Don Rickles,3
1,jim_varney,Jim Varney,4
1,tim_allen,Tim Allen,2
1,tom_hanks,Tom Hanks,1

Parsing csv using conditional statement in awk

I have a csv file
file.csv
C75ADANXX,5,20,,AGGCAGAA,AGAGTAGA,,,,,AB
C75ADANXX,5,21,,AGGCAGAA,GTAAGGAG,,,,,AB
C75ADANXX,5,22,,AGGCAGAA,ACTGCATA,,,,,AB
C75ADANXX,5,23,,AGGCAGAA,AAGGAGTA,,,,,TC
C75ADANXX,5,24,,AGGCAGAA,CTAAGCCT,,,,,TC
C75ADANXX,5,25,,TCCTGAGC,GCGTAAGA,,,,,TC
when I run the following awk command :
awk -F "," '{print$11}' file.csv ##prints last cloumn
i want to extract lines with TC ; but the following command prints nothing
awk -F "," '{if($11==TC){print$0}}' file.csv
Where am I going wrong in writing the command ? Thank you.
modified the command to
awk -F "," '{if($11=="TC\r"){print$0}}' file.csv
this file was copied from windows , it had a carriage return character at the end of the line which was obviously not seen when you print only last column.
Modify
awk -F "," '{if($11==TC){print$0}}' file.csv
To
awk -F "," '{if($11=="TC"){print$0}}' file.csv
OR even simple
awk -F, '$11=="TC"' file.csv
if($11==TC) Since variable TC is not defined before (because no quotes used, awk treats TC as variable not as string), it evaluates false always, so it prints nothing.
try this one
grep ",TC$" file.csv

parse a csv file that contains commans in the fields with awk

i have to use awk to print out 4 different columns in a csv file. The problem is the strings are in a $x,xxx.xx format. When I run the regular awk command.
awk -F, {print $1} testfile.csv
my output `ends up looking like
307.00
$132.34
30.23
What am I doing wrong.
"$141,818.88","$52,831,578.53","$52,788,069.53"
this is roughly the input. The file I have to parse is 90,000 rows and about 40 columns
This is how the input is laid out or at least the parts of it that I have to deal with. Sorry if I made you think this wasn't what I was talking about.
If the input is "$307.00","$132.34","$30.23"
I want the output to be in a
$307.00
$132.34
$30.23
Oddly enough I had to tackle this problem some time ago and I kept the code around to do it. You almost had it, but you need to get a bit tricky with your field separator(s).
awk -F'","|^"|"$' '{print $2}' testfile.csv
Input
# cat testfile.csv
"$141,818.88","$52,831,578.53","$52,788,069.53"
"$2,558.20","$482,619.11","$9,687,142.69"
"$786.48","$8,568,159.41","$159,180,818.00"
Output
# awk -F'","|^"|"$' '{print $2}' testfile.csv
$141,818.88
$2,558.20
$786.48
You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.
I think what you're saying is that you want to split the input into CSV fields while not getting tripped up by the commas inside the double quotes. If so...
First, use "," as the field separator, like this:
awk -F'","' '{print $1}'
But then you'll still end up with a stray double-quote at the beginning of $1 (and at the end of the last field). Handle that by stripping quotes out with gsub, like this:
awk -F'","' '{x=$1; gsub("\"","",x); print x}'
Result:
echo '"abc,def","ghi,xyz"' | awk -F'","' '{x=$1; gsub("\"","",x); print x}'
abc,def
In order to let awk handle quoted fields that contain the field separator, you can use a small script I wrote called csvquote. It temporarily replaces the offending commas with nonprinting characters, and then you restore them at the end of your pipeline. Like this:
csvquote testfile.csv | awk -F, {print $1} | csvquote -u
This would also work with any other UNIX text processing program like cut:
csvquote testfile.csv | cut -d, -f1 | csvquote -u
You can get the csvquote code here: https://github.com/dbro/csvquote
The data file:
$ cat data.txt
"$307.00","$132.34","$30.23"
The AWK script:
$ cat csv.awk
BEGIN { RS = "," }
{ gsub("\"", "", $1);
print $1 }
The execution:
$ awk -f csv.awk data.txt
$307.00
$132.34
$30.23