prefix every header column with string using awk - csv

I have a bunch of big csv I want to prefix every header column with fixed string. There is more than 500 columns in every file.
suppose my header is:
number;date;customer;key;amount
I tried this awk line:
awk -F';' 'NR==1{gsub(/[^a-z_]/,"input_file.")} { print }'
but I get (note fist column is missing prefix and separator is removed):
numberinput_file.dateinput_file.customerinput_file.keyinput_file.amount
expected output:
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount

In any awk that'd be:
$ awk 'NR==1{gsub(/^|;/,"&input_file.")} 1' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
but sed exists to do simple substitutions like that, e.g. using a sed that has -E to enable EREs (e.g. GNU and BSD sed):
$ sed -E '1s/^|;/&input_file./g' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
If you're using GNU tools then you could use either of the above to change all of your CSV files at once with either of these:
awk -i inplace 'NR==1{gsub(/^|;/,"&input_file.")} 1' *.csv
sed -i -E '1s/^|;/&input_file./g' *.csv

Your gsub would brutally replace any nonalphabetic character anywhere in the input with the prefix - including your column separators.
The print can be abbreviated to the common idiom 1 at the very end of your script; this simply means "this condition is true; perform the default action for every line (i.e. print it all)" though this is just a stylistic change.
awk -F';' 'NR==1{
sub(/^/, "input_file."); gsub(/;/, ";input_file."); }
1' filename
If you want to perform this on multiple files, probably put a shell loop around it. If you only want to concatenate everything to standard output, you can give all the files to Awk in one go (in which case you probably don't want to print the header line for any file after the first; maybe change the 1 to NR==1 || FNR != 1).

I would use GNU AWK following way, let file.txt content be
number;date;customer;key;amount
1;2;3;4;5
6;7;8;9;10
then
awk 'BEGIN{FS=";";OFS=";input_file."}NR==1{$1="input_file." $1}{print}' file.txt
output
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount
1;2;3;4;5
6;7;8;9;10
Explanation: I set OFS to ; followed by prefix. Then in first line I add prefix to first column, which trigger string rebuilding. No modification is done in any other line, thus they are printed as is.
(tested in GNU Awk 5.0.1)

Also with awk using for loop and printf:
awk 'BEGIN{FS=OFS=";"} NR==1{for (i=1; i<=NF; i++) printf "%s%s", "input_file." $i, (i<NF ? OFS : ORS)}' file
input_file.number;input_file.date;input_file.customer;input_file.key;input_file.amount

Related

Change column if some regex expression is true with awk or sed

I have a file (lets call it data.csv) similar to this
"123","456","ud,h-match","moredata"
with many rows in the same format and embedded commas. What I need to do is look at the third column and see if it has an expression. In this case I want to know if the third column has "match" anywhere (which it does). If there is any, then I to replace the whole column to something else like "replaced". So to relate it to the example data.csv file, I would want it to look this.
"123","456","replaced","moredata"
Ideally, I want the file data.csv itself to be changed (time is of the essence since I have a big file) but it's also fine if you write it to another file.
Edit:
I have tried using awk:
awk -F'","' -OFS="," '{if(tolower($3) ~ "stringI'mSearchingFor"){$3="replacement"; print}else print}' file
but it dosen't change anything. If I remove the OFS portion then it works but it gets separated by spaces and the columns don't get enclosed by double quotes.
Depending on the answer to my question about what you mean by column, this may be what you want (uses GNU awk for FPAT):
$ awk -v FPAT='[^,]+|"[^"]+"' -v OFS=',' '$3~/match/{$3="\"replaced\""} 1' file
"123","456","replaced","moredata"
Use awk -i inplace ... if you want to do "in place" editing.
With any awk (but slightly more fragile than the above since it leaves the leading/trailing " on the first and last fields, and has no -i inplace):
$ awk 'BEGIN{FS=OFS="\",\""} $3~/match/{$3="replaced"} 1' file
"123","456","replaced","moredata"

How to convert linefeed into literal "\n"

I'm having some trouble converting my file to a properly formatted json string.
Have been fiddling with sed for ages now, but it seems to mock me.
Am working on RHEL 6, if that matters.
I'm trying to convert this file (content):
Hi there...
foo=bar
tomàto=tomáto
url=http://www.stackoverflow.com
Into this json string:
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com"}
How would I replace the actual line feeds in the literal '\n' character?? This is where I'm utterly stuck!
I've been trying to convert line feeds into ";" first and then back to a literal "\n". Tried loops for each row in the file. Can't make it work...
Some help is much appreciated!
Thanks!
sed is for simple substitutions on individual lines, that is all. Since sed works line by line your sed script doesn't see the line endings and so you can't get it change the line endings without jumping through hoops using arcane language constructs and convoluted logic that hasn't been useful since the mid-1970s when awk was invented.
This will change all newlines in your input file to the string \n:
$ awk -v ORS='\\n' '1' file
Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com\n
and this will do the rest:
$ awk -v ORS='\\n' 'BEGIN{printf "{\"text\":\""} 1; END{printf "\"}\n"}' file
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com\n"}
or this if you have a newline at the end of your input file but don't want it to become a \n string in the output:
$ awk -v ORS='\\n' '{rec = (NR>1 ? rec ORS : "") $0} END{printf "{\"text\":\"%s\"}\n", rec}' file
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com"}
With GNU sed:
sed ':a;N;s/\n/\\n/;ta' file | sed 's/.*/{"text":"&"}/'
Output:
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com"}
Use awk for this :
awk -v RS=^$ '{gsub(/\n/,"\\n");sub(/^/,"{\"text\":\"");sub(/\\n$/,"\"}")}1' file
Output
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com"}
awk to the rescue!
$ awk -vRS='\0' '{gsub("\n","\\n");
print "{\"text\":\"" $0 "\"}"}' file
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com\n"}
This might work for you (GNU sed):
sed '1h;1!H;$!d;x;s/.*/"text":"&"/;s/\n/\\n/g' file
Slurp the file into memory and use pattern matching to manipulate the file to the desired format.
The most simple (and elegant ?) solution :) :
#!/bin/bash
in=$(perl -pe 's/\n/\\n/' $1)
cat<<EOF
{"text":"$in"}
EOF
Usage:
./script.sh file.txt
Output :
{"text":"Hi there...\n\nfoo=bar\ntomàto=tomáto\nurl=http://www.stackoverflow.com\n"}

How to create a string with x numbers of the same character without looping in AWK?

I'm trying to write a CSV join program in AWK which joins the first.csv file with the second.csv file. The program now works perfectly fine, assuming the the number of rows in both files are the same.
The problem arises when one of the files contains more rows than the other; in this case, I'd have to add multiple number of commas (which depends to the number of fields in input files), to the file with less number of rows, so that the columns are not misplaced.
The question is, How can I create and assign strings containing different number of commas? for instance,
if: NumberOfFields==5; Then, I want to create string ",,,,," and add it to an Array[i].
Here is another answer with sample code using your variable and array name.
BEGIN {
NumberOfFields=5;
i=1;
Array[i] = gensub(/0/, ",", "g", sprintf("%0*d", NumberOfFields, 0));
print Array[i];
}
Run it with awk -f x,awk where x.awk is the above code in a text file. Be aware that it always prints at least 1 comma, even if you specify zero.
$ awk -v num=3 'BEGIN {var=sprintf("%*s",num,""); gsub(/ /,",",var); print var}'
,,,
$
Use an array instead of var if/when you like. Note that unlike another solution posted the above will work with any awk, not just gawk, and it will not print any commas if the number requested is zero:
$ awk -v num=0 'BEGIN {var=sprintf("%*s",num,""); gsub(/ /,",",var); print var}'
$
The equivalent with GNU awk and gensub() would be:
$ awk -v num=3 'BEGIN {var=gensub(/ /,",","g",sprintf("%*s",num,"")); print var}'
,,,
$
$ awk -v num=0 'BEGIN {var=gensub(/ /,",","g",sprintf("%*s",num,"")); print var}'
$

How to change csv file delimiter

here's a csv file items.txt
item-number,item-description,item-category,cost,quantity-available
I tried to change the field separator from , to \n using awk and I need a easy way to do it.
1) this command does not work
awk -F, 'BEGIN{OFS="\n"} {print}' items.txt
2) this command works, but the real csv i need to process has 15+ columns, i dont want to provide all columns variables
awk -F, 'BEGIN{OFS="\n"} {print $1,$2}' items.txt
Thanks in advance.
If your fields do not contain ,'s, you may use tr:
tr ',' '\n' < infile > outfile
You were close. After setting the OFS to newline, you will need to re-construct the fields to get them separated to newlines.
$1=$1 re-evaluates the fields and separates them by OFS which by default is space. Since we set the OFS to RS which by default is new line, you get the desired output.
The 1 at the end of statement is idiomatic way of saying print the line.
$ awk 'BEGIN{FS=",";OFS=RS}{$1=$1}1' file
item-number
item-description
item-category
cost
quantity-available
You need to get awk to re-evaluate the string by changing a field.
awk -F, 'BEGIN{OFS="\n"} {$1 = $1; print}' items.txt
Or, if you're sure the first column is always non-empty and nonzero, you could use the somewhat simpler:
awk -F, 'BEGIN{OFS="\n"} $1 = $1' items.txt

parse a csv file that contains commans in the fields with awk

i have to use awk to print out 4 different columns in a csv file. The problem is the strings are in a $x,xxx.xx format. When I run the regular awk command.
awk -F, {print $1} testfile.csv
my output `ends up looking like
307.00
$132.34
30.23
What am I doing wrong.
"$141,818.88","$52,831,578.53","$52,788,069.53"
this is roughly the input. The file I have to parse is 90,000 rows and about 40 columns
This is how the input is laid out or at least the parts of it that I have to deal with. Sorry if I made you think this wasn't what I was talking about.
If the input is "$307.00","$132.34","$30.23"
I want the output to be in a
$307.00
$132.34
$30.23
Oddly enough I had to tackle this problem some time ago and I kept the code around to do it. You almost had it, but you need to get a bit tricky with your field separator(s).
awk -F'","|^"|"$' '{print $2}' testfile.csv
Input
# cat testfile.csv
"$141,818.88","$52,831,578.53","$52,788,069.53"
"$2,558.20","$482,619.11","$9,687,142.69"
"$786.48","$8,568,159.41","$159,180,818.00"
Output
# awk -F'","|^"|"$' '{print $2}' testfile.csv
$141,818.88
$2,558.20
$786.48
You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.
I think what you're saying is that you want to split the input into CSV fields while not getting tripped up by the commas inside the double quotes. If so...
First, use "," as the field separator, like this:
awk -F'","' '{print $1}'
But then you'll still end up with a stray double-quote at the beginning of $1 (and at the end of the last field). Handle that by stripping quotes out with gsub, like this:
awk -F'","' '{x=$1; gsub("\"","",x); print x}'
Result:
echo '"abc,def","ghi,xyz"' | awk -F'","' '{x=$1; gsub("\"","",x); print x}'
abc,def
In order to let awk handle quoted fields that contain the field separator, you can use a small script I wrote called csvquote. It temporarily replaces the offending commas with nonprinting characters, and then you restore them at the end of your pipeline. Like this:
csvquote testfile.csv | awk -F, {print $1} | csvquote -u
This would also work with any other UNIX text processing program like cut:
csvquote testfile.csv | cut -d, -f1 | csvquote -u
You can get the csvquote code here: https://github.com/dbro/csvquote
The data file:
$ cat data.txt
"$307.00","$132.34","$30.23"
The AWK script:
$ cat csv.awk
BEGIN { RS = "," }
{ gsub("\"", "", $1);
print $1 }
The execution:
$ awk -f csv.awk data.txt
$307.00
$132.34
$30.23