Prompt the way to edit csv - csv

I have an csv file like:
1;2,3,4
5;2,3
etc
I need to get file like:
1;12
1;13
1;14
5;52
5;53
Can i do that without deep programming, maybe something like awk or something. I can do this thing on perl or python, but ш think there is a simpler way.

This is a way:
$ awk 'BEGIN{FS=OFS=";"}{n=split($2, a, ","); for (i=1; i<=n; i++) print $1, $1a[i]}' file
1;12
1;13
1;14
5;52
5;53
Explanation
BEGIN{FS=OFS=";"} set input and output field separator as ;.
{n=split($2, a, ",") slice the second field based on comma. The pieces are stored in the array a[].
for (i=1; i<=n; i++) print $1, $1a[i]} loop through the fields in a[] printing them together with the first field on the format FIRST_FIELD;FIRST_FIELD + a[i]

awk -F '[;,]' '{ for (i = 2; i <= NF; ++i) print $1 ";" $1 $i }' file
Output:
1;12
1;13
1;14
5;52
5;53

how about:
awk -F";" '{sub(/;/,FS $1);gsub(/,/,ORS $1";"$1)}7' file
test with your data:
kent$ echo "1;2,3,4
5;2,3"|awk -F";" '{sub(/;/,FS $1);gsub(/,/,ORS $1";"$1)}7'
1;12
1;13
1;14
5;52
5;53
or:
awk -F";" 'sub(/;/,FS $1)+gsub(/,/,ORS $1";"$1)' file

You can use awk:
awk -F'[;,]' '{for(i=2;i<=NF;i++)printf "%s;%s%s\n",$1,$1,$i}' a.txt
Explanation
-F';|,' Split line by , or ;
{for(i=2;i<NF;i++)printf "%s;%s%s\n",$1,$1,$i} Iterate though columns and produce output as desired.

Related

Convert single column to multiple, ensuring column count on last line

I would like to use AWK (Windows) to convert a text file with a single column to multiple columns - the count specified in the script or on the command line.
This question has been asked before but my final data file needs to have the same column count all the way.
Example of input:
L1
L2
L3
L4
L5
L6
L7
split into 3 columns and ";" as a separator
L1;L2;L3
L4;L5;L6
L7;; <<< here two empty fields are created after end of file, since I used just one on this line.
I tried to modify variants of the typical solution given: NR%4 {printf $0",";next} 1; and a counter, but could not quite get it right.
I would prefer not to count lines before, thereby running over the file multiple times.
You may use this awk solution:
awk -v n=3 '{
sub(/\r$/, "") # removes DOS line break, if present
printf "%s", $0(NR%n ? ";" : ORS)
}
END {
# now we need to add empty columns in last record
if (NR % n) {
for (i=1; i < (n - (NR % n)); ++i)
printf ";"
print ""
}
}' file
L1;L2;L3
L4;L5;L6
L7;;
With your shown samples please try following awk code. Using xargs + awk combination to achieve the outcome needed by OP.
xargs -n3 < Input_file |
awk -v OFS=";" '{if(NF==1){$0=$0";;"};if(NF==2){$0=$0";"};$1=$1} 1'
For an awk I would do:
awk -v n=3 '
{printf("%s%s", $0, (NR%n>0) ? ";" : ORS)}
END{
for(i=NR%n; i<n-1; i++) printf(";")
printf ORS
}' file
Or, an alternative awk:
awk -v n=3 -v OFS=";" '
{ row=row ? row FS $0 : $0 } # build row of n fields
!(NR%n) {$0=row; NF=n; print; row="" } # split the fields sep by OFS
END { if (NR%n) { $0=row; NF=n; print } } # same
' file
Or you can use ruby if you want more options:
ruby -le '
n=3
puts $<.read.
split($/).
each_slice(n).
map{|sl| sl.fill(sl.size...n) { "" }; sl.join(";") }.
join($\) # By using $\ and $/ with the -l the RS and ORS is set correctly for the platform
' file
Or, realize that paste is designed to do this:
paste -d';' - - - <file
(Use a - for each column desired)
Any of those prints (with n=3):
L1;L2;L3
L4;L5;L6
L7;;
(And work correctly for other values of n...)

Adding a column in multiple csv file using awk

I want to add a column at the multiple (500) CSV files (same dimensionality). Each column should act as an identifier for the individual file. I want to create a bash script using awk(I am a new bee in awk). The CSV files do come with headers.
For eg.
Input File1.csv
#name,#age,#height
A,12,4.5
B,13,5.0
Input File2.csv
#name,#age,#height
C,11,4.6
D,12,4.3
I want to add a new column "#ID" in both the files, where the value of ID will be same for an individual file but not for both the file.
Expected Output
File1.csv
#name,#age,#height,#ID
A,12,4.5,1
B,13,5.0,1
Expected File2.csv
#name,#age,#height,#ID
C,11,4.6,2
D,12,4.3,2
Please suggest.
If you don't need to extract the id number from the filename, this should do.
$ c=1; for f in File*.csv;
do
sed -i '1s/$/,#ID/; 2,$s/$/,'$c'/' "$f";
c=$((c+1));
done
note that this is inplace edit. Perhaps make a backup or test first.
UPDATE
If you don't need the individual files to be updated, this may work better for you
$ awk -v OFS=, 'BEGIN {f="allFiles.csv"}
FNR==1 {c++; print $0,"#ID" > f; next}
{print $0,c > f}' File*.csv
awk -F, -v OFS=, ‘
FNR == 1 {
$(NF + 1) = “ID#”
i++
f = FILENAME
sub(/Input/, “Output”, f)
} FNR != 1 {
$(NF + 1) = i
} {
print > f
}’ Input*.csv
With GNU awk for inplace editing and ARGIND:
awk -i inplace -v OFS=, '{print $0, (FNR==1 ? "#ID" : ARGIND)}' File*.csv

Loop through specific range using AWK

I have a csv file with multiple sections in a sheet.
I want to loop through a specific range only. Say loop through row 10 to 100.
My below code so far loops through the whole sheet.
awk -v val1='Batch File Name' -F ',' '{for (i=1; i<=NF; i++) if ($i==val1) {print i} }' "$FILES"
To only match some rows you can use the begin pattern,end pattern construct
NR==10,NR==100 { action}
this might be faster if your file is long
awk -F, -v val1='Batch File Name' 'NR>100{exit}
NR>=10{for(i=1;i<=NF;i++) if($i==val1) print i} files

Comparing split strings inside fields of two CSV files

I have a CSV file (file1) that looks something like this:
123,info,ONE NAME
124,info,ONE VARIATION
125,info,NAME ANOTHER
126,info,SOME TITLE
and another CSV file (file2) that looks like this:
1,info,NAME FIRST
2,info,TWO VARIATION
3,info,NAME SECOND
4,info,ANOTHER TITLE
My desired output would be:
1,123,NAME FIRST,ONE NAME
3,125,NAME SECOND,NAME ANOTHER
Where if the first word in comma delimited field 3 (ie: NAME in line 1) of file2 is equal to any of the words in field 3 of file1, print a line with format:
field1(file2),field1(file1),field3(file2),field3(file1)
Each file has the same number of lines and matches are only made when each has the same line number.
I know I can split fields and get the first word in field3 in Awk like this:
awk -F"," '{split($3,a," "); print a[1]}' file
But since I'm only moderately competent in Awk, I'm at a loss for how to approach a job where there are two files compared using splits.
I could do it in Python like this:
with open('file1', 'r') as f1, open('file2', 'r') as f2:
l1 = f1.readlines()
l2 = f2.readlines()
for i in range(len(l1)):
line_1 = l1[i].split(',')
line_2 = l2[i].split(',')
field_3_1 = line_1[2].split()
field_3_2 = line_2[2].split()
if field_3_2[0] in field_3_1:
one = ' '.join(field_3_1)
two = ' '.join(field_3_2)
print(','.join((line_2[0], line_1[0], two, one)))
But I'd like to know how a job like this would be done in Awk as occasionally I use shells where only Awk is available.
This seems like a strange task to need to do, and my example I think can be a bit confusing, but I need to perform this to check for broken/ill-formatted data in one of the files.
awk -F, -vOFS=, '
{
num1 = $1
name1 = $3
split(name1, words1, " ")
getline <"file2"
split($3, words2, " ")
for (i in words1)
if (words2[1] == words1[i]) {
print $1, num1, $3, name1
break
}
}
' file1
Output:
1,123,NAME FIRST,ONE NAME
3,125,NAME SECOND,NAME ANOTHER
You can try something along the lines, although the following prints only one match for each line in second file:
awk -F, 'FNR==NR {
count= split($3, words, " ");
for (i=1; i <= count; i++) {
field1hash[words[i]]=$1;
field3hash[$1]=$3;
}
next;
}
{
split($3,words," ");
if (field1hash[words[1]]) {
ff1 = field1hash[words[1]];
print $1","ff1","$3","field3hash[ff1]
}
}' file1 file2
I like #ooga's answer better than this:
awk -F, -v OFS=, '
NR==FNR {
split($NF, a, " ")
data[NR,"word"] = a[1]
data[NR,"id"] = $1
data[NR,"value"] = $NF
next
}
{
n = split($NF, a, " ")
for (i=1; i<=n; i++)
if (a[i] == data[FNR,"word"])
print data[FNR,"id"], $1, data[FNR,"value"], $NF
}
' file2 file1

awk process own field value

I have a CSV file formatted like this:
Postcode,Count,Total
L1 3RT,20,345.65
I am summing the counts and totals by Postcode using awk, however I'd like to do this for the first portion of a postcode (ie L1, thus combining the values for L1 3RT and L2 4XW). Sample data and existing awk command shown below.
CM1 4QR,979,32950.8
CM1 4QS,2,145.14
CM13 1DL,115,3771
AWK line
awk 'BEGIN { FS = "," } ; {sums[$1] += $2; totals[$1] += $3} END { for (i in sums) printf("%s,%s,%i\n", i, sums[i],totals[i])}' coach.csv
I would like the output to be
CM1,981,33095.94
CM13,115,3771
The following works:
awk -F'[ ,]' '
{
sums[$1] += $3;
totals[$1] += $4;
}
END {
for (i in sums)
printf("%s,%i,%i\n", i, sums[i],totals[i]);
}' coach.csv
It uses two delimiters, the comma and space. It works for your sample input, but won't for more complex input that has spaces elsewhere.
You can use multiple delimiters in awk. Please try this
awk -F'[, ]' '{sums[$1] += $3; totals[$1] += $4} END {for (i in sums) printf("%s,%.2f,%.2f\n", i, sums[i], totals[i])}' coach.csv