I have a lot of txt files like this:
Title 1
Text 1(more then 1 line)
And I would like to make one csv file from all of them that it will look like this:
Title 1,Text 1
Title 2,Text 2
Title 3,Text 3
etc
How could I do it? I think that awk is good for it but don't know how to realize it.
May I suggest:
paste -d, file1 file2 file3
To handle large numbers of files, max 40 per output file (untested, but close):
xargs -n40 files... echo >tempfile
num=1
for line in $(<tempfile)
do
paste -d, $line >outfile.$num
let num=num+1
done
This is approximately what you posted with some improvements.
for text in *
do
awk 'BEGIN {q="\""; print q}
NR==1 {
gsub(" "," ") # why?
gsub("Title: *","")
print
}
NR>1 {
gsub(" "," ") # why?
gsub("Content: *","")
gsub(q,q q)
print
}
END {print q}' "$text" >> ../final
done
Edit:
If you have a bunch of files that consist of only two lines, try this:
sed 'N;s/\n/,/' file*.txt
If the files contain more than two lines each then it will put each pair of lines on the same line separated by a comma.
Given 3 files containing the following data:
file1.txt
Heading 1
Text 1
Text 2
file2.txt
Heading 2
Text 1
file3.txt
Heading 3
Text 1
text 2
Text 3
The expected results are:
Heading 1,Text 1,Text 2
Heading 2,Text1
Heading 3,Text 1,text 2,Text 3
This is accomplished using the program createcsv.awk below invoked as
gawk -f createcsv.awk file1.txt file2.txt file3.txt
createcsv.awk
{
if (1 == FNR) {
# It is the first line of a new file
if (csvline != "") {
# First file or empty files we can ignore
print csvline;
}
csvline = "";
delimiter = "";
}
csvline = csvline delimiter $0;
if ("" == delimiter) { delimiter="," }
}
END{
print csvline;
}
Related
I currently have the following script:
awk -F, 'NR==FNR { a[$1 FS $4]=$0; next } $1 FS $4 in a { printf a[$1 FS $4]; sub($1 FS $4,""); print }' file1.csv file2.csv > combined.csv
this compares two columns 1 & 4 from both csv files and outputs the result from both files to combined.csv. Is it possible to output the lines from file 1 & file 2 that dont match to other files with the same awk line? or would i need to do seperate parses?
File1
ResourceName,ResourceType,PatternType,User,Host,Operation,PermissionType
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow
File2
topic,groupName,Name,User,email,team,contact,teamemail,date,clienttype
BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer
combined
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
Wanted additional files:
non matched file1:
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
non matched file2:
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer```
again, I might be trying to do too much in one line? would it be wiser to run another parse?
Assuming the key pairs of $1 and $4 are unique within each input file then using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { next }
{ key = $1 FS $4 }
NR==FNR {
file1[key] = $0
next
}
key in file1 {
print file1[key], $0 > "out_combined"
delete file1[key]
next
}
{
print > "out_file2_only"
}
END {
for (key in file1) {
print file1[key] > "out_file1_only"
}
}
$ awk -f tst.awk file{1,2}
$ head out_*
==> out_combined <==
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
==> out_file1_only <==
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
==> out_file2_only <==
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer
The order of lines in out_file1_only will be shuffled by the in operator - if that's a problem let us know as it's an easy tweak to retain the input order.
Say I have a file of 101 lines called file100.csv, with the first row being the header. I want to be able to split that file into 10 files of 1+10 lines named N.file100.csv, where N = 1-10, and that the header being added to all 10 files as the first line
So far, I can isolate the header and split the files no problem:
head -n 1 file100.csv > tmpHeader
tail -n +2 file100.csv | awk '{filename = int((NR-1)/10)+1 ".file100.csv"; print >> filename}' -
What I'm having trouble with is attaching that header file as the first row to all subsequent 10 files.
awk 'NR==1 {a=$0; next} (NR-2)%10==0 {filename = int((NR-2)/10)+1 ".file100.csv"; print a >> filename} {print >> filename}' file100.csv
Explanation:
NR==1 {a=$0; next} reads the first line of the file and stores the header in the variable a.
(NR-2)%10==0 {filename = int((NR-2)/10)+1 ".file100.csv"; print a >> filename} generates the filename, it's the same as in your command. We only have to write the header when the filename changes. This is every 10th line, considering the offset coming with the header.
{print >> filename} prints every line to the current file.
AWK can match not only by regexp but by any boolean. Match current line number NR to find first line line and rest of the lines. NR == 1 { head = $0 }.
When every x-th line, generate new file name and print header separately into the file. Every 5-th in my case: NR % 5 == 2 { filename =int((NR-1)/5)+1; print head > filename }
Print rest of the lines into current file name: NR != 1 { print >> filename }
cat file100.csv | awk 'NR == 1 { head = $0 } NR % 5 == 2 { filename = int((NR-1)/5)+1 ".file100.csv"; print head > filename } NR != 1 { print >> filename }'
I have a CSV file (file1) that looks something like this:
123,info,ONE NAME
124,info,ONE VARIATION
125,info,NAME ANOTHER
126,info,SOME TITLE
and another CSV file (file2) that looks like this:
1,info,NAME FIRST
2,info,TWO VARIATION
3,info,NAME SECOND
4,info,ANOTHER TITLE
My desired output would be:
1,123,NAME FIRST,ONE NAME
3,125,NAME SECOND,NAME ANOTHER
Where if the first word in comma delimited field 3 (ie: NAME in line 1) of file2 is equal to any of the words in field 3 of file1, print a line with format:
field1(file2),field1(file1),field3(file2),field3(file1)
Each file has the same number of lines and matches are only made when each has the same line number.
I know I can split fields and get the first word in field3 in Awk like this:
awk -F"," '{split($3,a," "); print a[1]}' file
But since I'm only moderately competent in Awk, I'm at a loss for how to approach a job where there are two files compared using splits.
I could do it in Python like this:
with open('file1', 'r') as f1, open('file2', 'r') as f2:
l1 = f1.readlines()
l2 = f2.readlines()
for i in range(len(l1)):
line_1 = l1[i].split(',')
line_2 = l2[i].split(',')
field_3_1 = line_1[2].split()
field_3_2 = line_2[2].split()
if field_3_2[0] in field_3_1:
one = ' '.join(field_3_1)
two = ' '.join(field_3_2)
print(','.join((line_2[0], line_1[0], two, one)))
But I'd like to know how a job like this would be done in Awk as occasionally I use shells where only Awk is available.
This seems like a strange task to need to do, and my example I think can be a bit confusing, but I need to perform this to check for broken/ill-formatted data in one of the files.
awk -F, -vOFS=, '
{
num1 = $1
name1 = $3
split(name1, words1, " ")
getline <"file2"
split($3, words2, " ")
for (i in words1)
if (words2[1] == words1[i]) {
print $1, num1, $3, name1
break
}
}
' file1
Output:
1,123,NAME FIRST,ONE NAME
3,125,NAME SECOND,NAME ANOTHER
You can try something along the lines, although the following prints only one match for each line in second file:
awk -F, 'FNR==NR {
count= split($3, words, " ");
for (i=1; i <= count; i++) {
field1hash[words[i]]=$1;
field3hash[$1]=$3;
}
next;
}
{
split($3,words," ");
if (field1hash[words[1]]) {
ff1 = field1hash[words[1]];
print $1","ff1","$3","field3hash[ff1]
}
}' file1 file2
I like #ooga's answer better than this:
awk -F, -v OFS=, '
NR==FNR {
split($NF, a, " ")
data[NR,"word"] = a[1]
data[NR,"id"] = $1
data[NR,"value"] = $NF
next
}
{
n = split($NF, a, " ")
for (i=1; i<=n; i++)
if (a[i] == data[FNR,"word"])
print data[FNR,"id"], $1, data[FNR,"value"], $NF
}
' file2 file1
I have this awk script to write a text file to a specific cell in a .cvs file, but I am trying to have the text displayed vertically, not horizontally.
`nawk -v r=2 -v c=3 '
BEGIN { FS=OFS=","
}
FNR == NR {
val = sprintf("%s%s%s", val, NR > 1 ? " " : "", $0)
next
}
FNR == r {
$c = val
}
1' file new_one.csv`
Want
the
text
like
this
Can't you do something like:
val = sprintf("%s%s%s", val, NR > 1 ? " " : "", $0) + '\n'
Documentation here
Assuming a csv input file like:
a,b,c
d,e,f
g,h,i
k,l,m
and the data you want vertically like:
This file has words horizontally
here's one way to modify your script:
NR==FNR {gsub(" ","\n"); val="\""$0"\""; next}
This is going to replace all the single spaces in $0 with newlines. Then the whole line is assigned to val, but wrapped in double quotes per wikipedia's csv page.
Running this with the data files I created from the command line (using a slightly different syntax used than you for FS/OFS):
awk -F"," -v r=2 -v c=3 'NR==FNR{gsub(" ","\n"); val="\""$0"\""; next} FNR==r {$c=val} 1' OFS="," vert data
a,b,c
d,e,"This
file
has
words
horizontally"
g,h,i
k,l,m
where vert is the name of the vertical data and data is the name of the csv data file. Notice that f at [2,3] has been replaced with the altered input from the vert file.
Be aware that the row/column indexing you've chosen only works if none of the fields in the data file have internal commas in them and that awk isn't going to be your best friend for parsing csv files in general.
I need to export some numeric values from a given ASCII text file and export it in a specific formatted csv file. The input file has got the even / odd line pattern:
SCF Done: E(UHF) = -216.432419652 A.U. after 12 cycles
CCSD(T)= -0.21667965032D+03
SCF Done: E(UHF) = -213.594303492 A.U. after 10 cycles
CCSD(T)= -0.21379841974D+03
SCF Done: E(UHF) = -2.86120139864 A.U. after 6 cycles
CCSD(T)= -0.29007031339D+01
and so on
I need the odd line value in the 5th column and the even line 2nd column value. They should be printed in a semicolon seperated csv file, with 10 values in each row. So the output should look like
-216.432419652;-0.21667965032D+03;-213.594303492;-0.21379841974D+03;-2.86120139864;-0.29007031339D+01; ...linebreak after 5 pairs of values
I started with awk '{print $5}' and awk '{print $2}', however I was not successful in creating a pattern that just acts on even/odd lines.
A simple way to do that?
The following script doesn't use a lot of the great power of awk, but will do the job for you and is hopefully understandable:
NR % 2 { printf "%s;", $5 }
NR % 2 == 0 { printf "%s;", $2 }
NR % 10 == 0 { print "" }
END { print "" }
Usage (save the above as script.awk):
awk -f script.awk input.txt
Given a file called data.txt, try:
awk '/SCF/{ printf $5 ";"; } /CCSD/{ printf($2); } NR % 10 == 0 { printf "\n"; }' data.txt
Something like this could work -
awk '{x = NF > 3 ? $5 : $2 ; printf("%s;",x)}(NR % 10 == 0){print OFS}' file
|_____________________| |________| |___________||_________|
| | | |
This is a `ternary operator`, Print with `NR` is a `OFS` is another built-in
what it does is checks the line formatting a built-in that has a default value of
for number of fields (`NF`). If to add that keeps `\n`
the number of fields is more than a ";" track of
3, we assign $5 value to variable x number of lines.
else we assign $2 value We are using modulo
operator to check when
10 lines are crossed.
This might work for you:
tr -s ' ' ',' <file | paste -sd',\n' | cut -d, -f5,11 | paste -sd',,,,\n'