Pasting Text Vertically Into .CSV File - csv

I have this awk script to write a text file to a specific cell in a .cvs file, but I am trying to have the text displayed vertically, not horizontally.
`nawk -v r=2 -v c=3 '
BEGIN { FS=OFS=","
}
FNR == NR {
val = sprintf("%s%s%s", val, NR > 1 ? " " : "", $0)
next
}
FNR == r {
$c = val
}
1' file new_one.csv`
Want
the
text
like
this

Can't you do something like:
val = sprintf("%s%s%s", val, NR > 1 ? " " : "", $0) + '\n'
Documentation here

Assuming a csv input file like:
a,b,c
d,e,f
g,h,i
k,l,m
and the data you want vertically like:
This file has words horizontally
here's one way to modify your script:
NR==FNR {gsub(" ","\n"); val="\""$0"\""; next}
This is going to replace all the single spaces in $0 with newlines. Then the whole line is assigned to val, but wrapped in double quotes per wikipedia's csv page.
Running this with the data files I created from the command line (using a slightly different syntax used than you for FS/OFS):
awk -F"," -v r=2 -v c=3 'NR==FNR{gsub(" ","\n"); val="\""$0"\""; next} FNR==r {$c=val} 1' OFS="," vert data
a,b,c
d,e,"This
file
has
words
horizontally"
g,h,i
k,l,m
where vert is the name of the vertical data and data is the name of the csv data file. Notice that f at [2,3] has been replaced with the altered input from the vert file.
Be aware that the row/column indexing you've chosen only works if none of the fields in the data file have internal commas in them and that awk isn't going to be your best friend for parsing csv files in general.

Related

transform multiline text into csv with awk sed and grep

I run a shell command that returns a list of repeated values like this (note the indentation):
Name: vm346
cpu 1 (12%) 6150m (76%)
memory 1130Mi (7%) 1130Mi (7%)
Name: vm847
cpu 6 (75%) 30150m (376%)
memory 12980Mi (87%) 12980Mi (87%)
Name: vm848
cpu 3500m (43%) 17150m (214%)
memory 6216Mi (41%) 6216Mi (41%)
I am trying to transform that data like this (in csv):
vm346,1,(12%),6150m,(76%),1130Mi,(7%),1130Mi,(7%)
vm847,6,(75%),30150m,(376%),12980Mi,(87%),12980Mi,(87%)
vm848,3500m,(43%),17150m,(214%),6216Mi,(41%),6216Mi,(41%)
The problem is that any given dataset like the one above is always on more than one line.
when I pipe that into it awk it drives me mad because even if I use:
BEGIN{ FS="\n" }
to try and stitch the data together in one line, it doesn't work. No matter what I do, awk keeps the name value as a separated line above everything else.
I am sorry I haven't much code to share but I have been spinning my wheels with this for a few hours now and I am running out of ideas...
I can solve this in Perl:
perl -ane 'print join ",", #F[1 .. $#F]; print $F[0] eq "memory" ? "\n" : ","'
It should be easy to translate it to awk if you need it.
How does it work?
-a splits each line on whitespace into the #F array
-n reads the input line by line and runs the code specified after -e for each line
We print all the elements but the first one separated by commas (see join)
We then look at the first column, if it's memory, we are at the last line of the block, so we print a newline, otherwise we print a comma
With AWK, one option is to set RS to "Name: ", and ignore the first record with NR > 1, e.g.
awk -v RS="Name: " 'BEGIN{OFS=","} NR > 1 {print $1, $3, $4, $5, $6, $8, $9, $10, $11}' file
#> vm346,1,(12%),6150m,(76%),1130Mi,(7%),1130Mi,(7%)
#> vm847,6,(75%),30150m,(376%),12980Mi,(87%),12980Mi,(87%)
#> vm848,3500m,(43%),17150m,(214%),6216Mi,(41%),6216Mi,(41%)
awk '{$1=""}1' | paste -sd' \n' - | awk '{$1=$1}1' OFS=,
Get rid of the first column. Join every three rows. Same idea with sed:
sed 's/^ *[^ ]* *//' | paste -sd' \n' - | sed 's/ */,/g'
Something else:
awk '
$1=="Name:" {
sep=ors
ors=ORS
} {
for (i=2;i<=NF;++i) {
printf "%s%s",sep,$i
sep=OFS
}
} END {printf "%s",ors}'
Or if you want to print an ORS based on the first field being "memory" (note that this program may end without printing a terminating ORS):
awk '{for (i=2;i<=NF;++i) printf "%s%s",$i,(i==NF && $1=="memory" ? ORS : OFS)}'
something else else:
awk -v OFS=, '
index($0,$1)==1 {
OFS=ors
ors=ORS
} {
$1=""
printf "%s",$0
OFS=ofs
} END {printf "%s",ors} BEGIN {ofs=OFS}'
This might work for you (GNU sed):
sed -nE '/^ +\S+ +/{s///;H;$!d};x;/./s/\s+/,/gp;x;s/^\S+ +//;h' file
In overview the sed program processes indented lines, already gathered lines (except in the case that the current line is the first line of the file) and non-indented lines.
Turn off implicit printing and enable extended regexp's. (-nE).
If the current line is indented, remove the indent, the first field and any following spaces, append the result to the hold space and if it is not the last line, delete it.
Otherwise, check the hold space for gathered lines and if found, replace one or more whitespaces by commas and print the result. Then prep the current line by removing the first field and any following spaces and replace the hold space with the result.
The solution seems logically back-to-front, but programming in this style avoids having to check for end-of-file multiple times and invoking labels and gotos.
N.B. This solution will work for any number of indented lines.
Here is a ruby to do that:
ruby -e '
s=$<.read
s.scan(/^([^ \t]+:)([\s\S]+?)(?=^\1|\z)/m). # parse blocks
map(&:last). # get data part
# parse and join the data fields:
map{|block| block.split(/\n[ \t]+[^ \t]+[ \t]+/)}.
map{|lines| lines.map(&:strip).join(" ").split().join(",")}.
each{|l| puts "#{l}"}
' file
vm346,1,(12%),6150m,(76%),1130Mi,(7%),1130Mi,(7%)
vm847,6,(75%),30150m,(376%),12980Mi,(87%),12980Mi,(87%)
vm848,3500m,(43%),17150m,(214%),6216Mi,(41%),6216Mi,(41%)
The advantage is that this is not dependent on the number of lines or the number of fields. It is parsing data that is in blocks of the form:
START: ([ \t]+[data_with_no_space])*\n
l1 ([ \t]+[data_with_no_space])*\n
...
START:
...
Works this way:
Parse the blocks with THIS REGEX;
Save an array of the data elements;
Join the sub arrays and then split into data fields;
Join(',') to make a csv.

How to sent marked groups of text to different columns in CSV in awk?

I have a file marked like this, with "**" placed at the start of a line to indicate a new group of text. This was typed on very old hardware that doesn't have support for spreadsheets:
**1**
This is some text.
This text goes with the text above.
Here is more text in the first group.
**2**
This is some other text, but in a different group.
This text ought to go in the 2nd column of the CSV.
**3**
Here is data that goes in the 3rd column.
I need to send each group of text to a different column in a CSV. As commas are used, I use "#" as the delimiter.
Sample output:
**1**#**2**#**3**
This is some text.#This is some other text, but in a different group.#Here is data that goes in the 3rd column.
This text goes with the text above.#This text ought to go in the 2nd column of the CSV.#
Here is more text in the first group.##
I can use AWK to go from the below text to the first, e.g.:
awk -F"#" '{ print $1 }' >> file.txt
awk -F"#" '{ print $2 }' >> file.txt
awk -F"#" '{ print $3 }' >> file.txt
Can awk be used to reverse this?
Since there could be yet another group of records:
...
**4**
foo1
foo2
foo3
foo4
foo5
which has more entries than the first group, you need to either make two passes at the data to figure out the maximum number of fields to get the #s right or store the data into an array. I chose arrays and used GNU awk and a 2 dimensional array:
$ gawk '
/^\*\*/ {
r=1
f++
}
{
a[r++][f]=$0
}
END {
for(i=1;(i in a);i++)
for(j=1;j<=f;j++)
printf "%s%s",a[i][j],(j==f?ORS:"#")
}' file
Output:
**1**#**2**#**3**
This is some text.#This is some other text, but in a different group.#Here is data that goes in the 3rd column.
This text goes with the text above.#This text ought to go in the 2nd column of the CSV.#
Here is more text in the first group.##
Output with my additional 4th group in the input file:
**1**#**2**#**3**#**4**
This is some text.#This is some other text, but in a different group.#Here is data that goes in the 3rd column.#foo1
This text goes with the text above.#This text ought to go in the 2nd column of the CSV.##foo2
Here is more text in the first group.###foo3
###foo4
###foo5
Same approach as #JamesBrown (so please leave his answer as accepted) but will work in any awk and IMHO uses a little clearer variable names and syntax:
$ cat tst.awk
BEGIN { OFS="#" }
/^\*\*/ {
numCols++
rowNr = 0
}
{
vals[++rowNr,numCols] = $0
numRows = (numRows > rowNr ? numRows : rowNr)
}
END {
for (rowNr=1; rowNr<=numRows; rowNr++) {
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%s", vals[rowNr,colNr], (colNr < numCols ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
**1**#**2**#**3**
This is some text.#This is some other text, but in a different group.#Here is data that goes in the 3rd column.
This text goes with the text above.#This text ought to go in the 2nd column of the CSV.#
Here is more text in the first group.##
Could you please try following.
awk '
BEGIN{
OFS="#"
}
/^\*\*/{
flag=1
header=(header?header OFS:"")$0
if(value){
value=value ORS}
next
}
{
if(flag){
ofs=""
}
else{
ofs="#"
}
flag=""
value=(value?value ofs:"")$0
}
END{
print header ORS value"##"
}' Input_file

Comparing split strings inside fields of two CSV files

I have a CSV file (file1) that looks something like this:
123,info,ONE NAME
124,info,ONE VARIATION
125,info,NAME ANOTHER
126,info,SOME TITLE
and another CSV file (file2) that looks like this:
1,info,NAME FIRST
2,info,TWO VARIATION
3,info,NAME SECOND
4,info,ANOTHER TITLE
My desired output would be:
1,123,NAME FIRST,ONE NAME
3,125,NAME SECOND,NAME ANOTHER
Where if the first word in comma delimited field 3 (ie: NAME in line 1) of file2 is equal to any of the words in field 3 of file1, print a line with format:
field1(file2),field1(file1),field3(file2),field3(file1)
Each file has the same number of lines and matches are only made when each has the same line number.
I know I can split fields and get the first word in field3 in Awk like this:
awk -F"," '{split($3,a," "); print a[1]}' file
But since I'm only moderately competent in Awk, I'm at a loss for how to approach a job where there are two files compared using splits.
I could do it in Python like this:
with open('file1', 'r') as f1, open('file2', 'r') as f2:
l1 = f1.readlines()
l2 = f2.readlines()
for i in range(len(l1)):
line_1 = l1[i].split(',')
line_2 = l2[i].split(',')
field_3_1 = line_1[2].split()
field_3_2 = line_2[2].split()
if field_3_2[0] in field_3_1:
one = ' '.join(field_3_1)
two = ' '.join(field_3_2)
print(','.join((line_2[0], line_1[0], two, one)))
But I'd like to know how a job like this would be done in Awk as occasionally I use shells where only Awk is available.
This seems like a strange task to need to do, and my example I think can be a bit confusing, but I need to perform this to check for broken/ill-formatted data in one of the files.
awk -F, -vOFS=, '
{
num1 = $1
name1 = $3
split(name1, words1, " ")
getline <"file2"
split($3, words2, " ")
for (i in words1)
if (words2[1] == words1[i]) {
print $1, num1, $3, name1
break
}
}
' file1
Output:
1,123,NAME FIRST,ONE NAME
3,125,NAME SECOND,NAME ANOTHER
You can try something along the lines, although the following prints only one match for each line in second file:
awk -F, 'FNR==NR {
count= split($3, words, " ");
for (i=1; i <= count; i++) {
field1hash[words[i]]=$1;
field3hash[$1]=$3;
}
next;
}
{
split($3,words," ");
if (field1hash[words[1]]) {
ff1 = field1hash[words[1]];
print $1","ff1","$3","field3hash[ff1]
}
}' file1 file2
I like #ooga's answer better than this:
awk -F, -v OFS=, '
NR==FNR {
split($NF, a, " ")
data[NR,"word"] = a[1]
data[NR,"id"] = $1
data[NR,"value"] = $NF
next
}
{
n = split($NF, a, " ")
for (i=1; i<=n; i++)
if (a[i] == data[FNR,"word"])
print data[FNR,"id"], $1, data[FNR,"value"], $NF
}
' file2 file1

Remove Rows From CSV Where A Specific Column Matches An Input File

I have a CSV that contains multiple columns and rows [File1.csv].
I have another CSV file (just one column) that lists a specific words [File2.csv].
I want to able to take remove rows within File1 if any columns match any of the words listed in File2.
I originally used this:
grep -v -F -f File2.csv File1.csv > File3.csv
This worked, to a certain extent. This issue I ran into was with columns that had more than word in it (ex. word1,word2,word3). File2 contained word2 but did not delete that row.
I tired spreading the words apart to look like this: (word1 , word2 , word3), but the original command did not work.
How can I remove a row that contains a word from File2 and may have other words in it?
One way using awk.
Content of script.awk:
BEGIN {
## Split line with a doble quote surrounded with spaces.
FS = "[ ]*\"[ ]*"
}
## File with words, save them in a hash.
FNR == NR {
words[ $2 ] = 1;
next;
}
## File with multiple columns.
FNR < NR {
## Omit line if eigth field has no interesting value or is first line of
## the file (header).
if ( $8 == "N/A" || FNR == 1 ) {
print $0
next
}
## Split interested field with commas. Traverse it searching for a
## word saved from first file. Print line only if not found.
## Change due to an error pointed out in comments.
##--> split( $8, array, /[ ]*,[ ]*/ )
##--> for ( i = 1; i <= length( array ); i++ ) {
len = split( $8, array, /[ ]*,[ ]*/ )
for ( i = 1; i <= len; i++ ) {
## END change.
if ( array[ i ] in words ) {
found = 1
break
}
}
if ( ! found ) {
print $0
}
found = 0
}
Assuming File1.csv and File2.csv have content provided in comments of Thor's answer (I suggest to add that information to the question), run the script like:
awk -f script.awk File2.csv File1.csv
With following output:
"DNSName","IP","OS","CVE","Name","Risk"
"ex.example.com","1.2.3.4","Linux","N/A","HTTP 1.1 Protocol Detected","Information"
"ex.example.com","1.2.3.4","Linux","CVE-2011-3048","LibPNG Memory Corruption Vulnerability (20120329) - RHEL5","High"
"ex.example.com","1.2.3.4","Linux","CVE-2012-2141","Net-SNMP Denial of Service (Zero-Day) - RHEL5","Medium"
"ex.example.com","1.2.3.4","Linux","N/A","Web Application index.php?s=-badrow Detected","High"
"ex.example.com","1.2.3.4","Linux","CVE-1999-0662","Apache HTTPD Server Version Out Of Date","High"
"ex.example.com","1.2.3.4","Linux","CVE-1999-0662","PHP Unsupported Version Detected","High"
"ex.example.com","1.2.3.4","Linux","N/A","HBSS Common Management Agent - UNIX/Linux","High"
You could convert split lines containing multiple patterns in File2.csv.
Below uses tr to convert lines containing word1,word2 into separate lines before using them as patterns. The <() construct temporarily acts as a file/fifo (tested in bash):
grep -v -F -f <(tr ',' '\n' < File2.csv) File1.csv > File3.csv

Making csv from txt files

I have a lot of txt files like this:
Title 1
Text 1(more then 1 line)
And I would like to make one csv file from all of them that it will look like this:
Title 1,Text 1
Title 2,Text 2
Title 3,Text 3
etc
How could I do it? I think that awk is good for it but don't know how to realize it.
May I suggest:
paste -d, file1 file2 file3
To handle large numbers of files, max 40 per output file (untested, but close):
xargs -n40 files... echo >tempfile
num=1
for line in $(<tempfile)
do
paste -d, $line >outfile.$num
let num=num+1
done
This is approximately what you posted with some improvements.
for text in *
do
awk 'BEGIN {q="\""; print q}
NR==1 {
gsub(" "," ") # why?
gsub("Title: *","")
print
}
NR>1 {
gsub(" "," ") # why?
gsub("Content: *","")
gsub(q,q q)
print
}
END {print q}' "$text" >> ../final
done
Edit:
If you have a bunch of files that consist of only two lines, try this:
sed 'N;s/\n/,/' file*.txt
If the files contain more than two lines each then it will put each pair of lines on the same line separated by a comma.
Given 3 files containing the following data:
file1.txt
Heading 1
Text 1
Text 2
file2.txt
Heading 2
Text 1
file3.txt
Heading 3
Text 1
text 2
Text 3
The expected results are:
Heading 1,Text 1,Text 2
Heading 2,Text1
Heading 3,Text 1,text 2,Text 3
This is accomplished using the program createcsv.awk below invoked as
gawk -f createcsv.awk file1.txt file2.txt file3.txt
createcsv.awk
{
if (1 == FNR) {
# It is the first line of a new file
if (csvline != "") {
# First file or empty files we can ignore
print csvline;
}
csvline = "";
delimiter = "";
}
csvline = csvline delimiter $0;
if ("" == delimiter) { delimiter="," }
}
END{
print csvline;
}