I have several CSV files (all have the same number of rows and columns). Each file follows this format:
1 100.23 1 102.03 1 87.65
2 300.56 2 131.43 2 291.32
. . . . . .
. . . . . .
200 213.21 200 121.81 200 500.21
I need to extract columns 2, 4 and 6, and add them to a single CSV file.
I have a loop in my shell script which goes through all the CSV files, extracts the columns, and appends these columns to a single file:
#output header column
awk -F"," 'BEGIN {OFS=","}{ print $1; }' "$input" > $output
for f in "$1"*.csv;
do
if [[ -f "$f" ]] #removes symlinks (only executes on files with .csv extension)
then
fname=$(basename $f)
arr+=("$fname") #array to store filenames
paste -d',' $output <(awk -F',' '{ print $2","$4","$6; }' "$f") > temp.csv
mv temp.csv "$output"
fi
done
Running this produces this output:
1 100.23 102.03 87.65 219.42 451.45 903.1 ... 542.12 321.56 209.2
2 300.56 131.43 291.32 89.57 897.21 234.52 125.21 902.25 254.12
. . . . . . . . . .
. . . . . . . . . .
200 213.23 121.81 500.21 231.56 5023.1 451.09 ... 121.09 234.45 709.1
My desired output is a single CSV file that looks something like this:
1.csv 1.csv 1.csv 2.csv 2.csv 2.csv ... 700.csv 700.csv 700.csv
1 100.23 102.03 87.65 219.42 451.45 903.1 542.12 321.56 209.2
2 300.56 131.43 291.32 89.57 897.21 234.52 125.21 902.25 254.12
. . . . . . . . . .
. . . . . . . . . .
200 213.23 121.81 500.21 231.56 5023.1 451.09 ... 121.09 234.45 709.1
In other words, I need a header row containing the file names in order to identify which files the columns were extracted from. I can't seem to wrap my head around how to do this.
What is the easiest way to achieve this (preferably using awk)?
I was thinking of storing the file names into an array, inserting a header row and then print the array but I can't figure out the syntax.
So, based on a few assumptions:
the inputs are called "*.csv" but they're actually whitespace-separated, as they appear.
the odd-numbered input columns just repeat the row number 3 times, and can be ignored
the column headings are just the filenames, repeated 3 times each
they are input to some other program, and the numbers are left-justified anyway, so you aren't particular about the column formatting (columns lining up, decimals aligned, ...)
Humble apologies because code PRE formatting is not working for me here
f=$(set -- *.csv; echo $*)
(echo $f; paste $f) |
awk 'NR==1 { for (i=1; i<=NF; i++) {x=x" "$i" "$i" "$i} }
NR > 1 { x=$1; for (i=2; i<= NF; i+=2) {x=x" "$i} }
{print x}'
hth
i need a regular expression to separate left and right part of this pattern . . . . . :
for e.g.
Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . : alumnus.co.in
Description . . . . . . . . . . . : Microsoft ISATAP Adapter
Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
and store them into two variable.
i have written this regular expression
regexp {([[a-z]*[0-9]*.*[0-9]*[a-z]*]*" "):([[a-z]*[0-9]*.*[0-9]*[a-z]*]*)} 6*rag5hu. . :4ku5-1a543m match a b
but it is not working.
Any help will be appreciated.
I would do this:
set text {Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . : alumnus.co.in
Description . . . . . . . . . . . : Microsoft ISATAP Adapter
Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes}
foreach line [split $text \n] {
if {[regexp {^(.+?)(?: \.)+ : (.+)$} $line -> name value]} {
puts "$name => $value"
}
}
outputs
Media State => Media disconnected
Connection-specific DNS Suffix => alumnus.co.in
Description => Microsoft ISATAP Adapter
Physical Address. => 00-00-00-00-00-00-00-E0
DHCP Enabled. => No
Autoconfiguration Enabled => Yes
This uses a non-greedy quantifier (+?), and that make every quantifier in the regex non-greedy. You then require the anchors so that the bits you want to capture contain all the text you need.
Borrowing the definition of text:
package require textutil
foreach line [split $text \n] {
lassign [::textutil::splitx [string trim $line] {\s*(?:\. )+:\s*}] a b
puts "a: $a\nb: $b"
}
Giving the output
a: Media State
b: Media disconnected
a: Connection-specific DNS Suffix
b: alumnus.co.in
a: Description
b: Microsoft ISATAP Adapter
a: Physical Address
b: 00-00-00-00-00-00-00-E0
a: DHCP Enabled
b: No
a: Autoconfiguration Enabled
b: Yes
Documentation:
foreach,
lassign,
package,
puts,
split,
string,
textutil (package)
i am new to Perl and using Perl in my back end script and HTML in front end and CGI framework . Initially i am reading some details from flat file and displaying them . I am then trying to print the details in the form of checkboxes. i am using use Data::Dumper; module to check the values . however my input value of the last checkbox has an extra space as shown below
print '<form action="process.pl " method="POST" id="sel">';
print '<input type="checkbox" onClick="checkedAll()">Select All<br />';
foreach my $i (#robos) {
print '<input type="checkbox" name="sel" value="';
print $i;
print '">';
print $i;
print '<br />';
}
print '<input type="submit" value="submit">';
print '</form>';
print "response " . Dumper \$data;
however in Process.pl the selected value is retrieved using
#server = $q->param('sel');
but the response of selection is
print "response " . Dumper \#server; => is showing [ 'ar', 'br', 'cr ' ]; which is
showing an additional space after cr . I dont know whats is wrong .
In my flat file i have
RMCList:ar:br:cr
i am then reading the details into an array and splitting using
foreach my $ln (#lines)
{
if ($ln =~ /^RMCList/)
{
#robos = split (/:/,$ln);
}
} #### end of for statement
shift (#robos);
print "The robos are ", (join ',', map { '"' . $_ . '"' } #robos), "\n";
This is showing the robos are:
"ar","br","cr
"
While reading the file into #lines, you forgot to remove the newline. This newline is interpreted as a simple space in a HTML document.
Remove the newlines with the chomp function like
chomp #lines;
before looping over the lines.
Actually, don't read the file into an array at all, unless you have to access each line more than once. Otherwise, read one line at the time:
open my $infile, "<", $filename or die "Cannot open $filename: $!";
my #robos;
while (my $ln = <$infile>) {
chomp $ln;
#robos = split /:/, $ln if $ln =~ /^RMCList/;
}
(link for the picture http://d.pr/i/SCks just in case..)
On the three browsers above, the arabic text is well displayed.
On Chrome (V 15) bellow, it's a bit not well organized.
What seems to be the problem ?
PS: i'm using Fusioncharts
EDIT:
That's how I'm writing it:
echo '<set label="' . addslashes($label) . ' - \' + value_' . $i . ' + \'% (' . $label_value . ') " value="' . $data_pie['nb'] . '" />\';' . "\n";
what i am trying to do is get the contents of a file from another server. Since im not in tune with perl, nor know its mods and functions iv'e gone about it this way:
my $fileContents;
if( $md5Con =~ m/\.php$/g ) {
my $ftp = Net::FTP->new($DB_ftpserver, Debug => 0) or die "Cannot connect to some.host.name: $#";
$ftp->login($DB_ftpuser, $DB_ftppass) or die "Cannot login ", $ftp->message;
$ftp->get("/" . $root . $webpage, "c:/perlscripts/" . md5_hex($md5Con) . "-code.php") or die $ftp->message;
open FILE, ">>c:/perlscripts/" . md5_hex($md5Con) . "-code.php" or die $!;
$fileContents = <FILE>;
close(FILE);
unlink("c:/perlscripts/" . md5_hex($md5Con) . "-code.php");
$ftp->quit;
}
What i thought id do is get the file from the server, put on my local machine, edit the content, upload to where ever an then delete the temp file.
But I cannot seem to figure out how to get the contents of the file;
open FILE, ">>c:/perlscripts/" . md5_hex($md5Con) . "-code.php" or die $!;
$fileContents = <FILE>;
close(FILE);
keep getting error;
Use of uninitialized value $fileContents
Which im guessing means it isn't returning a value.
Any help much appreciated.
>>>>>>>>>> EDIT <<<<<<<<<<
my $fileContents;
if( $md5Con =~ m/\.php$/g ) {
my $ftp = Net::FTP->new($DB_ftpserver, Debug => 0) or die "Cannot connect to some.host.name: $#";
$ftp->login($DB_ftpuser, $DB_ftppass) or die "Cannot login ", $ftp->message;
$ftp->get("/" . $root . $webpage, "c:/perlscripts/" . md5_hex($md5Con) . "-code.php") or die $ftp->message;
my $file = "c:/perlscripts/" . md5_hex($md5Con) . "-code.php";
{
local( $/ ); # undefine the record seperator
open FILE, "<", $file or die "Cannot open:$!\n";
my $fileContents = <FILE>;
#print $fileContents;
my $bodyContents;
my $headContents;
if( $fileContents =~ m/<\s*body[^>]*>.*$/gi ) {
print $0 . $1 . "\n";
$bodyContents = $dbh->quote($1);
}
if( $fileContents =~ m/^.*<\/head>/gi ) {
print $0 . $1 . "\n";
$headContents = $dbh->quote($1);
}
$bodyTable = $dbh->quote($bodyTable);
$headerTable = $dbh->quote($headerTable);
$dbh->do($createBodyTable) or die " error: Couldn't create body table: " . DBI->errstr;
$dbh->do($createHeadTable) or die " error: Couldn't create header table: " . DBI->errstr;
$dbh->do("INSERT INTO $headerTable ( headData, headDataOutput ) VALUES ( $headContents, $headContents )") or die " error: Couldn't connect to database: " . DBI->errstr;
$dbh->do("INSERT INTO $bodyTable ( bodyData, bodyDataOutput ) VALUES ( $bodyContents, $bodyContents )") or die " error: Couldn't connect to database: " . DBI->errstr;
$dbh->do("INSERT INTO page_names (linkFromRoot, linkTrue, page_name, table_name, navigation, location) VALUES ( $linkFromRoot, $linkTrue, $page_name, $table_name, $navigation, $location )") or die " error: Couldn't connect to database: " . DBI->errstr;
unlink("c:/perlscripts/" . md5_hex($md5Con) . "-code.php");
}
$ftp->quit;
}
the above using print WILL print the whole file. BUT, for some reason the two regular expresions are returning false. Any idea why?
if( $fileContents =~ m/<\s*body[^>]*>.*$/gi ) {
print $0 . $1 . "\n";
$bodyContents = $dbh->quote($1);
}
if( $fileContents =~ m/^.*<\/head>/gi ) {
print $0 . $1 . "\n";
$headContents = $dbh->quote($1);
}
This is covered in section 5 of the Perl FAQ included with the standard distribution.
How can I read in an entire file all at once?
You can use the Path::Class::File::slurp module to do it in one step.
use Path::Class;
$all_of_it = file($filename)->slurp; # entire file in scalar
#all_lines = file($filename)->slurp; # one line per element
The customary Perl approach for processing all the lines in a file is to do so one line at a time:
open (INPUT, $file) || die "can't open $file: $!";
while (<INPUT>) {
chomp;
# do something with $_
}
close(INPUT) || die "can't close $file: $!";
This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often—if not almost always—the wrong approach. Whenever you see someone do this:
#lines = <INPUT>;
you should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard Tie::File module, or the DB_File module's $DB_RECNO bindings, which allow you to tie an array to a file so that accessing an element the array actually accesses the corresponding line in the file.
You can read the entire filehandle contents into a scalar.
{
local(*INPUT, $/);
open (INPUT, $file) || die "can't open $file: $!";
$var = <INPUT>;
}
That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:
$var = do { local $/; <INPUT> };
For ordinary files you can also use the read function.
read( INPUT, $var, -s INPUT );
The third argument tests the byte size of the data on the INPUT filehandle and reads that many bytes into the buffer $var.
Use Path::Class::File::slurp if you want to read all file contents in one go.
However, more importantly, use an HTML parser to parse HTML.
open FILE, "c:/perlscripts" . md5_hex($md5Con) . "-code.php" or die $!;
while (<FILE>) {
# each line is in $_
}
close(FILE);
will open the file and allow you to process it line-by-line (if that's what you want - otherwise investigate binmode). I think the problem is in your prepending the filename to open with >>. See this tutorial for more info.
I note you're also using regular expressions to parse HTML. Generally I would recommend using a parser to do this (e.g. see HTML::Parser). Regular expressions aren't suited to HTML due to HTML's lack of regularity, and won't work reliably in general cases.
Also, if you are in need of editing the contents of the files take a look at the CPAN module
Tie::File
This module relieves you from the need to creation of a temp file for editing the content
and writing it back to the same file.
EDIT:
What you are looking at is a way to slurp the file. May be you have to undefine
the record separator variable $/
The below code works fine for me:
use strict;
my $file = "test.txt";
{
local( $/ ); # undefine the record seperator
open FILE, "<", $file or die "Cannot open:$!\n";
my $lines =<FILE>;
print $lines;
}
Also see the section "Traditional Slurping" in this article.
BUT, for some reason the two regular expresions are returning false. Any idea why?
. in a regular expression by default matches any character except newline. Presumably you have newlines before the </head> tag and after the <body> tag. To make . match any character including newlines, use the //s flag.
I'm not sure what your print $0 . $1 ... code is about; you aren't capturing anything in your matches to be stored in $1, and $0 isn't a variable used for regular expression captures, it's something very different.
if you want to get the content of the file,
#lines = <FILE>;
Use File::Slurp::Tiny. As convenient as File::Slurp, but without the bugs.