I run a shell command that returns a list of repeated values like this (note the indentation):
Name: vm346
cpu 1 (12%) 6150m (76%)
memory 1130Mi (7%) 1130Mi (7%)
Name: vm847
cpu 6 (75%) 30150m (376%)
memory 12980Mi (87%) 12980Mi (87%)
Name: vm848
cpu 3500m (43%) 17150m (214%)
memory 6216Mi (41%) 6216Mi (41%)
I am trying to transform that data like this (in csv):
vm346,1,(12%),6150m,(76%),1130Mi,(7%),1130Mi,(7%)
vm847,6,(75%),30150m,(376%),12980Mi,(87%),12980Mi,(87%)
vm848,3500m,(43%),17150m,(214%),6216Mi,(41%),6216Mi,(41%)
The problem is that any given dataset like the one above is always on more than one line.
when I pipe that into it awk it drives me mad because even if I use:
BEGIN{ FS="\n" }
to try and stitch the data together in one line, it doesn't work. No matter what I do, awk keeps the name value as a separated line above everything else.
I am sorry I haven't much code to share but I have been spinning my wheels with this for a few hours now and I am running out of ideas...
I can solve this in Perl:
perl -ane 'print join ",", #F[1 .. $#F]; print $F[0] eq "memory" ? "\n" : ","'
It should be easy to translate it to awk if you need it.
How does it work?
-a splits each line on whitespace into the #F array
-n reads the input line by line and runs the code specified after -e for each line
We print all the elements but the first one separated by commas (see join)
We then look at the first column, if it's memory, we are at the last line of the block, so we print a newline, otherwise we print a comma
With AWK, one option is to set RS to "Name: ", and ignore the first record with NR > 1, e.g.
awk -v RS="Name: " 'BEGIN{OFS=","} NR > 1 {print $1, $3, $4, $5, $6, $8, $9, $10, $11}' file
#> vm346,1,(12%),6150m,(76%),1130Mi,(7%),1130Mi,(7%)
#> vm847,6,(75%),30150m,(376%),12980Mi,(87%),12980Mi,(87%)
#> vm848,3500m,(43%),17150m,(214%),6216Mi,(41%),6216Mi,(41%)
awk '{$1=""}1' | paste -sd' \n' - | awk '{$1=$1}1' OFS=,
Get rid of the first column. Join every three rows. Same idea with sed:
sed 's/^ *[^ ]* *//' | paste -sd' \n' - | sed 's/ */,/g'
Something else:
awk '
$1=="Name:" {
sep=ors
ors=ORS
} {
for (i=2;i<=NF;++i) {
printf "%s%s",sep,$i
sep=OFS
}
} END {printf "%s",ors}'
Or if you want to print an ORS based on the first field being "memory" (note that this program may end without printing a terminating ORS):
awk '{for (i=2;i<=NF;++i) printf "%s%s",$i,(i==NF && $1=="memory" ? ORS : OFS)}'
something else else:
awk -v OFS=, '
index($0,$1)==1 {
OFS=ors
ors=ORS
} {
$1=""
printf "%s",$0
OFS=ofs
} END {printf "%s",ors} BEGIN {ofs=OFS}'
This might work for you (GNU sed):
sed -nE '/^ +\S+ +/{s///;H;$!d};x;/./s/\s+/,/gp;x;s/^\S+ +//;h' file
In overview the sed program processes indented lines, already gathered lines (except in the case that the current line is the first line of the file) and non-indented lines.
Turn off implicit printing and enable extended regexp's. (-nE).
If the current line is indented, remove the indent, the first field and any following spaces, append the result to the hold space and if it is not the last line, delete it.
Otherwise, check the hold space for gathered lines and if found, replace one or more whitespaces by commas and print the result. Then prep the current line by removing the first field and any following spaces and replace the hold space with the result.
The solution seems logically back-to-front, but programming in this style avoids having to check for end-of-file multiple times and invoking labels and gotos.
N.B. This solution will work for any number of indented lines.
Here is a ruby to do that:
ruby -e '
s=$<.read
s.scan(/^([^ \t]+:)([\s\S]+?)(?=^\1|\z)/m). # parse blocks
map(&:last). # get data part
# parse and join the data fields:
map{|block| block.split(/\n[ \t]+[^ \t]+[ \t]+/)}.
map{|lines| lines.map(&:strip).join(" ").split().join(",")}.
each{|l| puts "#{l}"}
' file
vm346,1,(12%),6150m,(76%),1130Mi,(7%),1130Mi,(7%)
vm847,6,(75%),30150m,(376%),12980Mi,(87%),12980Mi,(87%)
vm848,3500m,(43%),17150m,(214%),6216Mi,(41%),6216Mi,(41%)
The advantage is that this is not dependent on the number of lines or the number of fields. It is parsing data that is in blocks of the form:
START: ([ \t]+[data_with_no_space])*\n
l1 ([ \t]+[data_with_no_space])*\n
...
START:
...
Works this way:
Parse the blocks with THIS REGEX;
Save an array of the data elements;
Join the sub arrays and then split into data fields;
Join(',') to make a csv.
So I have a perl module that uses a bash command to obtain the file(s) with certain "table" names. In my specific case, it is looking for tables with the name "event", but I need this to work with all names too.
Currently, I have the following code in my perl script to obtain MYI files with the name table, and I am receiving not only event_* but also event_extra_data_* as well. For my example, I only need the 2nd table that exists in my database for event_. As my test info, I have, currently,
event_1459161160_0
event_1459182760_0
event_extra_data_1459182745_0
event_extra_data_1459182760_0
which are partitioned tables from tables "event" and event_extra_data which is the value that the $table variable sees below.
Anyways, my question is, how do i limit this to only receiving event_1459182760_0.MYI and not event_extra_data_1459182760_0.MYI which it is currently getting?
elsif ($sql =~ /\{LAST\}/i )
{
$cmd = 'ls -1 /var/lib/mysql/sfsnort/'.$table.'_*MYI | grep -v template | tail -n1 | cut -d"/" -f6 | cut -d"." -f1';
$value = `$cmd`;
print "Search Value: $value\n";
if ($value eq "")
{
$sql = ""; # same as with FIRST
}
else
{
$sql =~ s/\{LAST\}/$value/g;
}
}
Don't parse ls - there's no point, and it's prone to causing problems.
I would point out this - the glob function within perl allows you to do to a limited number of "regex-like" patterns. (Note - they aren't regex, so don't get them mixed up).
foreach my $filename ( glob "event_[0-9]*" ) {
#do something with $filename
}
If you're just after the last - when sorted numerically:
my ( $last ) = reverse sort glob "event_[0-9]*";
Given you have a single path, then you should be able to:
my ( $last ) = reverse sort glob "/var/lib/mysql/sfsnort/event_[0-9]*.MYI";
Note - that this works, assuming you're working with time() numeric values - it's doing an alphanumeric sort (and on directory names too).
If that isn't a valid assumption, you'll need a custom sort - which is quite easy, you can feed sort a subroutine to sort by.
Either:
sort { my ($a1) = $a =~ /(\d+)/; my ($b1) = $b =~ /(\d+)/; $b1 <=> $a1 }
To extract the first 'string of digits' from the path. (note - also includes directories).
Or use the -M file test:
sort { -M $a <=> -M $b }
Which will read modification time from the file (technically -M is age in days).
You can remove the reverse if you custom sort, just by swapping $a and $b.
Though I think this would be better done all in perl, to answer your specific question about how to get event_* but not event_extra*, you could of course add that to your grep to filter out, or you could use a different glob, like $table_[0-9]* if there's always an _ then a digit after the table name.
In perl you could do it something like the following though:
opendir( DIR, '/var/lib/mysql/sfsnort/' );
my #files = sort grep { /$table_\d/ } readdir( DIR );
closedir( DIR );
$files[$#files] =~ /(^[^.]+)/;
my $value = $1;
I want to compare the lines of the same column in a csv file and keep only the lines that respect the following conditions
1.if the first pattern is the same as the one in the previous line and
2.the difference between the values in the second column equal abs(1)
for example if I have this lines
aaaa;12
aaaa;13
bbbb;11
bbbb;9
cccc;9
cccc;8
I will keep only
aaaa;12
aaaa;13
cccc;9
cccc;8
The logic would work this way:
If the previous pattern is not equal to this pattern, then remember the this pattern and this value as the new "previous", and move to the next line.
Otherwise, if the difference between the previous value and this value equals 1 or -1 (awk does not have an abs() function) then print the previous pattern and value and print this line.
Take a stab at translating that into code, and come back when you have questions.
Given:
$ echo "$test"
aaaa;12
aaaa;13
bbbb;11
bbbb;9
cccc;9
cccc;8
You can do something like:
$ echo "$test" | awk -F ";" 'function abs(v) {return v < 0 ? -v : v} $1==l1 && abs($2-l2)==1 {print l1 FS l2 RS $0} {l1=$1;l2=$2}'
aaaa;12
aaaa;13
cccc;9
cccc;8
I would like to get the line number using grep command, but I am getting the error message when search pattern is not a single word:
couldn't read file "Pattern": no such file or directory
How should be the proper usage of the grep? The code is here:
set status [catch {eval exec grep -n '$textToGrep' $fileName} lineNumber]
if { $status != 0 } {
#error
} else {
puts "lineNumber = $lineNumber"
}
Also if the search pattern is not matched at all, the returned value is : "child process exited abnormally"
Here is the simple test case:
set textToGrep "<BBB name=\"BBBRM\""
file contents:
<?xml version="1.0"?>
<!DOCTYPE AAA>
<AAA>
<BBB name="BBBRM" />
</AAA>
Well, I also get problems with your code and a single word pattern!
First of all, I don't think you need the eval command, because catch itself does an evaluation of its first argument.
Then, the problem is that you put the $textToGrep variable in exec inside single quotes ', which have no meaning to Tcl.
Therefore, if the content of textToGrep is foo, you are asking grep to search for the string 'foo'. If that string, including the single quotes, is not found in the file, you get the error.
Try to rewrite your first line with
set status [catch {exec grep -n $textToGrep $fileName} lineNumber]
and see if it works. Also, read the exec man page, which explains well these problems.
If your system has tcllib install, you can use the fileutil::grep command from the fileutil package:
package require fileutil
set fileName data.xml
set textToGrep {<BBB +name="BBBRM"}; # Update: Add + for multi-space match
set grepResult [::fileutil::grep $textToGrep $fileName]
foreach result $grepResult {
# Example result:
# data.xml:4: <BBB name="BBBRM" />
set lineNumber [lindex [split $result ":"] 1]
puts $lineNumber
# Update: Get the line, squeeze the spaces before name=
set line [lindex [split $result ":"] 2]
regsub { +name=} $line " name=" line
puts $line
}
Discussion
When assigning value to textToGrep, I used the curly braces, thus allowing double quote inside without having to escape them.
the result of the ::fileutil::grep command is a lits of strings. Each string contains the file name, line number, and the line itself; separated by colon.
One way to extract the line number is to first split the string (result) into pieces, using the colon as a separator. Next, I use lindex to grab the second item (index=1, since list is zero-base).
I have updated the code to account for case where there are multiple spaces before name=
There are two problems here:
Pattern matching does not work.
grep exits with error child process
exited abnormally when pattern is not found
The first problem is because you are not enclosing the textToGrep within double quotes(instead of single quotes). So your code should be:
[catch {exec grep -n "$textToGrep" $fileName} lineNumber]
Second problem is because of the exit status of grep command. grep exits with error when the pattern is not found. Here is the try on a shell:
# cat file
pattern
pattern with multiple spaces
# grep pattern file
pattern
pattern with multiple spaces
# echo $?
0
# grep nopattern file
# echo $?
1
EDIT:
In your case you have special characters such as < and > (which have special meaning on a shell).
set textToGrep "<BBB name=\"BBBRM\""
regsub -all -- {<} "$textToGrep" "\\\<" textToGrep
regsub -all -- {>} "$textToGrep" "\\\>" textToGrep
set textToGrep {\<BBB name="BBBRM"}
catch {exec grep -n $textToGrep $fileName} status
if {![regexp "child process" $status]} {
puts $status
} else {
puts "no word found"
}
I think you should do regular expression with child process. Just check above code if it works. In if statement you can process the status command as you like.
With the given example (in your post) the above code works only you need to use backslash for the "<" in the textToGrep variable
I have to extract data from JSON file depending on a specific key. The data then has to be filtered (based on the key value) and separated into different fixed width flat files. I have to develop a solution using shell scripting.
Since the data is just key:value pair I can extract them by processing each line in the JSON file, checking the type and writing the values to the corresponding fixed-width file.
My problem is that the input JSON file is approximately 5GB in size. My method is very basic and would like to know if there is a better way to achieve this using shell scripting ?
Sample JSON file would look like as below:
{"Type":"Mail","id":"101","Subject":"How are you ?","Attachment":"true"}
{"Type":"Chat","id":"12ABD","Mode:Online"}
The above is a sample of the kind of data I need to process.
Give this a try:
#!/usr/bin/awk
{
line = ""
gsub("[{}\x22]", "", $0)
f=split($0, a, "[:,]")
for (i=1;i<=f;i++)
if (a[i] == "Type")
file = a[++i]
else
line = line sprintf("%-15s",a[i])
print line > file ".fixed.out"
}
I made assumptions based on the sample data provided. There is a lot based on those assumptions that may need to be changed if the data varies much from what you've shown. In particular, this script will not work properly if the data values or field names contain colons, commas, quotes or braces. If this is a problem, it's one of the primary reasons that a proper JSON parser should be used. If it were my assignment, I'd push back hard on this point to get permission to use the proper tools.
This outputs lines that have type "Mail" to a file named "Mail.fixed.out" and type "Chat" to "Chat.fixed.out", etc.
The "Type" field name and field value ("Mail", etc.) are not output as part of the contents. This can be changed.
Otherwise, both the field names and values are output. This can be changed.
The field widths are all fixed at 15 characters, padded with spaces, with no delimiters. The field width can be changed, etc.
Let me know how close this comes to what you're looking for and I can make some adjustments.
perl script
#!/usr/bin/perl -w
use strict;
use warnings;
no strict 'refs'; # for FileCache
use FileCache; # avoid exceeding system's maximum number of file descriptors
use JSON;
my $type;
my $json = JSON->new->utf8(1); #NOTE: expect utf-8 strings
while(my $line = <>) { # for each input line
# extract type
eval { $type = $json->decode($line)->{Type} };
$type = 'json_decode_error' if $#;
$type ||= 'missing_type';
# print to the appropriate file
my $fh = cacheout '>>', "$type.out";
print $fh $line; #NOTE: use cache if there are too many hdd seeks
}
corresponding shell script
#!/bin/bash
#NOTE: bash is used to create non-ascii filenames correctly
__extract_type()
{
perl -MJSON -e 'print from_json(shift)->{Type}' "$1"
}
__process_input()
{
local IFS=$'\n'
while read line; do # for each input line
# extract type
local type="$(__extract_type "$line" 2>/dev/null ||
echo json_decode_error)"
[ -z "$type" ] && local type=missing_type
# print to the appropriate file
echo "$line" >> "$type.out"
done
}
__process_input
Example:
$ ./script-name < input_file
$ ls -1 *.out
json_decode_error.out
Mail.out