Code Golf: recognize ascii art boxes

Code Golf: recognize ascii art boxes - language-agnostic

Came up with this a while ago while doing some data structure work, though it'd make a good code golf: Given a two dimensional array of characters containing ascii art rectangles, produce a list of coordinates and sizes for the rectangles.
Any trivially convertable input or output format is fine (eg: char**, list of strings, lines on standard input; list of four ints, struct, fixed amount +/- for the size; etc).
Similarly, output need not be in any particular order.
You dont have to anything useful for invalid input or malformed rectangles, but you shouldnt to produce valid-looking coordinates for a rectangle that isnt in the input.
No two valid rectangles share a + (though + may appear not only as part of rectangle)
You can assume that all rectangles are at least 3x3: each side has a - or | in it.
Examples:
" "
" +-+ | "
" | | \-"
" +-+ "
(2,1;3,3)
"+--+ +--+"
"| | | |"
"+--+ +--+"
(0,0;4,3), (6,0;4,3)
" +---+ "
"->|...| "
" +---+ "
(2,0;5,3)
"+-+ +--+ +--+"
"| | | | | |"
"+-+ | | + -+"
" | | "
" +--+ +-+ "
" +--+ | "
" +--+ +-+ "
(0,0;3,3), (4,0;4,5) # (2,5;4,2) is fine, but not needed

Perl, 167 165 159 chars
(156 chars if you don't count slurping stdin to #a, just remove the last 3 chars and assign a list of strings representing your input to #a)
Gets input from stdin. Newlines not significant, added for readability. Notice the use of the +++ operator ;P
map{$l=$i++;while($c=/\+-+\+/g){$w=$+[0]-2-($x=$-[0]);
$c++while$a[$l+$c]=~/^.{$x}\|.{$w}\|/;
print"($x,$l;",$w+2,",$c)\n"if$a[$c+++$l]=~/^.{$x}\+-{$w}\+/}}#a=<>
Be liberal in what you accept version, 170 chars
map{$l=$i++;while($c=/\+-*\+/g){pos=-1+pos;$w=$+[0]-2-($x=$-[0]);
$c++while$a[$l+$c]=~/^.{$x}\|.{$w}\|/;
print"($x,$l;",$w+2,",$c)\n"if$a[$c+++$l]=~/^.{$x}\+-{$w}\+/}}#a=<>
Be conservative in what you accept version, 177 chars
map{$l=$i++;while($c=/\+-+\+/g){$w=$+[0]-2-($x=$-[0]);
$c++while$a[$l+$c]=~/^.{$x}\|.{$w}\|/;print
"($x,$l;",$w+2,",$c)\n"if$c>1&&$a[$c+++$l]=~s/^(.{$x})\+(-{$w})\+/$1v$2v/}}#a=<>
Commented version:
#a=<>; # slurp stdin into an array of lines
$l=0; # start counting lines from zero
map{ # for each line
while(/\+-+\+/g){ # match all box tops
$c=1; # initialize height
# x coordinate, width of box - sides
$w=$+[0]-2-($x=$-[0]);
# increment height while there are inner parts
# of a box with x and w coinciding with last top
# (look into next lines of array)
$c++ while $a[$l+$c]=~/^.{$x}\|.{$w}\|/;
# if there is a box bottom on line + height
# with coinciding x and w, print coords
# (after incrementing height)
print "($x,$l;",$w+2,",$c)\n"
if $a[$c+++$l]=~/^.{$x}\+-{$w}\+/
}
$l++ # line++
}#a
Mega test case:
+--+ +-+ +-+ +++ +---+ +-+ +-+-+ +-++-+
|SO| | | | | +++ |+-+| | | | | | | || |
+--+ +-+-+-+ +++ ||+|| +-+ +-+-+ +-++-+
| | |+-+| | |
+-+-+-+ +---+ +-+
| | | |
+-+ +-+
++ +-+ ++ +-+ +- + +--+ +--+ +--+
|| +-+ ++ +-+-+ | | | | | | |
++ | | | | | | | | |
+-+ +--+ + -+ +--+ +--+

Perl - 223 222 216
Golfed version (newlines not significant):
$y=0;sub k{$s=$-[0];"($s,%i;".($+[0]-$s).",%i)"}while(<>){while(/\+-+\+/g){
if(exists$h{&k}){push#o,sprintf k,#{$h{&k}};delete$h{&k}}else{$h{&k}=[$y,2]}}
while(/\|.+?\|/g){++${$h{&k}}[1]if exists$h{&k}}++$y}print"#o\n"
Older de-golfed version:
# y starts at line zero.
$y = 0;
# Abuse Perl's dynamic scoping rules
# to get a key for the hash of current rectangles,
# which indexes rectangles by x and width,
# and is also used as a format string.
sub k {
# The start of the current match.
$s = $-[0];
# $+[0] is the end of the current match,
# so subtract the beginning to get the width.
"($s,%i;" . ($+[0] - $s) . ",%i)"
}
# Read lines from STDIN.
while (<>) {
# Get all rectangle tops and bottoms in this line.
while (/\+-+\+/g) {
# If line is a bottom:
if (exists $h{&k}) {
# Add to output list and remove from current.
push #o, sprintf k, #{$h{&k}};
delete $h{&k}
# If line is a top:
} else {
# Add rectangle to current.
$h{&k} = [$y, 2]
}
}
# Get all rectangle sides in this line.
while (/\|.+?\|/g) {
# Increment the height of the corresponding
# rectangle, if one exists.
++${$h{&k}}[1] if exists $h{&k}
}
# Keep track of the current line.
++$y
}
# Print output.
print join", ",#o
Note that this does not handle junk vertical bars to the left of the rectangles, that is:
+--+ +--+
| | | | |
+--+ +--+
Will incorrectly yield a height of 2 for both. This is because the /\|.+?\|/g pattern starts searching from the beginning of the line. Anyone have a suggestion for how to fix this?

Ruby — 306 260 245 228 168
# 228 chars
g=->(s,u='-'){o=[];s.scan(/\+#{u}+\+/){o<<[$`,$`+$&].map(&:size)};o}
b=t.map{|i|i.split''}.transpose.map{|s|g[s*'','\|']}
(1...t.size).map{|i|i.times{|j|(g[t[i]]&g[t[j]]).map{|x,y|p [x,j,y-x,i-j+1]if(b[x]&b[y-1]&[[j,i+1]])[0]}}}
produces
[0, 0, 3, 3]
[4, 1, 4, 3]
[10, 3, 3, 3]
for t=
["+-+ +--+",
"| | +--+ | |",
"+-+ | | + -+",
" +--+ +-+ ",
" +--+ | | ",
" +--+ +-+ "]
Explanation:
# function returns info about all inclusions of "+---+" in string
# " +--+ +-+" -> [[2,5],[7,9]]
g=->(s,u='-'){o=[];s.scan(/\+#{u}+\+/){o<<[$`,$`+$&].map(&:size)};o}
# mapping transposed input with this function
b=t.map{|i|i.split''}.transpose.map{|s|g[s*'','\|']}
# earlier here was also mapping original input, but later was merged with "analyse"
# "analyse"
# take each pair of lines
(1...t.size).map{|i|i.times{|j|
# find horizontal sides of the same length on the same positions
(g[t[i]]&g[t[j]]).map{|x,y|
# make output if there are correct vertical sides
p [x,j,y-x,i-j+1]if(b[x]&b[y-1]&[[j,i+1]])[0]
}
}}
# yeah, some strange +/-1 magick included ,.)
And more straight-forward 168-chars solution!
t.size.times{|i|t[0].size.times{|j|i.times{|k|j.times{|l|p [l,k,j-l+1,i-k+1]if
t[k..i].map{|m|m[j]+m[l]}*''=~/^\+\+\|+\+\+$/&&t[i][l..j]+t[k][l..j]=~/^(\+-+\+){2}$/}}}}

JavaScript — 156 characters*
Also at http://jsfiddle.net/eR5ee/4/ (only click the link if using Firefox or Chrome) or http://jsfiddle.net/eR5ee/5/ (adapted to Safari and Opera):
var A = [
"+-+ +--+ +--+",
"| | | | | |",
"+-+ | | + -+",
" | | ",
" +--+ +-+ ",
" +--+ | ",
" +--+ +-+ "
]; // not counted
for(y=A.length;--y;)for(;m=/\+-*\+/g(A[y]);){
for(w=m[0].length,z=y;A[--z][x=m.index]+A[z][x+w-1]=="||";);
/^\+-*\+$/(A[z].substr(x,w))&&alert([x,z,w,y-z+1])}
Excluding newlines and whitespace characters, which are completely unnecessary.
Apparently, Firefox and Chrome retain the lastIndex of the first regex. It takes four more characters to keep Safari and Opera out of an infinite loop. To get Internet Explorer working, fourteen more characters would be needed to fix both the above and the error "Function expected." Apparently, "... a regular expression's exec method can be called... indirectly (with regexp(str))" (quoted from Mozilla documentation) does not apply to IE.
The code detects all rectangles 2x2 and larger if no rectangle's bottom touches any other rectangle's top or bottom or a plus or minus sign and none overlap.
The order of the numbers in each alert box (which corresponds to a rectangle) is left, top, width, height. The code does error out if a rectangle extends off the top, but all the coordinates needed have already been output (from specification: "You dont have to (sic) anything useful for invalid input or malformed rectangles...")
Since most major web browsers implement the canvas tag, in several more lines of code, drawing the detected rectangles on screen is possible. http://jsfiddle.net/MquqM/6/ works on all browsers except Internet Explorer and Opera.
Edit: eliminated an unnecessary variable assignment
Edit 2: avoid throwing errors with completely valid input (--y instead of y--), clarify the specific cases the code handles

C (204 186 Characters)
#include<stdio.h>
char H=7,W=14,*S =
"+-+ +--+ +--+"
"| | | | | |"
"+-+ | | + -+"
" | | "
" +--+ +-+ "
" +--+ | "
" +--+ +-+ ";
void main(){
#define F(a,r,t)if(*c==43){while(*(c+=a)==r);t}
char*c,*o,*e=S;while(*(c=e++))
F(1,45,F(W,'|',o=c;F(-1,45,F(-W,'|',c==e-1?
printf("%i,%i %i,%i\n",(c-S)%W,(c-S)/W,(o-c)%W+1,(o-c)/W+1):0;))))
}
The character count is the body of main(). This code will walk the string with e until it reaches the top-left corner of a potential rectangle. It will then check the edges with c and using o to keep track of the bottom-right corner.
Output of the program is:
0,0 3,3
4,0 4,5
2,5 4,2

Python 2.6 - 287 263 254
a = [
"+-+ +--+ +--+",
"| | | | | |",
"+-+ | | + -+",
" | | ",
" +--+ +-+ ",
" +--+ | ",
" +--+ +-+ "
]
l=len
r=range
w,h=l(a[0]),l(a)
[(x,y,u,v)for x in r(0,w)for y in r(0,h)for u in r(x+2,w)for v in r(y+2,h)if a[y][x]==a[v][x]==a[y][u]==a[v][u]=='+' and a[y][x+1:u]+a[v][x+1:u]=="-"*2*(u-x-1)and l([c for c in r(y+1,v-y)if a[c][x]==a[c][u]=='|'])==v-y-1]
evaluated to:
[(0, 0, 3, 3), (4, 0, 4, 5)]

Scala 2.8 - 283 273 269 257
val a = Seq(
"+-+ +--+ +--+",
"| | | | | |",
"+-+ | | + -+",
" | | ",
" +--+ +-+ ",
" +--+ | ",
" +--+ +-+ "
)
// begin golf count
val (w,h) = (a(0).size-1,a.size-1)
for (
x <- 0 to w;
y <- 0 to h;
u <- x+2 to w;
v <- y+2 to h;
if Set(a(y)(x),a(v)(x),a(y)(u),a(v)(u)) == Set(43)
&& (x+1 to u-1).forall(t => (a(y)(t)<<8|a(v)(t)) == 11565)
&& (y+1 to v-1).forall(t => (a(t)(x)<<8|a(t)(u)) == 31868)
) yield (x,y,u-x+1,v-y+1)
// end golf count
evaluates to :
Vector((0,0,3,3), (4,0,4,5))
The for expression evaluates to the answer (the Vector object), that is why I counted only this part (whitespaces removed). Let me know if this is the correct way to count.
How it works
The coordinates of all possible rectangles (actually, only >= 3x3) are generated by the for expression. These coordinates are filtered by looking for the +, - and | at the edges and corners of all rectangles (the if part of the for expression).

Python 2.6 (251 characters)
I come in a bit late, anyway, some fun. Python, using regular expressions. To save a print statement and stay shorter than Fredb219's this won't print anything if you run it as a script, but typing one line at a time in the interpreter it's gonna show the result. Not really solid, it won't handle nested boxes nor most cases more complex than the ones given by DavidX. Haven't done testing, but I think it's likely to show results in a wrong order if something "strange" occurs.
import re
l,a,o=s.index("\n")+1,re.finditer,sorted
o(o(set((m.start(),m.end())for m in a(r'\+-* *-*\+',s)for n in a(r'\|.+\|',s)if(m.start()%l,m.end()%l)==(n.start()%l,n.end()%l)if m.start()+l==n.start()or m.start()-l==n.start())),key=lambda x:x[0]%l)
Input is a single string, lines (all same length) separated by a newline character. Results are the string slices of the top and the bottom for every "good" box, starting top left. It also allows "broken" boxes (i.e. with some space in the middle of one side, not without one entire side). This was just a way to fix an unwanted behavior creating many brand new side effects! :-)
input:
>>>s = """+-+ +--+ +--+
| | | | | |
+-+ | | + -+
| |
+--+ +-+
+--+ |
+--+ +-+ """
or:
>>>s = "+-+ +--+ +--+\n| | | | | |\n+-+ | | + -+\n | | \n +--+ +-+ \n +--+ | \n +--+ +-+ "
then:
>>>import re
>>>l,a,o=s.index("\n")+1,re.finditer,sorted
>>>o(o(set((m.start(),m.end())for m in a(r'\+-* *-*\+',s)for n in a(r'\|.+?\|',s)if(m.start()%l,m.end()%l)==(n.start()%l,n.end()%l)if m.start()+l==n.start()or m.start()-l==n.start())),key=lambda x:x[0]%l)
output:
[(0, 3), (30, 33), (4, 8), (64, 68), (10, 14), (40, 44)]
so (0,3) top of 1st box (30,33) bottom of same box, (4,8) top of 2nd box and so on.

F#, 297 chars
Kinda lame, but simple.
let F a=
for x=0 to Array2D.length1 a-1 do
for y=0 to Array2D.length2 a-1 do
if a.[x,y]='+' then
let mutable i,j=x+1,y+1
while a.[i,y]<>'+' do i<-i+1
while a.[x,j]<>'+' do j<-j+1
printfn"(%d,%d;%d,%d)"x y (i-x+1)(j-y+1)
a.[i,y]<-' '
a.[x,j]<-' '
a.[i,j]<-' '
Look for a plus. Find the one to the right of it. Find the one below it. Print out this rectangle's info, and 'null out' the pluses we already used. Since each plus is only a part of one valid rectangle, that's all we need to do.

XQuery (304 characters)
Here is my solution:
declare variable $i external;let$w:=string-length($i[1]),$h:=count($i)for$y in 1 to$h,$x in 1 to$w,$w in 2 to$w+1 -$x,$h in 1 to$h where min(for$r in (0,$h),$s in 1 to$h return (matches(substring($i[$y+$r],$x,$w),'^\+-*\+$'),matches(substring($i[$y+$s],$x,$w),'^|.*|$')))return ($x -1,$y -1,$w,$h+1,'')
You can run this (with XQSharp) by setting the variable $i to be the lines of the input
>XQuery boxes.xq "i=(' +-+','+-+-+','| | ','+-+ ')" !method=text
2 0 3 2 0 1 3 3
I suppose one could argue that declare variable $i external; is just setting up the input and so doesn't add to the count, in which case 275 characters

Related

How can I clean a TSV file having record or fields separators in one of its fields?

Given a TSV file with col2 that contains either a field or record separator (FS/RS) being respectively a tab or a carriage return which are escaped/surrounded by quotes.
$ printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \cat -vet
col1^Icol2^Icol3$
1^I"A^IB"^I1234$
2^I"CD$
EF"^I567$
+------+---------+------+
| col1 | col2 | col3 |
+------+---------+------+
| 1 | "A B" | 1234 |
| 2 | "CD | 567 |
| | EF" | |
+------+---------+------+
Is there a way in sed/awk/perl or even (preferably) miller/mlr to transform those pesky characters into spaces in order to generate the following result:
+------+---------+------+
| col1 | col2 | col3 |
+------+---------+------+
| 1 | "A B" | 1234 |
| 2 | "CD EF" | 567 |
+------+---------+------+
I cannot get miller 6.2 to make the proper transformation (tried with DSL put/gsub) because it doesn't recognize the tab or CR/LF being part of the columns which breaks the field number:
$ printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | mlr --opprint --barred --itsv cat
mlr : mlr: CSV header/data length mismatch 3 != 4 at filename (stdin) line 2.

A good library cleanly handles things like embedded newlines and quoted separators (in fields)
In a Perl script with Text::CSV
use warnings;
use strict;
use Text::CSV;
my $file = shift // die "Usage: $0 filename\n";
my $csv = Text::CSV->new( { binary => 1, sep_char => "\t", auto_diag => 1 } );
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $row = $csv->getline($fh)) {
s/\s+/ /g for #$row; # collapse multiple spaces, tabs, newlines
$csv->say(*STDOUT, $row);
}
Note the many other options for the constructor that can help handle various irregularities.
This can fit in a one-liner; its functional interface (with csv) is particularly well suited for that.

if you run
printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \
mlr --c2t --fs "\t" clean-whitespace
col1 col2 col3
1 A B 1234
2 CD EF 567
I'm using mlr 6.2.
A way to do it in miller 5 is to use simply the put verb:
printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \
mlr --tsv put -S 'for (k in $*) {$[k] = gsub($[k], "\n", " ")}' then clean-whitespace

perl -MText::CSV_XS=csv -e'
csv
in => *ARGV,
on_in => sub { s/\s+/ /g for #{$_[1]} },
sep_char => "\t";
'
Or s/[\t\n]/ /g if you prefer.
Can be placed all on one line.
Input is accepted from file named by argument or STDIN.

With GNU awk for multi-char RS, RT, and gensub():
$ awk -v RS='"([^"]|"")*"' '{ORS=gensub(/[\n\t]/," ","g",RT)} 1' file
col1 col2 col3
1 "A B" 1234
2 "CD EF" 567
The above just uses RS to isolate each "..." string and saves it in RT, then replaces every \n or \t in that string with a blank and saves the result in ORS, then prints the record.

you absolutely don't need gawk to get this done - here's one that works for mawk, gawk, or macos nawk :
INPUT
--before--
col1 col2 col3
1 "A B" 1234
2 "CD
EF" 567
CODE
{m,n,g}awk '
BEGIN {
1 __=substr((OFS=FS="\t\"")(FS)(ORS=_)\
(RS = "^$"),_+=_^=_<_,_)
}
END {
1 printbefore()
3 for (_^=_<_; _<=NF; _++) {
3 sub(/[\t-\r]+/, ($_~__)?" ":"&", $_)
}
1 print
}
1 function printbefore(_)
{
1 printf("\n\n--before--\n%s\n------"\
"------AFTER------\n\n", $+_)>("/dev/stderr")
}
OUTPUT
———AFTER (using mawk)------
col1 col2 col3
1 "A B" 1234
2 "CD EF" 567
strip out the part about printbefore() that's more for debugging purposes, then it's just
{m,n,g}awk '
BEGIN { __=substr((OFS=FS="\t\"") FS \
(ORS=_) (RS="^$"),_+=_^=_<_,_)
} END {
for(--_;_<=NF;_++) {
sub(/[\t-\r]+/, $_~__?" ":"&",$_) } print }'

AWK: How to merge CSV files and eliminate rows that contain certain values?

I have hundreds of CSV files. Each CSV file is similar to this:
| KEYWORD | NUMBER OF COMPS | AVGE M E (K) | GS/M | EST. A SE/M | C CORE |
|---------|-----------------|--------------|------|-------------|--------|
| Apples | 311 | 12 | N/A | <100 | 10 |
| Bananas | >1,200 | 737 | N/A | 490 | 88 |
| Oranges | 48 | 184 | N/A | N/A | 1 |
| Fruits | 161 | 94 | N/A | - | 6 |
(I have posted this in table format, to make it more readable, but the CSV data is at the bottom of this post).
All the CSV files have the same header row. Only the data is different.
I would like to do the following:
Merge all the CSV files together, but only have 1 header row.
Omit any rows where EST. A SE/M (Column 5) contains any of the following data: <100, N/A or -
Notes about the Data
Sometimes the some or even all cells in the CSV file are wrapped in quotation marks.
Other times they are not.
Sometimes the first column (keyword) may contain multiple words or accented characters.
My code so far
This code merges all the CSV files into 1 without only one heading
awk '(NR == 1) || (FNR > 1)' *.csv > ^0-output.csv
This works perfectly.
However, I am not sure how to delete the unwanted rows after the merge.
So far I have this:
awk '$5 !~ /(<100|N\/A|-)/' ^0-output.csv > ^0-output.csv
But when I use this code, it just produces a blank file.
Plus, I am not sure if there is a way to integrate it in the first line, so it does everything with a single command.
Notes
Here is how the data looks in CSV format
Sample1.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Apples,311,12,N/A,<100,10
Bananas,">1,200",737,N/A,490,88
Oranges,48,184,N/A,N/A,1
Fruits,161,94,N/A,-,63
Sample2.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Dino,588,67,N/A,888,234
Thunder,">1,200",211,N/A,<100,77
Ninja,95,37,N/A,-,878
Sample3.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Blur,84,2454,N/A,-,234
Sample4.csv
"KEYWORD","NUMBER OF COMPS","AVGE M E (K)","GS/M","EST. A SE/M","C CORE"
"hedgehog rolls ròund",32,481,N/A,"878",13
"Clever Fox jumps Hîgh",233,83,N/A,"<100",12
"Bear à lot",122,35,N/A,"-",11
"kitten hîgh life","121","673","32","N/A","15"
Please note: The actual files that the finished script will be used on will have a variety of file names. They will NOT always follow the pattern of sample 1, sample 2 etc.
Expected Output
Expected output: (CSV format)
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Bananas,">1,200",737,N/A,490,88
Dino,588,67,N/A,888,234
"hedgehog rolls ròund",32,481,N/A,"878",13
(Note: It doesn't matter if the expected output keeps the wrapping quote marks as the final CSV file is opened in Apple Numbers)
Expected output: (Readable format)
| KEYWORD | NUMBER OF COMPS | AVGE M E (K) | GS/M | EST. A SE/M | C CORE |
|---------|-----------------|--------------|------|-------------|--------|
| Bananas | >1,200 | 737 | N/A | 490 | 88 |
| Dino | 588 | 67 | N/A | 888 | 234 |
| hedgehog rolls ròund | 588 | 67 | N/A | 888 | 234 |
Environment:
I am using Mac OS X 10.14.6. I am unable to install other versions of awk.

You may just add merge 2 conditions into one using && :
awk -F, 'NR==1 || (FNR>1 && $5 !~ /^(<100|N\/A|-)$/)' *.csv > output.csv
Here $5 !~ /^(<100|N\/A|-)$/) will skip a row if $5 is <100 or - or N/A. It is important to use regex anchors ^ and $ to avoid matching unwanted string such as 1000 or AB-123.
It seems you have a comma in double quotes also in file1.csv. In that case following gnu-awk command should work from you:
awk -v FPAT='"[^"]*"|[^,]*' '
NR == 1 || (FNR > 1 && $5 !~ /^(<100|N\/A|-)*$/)' *.csv > output.csv

EDIT: As per OP's comments there could be a comma in between " too, so to handle that its better to use FPAT, written and tested with GNU awk.
awk -v FPAT='[^,]*|"[^"]+"' '
{ sub(/\r$/,"") }
FNR==1{
if(NR==1){ print }
next
}
$5=="<100"||$5=="N/A"||$5=="-"{
next
}
1
' *.csv
Could you please try following, written and tested with GNU awk on shown samples only.
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
if(NR==1){ print }
next
}
$5=="<100"||$5=="N/A"||$5=="-"{ next }
1
' *.csv
OR in case your values can contain something else also and you want to use regex to match the values which you want to neglect then try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
if(NR==1){ print }
next
}
$5~/<100/ || $5~/N\/A/ || $5~/-/{ next }
1
' *.csv
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="," ##Setting field separator as comma here.
}
FNR==1{ ##Checking condition if its firt line of current Input_file then do following.
if(NR==1){ print } ##If its very first line of very first Input_file then print that line.
next ##next will skip all further statements from here.
}
$5=="<100"||$5=="N/A"||$5=="-"{ next } ##Checking condition if 5th field contains either <100 OR N/A OR - then skip all further statements.
1 ##awk'sh way to print the current line.
' *.csv ##Passing all .csv files to awk program from here.

It looks to me like you're only interested in testing the 2nd-last field and neither that nor the last field can contain commas so just count field numbers from the end instead of from the beginning of each line and then you don't care whether earlier fields contain commas or not. Given that, this will work using any awk:
$ awk -F',' '(NR==1) || (FNR>1 && $(NF-1)!~/^"?(<100|N\/A|-)"?$/)' *.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Bananas,">1,200",737,N/A,490,88
Dino,588,67,N/A,888,234
"hedgehog rolls ròund",32,481,N/A,"878",13

error and output issues for python matrix?

I am building a function that takes a list made up of lists (ex: [['a'],['b'],['c']]) and outputs it as a table. I cannot use pretty table because I need a specific output (ex | a | b | ) with the lines and the spaces exactly alike.
Here is my function:
def show_table(table):
if table is None:
table=[]
new_table=""
for row in range(table):
for val in row:
new_table+= ("| "+val+" ")
new_table+= "|\n"
return new_table.strip("\n")
I keep getting the error:
show_table([['a'],['b'],['c']])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in show_table
TypeError: 'list' object cannot be interpreted as an integer
I'm not sure why there is an issue. I've also gotten an output error where it only outputs the first item in the first list and nothing more. Could someone explain how to use the format function to get rid of this error and output what I want correctly?
Fixed error but still failing tests:
FAIL: test_show_table_12 (main.AllTests)
Traceback (most recent call last):
File "testerl7.py", line 116, in test_show_table_12
def test_show_table_12 (self): self.assertEqual (show_table([['10','2','300'],['4000','50','60'],['7','800','90000']]),'| 10 | 2 | 300 |\n| 4000 | 50 | 60 |\n| 7 | 800 | 90000 |\n')
AssertionError: '| 10| 2| 300|\n| 4000| 50| 60|\n| 7| 800| 90000|' != '| 10 | 2 | 300 |\n| 4000 | 50 | 60 |\n| 7 | 800 | 90000 |\n'
- | 10| 2| 300|
+ | 10 | 2 | 300 |
? +++ +++ +++
- | 4000| 50| 60|
+ | 4000 | 50 | 60 |
? + ++ ++++
- | 7| 800| 90000|+ | 7 | 800 | 90000 |
? ++++ + + +

The problem is here:
for row in range(table):
range takes 1, 2, or 3 integers as arguments. It does not take a list.
You want to use:
for row in table:
Also, check your indents; it looks like the newline addition should be indented more.

Your traceback tells you that the problem occurs on line 5:
for row in range(table):
… so something on that line is trying, without success, to interpret something else as an integer. If we take a look at the docs for range(), we see this:
The arguments to the range constructor must be integers (either built-in int or any object that implements the __index__ special method).
… but table is not an integer; it's a list. If you want to iterate over a list (or something similar), you don't need a special function – simply
for row in range:
will work just fine.
There's another problem with your function apart from the misuse of range(), which is that you've indented too much of your code. This:
if table is None:
table=[]
new_table=""
for row in range(table):
for val in row:
new_table+= ("| "+val+" ")
new_table+= "|\n"
… will only execute any of the indented code if table is None, whereas what you really want is just to set table=[] if that is the case. Fixing up both those problems gives you this:
def show_table(table):
if table is None:
table=[]
new_table = ""
for row in table:
for val in row:
new_table += ("| " + val + " ")
new_table += "|\n"
return new_table.strip("\n")
(I've also changed all your indents to four spaces, and added spaces here and there, to improve the style).

Output Error for Matrix in Python?

I am trying to create a function that outputs a matrix that contains each item in a list on a separate line with lines in between. The only output I'm getting is quotations (''). I do not understand why. I think I set it all up correctly to output what is needed but there has to be something missing?
I included examples below my code.
def show_table(table):
table=[]
s=[[str(e) for e in row] for row in table]
lens= [max(map(len, col)) for col in zip(*s)]
fmt= '\t'.join('{{:{}}}'.format(x) for x in lens)
table= [fmt.format(*row) for row in s]
return '\n'.join(table)
show_table([['A','BB'],['C','DD']])
output:
'| A | BB |\n| C | DD |\n'
print(show_table([['A','BB'],['C','DD']]))
output:
| A | BB |
| C | DD |

The issue is on the second line where you are initialising your list to an empty list. Instead try:
if table is None:
table = []
Perhaps a better way to accomplish this could be:
def show_table(table):
if table is None:
table = []
data = ""
for row in table:
for val in row:
data += "| " + val + " "
data += "|\n"
return data.strip("\n")
print show_table([['a','bb'],['c','dd']])
Output:
| a | bb |
| c | dd |

Code Golf: Musical Notes

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
The challenge
The shortest code by character count, that will output musical notation based on user input.
Input will be composed of a series of letters and numbers - letters will represent the name of the note and the number will represent the length of the note. A note is made of 4 vertical columns. The note's head will be a capital O, stem, if present will be 3 lines tall, made from the pipe character |, and the flag(s) will be made from backward slash \.
Valid note lengths are none, 1/4 of a note, 1/8 of a note, 1/16 of a note and 1/32 of a note.
| |\ |\ |\
| | |\ |\
| | | |\
O O O O O
1 1/4 1/8 1/16 1/32
Notes are places on the Staff, according to their note name:
----
D ----
C
B ----
A
G ----
F
E ----
All input can be assumed to be valid and without errors - Each note separated with a white space on a single line, with at least one valid note.
Test cases
Input:
B B/4 B/8 B/16 B/32 G/4 D/8 C/16 D B/16
Output:
|\
--------------------------|---|\--------
| |\ |\ |\ | |\ |\
------|---|---|\--|\-----O----|--O----|\
| | | |\ | O |
-O---O---O---O---O----|--------------O--
|
---------------------O------------------
----------------------------------------
Input:
E/4 F/8 G/16 A/32 E/4 F/8 G/16 A/32
Output:
--------------------------------
--------------|\--------------|\
|\ |\ |\ |\
------|\--|\--|\------|\--|\--|\
| | | O | | | O
--|---|--O--------|---|--O------
| O | O
-O---------------O--------------
Input:
C E/32 B/8 A/4 B F/32 B C/16
Output:
------------------------------|\
|\ |\
----------|---|---------------|-
O | | O
---------O----|--O----|\-O------
|\ O |\
------|\--------------|\--------
|\ O
-----O--------------------------
Code count includes input/output (i.e full program).

Golfscript (112 characters)
' '%:A;10,{):y;A{2/.0~|1=~:r;0=0=5\- 7%
4y#--:q' '' O'if-4q&!q*r*{16q/r<'|\\'
'| 'if}' 'if+{.32=y~&{;45}*}%}%n}%

Perl, 126 characters (115/122 with switches)
Perl in 239 226 218 216 183 180 178 172 157 142 136 133 129 128 126 chars
This 126 character solution in Perl is the result of a lengthy collaboration between myself and A. Rex.
#o=($/)x10;$/=$";map{m[/];$p=4+(5-ord)%7;
$_.=--$p?!($p&~3)*$'?16<$p*$'?" |\\":" | ":$/x4:" O ",
$|--&&y# #-#for#o}<>;print#o
A. Rex also proposes a solution to run with the perl -ap switch. With 111(!)
characters in this solution plus 4 strokes for the extra command-line switch,
this solution has a total score of 115.
$\="$:
"x5;$p=4+(5-ord)%7,s#..##,$\=~s#(.)\K$#--$p?
$_*!($p&~3)?"$1|".(16<$p*$_?"\\":$1).$1:$1x4:O.$1x3#gemfor#F
The first newline in this solution is significant.
Or 122 characters embedding the switches in the shebang line:
#!perl -ap
$\="$:
"x5;$p=4+(5-ord)%7,s#..##,$\=~s#(.)\K$#--$p?$_*!($p&~3)?"$1|".(16<$p*$_?
"\\":$1).$1:$1x4:O.$1x3#gemfor#F
(first two newlines are significant).
Half-notes can be supported with an additional 12 chars:
#o=($/)x10;$/=$";map{m[/];$p=4+(5-ord)%7;
$_.=--$p?!($p&~3)*$'?16<$p*$'?" |\\":" | ":$/x4:$'>2?" # ":" O ",
$|--&&y# #-#for#o}<>;print#o

LilyPond - 244 bytes
Technically speaking, this doesn't adhere to the output specification, as the output is a nicely engraved PDF rather than a poor ASCII text substitute, but I figured the problem was just crying out for a LilyPond solution. In fact, you can remove the "\autoBeamOff\cadenzaOn\stemUp" to make it look even more nicely formatted. You can also add "\midi{}" after the "\layout{}" to get a MIDI file to listen to.
o=#(open-file"o""w")p=#ly:string-substitute
#(format o"~(~a"(p"2'1""2"(p"4'1""4"(p"6'1""6"(p"8'1""8"(p"/""'"(p"C""c'"(p"D""d'"(p" ""/1"(p"
"" "(ly:gulp-file"M")))))))))))#(close-port o)\score{{\autoBeamOff\cadenzaOn\stemUp\include"o"}\layout{}}
Usage: lilypond thisfile.ly
Notes:
The input must be in a file named "M" in the same directory as the program.
The input file must end in a newline. (Or save 9 bytes by having it end in a space.)
The output is a PDF named "thisfile.pdf", where "thisfile.ly" is the name of the program.
I tested this with LilyPond 2.12.2; other versions might not work.
I haven't done much in LilyPond, so I'm not sure this is the best way to do this, since it has to convert the input to LilyPond format, write it to an auxiliary file, and then read it in. I currently can't get the built-in LilyPond parser/evaluator to work. :(
Now working on an ASCII-output solution.... :)

C89 (186 characters)
#define P,putchar(
N[99];*n=N;y;e=45;main(q){for(;scanf(" %c/%d",n,n+1)>0;n
+=2);for(;y<11;q=y-(75-*n++)%7 P+q-4?e:79)P*n&&q<4&q>0?
124:e)P*n++/4>>q&&q?92:e))*n||(e^=13,n=N,y++P+10))P+e);}
Half-note support (+7 characters)
#define P,putchar(
N[99];*n=N;y;e=45;main(q){for(;scanf(" %c/%d",n,n+1)>0;n
+=2);for(;y<11;q=y-(75-*n++)%7 P+q-4?e:v<4?79:64)P*n&&q<4&q>0?
124:e)P*n++/4>>q&&q?92:e))*n||(e^=13,n=N,y++P+10))P+e);}

Python 178 characters
The 167 was a false alarm, I forgot to suppress the stems on the whole notes.
R=raw_input().split()
for y in range(10):
r=""
for x in R:o=y-(5-ord(x[0]))%7;b=" -"[y&1]+"O\|";r+=b[0]+b[o==3]+b[-(-1<o<3and''<x[1:])]+b[2*(-1<o<":862".find(x[-1]))]
print r
Python 167 characters (Broken)
No room for the evil eye in this one, although there are 2 filler characters in there, so I added a smiley. This technique takes advantage of the uniqueness of the last character of the note lengths, so lucky for me that there are no 1/2 notes or 1/64 notes
R=raw_input().split()
for y in range(10):
r=""
for x in R:o=y-(5-ord(x[0]))%7;b=" -"[y&1]+"O\|";r+=b[0]+b[o==3]+b[-(-1<o<3)]+b[2*(-1<o<":862".find(x[-1]))]
print r
Python 186 characters <<o>>
Python uses the <<o>> evil eye operator to great effect here. The find() method returns -1 if the item is not found, so that is why D doesn't need to appear in the notes.
R=raw_input().split()
for y in range(10):
r=""
for x in R:o='CBAGFE'.find(x[0])+4;B=" -"[y%2];r+=B+(B,'O')[o==y]+(x[2:]and
y+4>o>y and"|"+(B,'\\')[int(x[2:])<<o>>6+y>0]or B*2)
print r
11 extra bytes gives a version with half notes
R=raw_input().split()
for y in range(10):
r=""
for x in R:t='CBAGFE'.find(x[0])+4;l=x[2:];B=" -"[y%2];r+=B+(B,'#O'[l
in'2'])[t==y]+(l and y+4>t>y and"|"+(B,'\\')[int(l)>>(6+y-t)>0]or B*2)
print r
$ echo B B/2 B/4 B/8 B/16 B/32 G/4 D/8 C/16 D B/16| python notes.py
|\
------------------------------|---|\--------
| | |\ |\ |\ | |\ |\
------|---|---|---|\--|\-----#----|--O----|\
| | | | |\ | # |
-O---O---#---#---#---#----|--------------#--
|
-------------------------#------------------
--------------------------------------------

159 Ruby chars
n=gets.split;9.downto(0){|p|m='- '[p%2,1];n.each{|t|r=(t[0]-62)%7;g=t[2..-1]
print m+(r==p ?'O'+m*2:p>=r&&g&&p<r+4?m+'|'+(g.to_i>1<<-p+r+5?'\\':m):m*3)}
puts}

Ruby 136
n=gets;10.times{|y|puts (b=' -'[y&1,1])+n.split.map{|t|r=y-(5-t[0])%7
(r==3?'O':b)+(t[1]&&0<=r&&r<3?'|'<<(r<t[2,2].to_i/8?92:b):b+b)}*b}
Ruby 139 (Tweet)
n=gets;10.times{|y|puts (b=' -'[y&1,1])+n.split.map{|t|r=y-(5-t[0])%7
(r==3?'O':b)+(t[1]&&0<=r&&r<3?'|'<<(r<141>>(t[-1]&7)&3?92:b):b+b)}*b}
Ruby 143
n=gets.split;10.times{|y|puts (b=' -'[y&1,1])+n.map{|t|r=y-(5-t[0])%7;m=t[-1]
(r==3?'O':b)+(m<65&&0<=r&&r<3?'|'<<(r<141>>(m&7)&3?92:b):b+b)}*b}
Ruby 148
Here is another way to calculate the flags,
where m=ord(last character), #flags=1+m&3-(1&m/4)
and another way #flags=141>>(m&7)&3, that saves one more byte
n=gets.split;10.times{|y|b=' -'[y&1,1];n.each{|t|r=y-(5-t[0])%7;m=t[-1]
print b+(r==3?'O':b)+(m<65&&0<=r&&r<3?'|'<<(r<141>>(m&7)&3?92:b):b+b)}
puts}
Ruby 181
First try is a transliteration of my Python solution
n=gets.split;10.times{|y|r="";n.each{|x|o=y-(5-x[0])%7
r+=(b=" -"[y&1,1]+"O\\|")[0,1]+b[o==3?1:0,1]+b[-1<o&&o<3&&x[-1]<64?3:0,1]+b[-1<o&&o<(":862".index(x[-1]).to_i)?2:0,1]}
puts r}

F#, 458 chars
Reasonably short, and still mostly readable:
let s=Array.init 10(fun _->new System.Text.StringBuilder())
System.Console.ReadLine().Split([|' '|])
|>Array.iter(fun n->
for i in 0..9 do s.[i].Append(if i%2=1 then"----"else" ")
let l=s.[0].Length
let i=68-int n.[0]+if n.[0]>'D'then 7 else 0
s.[i+3].[l-3]<-'O'
if n.Length>1 then
for j in i..i+2 do s.[j].[l-2]<-'|'
for j in i..i-1+(match n.[2]with|'4'->0|'8'->1|'1'->2|_->3)do s.[j].[l-1]<-'\\')
for x in s do printfn"%s"(x.ToString())
With brief commentary:
// create 10 stringbuilders that represent each line of output
let s=Array.init 10(fun _->new System.Text.StringBuilder())
System.Console.ReadLine().Split([|' '|])
// for each note on the input line
|>Array.iter(fun n->
// write the staff
for i in 0..9 do s.[i].Append(if i%2=1 then"----"else" ")
// write note (math so that 'i+3' is which stringbuilder should hold the 'O')
let l=s.[0].Length
let i=68-int n.[0]+if n.[0]>'D'then 7 else 0
s.[i+3].[l-3]<-'O'
// if partial note
if n.Length>1 then
// write the bar
for j in i..i+2 do s.[j].[l-2]<-'|'
// write the tails if necessary
for j in i..i-1+(match n.[2]with|'4'->0|'8'->1|'1'->2|_->3)do s.[j].[l-1]<-'\\')
// print output
for x in s do printfn"%s"(x.ToString())

C 196 characters <<o>>
Borrowing a few ideas off strager. Interesting features include the n+++1 "triple +" operator and the <<o>> "evil eye" operator
#define P,putchar
N[99];*n=N;y;b;main(o){for(;scanf(" %c/%d",n,n+1)>0;n+=2);for(;y<11;)
n=*n?n:(y++P(10),N)P(b=y&1?32:45)P((o=10-(*n+++1)%7-y)?b:79)P(0<o&o<4&&*n?'|':b)
P(*n++<<o>>6&&0<o&o<4?92:b);}

168 characters in Perl 5.10
My original solution was 276 characters, but lots and lots of tweaking reduced it by more than 100 characters!
$_=<>;
y#481E-GA-D62 #0-9#d;
s#.(/(.))?#$"x(7+$&).O.$"x($k=10).($1?"|":$")x3 .$"x(10-$2)."\\"x$2.$"x(9-$&)#ge;
s#(..)*?\K (.)#-$2#g;
print$/while--$k,s#.{$k}\K.#!print$&#ge
If you have a minor suggestion that improves this, please feel free to just edit my code.

Lua, 307 Characters
b,s,o="\\",io.read("*l"),io.write for i=1,10 do for n,l in
s:gmatch("(%a)/?(%d*)")do x=n:byte() w=(x<69 and 72 or 79)-x
l=tonumber(l)or 1 d=i%2>0 and" "or"-"o(d..(i==w and"O"or
d)..(l>3 and i<w and i+4>w and"|"or d)..(l>7 and i==w-3
and b or l>15 and i==w-2 and b or l>31 and i==w-1 and b or
d))end o"\n"end

C -- 293 characters
Still needs more compression, and it takes the args on the command line instead of reading them...
i,j,k,l;main(c,v)char **v;{char*t;l=4*(c-1)+2;t=malloc(10*l)+1;for(i=0;i<10;i
++){t[i*l-1]='\n';for(j=0;j<l;j++)t[i*l+j]=i&1?'-':' ';}t[10*l-1]=0;i=1;while
(--c){j='G'-**++v;if(j<3)j+=7;t[j*l+i++]='O';if(*++*v){t[--j*l+i]='|';t[--j*l
+i]='|';t[--j*l+i]='|';if(*++*v!='4'){t[j++*l+i+1]='\\';if(**v!='8'){t[j++*l+
i+1]='\\';if(**v!='1'){t[j++*l+i+1]='\\';}}}}i+=3;}puts(t);}
edit: fixed the E
edit: down to 293 characters, including the newlines...
#define X t[--j*l+i]='|'
#define Y t[j++*l+i+1]=92
i,j,k,l;main(c,v)char**v;{char*t;l=4*(c-1)+2;t=malloc(10*l)+1;for(i=10;i;)t[--i*
l-1]=10,memset(t+i*l,i&1?45:32,l-1);t[10*l-1]=0;for(i=1;--c;i+=3)j=71-**++v,j<3?
j+=7:0,t[j*l+i++]=79,*++*v?X,X,X,*++*v-52?Y,**v-56?Y,**v-49?Y:0:0:0:0;puts(t);}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Code Golf: recognize ascii art boxes - language-agnostic

Related

How can I clean a TSV file having record or fields separators in one of its fields?

AWK: How to merge CSV files and eliminate rows that contain certain values?

error and output issues for python matrix?

Output Error for Matrix in Python?

Code Golf: Musical Notes

Categories

Resources