Get the last line with consecutive pattern - tcl

I have a pattern in a line and I want to get the last line if there are consecutive occurrence in the file.
Example file:
apple 1
banana 5
banana 6
apple 2
apple 5
apple 7
banana 9
Expected output:
apple 1
banana 6
apple 7
banana 9

Assuming that each line is a proper list, it's a matter of remembering the last line and printing the previous value when it is different to the current one.
gets $fin oldline; # Assume there's at least one line for simplicity of coding
while {[gets $fin newline] >= 0} {
if {[lindex $newline 0] ne [lindex $oldline 0]} {
puts $oldline; # There was a difference, so print out the old one
}
set oldline $newline; # Save the new line we read for the next iteration
}
puts $oldline; # The last line to be read hasn't been printed yet
Determining whether two lines are the same is the main problem; it's likely to be more complex with real data than just applying lindex. This is where you get into using regexp or scan to parse the data, and how you do that is a non-trivial problem that requires actually understanding the format of the real data.
Dealing with the case of having no lines at all is a separate matter. Do that by checking for the return value of that initial gets, and if it is less than zero, not going into the loop or printing the final value at all.

Related

How to do a low RAM full cross join?

I have a hope to perform a full self-cross join on a large data file of points. However, I cannot use programming language to perform the operation because I cannot store it in memory. I would like to find all combinations of points within the set. Below would be an example of my dataset.
x y
1 9
2 8
3 7
4 6
5 5
I would like to cross join on this data to generate 25-row table containing all the combination of points. Would there be a low memory solution? perhaps with awk ?
Thank you,
Nicholas Hayden
P.S. I am a novice programmer.
perhaps in two steps, create a header, column1 and column2 files and join the column1 and column2 and append to header file
awk 'NR==1{print > "cross"} NR>1 {print $1 > "col1"; print $2 > "col2"}' file
join -j9 col1 col2 -o1.1,2.1 >> cross
rm col1, col2
obviously make sure the temp and final file names won't clash with the existing ones.
Note, the join command on MacOS doesn't have the -j option, so change it to equivalent long form
join -19 -29 col1 col2 -o1.1,2.1 >> cross
in both alternatives we're asking join to use the non-existent 9th field as the key which matches every line of the first file to every line in the second to generate the cross product of the two files.
If the memory usage wasn't an issue I'd probably do this:
$ awk 'NR==1 { print; next } # print the header
{ x[NR]=$1; y[NR]=$2 } # read data ro two hashes x and y
END { for(i=2;i<=NR;i++)
for(j=2;j<=NR;j++)
print x[i],y[j] # print all combinations of x and y
}' file
Keeping the memory usage low obviously requires keeping data out of memory and that means accessing the file a lot. So while processing FILENAME for x, open the same file with another name (file below) and process that record by record for y:
$ awk 'NR==1 { print; next } # print header
{ file=FILENAME; x=$1; nr=1 # duplicate FILENAME, keep $1, create local nr
while((getline <file) > 0) # process file record by record
if(nr++>1) {print x,$2 } # print $1 of FILENAME and $2 of file
close(file) }' file # close the file
x y
1 9
1 8
1 7
1 6
1 5
2 9
...
I'd probably never use that code as it is for anything useful but maybe you can mix those 2 solutions to create something suitable.

TCL -- How to store and print table of values?

I know that you can "hack" nesting associative arrays in tcl, and I also know that with dictionaries (which I have no experience with) you can nest them pretty easily. I'm trying to find a way to store the values of a function that has two variables, and then at the end I just want to print out a table of the two variables (column and row headers) with the function values in the cells. I can make this work, but it is neither succinct nor efficient.
Here's what should be printed. The rows are values of b and columns are values of a (1,2,3,4,5 for simplicity):
b
1 2 3 4 5
1 y(1,1) y(1,2) y(1,3) y(1,4) y(1,5)
2 y(2,1) y(2,2) y(2,3) y(2,4) y(2,5)
a 3 y(3,1) y(3,2) y(3,3) y(3,4) y(3,5)
4 y(4,1) y(4,2) y(4,3) y(4,4) y(4,5)
5 y(5,1) y(5,2) y(5,3) y(5,4) y(5,5)
To store this, I imagine I would simply do two nested for loops over a and b and somehow store the results in nested dictionaries. Like have one dictionary with 5 entries, 1 for each value of b, and each entry in this is another dictionary for each value of b.
To print it, the only way I can think of is to just explicitly print out each table line and call each dictionary entry. I'm not too versed in output formatting with tcl, but I can probably manage there.
Can anyone think of a more elegant way to do this?
Here are a couple of examples on how you might use the struct::matrix package.
Example 1 - Simple Create/Display
package require struct::matrix
package require Tclx
# Create a 3x4 matrix
set m [::struct::matrix]
$m add rows 3
$m add columns 4
# Populate data
$m set rect 0 0 {
{1 2 3 4}
{5 6 7 8}
{9 10 11 12}
}
# Display it
puts "Print matrix, cell by cell:"
loop y 0 [$m rows] {
loop x 0 [$m columns] {
puts -nonewline [format "%4d" [$m get cell $x $y]]
}
puts ""
}
Output
Print matrix, cell by cell:
1 2 3 4
5 6 7 8
9 10 11 12
Discussion
In the first part of the script, I created a matrix, add 3 rows and 4 columns--a straight forward process.
Next, I called set rect to populate the matrix with data. Depend on your need, you might want to look into set cell, set column, or set row. For more information, please consult the reference for struct::matrix.
When it comes to displaying the matrix, instead of using the Tcl's for command, I prefer the loop command from the Tclx package, which is simpler to read and use.
Example 2 - Read from a CSV file
package require csv
package require struct::matrix
package require Tclx
# Read a matrix from a CSV file
set m [::struct::matrix]
set fileHandle [open data.csv]
::csv::read2matrix $fileHandle $m "," auto
close $fileHandle
# Displays the matrix
loop y 0 [$m rows] {
loop x 0 [$m columns] {
puts -nonewline [format "%4d" [$m get cell $x $y]]
}
puts ""
}
The data file, data.csv:
1,2,3,4
5,6,7,8
9,10,11,12
Output
1 2 3 4
5 6 7 8
9 10 11 12
Discussion
The csv package provides a simple way to read from a CSV file to a matrix.
The heart of the operation is in the ::csv::read2matrix command, but before that, I have to create an empty matrix and open the file for reading.
The code to display the matrix is the same as previous example.
Conclusion
While the struct::matrix package seems complicated at first; I only need to learn a couple of commands to get started.
Elegance is in the eye of the beholder :)
With basic core Tcl, I think you understand your options reasonably well. Either arrays or nested dictionaries have clunky edges when it comes to tabular oriented data.
If you are willing to explore extensions (and Tcl is all about the extension) then you might consider the matrix package from the standard Tcl library. It deals with rows and columns as key concepts. If you need to do transformations on tabular data then I would suggest TclRAL, a relational algebra library that defines a Relation data type and will handle all types of tabular data and provide a large number of operations on it. Alternatively, you could try something like SQLite which will also handle tabular data, provide for manipulating it and has robust persistent storage. The Tcl wiki will direct you to details of all of these extensions.
However, if these seem too heavyweight for your taste or if you don't want to suffer the learning curve, rolling up your sleeves and banging out an array or nested dictionary solution, while certainly being rather ad hoc, is probably not that difficult. Elegant? Well, that's for you to judge.
Nested lists work reasonably well for tabular data from 8.4 onwards (with multi-index lindex and lset) provided you've got compact numeric indices. 8.5's lrepeat is good for constructing an initial matrix too.
set mat [lrepeat 5 [lrepeat 5 0.0]]
lset mat 2 3 1.3
proc printMatrix {mat} {
set height [llength $mat]
set width [llength [lindex $mat 0]]
for {set j 0} {$j < $width} {incr j} {
puts -nonewline \t$j
}
puts ""
for {set i 0} {$i < $height} {incr i} {
puts -nonewline $i
for {set j 0} {$j < $width} {incr j} {
puts -nonewline \t[lindex $mat $i $j]
}
puts ""
}
}
printMatrix $mat
You should definitely consider using the struct::matrix and report packages from tcllib.
package require csv
package require struct::matrix
array set OPTS [subst {
csv_input_filename {input.csv}
}]
::struct::matrix indata
set chan [open $OPTS(csv_input_filename)]
csv::read2matrix $chan indata , auto
close $chan
# prints matrix as list format
puts [join [indata get rect 0 0 end end] \n];
# prints matrix as csv format
csv::joinmatrix indata
# cleanup
indata destroy
This is a one-liner way to print out a matrix as list or csv format respectively.

What's the best way to join two lists?

I have two lists that contain some data (numeric data or/and strings)?
How do I join these two lists, assuming that the lists do not contain sublists?
Which choice is preferred and why?
set first [concat $first $second]
lappend first $second
append first " $second"
It is fine to use concat and that is even highly efficient in some cases (it is the recommended technique in 8.4 and before, and not too bad in later versions). However, your second option with lappend will not work at all, and the version with append will work, but will also be horribly inefficient.
Other versions that will work:
# Strongly recommended from 8.6.1 on
set first [list {*}$first {*}$second]
lappend first {*}$second
The reason why the first of those is recommended from 8.6.1 onwards is that the compiler is able to optimise it to a direct "list-concatenate" operation.
Examples
% set first {a b c}
a b c
% set second {1 2 3}
1 2 3
% set first [concat $first $second]; # #1 is correct
a b c 1 2 3
% set first {a b c}
a b c
% lappend first $second; # #2 is wrong: appends the whole `second` list to `first
a b c {1 2 3}
Discussion
I looked up the documentation, also experiment with some lists and found out that:
Your first choice, concat is correct
lappend does not work because it treats $second as one element, not a list
append works, but you are treating your lists as string. I don't know what the implications are, but it does not communicate the intention that first and second are lists.
Maybe a bit old, but wanted to clarify;
As already stated, the standard way to merge 2 lists is via concat pre-v8.6. However please note that concat gets very inefficient when dealing with long lists, since it analyzes the lists as part of the merge. eg when merging lists, the larger they get the slower they merge.
Both appends do not merge "lists", they just add to an existing list (lappend) or variable (append). Both appends have no impact to speed, since they do not analyze anything when appending.
If merging single entry list elements, one could merge them via set first [join [lappend first $second]] but only if dealing with simple/single elements within each list (ie no spaces per element).
To add to the other answers, I ran a rough benchmark comparing the different versions (tclsh 8.6.13).
#! /usr/bin/env tclsh
set a {1}
for {set i 0} {$i < 25} {incr i} {
switch $argv {
list {
set a [list {*}$a {*}$a]
}
concat {
set a [concat $a $a]
}
lappend {
lappend a {*}$a
}
append {
append a " $a"
}
}
}
Results:
./test.tcl lappend 0.28s user 0.51s system 99% cpu 0.795 total
./test.tcl list 0.22s user 0.29s system 99% cpu 0.511 total
./test.tcl append 0.04s user 0.08s system 99% cpu 0.115 total
./test.tcl concat 0.04s user 0.08s system 99% cpu 0.112 total
Note that the semantics aren't quite the same between the different versions. For example, list will re-quote list elements.

Is there any (opposite of newline) char?

Was wondering if we could print from right to left, bottom to top... I got this thought when trying to write a program to print the following square (for an input 'n', here n=4 )
1 2 3 4
12 13 14 5
11 16 15 6
10 9 8 7
This could be solved many ways, by storing into a 2D array and printing the array... (Any language: Perl, C, C++, Java).
The long answer is that you can do whatever the terminal supports. There are many kinds of terminals (or “character output devices”), many of them support cursor motions. (You can see the Termcap Library project to create a picture what different terminal types do.) There is a terminal command for moving up a line, so esentially yes, you should be able to do that. After poking in the termcap database, I came up with the following:
$ printf "\n"; printf '\e[A'; echo Foo
Foo
In other words, the \e[A string has a non-zero chance to get you one line up. On some terminals :)
Baiscly this is possible. But not on an traditional line-based terminal. When accessing the screen pixel based, it's quite easy to solve this problem. At least there is no real counterpart to \n defined in ASCII.
Or maybe this could be archived by changing the input method of the terminal to some culture which reads left to right and bottom to up.

Code Golf: Frobenius Number

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Write the shortest program that calculates the Frobenius number for a given set of positive numbers. The Frobenius number is the largest number that cannot be written as a sum of positive multiples of the numbers in the set.
Example: For the set of the Chicken McNuggetTM sizes [6,9,20] the Frobenius number is 43, as there is no solution for the equation a*6 + b*9 + c*20 = 43 (with a,b,c >= 0), and 43 is the largest value with this property.
It can be assumed that a Frobenius number exists for the given set. If this is not the case (e.g. for [2,4]) no particular behaviour is expected.
References:
http://en.wikipedia.org/wiki/Coin_problem
http://mathworld.wolfram.com/FrobeniusNumber.html
[Edit]
I decided to accept the GolfScript version. While the MATHEMATICA version might be considered "technically correct", it would clearly take the fun out of the competition. That said, I'm also impressed by the other solutions, especially Ruby (which was very short for a general purpose language).
Mathematica 0 chars (or 19 chars counting the invoke command)
Invoke wtih
FrobeniusNumber[{a,b,c,...}]
Example
In[3]:= FrobeniusNumber[{6, 9, 20}]
Out[3]= 43
Is it a record? :)
Ruby 100 86 80 chars
(newline not needed)
Invoke with frob.rb 6 9 20
a=$*.map &:to_i;
p ((1..eval(a*"*")).map{|i|a<<i if(a&a.map{|v|i-v})[0];i}-a)[-1]
Works just like the Perl solution (except better:). $* is an array of command line strings; a is the same array as ints, which is then used to collect all the numbers which can be made; eval(a*"*") is the product, the max number to check.
In Ruby 1.9, you can save one additional character in by replacing "*" with ?*.
Edit: Shortened to 86 using Symbol#to_proc in $*.map, inlining m and shortening its calculation by folding the array.
Edit 2: Replaced .times with .map, traded .to_a for ;i.
Mathematica PROGRAM - 28 chars
Well, this is a REAL (unnecessary) program. As the other Mathematica entry shows clearly, you can compute the answer without writing a program ... but here it is
f[x__]:=FrobeniusNumber[{x}]
Invoke with
f[6, 9, 20]
43
GolfScript 47/42 chars
Faster solution (47).
~:+{0+{.1<{$}{1=}if|}/.!1):1\{:X}*+0=-X<}do];X(
Slow solution (42). Checks all values up to the product of every number in the set...
~:+{*}*{0+{.1<{$}{1=}if|}/1):1;}*]-1%.0?>,
Sample I/O:
$ echo "[6 9 20]"|golfscript frobenius.gs
43
$ echo "[60 90 2011]"|golfscript frobenius.gs
58349
Haskell 155 chars
The function f does the work and expects the list to be sorted. For example f [6,9,20] = 43
b x n=sequence$replicate n[0..x]
f a=last$filter(not.(flip elem)(map(sum.zipWith(*)a)(b u(length a))))[1..u] where
h=head a
l=last a
u=h*l-h-l
P.S. since that's my first code golf submission I'm not sure how to handle input, what are the rules?
C#, 360 characters
using System;using System.Linq;class a{static void Main(string[]b)
{var c=(b.Select(d=>int.Parse(d))).ToArray();int e=c[0]*c[1];a:--e;
var f=c.Length;var g=new int[f];g[f-1]=1;int h=1;for(;;){int i=0;for
(int j=0;j<f;j++)i+=c[j]*g[j];if(i==e){goto a;}if(i<e){g[f-1]++;h=1;}
else{if(h>=f){Console.Write(e);return;}for(int k=f-1;k>=f-h;k--)
g[k]=0;g[f-h-1]++;h++;}}}}
I'm sure there's a shorter C# solution than this, but this is what I came up with.
This is a complete program that takes the values as command-line parameters and outputs the result to the screen.
Perl 105 107 110 119 122 127 152 158 characters
Latest edit: Compound assignment is good for you!
$h{0}=$t=1;$t*=$_ for#ARGV;for$x(1..$t){$h{$x}=grep$h{$x-$_},#ARGV}#b=grep!$h{$_},1..$t;print pop#b,"\n"
Explanation:
$t = 1;
$t *= $_ foreach(#ARGV);
Set $t to the product of all of the input numbers. This is our upper limit.
foreach $x (1..$t)
{
$h{$x} = grep {$_ == $x || $h{$x-$_} } #ARGV;
}
For each number from 1 to $t: If it's one of the input numbers, mark it using the %h hash; otherwise, if there is a marked entry from further back (difference being anything in the input), mark this entry. All marked entries are non-candidates for Frobenius numbers.
#b=grep{!$h{$_}}(1..$t);
Extract all UNMARKED entries. These are Frobenius candidates...
print pop #b, "\n"
...and the last of these, the highest, is our Frobenius number.
Haskell 153 chars
A different take on a Haskell solution. I'm a rank novice at Haskell, so I'd be surprised if this couldn't be shortened.
m(x:a)(y:b)
|x==y=x:m a b
|x<y=x:m(y:b)a
|True=y:m(x:a)b
f d=l!!s-1where
l=0:foldl1 m[map(n+)l|n<-d]
g=minimum d
s=until(\n->l!!(n+g)-l!!n==g)(+1)0
Call it with, e.g., f [9,6,20].
FrobeniusScript 5 characters
solve
Sadly there does not yet exist any compiler/interpreter for this language.
No params, the interpreter will handle that:
$ echo solve > myProgram
$ frobeniusScript myProgram
6
9
20
^D
Your answer is: 43
$ exit