As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
To commemorate the public launch of Stack Overflow, what's the shortest code to cause a stack overflow? Any language welcome.
ETA: Just to be clear on this question, seeing as I'm an occasional Scheme user: tail-call "recursion" is really iteration, and any solution which can be converted to an iterative solution relatively trivially by a decent compiler won't be counted. :-P
ETA2: I've now selected a “best answer”; see this post for rationale. Thanks to everyone who contributed! :-)
Read this line, and do what it says twice.
All these answers and no Befunge? I'd wager a fair amount it's shortest solution of them all:
1
Not kidding. Try it yourself: http://www.quirkster.com/iano/js/befunge.html
EDIT: I guess I need to explain this one. The 1 operand pushes a 1 onto Befunge's internal stack and the lack of anything else puts it in a loop under the rules of the language.
Using the interpreter provided, you will eventually--and I mean eventually--hit a point where the Javascript array that represents the Befunge stack becomes too large for the browser to reallocate. If you had a simple Befunge interpreter with a smaller and bounded stack--as is the case with most of the languages below--this program would cause a more noticeable overflow faster.
You could also try this in C#.net
throw new StackOverflowException();
Nemerle:
This crashes the compiler with a StackOverflowException:
def o(){[o()]}
My current best (in x86 assembly) is:
push eax
jmp short $-1
which results in 3 bytes of object code (50 EB FD). For 16-bit code, this is also possible:
call $
which also results in 3 bytes (E8 FD FF).
PIC18
The PIC18 answer given by TK results in the following instructions (binary):
overflow
PUSH
0000 0000 0000 0101
CALL overflow
1110 1100 0000 0000
0000 0000 0000 0000
However, CALL alone will perform a stack overflow:
CALL $
1110 1100 0000 0000
0000 0000 0000 0000
Smaller, faster PIC18
But RCALL (relative call) is smaller still (not global memory, so no need for the extra 2 bytes):
RCALL $
1101 1000 0000 0000
So the smallest on the PIC18 is a single instruction, 16 bits (two bytes). This would take 2 instruction cycles per loop. At 4 clock cycles per instruction cycle you've got 8 clock cycles. The PIC18 has a 31 level stack, so after the 32nd loop it will overflow the stack, in 256 clock cycles. At 64MHz, you would overflow the stack in 4 micro seconds and 2 bytes.
PIC16F5x (even smaller and faster)
However, the PIC16F5x series uses 12 bit instructions:
CALL $
1001 0000 0000
Again, two instruction cycles per loop, 4 clocks per instruction so 8 clock cycles per loop.
However, the PIC16F5x has a two level stack, so on the third loop it would overflow, in 24 instructions. At 20MHz, it would overflow in 1.2 micro seconds and 1.5 bytes.
Intel 4004
The Intel 4004 has an 8 bit call subroutine instruction:
CALL $
0101 0000
For the curious that corresponds to an ascii 'P'. With a 3 level stack that takes 24 clock cycles for a total of 32.4 micro seconds and one byte. (Unless you overclock your 4004 - come on, you know you want to.)
Which is as small as the befunge answer, but much, much faster than the befunge code running in current interpreters.
C#:
public int Foo { get { return Foo; } }
Hoot overflow!
// v___v
let rec f o = f(o);(o)
// ['---']
// -"---"-
Every task needs the right tool. Meet the SO Overflow language, optimized to produce stack overflows:
so
TeX:
\def~{~.}~
Results in:
! TeX capacity exceeded, sorry [input stack size=5000].
~->~
.
~->~
.
~->~
.
~->~
.
~->~
.
~->~
.
...
<*> \def~{~.}~
LaTeX:
\end\end
Results in:
! TeX capacity exceeded, sorry [input stack size=5000].
\end #1->\csname end#1
\endcsname \#checkend {#1}\expandafter \endgroup \if#e...
<*> \end\end
Z-80 assembler -- at memory location 0x0000:
rst 00
one byte -- 0xC7 -- endless loop of pushing the current PC to the stack and jumping to address 0x0000.
In english:
recursion = n. See recursion.
Another PHP Example:
<?
require(__FILE__);
How about the following in BASIC:
10 GOSUB 10
(I don't have a BASIC interpreter I'm afraid so that's a guess).
I loved Cody's answer heaps, so here is my similar contribution, in C++:
template <int i>
class Overflow {
typedef typename Overflow<i + 1>::type type;
};
typedef Overflow<0>::type Kaboom;
Not a code golf entry by any means, but still, anything for a meta stack overflow! :-P
Here's my C contribution, weighing in at 18 characters:
void o(){o();o();}
This is a lot harder to tail-call optimise! :-P
Using a Window's batch file named "s.bat":
call s
Javascript
To trim a few more characters, and to get ourselves kicked out of more software shops, let's go with:
eval(i='eval(i)');
Groovy:
main()
$ groovy stack.groovy:
Caught: java.lang.StackOverflowError
at stack.main(stack.groovy)
at stack.run(stack.groovy:1)
...
Please tell me what the acronym "GNU" stands for.
Person JeffAtwood;
Person JoelSpolsky;
JeffAtwood.TalkTo(JoelSpolsky);
Here's hoping for no tail recursion!
C - It's not the shortest, but it's recursion-free. It's also not portable: it crashes on Solaris, but some alloca() implementations might return an error here (or call malloc()). The call to printf() is necessary.
#include <stdio.h>
#include <alloca.h>
#include <sys/resource.h>
int main(int argc, char *argv[]) {
struct rlimit rl = {0};
getrlimit(RLIMIT_STACK, &rl);
(void) alloca(rl.rlim_cur);
printf("Goodbye, world\n");
return 0;
}
perl in 12 chars:
$_=sub{&$_};&$_
bash in 10 chars (the space in the function is important):
i(){ i;};i
try and put more than 4 patties on a single burger. stack overflow.
Python:
so=lambda:so();so()
Alternatively:
def so():so()
so()
And if Python optimized tail calls...:
o=lambda:map(o,o());o()
I'm selecting the “best answer” after this post. But first, I'd like to acknowledge some very original contributions:
aku's ones. Each one explores a new and original way of causing stack overflow. The idea of doing f(x) ⇒ f(f(x)) is one I'll explore in my next entry, below. :-)
Cody's one that gave the Nemerle compiler a stack overflow.
And (a bit grudgingly), GateKiller's one about throwing a stack overflow exception. :-P
Much as I love the above, the challenge is about doing code golf, and to be fair to respondents, I have to award “best answer” to the shortest code, which is the Befunge entry; I don't believe anybody will be able to beat that (although Konrad has certainly tried), so congrats Patrick!
Seeing the large number of stack-overflow-by-recursion solutions, I'm surprised that nobody has (as of current writing) brought up the Y combinator (see Dick Gabriel's essay, The Why of Y, for a primer). I have a recursive solution that uses the Y combinator, as well as aku's f(f(x)) approach. :-)
((Y (lambda (f) (lambda (x) (f (f x))))) #f)
Here's another interesting one from Scheme:
((lambda (x) (x x)) (lambda (x) (x x)))
Java
Slightly shorter version of the Java solution.
class X{public static void main(String[]a){main(a);}}
xor esp, esp
ret
3 bytes:
label:
pusha
jmp label
Update
According to the (old?) Intel(?) documentation, this is also 3 bytes:
label:
call label
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I am using the code given here to find out the TFlops of mixed precision ops on Nvidia Tesla T4. Its theoretical value is given 65 Tflops. however, the code produces the value as 10 Tflops. Any explanation that can justify this happening?
This might be more of an extended comment, bet hear me out ...
As pointed out in the comments CUDA Samples are not meant as performance measuring tools.
The second benchmark you provided does not actually use tensor cores, but just a normal instruction executed on FP32 or FP64 cores.
for(int i=0; i<compute_iterations; i++){
tmps[j] = mad(tmps[j], tmps[j], seed);
}
On a Turing T4 this, for single precision operations gives me a peak of 7.97 TFLOPS, so very close to the theoretical limit of 8.1 TFLOPS.
For half precision operations I get 16.09 TFLOPS, as expected about double that of the single precision performance.
Now, on to Tensor cores. As the previously mentioned benchmark does not use them, let's look for something that does.
CUTLASS (https://github.com/NVIDIA/cutlass) is a high performance Matrix-Matrix Multiplication library from NVIDIA.
They provide a profiling application for all the kernels provided. If you run this on a T4, you should get output like this:
Problem ID: 1
Provider: ^[[1;37mCUTLASS^[[0m
OperationKind: ^[[1;37mgemm^[[0m
Operation: cutlass_tensorop_h1688gemm_256x128_32x2_nt_align8
Status: ^[[1;37mSuccess^[[0m
Verification: ^[[1;37mON^[[0m
Disposition: ^[[1;32mPassed^[[0m
reference_device: Passed
cuBLAS: Passed
Arguments: --gemm_kind=universal --m=1024 --n=1024 --k=1024 --A=f16:column --B=f16:row --C=f16:column --alpha=1 \
--beta=0 --split_k_slices=1 --batch_count=1 --op_class=tensorop --accum=f16 --cta_m=256 --cta_n=128 \
--cta_k=32 --stages=2 --warps_m=4 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=8 --min_cc=75 \
--max_cc=1024
Bytes: 6291456 bytes
FLOPs: 2149580800 flops
Runtime: 0.0640419 ms
Memory: 91.4928 GiB/s
Math: 33565.2 GFLOP/s
As you can see we are now actually using Tensor cores, and half-precision operation, with a performance of 33.5 TFLOPS. Now, this might not be at 65 TFLOS, but for an application you can use in the real world, that is pretty good.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am new to CUDA. I was using cuda to find the dot prod of float vectors and I came across a float point addition issue in cuda. In essence following is the simple kernel. I'm using -arch=sm_50
So the basic idea is for the thread_0 to add the values of vector a.
__global__ void temp(float *a, float *b, float *c) {
if (0 == threadIdx.x && blockIdx.x == 0 && blockIdx.y ==0 ) {
float xx = 0.0f;
for (int i = 0; i < LENGTH; i++){
xx += a[i];
}
*c = xx;
}
}
When I initialize 'a' with 1000 elements of 1.0 I get the desired result of 1000.00
but when I initialize 'a' with 1.1, I should get 1100.00xx but istead, I am getting 1099.989014. The cpu implementation simply yields 1100.000024
I am trying to understand what the issue here! :-(
I even tried to count the number of 1.1 elements in the a vector and that yeilds 1000, which is expected. and I even used atomicAdd and still I have the same issue.
would be very grateful if someone could help me out here!
best
EDIT:
Biggest concern here is the disparity of the CPU result vs GPU result! I understand floats can be off by some decimal points. But the GPU error is very significant! :-(
It is not possible to represent 1.1 exactly using IEEE-754 floating point representation. As #RobertCrovella mentionned in his comment, the computation performed on the CPU does not use the same IEEE-754 settings than the GPU one.
Indeed, 1.1 in floating point is stored as 0x3F8CCCCD = which is 1.10000002384185. Performing the sum on 1000 elements, the last bits gets lost in rouding, one bit for the first addition, two bits after four, etc, until 10 bits after 1000. Depending on rounding mode, you may truncate the 10 bits for the last half of operations, hence ending up summing 0x3F8CCC00 which is 1.09997558.
The result from CUDA divided by 1000 is 0x3F8CCC71, which is consistent with a calculation in 32 bits.
When compiling on CPU, depending on optimization flags, you may be using fast math, which uses the internal register precision. It can be, if not specifying vector registers, using the x87 FPU which is 80 bits precision. In that occurence, the computation would read 1.1 in float which is 1.10000002384185, add it 1000 times using higher precision, hence not loosing any bit in rounding resulting in 1100.00002384185, and display 1100.000024 which is its round to nearest display.
Depending on compilation flags, the actual equivalent computation on Cpu may require enforcement of 32 bits floating-point arithmetics which can be done using addss of the SSE2 instruction set for example.
You can also play with /fp: option or -mfpmath with the compiler and explore issued instructions. In that case assembly instruction fadd is the 80-bits precision addition.
All of this has nothing to do with GPU floating-point precision. It is rather some misunderstanding of the IEEE-754 norm and the legacy x87 FPU behaviour.
For the following entries, what instructions do they represent respectively?
Binary: 00000001110001011000100000100001
Hexadecimal: 144FFF9D
I'm completely lost on what I'm doing here - searching online has produced a bunch of results that make very little sense to me, but what I've gathered is I'm basically supposed to match up the numbers to their appropriate instructions/registers, but how exactly do I know what those are? Where can I find a comprehensive list? How do I know whether it's an R I or J format function?
The first 6 bits (it is easier to work in binary) are the opcode, from which you can determine how to interpret the rest. This site should get you started: http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html
Update: Calling the first 6 bits the opcode is (to be too kind) misleading, but it is enough to tell you how to interpret the rest of the instruction; you may need to look elsewhere (typically at the end of the instruction) for the complete determination of the opcode.
There are 3 Type of MIPS Instructions:
R_type: Opcode must be 000000 (the first 6 bits) and with last 6 bits we can know what is the correct instruction
I_type
j_type
In this case, we have a R-type MIPS instruction and thus :
Opcode rs rt rd shamt funct
000000 01110 00101 10001 00000 100001
addu $s1 , $t6 , $a1
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Update:
Surprised that it is being so heavily downvoted...
The question is coding-related and before asking this question I have googled for "42" in combination with:
site:msdn.micrsoft.com
"code example"
"c#"
"magic number"
And I am not an expert/fan of Western culture/literature.
Also found, Why are variables “i” and “j” used for counters? [duplicate] which was not closed but even protected.
I feel that everybody knows it, except me...
What is the origin of ubiquitous magic digit 42 used all over the code samples and samples?
How have you come using 42? because I have not ever come or ever used 42
After some search, I found MSDN doc on it: Magic Numbers: Integers:
"Aside from a book/movie reference, developers often use this as an arbitrary value"
Well, this did not explain me anything.
Which movies and books have I missed for all those years of being involved in development, coding and programming and around-IT related activities like rwquirements analysis, system administration, etc??
Some references to some texts using code snippets with 42 (just C#-related):
Jérôme Laban. C# Async Tips and Tricks, Part 3: Tasks and the Synchronization Context
var t = Task.Delay(TimeSpan.FromSeconds(1))
.ContinueWith
(
_ => Task.Delay(TimeSpan.FromSeconds(42))
);
MSDN Asynchronous Agents Library
send(_target, 42);
Quickstart: Calling asynchronous APIs in C# or Visual Basic
Office.context.document.setSelectedDataAsync(
"<html><body>hello world</body></html>",
{coercionType: "html", asyncContext: 42},
function(asyncResult) {
write(asyncResult.status + " " + asyncResult.asyncContext);
Asynchronous Programming in C++ Using PPL
task<int> myTask = someOtherTask.then([]() { return 42; });
Boxing and Unboxing (C# Programming Guide)
Console.WriteLine(String.Concat("Answer", 42, true));
How To: Override the ToString Method (C# Programming Guide)
int x = 42;
Trace Listeners
// Use this example when debugging.
System.Diagnostics.Debug.WriteLine("Error in Widget 42");
// Use this example when tracing.
System.Diagnostics.Trace.WriteLine("Error in Widget 42");
|| Operator (C# Reference
// The following line displays True, because 42 is evenly
// divisible by 7.
Console.WriteLine("Divisible returns {0}.", Divisible(42, 7));
// The following line displays False, because 42 is not evenly
// divisible by 5.
Console.WriteLine("Divisible returns {0}.", Divisible(42, 5));
// The following line displays False when method Divisible
// uses ||, because you cannot divide by 0.
// If method Divisible uses | instead of ||, this line
// causes an exception.
Console.WriteLine("Divisible returns {0}.", Divisible(42, 0));
WIKIPedia C Sharp (programming language)
int foo = 42; // Value type.
It's from The Hitch Hiker's Guide to the Galaxy.
In The Hitchhiker's Guide to the Galaxy (published in 1979), the
characters visit the legendary planet Magrathea, home to the
now-collapsed planet-building industry, and meet Slartibartfast, a
planetary coastline designer who was responsible for the fjords of
Norway. Through archival recordings, he relates the story of a race of
hyper-intelligent pan-dimensional beings who built a computer named
Deep Thought to calculate the Answer to the Ultimate Question of Life,
the Universe, and Everything. When the answer was revealed to be 42,
Deep Thought explained that the answer was incomprehensible because
the beings didn't know what they were asking. It went on to predict
that another computer, more powerful than itself would be made and
designed by it to calculate the question for the answer. (Later on,
referencing this, Adams would create the 42 Puzzle, a puzzle which
could be approached in multiple ways, all yielding the answer 42.)
The answer is, as people already have stated, The Hitchhiker's Guide to the Galaxy.
I made a little experiment and put a couple of numbers in the search field, and these are the results:
It seems like 42 beats its neighbors clearly, but it can't touch regular numbers like 40, 45 and 50, no matter how magical it is.
It would be interesting to do the same search in source code only.
Dude!
It's the Answer to the Ultimate Question of Life, the Universe, and Everything! As computed by Deep Thought supercomputer, which took 7.5 million years!
http://en.wikipedia.org/wiki/The_answer_to_life_the_universe_and_everything#Answer_to_the_Ultimate_Question_of_Life.2C_the_Universe_and_Everything_.2842.29
Check this out. 42 is the ultimate answer to the ultimate question of life the universe and everything
This is from The Hitch hikers Guide to the Galaxy and is:
The Answer to the Ultimate Question of Life, the Universe, and Everything
WikiLink
Refer The Hitch Hiker's Guide to the Galaxy.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Write the shortest program that calculates the Frobenius number for a given set of positive numbers. The Frobenius number is the largest number that cannot be written as a sum of positive multiples of the numbers in the set.
Example: For the set of the Chicken McNuggetTM sizes [6,9,20] the Frobenius number is 43, as there is no solution for the equation a*6 + b*9 + c*20 = 43 (with a,b,c >= 0), and 43 is the largest value with this property.
It can be assumed that a Frobenius number exists for the given set. If this is not the case (e.g. for [2,4]) no particular behaviour is expected.
References:
http://en.wikipedia.org/wiki/Coin_problem
http://mathworld.wolfram.com/FrobeniusNumber.html
[Edit]
I decided to accept the GolfScript version. While the MATHEMATICA version might be considered "technically correct", it would clearly take the fun out of the competition. That said, I'm also impressed by the other solutions, especially Ruby (which was very short for a general purpose language).
Mathematica 0 chars (or 19 chars counting the invoke command)
Invoke wtih
FrobeniusNumber[{a,b,c,...}]
Example
In[3]:= FrobeniusNumber[{6, 9, 20}]
Out[3]= 43
Is it a record? :)
Ruby 100 86 80 chars
(newline not needed)
Invoke with frob.rb 6 9 20
a=$*.map &:to_i;
p ((1..eval(a*"*")).map{|i|a<<i if(a&a.map{|v|i-v})[0];i}-a)[-1]
Works just like the Perl solution (except better:). $* is an array of command line strings; a is the same array as ints, which is then used to collect all the numbers which can be made; eval(a*"*") is the product, the max number to check.
In Ruby 1.9, you can save one additional character in by replacing "*" with ?*.
Edit: Shortened to 86 using Symbol#to_proc in $*.map, inlining m and shortening its calculation by folding the array.
Edit 2: Replaced .times with .map, traded .to_a for ;i.
Mathematica PROGRAM - 28 chars
Well, this is a REAL (unnecessary) program. As the other Mathematica entry shows clearly, you can compute the answer without writing a program ... but here it is
f[x__]:=FrobeniusNumber[{x}]
Invoke with
f[6, 9, 20]
43
GolfScript 47/42 chars
Faster solution (47).
~:+{0+{.1<{$}{1=}if|}/.!1):1\{:X}*+0=-X<}do];X(
Slow solution (42). Checks all values up to the product of every number in the set...
~:+{*}*{0+{.1<{$}{1=}if|}/1):1;}*]-1%.0?>,
Sample I/O:
$ echo "[6 9 20]"|golfscript frobenius.gs
43
$ echo "[60 90 2011]"|golfscript frobenius.gs
58349
Haskell 155 chars
The function f does the work and expects the list to be sorted. For example f [6,9,20] = 43
b x n=sequence$replicate n[0..x]
f a=last$filter(not.(flip elem)(map(sum.zipWith(*)a)(b u(length a))))[1..u] where
h=head a
l=last a
u=h*l-h-l
P.S. since that's my first code golf submission I'm not sure how to handle input, what are the rules?
C#, 360 characters
using System;using System.Linq;class a{static void Main(string[]b)
{var c=(b.Select(d=>int.Parse(d))).ToArray();int e=c[0]*c[1];a:--e;
var f=c.Length;var g=new int[f];g[f-1]=1;int h=1;for(;;){int i=0;for
(int j=0;j<f;j++)i+=c[j]*g[j];if(i==e){goto a;}if(i<e){g[f-1]++;h=1;}
else{if(h>=f){Console.Write(e);return;}for(int k=f-1;k>=f-h;k--)
g[k]=0;g[f-h-1]++;h++;}}}}
I'm sure there's a shorter C# solution than this, but this is what I came up with.
This is a complete program that takes the values as command-line parameters and outputs the result to the screen.
Perl 105 107 110 119 122 127 152 158 characters
Latest edit: Compound assignment is good for you!
$h{0}=$t=1;$t*=$_ for#ARGV;for$x(1..$t){$h{$x}=grep$h{$x-$_},#ARGV}#b=grep!$h{$_},1..$t;print pop#b,"\n"
Explanation:
$t = 1;
$t *= $_ foreach(#ARGV);
Set $t to the product of all of the input numbers. This is our upper limit.
foreach $x (1..$t)
{
$h{$x} = grep {$_ == $x || $h{$x-$_} } #ARGV;
}
For each number from 1 to $t: If it's one of the input numbers, mark it using the %h hash; otherwise, if there is a marked entry from further back (difference being anything in the input), mark this entry. All marked entries are non-candidates for Frobenius numbers.
#b=grep{!$h{$_}}(1..$t);
Extract all UNMARKED entries. These are Frobenius candidates...
print pop #b, "\n"
...and the last of these, the highest, is our Frobenius number.
Haskell 153 chars
A different take on a Haskell solution. I'm a rank novice at Haskell, so I'd be surprised if this couldn't be shortened.
m(x:a)(y:b)
|x==y=x:m a b
|x<y=x:m(y:b)a
|True=y:m(x:a)b
f d=l!!s-1where
l=0:foldl1 m[map(n+)l|n<-d]
g=minimum d
s=until(\n->l!!(n+g)-l!!n==g)(+1)0
Call it with, e.g., f [9,6,20].
FrobeniusScript 5 characters
solve
Sadly there does not yet exist any compiler/interpreter for this language.
No params, the interpreter will handle that:
$ echo solve > myProgram
$ frobeniusScript myProgram
6
9
20
^D
Your answer is: 43
$ exit