I need to perform a function on triplets taken from an array and add the result to a Histogram, but I want to avoid permutations since the function is invariant under those [F(i,j,k) = F(j,i,k) and so on].
Normally I would code something like this:
def F(int i, int j, int k){
int temp_index;
/* Do something */
return temp_index;
}
for(int i=0;i<N;i++){
for(int j=i+1;j<N;j++){
for(int k=j+1;k<N;k++){
hist[F(i,j,k)]++;
}
}
}
As N is quite big (approx. 10^5), I would like to call perform this on a GPU using cuda.
I have written a code to call this function on the GPU, but I have no idea how to prevent multiple calls of the same triple of indices. So far I call cuda with a 3-dimensional grid, like:
__global__ void compute_3pcf(float *theta, float *hist) {
int i,j,k;
i = blockIdx.x*blockDim.x + threadIdx.x;
j = blockIdx.y*blockDim.y + threadIdx.y;
k = blockIdx.z*blockDim.z + threadIdx.z;
if(i>=j || j>=k) return;
atomicAdd(&hist[F(i,j,k)],1);
}
int main(){
/*
Allocation of memory and cudaMemcpy
*/
dim3 grid((N+15)/16,(N+7)/8,(N+7)/8);
dim3 block(16,8,8);
//Launch on GPU
compute_3pcf<<<grid,block>>>(d_theta, d_hist);
}
However, now for each combination (i,j,k) a new thread is launched and then aborted, which seems very inefficient to me, as then only 1/6 of the threads perform the actual computation. What I would like to have is something like this:
__global__ void compute_3pcf(float *theta, float *hist) {
int i,j,k,idx;
idx = blockIdx.x*blockDim.x + threadIdx.x;
i = H_i(idx);
j = H_j(idx,i);
k = H_k(idx,j);
atomicAdd(&hist[F(i,j,k)],1);
}
int main(){
/*
Allocation of memory and cudaMemcpy
*/
long long int N_combinations = N*(N-1)*(N-2)/6;
long int grid = (N_combinations+1023)/1024;
int block = 1024;
//Launch on GPU
compute_3pcf<<<grid,block>>>(d_theta, d_hist);
}
However, I am unable to find the functions H_i, H_j, H_k. If anyone can tell me how I could solve or avoid this problem, I would be very thankful.
Edit: The histogram contains about 10^6 bins, so that I can not have one histogram per block in a shared memory, like in the example code for cuda. Instead, it lies in the global memory of the GPU.
[Disclaimer -- this is only a partial answer and a work in progress and answers a related problem, while only hinting at a solution to the actual question]
Before thinking about algorithms and code it is useful to understand the mathematical character of your problem. If we look at the output of your pseudocode in Python (and note that this includes the diagonal entries where the original question does not), we see this for the 5x5x5 case:
N = 5
x0 = np.zeros((N,N,N), dtype=np.int)
idx = 1
for i in range(0,N):
for j in range(i,N):
for k in range(j,N):
x0[i,j,k] = idx
idx += 1
print(x0)
we get:
[[[ 1 2 3 4 5]
[ 0 6 7 8 9]
[ 0 0 10 11 12]
[ 0 0 0 13 14]
[ 0 0 0 0 15]]
[[ 0 0 0 0 0]
[ 0 16 17 18 19]
[ 0 0 20 21 22]
[ 0 0 0 23 24]
[ 0 0 0 0 25]]
[[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 26 27 28]
[ 0 0 0 29 30]
[ 0 0 0 0 31]]
[[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 32 33]
[ 0 0 0 0 34]]
[[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 35]]]
i.e. the unique entries form a series of stacked upper triangular matrices of decreasing sizes. As identified in comments, the number of non-zero entries is a tetrahedral number, in this case for n = 5, the tetrahedral number Tr[5] = 5*(5+1)*(5+2)/6 = 35 entries, and the non zero entries fill a tetrahedral shaped region of the hypermatrix in three dimensions (best illustration here) And as noted in the original question, all the permutations of indices are functionally identical in the problem, meaning that there are six (3P3) functionally identical symmetric tetrahedral regions in the cubic hypermatrix. You can confirm this yourself:
x1 = np.zeros((N,N,N), dtype=np.int)
idx = 1
for i in range(0,N):
for j in range(0,N):
for k in range(0,N):
if (i <= j) and (j <= k):
x1[i,j,k] = idx
x1[i,k,j] = idx
x1[j,i,k] = idx
x1[j,k,i] = idx
x1[k,i,j] = idx
x1[k,j,i] = idx
idx += 1
print(x1)
which gives:
[[[ 1 2 3 4 5]
[ 2 6 7 8 9]
[ 3 7 10 11 12]
[ 4 8 11 13 14]
[ 5 9 12 14 15]]
[[ 2 6 7 8 9]
[ 6 16 17 18 19]
[ 7 17 20 21 22]
[ 8 18 21 23 24]
[ 9 19 22 24 25]]
[[ 3 7 10 11 12]
[ 7 17 20 21 22]
[10 20 26 27 28]
[11 21 27 29 30]
[12 22 28 30 31]]
[[ 4 8 11 13 14]
[ 8 18 21 23 24]
[11 21 27 29 30]
[13 23 29 32 33]
[14 24 30 33 34]]
[[ 5 9 12 14 15]
[ 9 19 22 24 25]
[12 22 28 30 31]
[14 24 30 33 34]
[15 25 31 34 35]]]
Here it should be obvious that you can slice the hypermatrix along any plane and get a symmetric matrix, and that it can be constructed by a set of reflections from any of the six permutations of the same basic tetrahedral hypermatrix.
That last part is important because I am now going to focus on another permutation from the one in your question. It is functionally the same (as shown above) but mathematically and graphically easier to visualize compared to the upper tetrahedron calculated by the original pseudocode in the question. Again some Python:
N = 5
nmax = N * (N+1) * (N+2) // 6
x= np.empty(nmax, dtype=object)
x2 = np.zeros((N,N,N), dtype=np.int)
idx = 1
for i in range(0,N):
for j in range(0,i+1):
for k in range(0,j+1):
x2[i,j,k] = idx
x[idx-1] = (i,j,k)
idx +=1
print(x)
print(x2)
which produces
[(0, 0, 0) (1, 0, 0) (1, 1, 0) (1, 1, 1) (2, 0, 0) (2, 1, 0) (2, 1, 1)
(2, 2, 0) (2, 2, 1) (2, 2, 2) (3, 0, 0) (3, 1, 0) (3, 1, 1) (3, 2, 0)
(3, 2, 1) (3, 2, 2) (3, 3, 0) (3, 3, 1) (3, 3, 2) (3, 3, 3) (4, 0, 0)
(4, 1, 0) (4, 1, 1) (4, 2, 0) (4, 2, 1) (4, 2, 2) (4, 3, 0) (4, 3, 1)
(4, 3, 2) (4, 3, 3) (4, 4, 0) (4, 4, 1) (4, 4, 2) (4, 4, 3) (4, 4, 4)]
[[[ 1 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]]
[[ 2 0 0 0 0]
[ 3 4 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]]
[[ 5 0 0 0 0]
[ 6 7 0 0 0]
[ 8 9 10 0 0]
[ 0 0 0 0 0]
[ 0 0 0 0 0]]
[[11 0 0 0 0]
[12 13 0 0 0]
[14 15 16 0 0]
[17 18 19 20 0]
[ 0 0 0 0 0]]
[[21 0 0 0 0]
[22 23 0 0 0]
[24 25 26 0 0]
[27 28 29 30 0]
[31 32 33 34 35]]]
You can see it is a transformation of the original code, with each "layer" of the tetrahedron built from a lower triangular matrix of increasing size, rather than upper triangular matrices of successively smaller size.
When you look at tetrahedron produced by this permutation, it should be obvious that each lower triangular slice starts at a tetrahedral number within the linear array of indices and each row within the lower triangular matrix starts at a triangular number offset relative to the start of the matrix. The indexing scheme is, therefore:
idx(i,j,k) = (i*(i+1)*(i+2)/6) + (j*(j+1)/2) + k
when data is arranged so that the kth dimension is the fastest varying in memory, and ith the slowest.
Now to the actual question. To calculate (i,j,k) from a given idx value would require calculating the integer cube root for i and the integer square root for j, which isn't particularly easy or performant and I would not imagine that it would offer any advantage over what you have now. However, if your implementation has a finite and known dimension a priori, you can use precalculated tetrahedral and triangular numbers and perform a lookup to replace the need to calculate roots.
A toy example:
#include <cstdio>
__constant__ unsigned int tetdata[100] =
{ 0, 1, 4, 10, 20, 35, 56, 84, 120, 165, 220, 286, 364, 455, 560, 680, 816, 969, 1140,
1330, 1540, 1771, 2024, 2300, 2600, 2925, 3276, 3654, 4060, 4495, 4960, 5456, 5984,
6545, 7140, 7770, 8436, 9139, 9880, 10660, 11480, 12341, 13244, 14190, 15180, 16215,
17296, 18424, 19600, 20825, 22100, 23426, 24804, 26235, 27720, 29260, 30856, 32509,
34220, 35990, 37820, 39711, 41664, 43680, 45760, 47905, 50116, 52394, 54740, 57155,
59640, 62196, 64824, 67525, 70300, 73150, 76076, 79079, 82160, 85320, 88560, 91881,
95284, 98770, 102340, 105995, 109736, 113564, 117480, 121485, 125580, 129766, 134044,
138415, 142880, 147440, 152096, 156849, 161700, 166650 };
__constant__ unsigned int tridata[100] =
{ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120,
136, 153, 171, 190, 210, 231, 253, 276, 300, 325, 351, 378, 406,
435, 465, 496, 528, 561, 595, 630, 666, 703, 741, 780, 820, 861,
903, 946, 990, 1035, 1081, 1128, 1176, 1225, 1275, 1326, 1378, 1431,
1485, 1540, 1596, 1653, 1711, 1770, 1830, 1891, 1953, 2016, 2080, 2145,
2211, 2278, 2346, 2415, 2485, 2556, 2628, 2701, 2775, 2850, 2926, 3003,
3081, 3160, 3240, 3321, 3403, 3486, 3570, 3655, 3741, 3828, 3916, 4005,
4095, 4186, 4278, 4371, 4465, 4560, 4656, 4753, 4851, 4950 };
__device__ unsigned int lookup(unsigned int&x, unsigned int n, const unsigned int* data)
{
int i=0;
while (n >= data[i]) i++;
x = data[i-1];
return i-1;
}
__device__ unsigned int tetnumber(unsigned int& x, unsigned int n) { return lookup(x, n, tetdata); }
__device__ unsigned int trinumber(unsigned int& x, unsigned int n) { return lookup(x, n, tridata); }
__global__ void kernel()
{
unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;
unsigned int x;
unsigned int k = idx;
unsigned int i = tetnumber(x, k); k -= x;
unsigned int j = trinumber(x, k); k -= x;
printf("idx = %d, i=%d j=%d k=%d\n", idx, i, j, k);
}
int main(void)
{
cudaSetDevice(0);
kernel<<<1,35>>>();
cudaDeviceSynchronize();
cudaDeviceReset();
return 0;
}
which does the same thing as the python (note the out-of-order print output):
$ nvcc -o tetrahedral tetrahedral.cu
avidday#marteil2:~/SO$ cuda-memcheck ./tetrahedral
========= CUDA-MEMCHECK
idx = 32, i=4 j=4 k=2
idx = 33, i=4 j=4 k=3
idx = 34, i=4 j=4 k=4
idx = 0, i=0 j=0 k=0
idx = 1, i=1 j=0 k=0
idx = 2, i=1 j=1 k=0
idx = 3, i=1 j=1 k=1
idx = 4, i=2 j=0 k=0
idx = 5, i=2 j=1 k=0
idx = 6, i=2 j=1 k=1
idx = 7, i=2 j=2 k=0
idx = 8, i=2 j=2 k=1
idx = 9, i=2 j=2 k=2
idx = 10, i=3 j=0 k=0
idx = 11, i=3 j=1 k=0
idx = 12, i=3 j=1 k=1
idx = 13, i=3 j=2 k=0
idx = 14, i=3 j=2 k=1
idx = 15, i=3 j=2 k=2
idx = 16, i=3 j=3 k=0
idx = 17, i=3 j=3 k=1
idx = 18, i=3 j=3 k=2
idx = 19, i=3 j=3 k=3
idx = 20, i=4 j=0 k=0
idx = 21, i=4 j=1 k=0
idx = 22, i=4 j=1 k=1
idx = 23, i=4 j=2 k=0
idx = 24, i=4 j=2 k=1
idx = 25, i=4 j=2 k=2
idx = 26, i=4 j=3 k=0
idx = 27, i=4 j=3 k=1
idx = 28, i=4 j=3 k=2
idx = 29, i=4 j=3 k=3
idx = 30, i=4 j=4 k=0
idx = 31, i=4 j=4 k=1
========= ERROR SUMMARY: 0 errors
Obviously the lookup function is only for demonstration purposes. At large sizes either a binary array or hash based look-up would be much faster. But this at least demonstrates that it seems possible to do what you envisaged, even if the problem solved and approach are subtly different from what you probably had in mind.
Note I have no formal mathemtical proofs for anything in this answer and don't claim that any of the code or propositions here are correct. Buyer beware.
After some more thought, it is trivial to extend this approach via a hybrid search/calculation routine which is reasonably efficient:
#include <iostream>
#include <vector>
#include <cstdio>
typedef unsigned int uint;
__device__ __host__ ulong tetnum(uint n) { ulong n1(n); return n1 * (n1 + 1ull) * (n1 + 2ull) / 6ull; }
__device__ __host__ ulong trinum(uint n) { ulong n1(n); return n1 * (n1 + 1ull) / 2ull; }
typedef ulong (*Functor)(uint);
template<Functor F>
__device__ __host__ uint bounded(ulong& y, ulong x, uint n1=0, ulong y1=0)
{
uint n = n1;
y = y1;
while (x >= y1) {
y = y1;
n = n1++;
y1 = F(n1);
}
return n;
}
__constant__ uint idxvals[19] = {
0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
32768, 65536, 131072 };
__constant__ ulong tetvals[19] = {
0, 1, 4, 20, 120, 816, 5984, 45760, 357760, 2829056, 22500864, 179481600, 1433753600,
11461636096, 91659526144, 733141975040, 5864598896640, 46914643623936, 375308558925824 };
__constant__ ulong trivals[19] = {
0, 1, 3, 10, 36, 136, 528, 2080, 8256, 32896, 131328, 524800, 2098176, 8390656, 33558528,
134225920, 536887296, 2147516416, 8590000128 };
__device__ __host__ uint lookup(ulong& x, uint n, const uint* abscissa, const ulong* data)
{
uint i=0;
while (n >= data[i]) i++;
x = data[i-1];
return abscissa[i-1];
}
__device__ uint tetnumber(ulong& x, uint n)
{
ulong x0;
uint n0 = lookup(x0, n, idxvals, tetvals);
return bounded<tetnum>(x, n, n0, x0);
}
__device__ uint trinumber(ulong& x, uint n)
{
ulong x0;
uint n0 = lookup(x0, n, idxvals, trivals);
return bounded<trinum>(x, n, n0, x0);
}
__global__ void kernel(uint3 *results, ulong Nmax)
{
ulong idx = threadIdx.x + blockIdx.x * blockDim.x;
ulong gridStride = blockDim.x * gridDim.x;
for(; idx < Nmax; idx += gridStride) {
ulong x, k1 = idx;
uint3 tuple;
tuple.x = tetnumber(x, k1); k1 -= x;
tuple.y = trinumber(x, k1); k1 -= x;
tuple.z = (uint)k1;
results[idx] = tuple;
}
}
int main(void)
{
cudaSetDevice(0);
uint N = 500;
ulong Nmax = tetnum(N);
uint3* results_d; cudaMalloc(&results_d, Nmax * sizeof(uint3));
int gridsize, blocksize;
cudaOccupancyMaxPotentialBlockSize(&gridsize, &blocksize, kernel);
kernel<<<gridsize, blocksize>>>(results_d, Nmax);
cudaDeviceSynchronize();
std::vector<uint3> results(Nmax);
cudaMemcpy(&results[0], results_d, Nmax * sizeof(uint3), cudaMemcpyDeviceToHost);
cudaDeviceReset();
// Only uncomment this if you want to see 22 million lines of output
//for(auto const& idx : results) {
// std::cout << idx.x << " " << idx.y << " " << idx.z << std::endl;
//}
return 0;
}
which does this (be aware it will emit 21 million lines of output if you uncomment the last loop):
$ module load use.own cuda9.2
$ nvcc -std=c++11 -arch=sm_52 -o tetrahedral tetrahedral.cu
$ nvprof ./tetrahedral
==20673== NVPROF is profiling process 20673, command: ./tetrahedral
==20673== Profiling application: ./tetrahedral
==20673== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 78.85% 154.23ms 1 154.23ms 154.23ms 154.23ms kernel(uint3*, unsigned long)
21.15% 41.361ms 1 41.361ms 41.361ms 41.361ms [CUDA memcpy DtoH]
API calls: 41.73% 154.24ms 1 154.24ms 154.24ms 154.24ms cudaDeviceSynchronize
30.90% 114.22ms 1 114.22ms 114.22ms 114.22ms cudaMalloc
15.94% 58.903ms 1 58.903ms 58.903ms 58.903ms cudaDeviceReset
11.26% 41.604ms 1 41.604ms 41.604ms 41.604ms cudaMemcpy
0.11% 412.75us 96 4.2990us 275ns 177.45us cuDeviceGetAttribute
0.04% 129.46us 1 129.46us 129.46us 129.46us cuDeviceTotalMem
0.02% 55.616us 1 55.616us 55.616us 55.616us cuDeviceGetName
0.01% 32.919us 1 32.919us 32.919us 32.919us cudaLaunchKernel
0.00% 10.211us 1 10.211us 10.211us 10.211us cudaSetDevice
0.00% 5.7640us 1 5.7640us 5.7640us 5.7640us cudaFuncGetAttributes
0.00% 4.6690us 1 4.6690us 4.6690us 4.6690us cuDeviceGetPCIBusId
0.00% 2.8580us 4 714ns 393ns 1.3680us cudaDeviceGetAttribute
0.00% 2.8050us 3 935ns 371ns 2.0030us cuDeviceGetCount
0.00% 2.2780us 1 2.2780us 2.2780us 2.2780us cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags
0.00% 1.6720us 1 1.6720us 1.6720us 1.6720us cudaGetDevice
0.00% 1.5450us 2 772ns 322ns 1.2230us cuDeviceGet
That code calculates and stores the unique (i,j,k) pairs for a 500 x 500 x 500 search space (about 21 million values) in 150 milliseconds on my GTX970. Perhaps that is some use to you.
One possible approach is given on this wikipedia page ("Finding the k-combination for a given number") for a closed-form solution to convert a linear index into a unique C(n,3) combination.
However it will involve calculating square roots and cube roots, so its "non-trivial". My rationale for even mentioning it is two-fold:
If the amount of work to be saved per-thread is substantial, then the additional burden this method proposes may be offset by that. However, for the example given, the amount of work per thread saved is just a few simple if-tests.
Processor trends are such that computation cost is dropping more rapidly than e.g. memory access cost. Since this approach involves no memory access, if future processor trends continue in this vein, this approach may become more palatable.
This approach is also distinguished by the fact that there is no iterative exhaustive table searching. However as indicated in the other answer, for the stipulations given there, it is almost certainly preferable to this approach, currently.
As indicated on the previously mentioned wiki page, the general approach will be to:
Find the largest C(n,3) number that is less than the current index (N). The n value associated with this C(n,3) number becomes the ordinal value of our first "choice" index n1.
Subtract the C(n,3) number from the current index. The process is repeated with the remainder and C(n,2). The n value associated with the maximum C(n,2) number that fits within our remainder becomes our second "choice" index n2.
The remainder is found from step 2, and this then identifies our final C(n,1) choice (C(n,1) = n = n3).
In order to come up with a closed-form solution to step 1, we must:
identify the cubic equation associated with the relationship between
N and C(N,3)
Use the solution of the cubic polynomial to identify N (in floating
point).
Truncate the value N, to get our "largest" N.
perform an integer search around this point, for the correct solution, to address floating point issues
A similar process can be repeated for step 2 (quadratic) and step 3 (linear).
I don't intend to cover all the math in particular detail, however the solution of a cubic polynomial equation in closed form can be readily found on the web (such as here) and the derivation of the governing cubic equation for step 1 is straightforward. We simply use the formula for the total number of choices already given in the question, coupled with the particular thread index:
n(n-1)(n-2)/6 = N -> n(n-1)(n-2)/6 - N = 0
rearranging:
(n^3)/6 - (n^2)/2 + n/3 - N = 0
from this we can acquire the a,b,c,d coefficients to feed into our cubic solution method.
a = 1/6, b = -1/2, c = 1/3, d = -N
(Note that N here is effectively our globally unique 1D thread index. We are solving for n, which gives us our first "choice" index.)
Studying the formula for the solution of the cubic, we note that the only item that varies among threads is the d coefficient. This allows for reduction of some arithmetic at run-time.
What follows then is a worked example. It is not thoroughly tested, as my aim here is to identify a solution method, not a fully tested solution:
$ cat t1485.cu
#include <stdio.h>
#include <math.h>
typedef float ct;
const int STEP_DOWN = 2;
// only float or double template types allowed
template <typename ft>
struct CN3{
__host__ __device__
int3 operator()(size_t N){
int3 n;
if (N == 0) {n.x = 2; n.y = 1; n.z = 0; return n;}
if (N == 1) {n.x = 3; n.y = 1; n.z = 0; return n;}
if (N == 2) {n.x = 3; n.y = 2; n.z = 0; return n;}
if (N == 3) {n.x = 3; n.y = 2; n.z = 1; return n;}
if (N == 4) {n.x = 4; n.y = 1; n.z = 0; return n;}
ft x, x1;
// identify n.x from cubic
// compiler computed
const ft a = 1.0/6;
const ft b = -1.0/2;
const ft c = 1.0/3;
const ft p1 = (-1.0)*(b*b*b)/(27.0*a*a*a) + b*c/(6.0*a*a);
const ft p2 = c/(3.0*a) - (b*b)/(9.0*a*a);
const ft p3 = p2*p2*p2;
const ft p4 = b/(3.0*a);
// run-time computed
//const ft d = -N;
const ft q0 = N/(2.0*a); // really should adjust constant for float vs. double
const ft q1 = p1 + q0;
const ft q2 = q1*q1;
if (sizeof(ft)==4){
x1 = sqrtf(q2+p3);
x = cbrtf(q1+x1) + cbrtf(q1-x1) - p4;
n.x = truncf(x);}
else {
x1 = sqrt(q2+p3);
x = cbrt(q1+x1) + cbrt(q1-x1) - p4;
n.x = trunc(x);}
/// fix floating-point errors
size_t tn = n.x - STEP_DOWN;
while ((tn)*(tn-1)*(tn-2)/6 <= N) tn++;
n.x = tn-1;
// identify n.y from quadratic
// compiler computed
const ft qa = 1.0/2;
//const ft qb = -qa;
const ft p5 = 1.0/4;
const ft p6 = 2.0;
// run-time computed
N = N - (((size_t)n.x)*(n.x-1)*(n.x-2))/6;
if (sizeof(ft)==4){
x = qa + sqrtf(p5+p6*N);
n.y = truncf(x);}
else {
x = qa + sqrt(p5+p6*N);
n.y = trunc(x);}
/// fix floating-point errors
if ((n.y - STEP_DOWN) <= 0) tn = 0;
else tn = n.y - STEP_DOWN;
while ((((tn)*(tn-1))>>1) <= N) tn++;
n.y = tn-1;
// identify n3
n.z = N - ((((size_t)n.y)*(n.y-1))>>1);
return n;
}
};
template <typename T>
__global__ void test(T f, size_t maxn, int3 *res){
size_t idx = threadIdx.x+((size_t)blockDim.x)*blockIdx.x;
if (idx < maxn)
res[idx] = f(idx);
}
int3 get_next_C3(int3 prev){
int3 res = prev;
res.z++;
if (res.z >= res.y){
res.y++; res.z = 0;
if (res.y >= res.x){res.x++; res.y = 1; res.z = 0;}}
return res;
}
int main(int argc, char* argv[]){
size_t n = 1000000000;
if (argc > 1) n *= atoi(argv[1]);
const int nTPB = 256;
int3 *d_res;
cudaMalloc(&d_res, n*sizeof(int3));
test<<<(n+nTPB-1)/nTPB,nTPB>>>(CN3<ct>(), n, d_res);
int3 *h_gpu = new int3[n];
int3 temp;
temp.x = 2; temp.y = 1; temp.z = 0;
cudaMemcpy(h_gpu, d_res, n*sizeof(int3), cudaMemcpyDeviceToHost);
for (int i = 0; i < n; i++){
if ((temp.x != h_gpu[i].x) || (temp.y != h_gpu[i].y) || (temp.z != h_gpu[i].z))
{printf("mismatch at index %d: cpu: %d,%d,%d gpu: %d,%d,%d\n", i, temp.x,temp.y,temp.z, h_gpu[i].x, h_gpu[i].y, h_gpu[i].z); return 0;}
temp = get_next_C3(temp);}
}
$ nvcc -arch=sm_70 -o t1485 t1485.cu
$ cuda-memcheck ./t1485 2
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
[user2#dc10 misc]$ nvprof ./t1485
==6128== NVPROF is profiling process 6128, command: ./t1485
==6128== Profiling application: ./t1485
==6128== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 99.35% 4.81251s 1 4.81251s 4.81251s 4.81251s [CUDA memcpy DtoH]
0.65% 31.507ms 1 31.507ms 31.507ms 31.507ms void test<CN3<float>>(float, int, int3*)
API calls: 93.70% 4.84430s 1 4.84430s 4.84430s 4.84430s cudaMemcpy
6.09% 314.89ms 1 314.89ms 314.89ms 314.89ms cudaMalloc
0.11% 5.4296ms 4 1.3574ms 691.18us 3.3429ms cuDeviceTotalMem
0.10% 4.9644ms 388 12.794us 317ns 535.35us cuDeviceGetAttribute
0.01% 454.66us 4 113.66us 103.24us 134.26us cuDeviceGetName
0.00% 65.032us 1 65.032us 65.032us 65.032us cudaLaunchKernel
0.00% 24.906us 4 6.2260us 3.2890us 10.160us cuDeviceGetPCIBusId
0.00% 8.2490us 8 1.0310us 533ns 1.5980us cuDeviceGet
0.00% 5.9930us 3 1.9970us 381ns 3.8870us cuDeviceGetCount
0.00% 2.8160us 4 704ns 600ns 880ns cuDeviceGetUuid
$
Notes:
as indicated above I have tested it for accuracy up through the first 2 billion results
The implementation above accounts for the fact that the solution of the cubic and quadratic equations in floating point introduces errors. These errors are "fixed" by creating a local integer search around the starting point given by the floating-point calculations, to produce the correct answer.
As indicated, the kernel above runs in ~30ms on my Tesla V100 for 1 billion results (10^9). If the methodology could correctly scale to 10^15 results, I have no reason to assume it would not take at least 0.03*10^6 seconds, or over 8 hours(!)
I haven't run the test, but I suspect that a quick benchmark of the simple case proposed in the question of simply generating the full domain (10^15) and then throwing away the ~5/6 of the space that did not apply, would be quicker.
Out of curiosity, I created an alternate test case that tests 31 out of each 32 values, across a larger space.
Here is the code and test:
$ cat t1485.cu
#include <stdio.h>
#include <math.h>
typedef float ct;
const int nTPB = 1024;
const int STEP_DOWN = 2;
// only float or double template types allowed
template <typename ft>
struct CN3{
__host__ __device__
int3 operator()(size_t N){
int3 n;
if (N == 0) {n.x = 2; n.y = 1; n.z = 0; return n;}
if (N == 1) {n.x = 3; n.y = 1; n.z = 0; return n;}
if (N == 2) {n.x = 3; n.y = 2; n.z = 0; return n;}
if (N == 3) {n.x = 3; n.y = 2; n.z = 1; return n;}
if (N == 4) {n.x = 4; n.y = 1; n.z = 0; return n;}
ft x, x1;
// identify n.x from cubic
// compiler computed
const ft a = 1.0/6;
const ft b = -1.0/2;
const ft c = 1.0/3;
const ft p1 = (-1.0)*(b*b*b)/(27.0*a*a*a) + b*c/(6.0*a*a);
const ft p2 = c/(3.0*a) - (b*b)/(9.0*a*a);
const ft p3 = p2*p2*p2;
const ft p4 = b/(3.0*a);
// run-time computed
//const ft d = -N;
const ft q0 = N/(2.0*a); // really should adjust constant for float vs. double
const ft q1 = p1 + q0;
const ft q2 = q1*q1;
if (sizeof(ft)==4){
x1 = sqrtf(q2+p3);
x = cbrtf(q1+x1) + cbrtf(q1-x1) - p4;
n.x = truncf(x);}
else {
x1 = sqrt(q2+p3);
x = cbrt(q1+x1) + cbrt(q1-x1) - p4;
n.x = trunc(x);}
/// fix floating-point errors
size_t tn = n.x - STEP_DOWN;
while ((tn)*(tn-1)*(tn-2)/6 <= N) tn++;
n.x = tn-1;
// identify n.y from quadratic
// compiler computed
const ft qa = 1.0/2;
//const ft qb = -qa;
const ft p5 = 1.0/4;
const ft p6 = 2.0;
// run-time computed
N = N - (((size_t)n.x)*(n.x-1)*(n.x-2))/6;
if (sizeof(ft)==4){
x = qa + sqrtf(p5+p6*N);
n.y = truncf(x);}
else {
x = qa + sqrt(p5+p6*N);
n.y = trunc(x);}
/// fix floating-point errors
if ((n.y - STEP_DOWN) <= 0) tn = 0;
else tn = n.y - STEP_DOWN;
while ((((tn)*(tn-1))>>1) <= N) tn++;
n.y = tn-1;
// identify n3
n.z = N - ((((size_t)n.y)*(n.y-1))>>1);
return n;
}
};
__host__ __device__
int3 get_next_C3(int3 prev){
int3 res = prev;
res.z++;
if (res.z >= res.y){
res.y++; res.z = 0;
if (res.y >= res.x){res.x++; res.y = 1; res.z = 0;}}
return res;
}
template <typename T>
__global__ void test(T f){
size_t idx = threadIdx.x+((size_t)blockDim.x)*blockIdx.x;
size_t idy = threadIdx.y+((size_t)blockDim.y)*blockIdx.y;
size_t id = idx + idy*gridDim.x*blockDim.x;
int3 temp = f(id);
int3 temp2;
temp2.x = __shfl_up_sync(0xFFFFFFFF, temp.x, 1);
temp2.y = __shfl_up_sync(0xFFFFFFFF, temp.y, 1);
temp2.z = __shfl_up_sync(0xFFFFFFFF, temp.z, 1);
temp2 = get_next_C3(temp2);
if ((threadIdx.x & 31) != 0)
if ((temp.x != temp2.x) || (temp.y != temp2.y) || (temp.z != temp2.z)) printf("%lu,%d,%d,%d,%d,%d,%d\n", id, temp.x, temp.y, temp.z, temp2.x, temp2.y, temp2.z);
}
int main(int argc, char* argv[]){
const size_t nbx = 200000000ULL;
const int nby = 100;
dim3 block(nbx, nby, 1);
test<<<block,nTPB>>>(CN3<ct>());
cudaDeviceSynchronize();
cudaError_t e = cudaGetLastError();
if (e != cudaSuccess) {printf("CUDA error %s\n", e); return 0;}
printf("tested space of size: %lu\n", nbx*nby*nTPB);
}
$ nvcc -arch=sm_70 -o t1485 t1485.cu
$ time ./t1485
tested space of size: 20480000000000
real 25m18.133s
user 18m4.804s
sys 7m12.782s
Here we see that the Tesla V100 took about 30 minutes to accuracy test a space of 20480000000000 results (about 2 * 10^13).
Closed. This question is off-topic. It is not currently accepting answers.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
The year 2009 is coming to an end, and with the economy and all, we'll save our money and instead of buying expensive fireworks, we'll celebrate in ASCII art this year.
The challenge
Given a set of fireworks and a time, take a picture of the firework at that very time and draw it to the console.
The best solution entered before midnight on New Year's Eve (UTC) will receive a bounty of 500 rep. This is code golf, so the number of characters counts heavily; however so do community votes, and I reserve the ultimate decision as to what is best/coolest/most creative/etc.
Input Data
Note that our coordinate system is left-to-right, bottom-to-top, so all fireworks are launched at a y-coordinate of 0 (zero).
The input data consists of fireworks of the form
(x, speed_x, speed_y, launch_time, detonation_time)
where
x is the position (column) where the firework is launched,
speed_x and speed_y are the horizontal and vertical velocity of the firework at launch time,
launch_time is the point in time that this firework is launched,
detonation_time is the point in time that this firework will detonate.
The firework data may be hardcoded in your program as a list of 5-tuples (or the equivalent in your language), not counting towards your character count. It must, however, be easy to change this data.
You may make the following assumptions:
there is a reasonable amount of fireworks (say, fewer then a hundred)
for each firework, all five numbers are integers within a reasonable range (say, 16 bits would suffice for each),
-20 <= x <= 820
-20 <= speed_x <= 20
0 < speed_y <= 20
launch_time >= 0
launch_time < detonation_time < launch_time + 50
The single additional piece of input data is the point of time which is supposed to be rendered. This is a non-negative integer that is given to you via standard input or command line argument (whichever you choose).
The idea is that (assuming your program is a python script called firework.py) this bash script gives you a nice firework animation:
#!/bin/bash
I=0
while (( 1 )) ; do
python firework.py $I
I=$(( $I + 1 ))
done
(feel free to put the equivalent .BAT file here).
Life of a firework
The life of a firework is as follows:
Before the launch time, it can be ignored.
At launch time, the rocket has the position (x, 0) and the speed vector (speed_x, speed_y).
For each time step, the speed vector is added to the position. With a little stretch applied to Newton's laws, we assume that the speed stays constant.
At detonation time, the rocket explodes into nine sparks. All nine sparks have the same position at this point in time (which is the position that the rocket would have, hadn't it exploded),
but their speeds differ. Each speed is based on the rocket's speed, with -20, 0, or 20 added to speed_x and -10, 0, or 10 added to speed_y. That's nine possible combinations.
After detonation time, gravity starts to pull: With each time step, the gravitational constant, which happens to be 2 (two), is subtracted from every spark's speed_y.
The horizontal speed_x stays constant.
For each time step after the detonation time, you first add the speed vector to the position, then subtract 2 from speed_y.
When a spark's y position drops below zero, you may forget about it.
Output
What we want is a picture of the firework the way it looks at the given point in time. We only look at the frame 0 <= x <= 789 and 0 <= y <= 239, mapping it to a 79x24 character output.
So if a rocket or spark has the position (247, 130), we draw a character in column 24 (zero-based, so it's the 25th column), row 13 (zero-based and counting from the bottom, so it's line 23 - 13 = 10, the 11th line
of the output).
Which character gets drawn depends on the current speed of the rocket / spark:
If the movement is horizontal*, i.e. speed_y == 0 or abs(speed_x) / abs(speed_y) > 2, the character is "-".
If the movement is vertical*, i.e. speed_x == 0 or abs(speed_y) / abs(speed_x) > 2, the character is "|".
Otherwise the movement is diagonal, and the character is "\" or "/" (you'll guess the right one).
If the same position gets drawn to more than once (even if it's the same character), we put "X" instead. So assuming you have a spark at (536, 119) and one at (531, 115), you draw an "X", regardless of their speeds.
* update: these are integer divisions, so the slope has to be at least 3, or at most 1/3, respectively
The output (written to standard output) is 24 lines, each terminated by a newline character. Trailing spaces are ignored, so you may, but don't need to, pad to a width of 79. The lines may not be longer than 79 characters (excluding the newline). All interior spacing must be space characters (ASCII 32).
Sample Data
Fireworks:
fireworks = [(628, 6, 6, 3, 33),
(586, 7, 11, 11, 23),
(185, -1, 17, 24, 28),
(189, 14, 10, 50, 83),
(180, 7, 5, 70, 77),
(538, -7, 7, 70, 105),
(510, -11, 19, 71, 106),
(220, -9, 7, 77, 100),
(136, 4, 14, 80, 91),
(337, -13, 20, 106, 128)]
Output at time 33:
\ | /
/ \
- | /
- | -
/ \
Output at time 77:
\
\
X
\
Output at time 93:
\ | /
\ / /
- - - \
\
/ \ \
Update: I have uploaded the expected output at the times 0 thru 99 to firework.ü-wie-geek.de/NUMBER.html, where NUMBER is the time. It includes debug information; click on a particle to see its current position, speed, etc. And yes, it's an umlaut domain. If your browser can't handle that (as obviously neither can Stack Overflow), try firework.xn---wie-geek-p9a.de.
Another update: As hinted at in the comments below, a longer firework is now available on YouTube. It was created with a modified version of MizardX' entry, with a total fireworks count of 170 (yes, that's more than the spec asked for, but the program handled it gracefully). Except for the color, the music, and the end screen, the animation can be recreated by any entry to this code golf. So, if you're geeky enough to enjoy an ASCII art firework (you know you are): Have fun, and a happy new year to all!
Heres my solution in Python:
c = [(628, 6, 6, 3, 33),
(586, 7, 11, 11, 23),
(185, -1, 17, 24, 28),
(189, 14, 10, 50, 83),
(180, 7, 5, 70, 77),
(538, -7, 7, 70, 105),
(510, -11, 19, 71, 106),
(220, -9, 7, 77, 100),
(136, 4, 14, 80, 91),
(337, -13, 20, 106, 128)]
t=input()
z=' '
s=([z]*79+['\n'])*23+[z]*79
def p(x,y,i,j):
if 0<=x<790 and 0<=y<240:p=x/10+(23-y/10)*80;I=abs(i);J=abs(j);s[p]='X-|\\/'[[s[p]!=z,I>=3*J,J>=3*I,i*j<0,1].index(1)]
for x,i,j,l,d in c:
T=t-l;x+=i*T
if t>=d:e=t-d;[p(x+X*e,j*T+e*(Y-e+1),i+X,j+Y-2*e)for X in -20,0,20 for Y in -10,0,10]
elif t>=l:p(x,j*T,i,j)
print ''.join(s)
Takes the time from the stdin and it has the nice number of 342 characters. I'm still trying to imagine how the OP got 320 :P
Edit:
This is the best I could get, 322 chars acording to wc
t=input()
s=([' ']*79+['\n'])*24
def p(x,y,i,j):
if 790>x>-1<y<240:p=x/10+(23-y/10)*80;I=abs(i);J=abs(j);s[p]='X-|\\/'[[s[p]>' ',I>=3*J,J>=3*I,i*j<0,1].index(1)]
for x,i,j,l,d in c:
T=t-l;x+=i*T;e=t-d
if t>=d:[p(x+X*e,j*T+e*(Y-e+1),i+X,j+Y-2*e)for X in-20,0,20for Y in-10,0,10]
elif t>=l:p(x,j*T,i,j)
print''.join(s),
Now that the winner is chosen – congratulations to Juan – here is my own solution, 304 characters in Python:
t=input()
Q=range
for y in Q(24):print"".join((["\\/|-"[3*(h*h>=9*v*v)or(v*v>=9*h*h)*2or h*v>0]for X,H,V,L,D in F for s,Z,K,P,u in[(t-D,t>D,t-L,G%3*20-20,G/3*10-10)for G in[Q(9),[4]][t<D]]for h,v in[[P+H,u+V-2*s*Z]]if((X+H*K+P*s)/10,23-(V*K-s*(Z*s-Z-u))/10)==(x,y)][:2]+[" ","X"])[::3][-1]for x in Q(79))
This is not really fast, because for each point in the 79x24 display, it loops through all fireworks to see if any of them is visible at this point.
Here is a version that tries to explain what's going on:
t=input()
Q=range
for y in Q(24):
line = ""
for x in Q(79):
chars = [] # will hold all characters that should be drawn at (x, y)
for X,H,V,L,D in F: # loop through the fireworks
s = t - D
Z = t > D
K = t - L
# if t < D, i.e. the rocket hasn't exploded yet, this is just [(0, 0)];
# otherwise it's all combinations of (-20, 0, 20) for x and (-10, 0, 10)
speed_deltas = [(G % 3 * 20 - 20, G / 3 * 10 -10) for G in [Q(9), [4]][t < D]]
for P, u in speed_deltas:
if x == (X + H*K + P*s)/10 and y == 23 - (V*K - s*(Z*s - Z - u))/10:
# the current horizontal and vertical speed of the particle
h = P + H
v = u + V - 2*s*Z
# this is identical to (but shorter than) abs(h) >= 3 * abs(v)
is_horizontal = h*h >= 9*v*v
is_vertical = v*v >= 9*h*h
is_northeast_southwest = h*v > 0
# a shorter way of saying
# char_index = (3 if is_horizontal else 2 if is_vertical else 1
# if is_northeast_southwest else 0)
char_index = 3 * is_horizontal or 2 * is_vertical or is_northeast_southwest
chars.append("\\/|-"[char_index])
# chars now contains all characters to be drawn to this point. So we have
# three possibilities: If chars is empty, we draw a space. If chars has
# one element, that's what we draw. And if chars has more than one element,
# we draw an "X".
actual_char = (chars[:2] + [" ", "X"])[::3][-1] # Yes, this does the trick.
line += actual_char
print line
Python:
fireworks = [(628, 6, 6, 3, 33),
(586, 7, 11, 11, 23),
(185, -1, 17, 24, 28),
(189, 14, 10, 50, 83),
(180, 7, 5, 70, 77),
(538, -7, 7, 70, 105),
(510, -11, 19, 71, 106),
(220, -9, 7, 77, 100),
(136, 4, 14, 80, 91),
(337, -13, 20, 106, 128)]
import sys
t = int(sys.argv[1])
particles = []
for x, speed_x, speed_y, launch_time, detonation_time in fireworks:
if t < launch_time:
pass
elif t < detonation_time:
x += speed_x * (t - launch_time)
y = speed_y * (t - launch_time)
particles.append((x, y, speed_x, speed_y))
else:
travel_time = t - detonation_time
x += (t - launch_time) * speed_x
y = (t - launch_time) * speed_y - travel_time * (travel_time - 1)
for dx in (-20, 0, 20):
for dy in (-10, 0, 10):
x1 = x + dx * travel_time
y1 = y + dy * travel_time
speed_x_1 = speed_x + dx
speed_y_1 = speed_y + dy - 2 * travel_time
particles.append((x1, y1, speed_x_1, speed_y_1))
rows = [[' '] * 79 for y in xrange(24)]
for x, y, speed_x, speed_y in particles:
x, y = x // 10, y // 10
if 0 <= x < 79 and 0 <= y < 24:
row = rows[23 - y]
if row[x] != ' ': row[x] = 'X'
elif speed_y == 0 or abs(speed_x) // abs(speed_y) > 2: row[x] = '-'
elif speed_x == 0 or abs(speed_y) // abs(speed_x) > 2: row[x] = '|'
elif speed_x * speed_y < 0: row[x] = '\\'
else: row[x] = '/'
print '\n'.join(''.join(row) for row in rows)
If you remove the initial fireworks declaration, compress variable-names to single characters, and whitespace to a minimum, you can get 590 characters.
C:
With all unnecessary whitespace removed (632 bytes excluding the fireworks declaration):
#define N 10
int F[][5]={628,6,6,3,33,586,7,11,11,23,185,-1,17,24,28,189,14,10,50,83,180,7,5,70,77,538,-7,7,70,105,510,-11,19,71,106,220,-9,7,77,100,136,4,14,80,91,337,-13,20,106,128};
#define G F[i]
#define R P[p]
g(x,y){if(y==0||abs(x)/abs(y)>2)return 45;if(x==0||abs(y)/abs(x)>2)return'|';if(x*y<0)return 92;return 47;}main(int A,char**B){int a,b,c,C[24][79]={},d,i,j,p=0,P[N*9][3],Q,t=atoi(B[1]),x,y;for(i=0;i<N;i++){if(t>=G[3]){a=t-G[3];x=G[0]+G[1]*a;y=G[2]*a;if(t<G[4]){R[0]=x;R[1]=y;R[2]=g(G[1],G[2]);p++;}else{b=t-G[4];y-=b*(b-1);for(c=-20;c<=20;c+=20){for(d=-10;d<=10;d+=10){R[0]=x+c*b;R[1]=y+d*b;R[2]=g(G[1]+c,G[2]+d-2*b);p++;}}}}}Q=p;for(p=0;p<Q;p++){x=R[0]/10;y=R[1]/10;if(R[0]>=0&&x<79&&R[1]>=0&&y<24)C[y][x]=C[y][x]?88:R[2];}for(i=23;i>=0;i--){for(j=0;j<79;j++)putchar(C[i][j]?C[i][j]:32);putchar(10);}}
And here's the exact same code with whitespace added for readability:
#define N 10
int F[][5] = {
628, 6, 6, 3, 33,
586, 7, 11, 11, 23,
185, -1, 17, 24, 28,
189, 14, 10, 50, 83,
180, 7, 5, 70, 77,
538, -7, 7, 70, 105,
510, -11, 19, 71, 106,
220, -9, 7, 77, 100,
136, 4, 14, 80, 91,
337, -13, 20, 106, 128
};
#define G F[i]
#define R P[p]
g(x, y) {
if(y == 0 || abs(x)/abs(y) > 2)
return 45;
if(x == 0 || abs(y)/abs(x) > 2)
return '|';
if(x*y < 0)
return 92;
return 47;
}
main(int A, char**B){
int a, b, c, C[24][79] = {}, d, i, j, p = 0, P[N*9][3], Q, t = atoi(B[1]), x, y;
for(i = 0; i < N; i++) {
if(t >= G[3]) {
a = t - G[3];
x = G[0] + G[1]*a;
y = G[2]*a;
if(t < G[4]) {
R[0] = x;
R[1] = y;
R[2] = g(G[1], G[2]);
p++;
} else {
b = t - G[4];
y -= b*(b-1);
for(c = -20; c <= 20; c += 20) {
for(d =- 10; d <= 10; d += 10) {
R[0] = x + c*b;
R[1] = y + d*b;
R[2] = g(G[1] + c, G[2] + d - 2*b);
p++;
}
}
}
}
}
Q = p;
for(p = 0; p < Q; p++) {
x = R[0]/10;
y = R[1]/10;
if(R[0] >= 0 && x < 79 && R[1] >= 0 && y < 24)
C[y][x] = C[y][x] ? 88 : R[2];
}
for(i = 23; i >= 0; i--) {
for(j = 0; j < 79; j++)
putchar(C[i][j] ? C[i][j] : 32);
putchar(10);
}
}
For Python, #MizardX's solution is nice, but clearly not codegolf-optimized -- besides the "don't really count" 333 characters of the prefix, namely:
fireworks = [(628, 6, 6, 3, 33),
(586, 7, 11, 11, 23),
(185, -1, 17, 24, 28),
(189, 14, 10, 50, 83),
(180, 7, 5, 70, 77),
(538, -7, 7, 70, 105),
(510, -11, 19, 71, 106),
(220, -9, 7, 77, 100),
(136, 4, 14, 80, 91),
(337, -13, 20, 106, 128)]
f = fireworks
### int sys argv append abs join f xrange
(the last comment is a helper for a little codegolf-aux script of mine that makes all feasible names 1-char mechanically -- it needs to be told what names NOT to minify;-), the shortest I can make that solution by squeezing whitespace is 592 characters (close enough to the 590 #MizardX claims).
Pulling out all the stops ("refactoring" the code in a codegolf mood), I get, after the prefix (I've used lowercase for single-character names I'm manually introducing or substituting, uppercase for those my codegolf-aux script substituted automatically):
import sys
Z=int(sys.argv[1])
Y=[]
e=Y.extend
for X,W,V,U,T in f:
if Z>=U:
z=Z-U;X+=W*z
if Z<T:e(((X,V*z,W,V),))
else:R=Z-T;e((X+Q*R,z*V-R*(R-1)+P*R,W+Q,V+P-2*R)for Q in(-20,0,20)for P in(-10,0,10))
K=[79*[' ']for S in range(24)]
for X,S,W,V in Y:
X,S=X/10,S/10
if(0<=X<79)&(0<=S<24):
J=K[23-S];v=abs(V);w=abs(W)
J[X]='X'if J[X]!=' 'else'-'if V==0 or w/v>2 else'|'if W==0 or v/w>2 else '\\'if W*V<0 else'/'
print '\n'.join(''.join(J)for J in K)
which measures in at 460 characters -- that's a reduction of 130, i.e. 130/590 = 22%.
Beyond 1-character names and obvious ways to minimize spacing, the key ideas include: single / for division (same as the nicer // for ints in Python 2.*), an if/else expression in lieu of an if/elif/else statement, extend with a genexp rather than a nested loop with append (allows the removal of some spaces and punctuation), not binding to a name subexpressions that occur just once, binding to a name subexpressions that would otherwise get repeated (including the .extend attribute lookup), semicolons rather than newlines where feasible (only if the separate lines would have to be indented, otherwise, counting a newline as 1 character, there is no saving).
Yep, readability suffers a bit, but that's hardly surprising in code golf;-).
Edit: after a lot more tightening, I now have a smaller program (same prefix):
Z=input()
K=[79*[' ']for S in range(24)];a=-10,0,10
def g(X,S,W,V):
X/=10;S/=10
if(0<=X<79)&(0<=S<24):J=K[23-S];v=abs(V);w=abs(W);J[X]=[[['/\\'[W*V<0],'|'][v>2.9*w],'-'][w>2.9*v],'X'][J[X]!=' ']
for X,W,V,U,T in f:
if Z>=U:
z=Z-U;X+=W*z
if Z<T:g(X,V*z,W,V)
else:R=Z-T;[g(X+Q*2*R,z*V-R*(R-1)+P*R,W+Q*2,V+P-2*R)for Q in a for P in a]
print'\n'.join(''.join(J)for J in K)
Still the same output, but now 360 characters -- exactly 100 fewer than my previous solution, which i've left as the first part of this answer (still well above the 320 the OP says he has, though!-).
I've taken advantage of the degree of freedom allowing the input-time value to come from stdin (input is much tighter than importing sys and using sys.argv[1]!-), eliminated the intermediate list (w/the extend calls and a final loop of it) in favor of the new function g which gets called directly and updates K as we go, found and removed some commonality, refactored the nested if/else expression into a complicated (but more concise;-) building and indexing of nested lists, used the fact that v>2.9*w is more concise than w==0 or v/w>2 (and always gives the same result in the range of values that are to be considered).
Edit: making K (the "screen image") into a 1-D list saves a further 26 characters, shrinking the following solution to 334 (still 14 above the OP's, but closing up...!-):
Z=input()
K=list(24*(' '*79+'\n'))
a=-10,0,10
def g(X,S,W,V):
if(0<=X<790)&(0<=S<240):j=80*(23-S/10)+X/10;v=abs(V);w=abs(W);K[j]=[[['/\\'[W*V<0],'|'][v>2.9*w],'-'][w>2.9*v],'X'][K[j]!=' ']
for X,W,V,U,T in f:
if Z>=U:
z=Z-U;X+=W*z
if Z<T:g(X,V*z,W,V)
else:R=Z-T;[g(X+Q*2*R,z*V-R*(R-1)+P*R,W+Q*2,V+P-2*R)for Q in a for P in a]
print ''.join(K),
Done in F# in 957* characters, and it's ugly as sin:
Array of fireworks:
let F = [(628,6,6,3,33);(586,7,11,11,23);(185,-1,17,24,28);(189,14,10,50,83);(180,7,5,70,77);(538,-7,7,70,105);(510,-11,19,71,106);(220,-9,7,77,100);(136,4,14,80,91);(337,-13,20,106,128)]
Remaining code
let M=List.map
let C=List.concat
let P=List.partition
let L t f r=(let s=P(fun(_,_,_,u,_)->not(t=u))f
(fst s, r#(M(fun(x,v,w,_,t)->x,0,v,w,t)(snd s))))
let X d e (x,y,v,w)=C(M(fun(x,y,v,w)->[x,y,v-d,w;x,y,v,w;x,y,v+d,w])[x,y,v,w-e;x,y,v,w;x,y,v,w+e])
let D t r s=(let P=P(fun(_,_,_,_,u)->not(t=u))r
(fst P,s#C(M(fun(x,y,v,w,_)->(X 20 10(x,y,v,w)))(snd P))))
let rec E t l f r s=(
let(a,m)=L t f (M(fun(x,y,v,w,t)->x+v,y+w,v,w,t)r)
let(b,c)=D t m (M(fun(x,y,v,w)->x+v,y+w,v,w-2)s)
if(t=l)then(a,b,c)else E(t+1)l a b c)
let N=printf
let G t=(
let(f,r,s)=E 0 t F [] []
let os=s#(M(fun(x,y,v,w,_)->(x,y,v,w))r)
for y=23 downto 0 do (
for x=0 to 79 do (
let o=List.filter(fun(v,w,_,_)->((v/10)=x)&&((w/10)=y))os
let l=o.Length
if l=0 then N" "
elif l=1 then
let(_,_,x,y)=o.Head
N(
if y=0||abs(x)/abs(y)>2 then"-"
elif x=0||abs(y)/abs(x)>2 then"|"
elif y*x>0 then"/"
else"\\")
elif o.Length>1 then N"X")
N"\n"))
[<EntryPointAttribute>]
let Z a=
G (int(a.[0]))
0
"Pretty" code:
let fxs = [(628,6,6,3,33);(586,7,11,11,23);(185,-1,17,24,28);(189,14,10,50,83);(180,7,5,70,77);(538,-7,7,70,105);(510,-11,19,71,106);(220,-9,7,77,100);(136,4,14,80,91);(337,-13,20,106,128)]
let movs xs =
List.map (fun (x, y, vx, vy) -> (x + vx, y + vy, vx, vy-2)) xs
let movr xs =
List.map (fun (x, y, vx, vy, dt) -> (x + vx, y + vy, vx, vy, dt)) xs
let launch t fs rs =
let split = List.partition(fun (lx, sx, sy, lt, dt) -> not (t = lt)) fs
(fst split, rs # (List.map(fun (lx, sx, sy, lt, dt) -> (lx, 0, sx, sy, dt)) (snd split)))
let split dx dy (x,y,sx,sy) =
List.concat (List.map (fun (x,y,sx,sy)->[(x,y,sx-dx,sy);(x,y,sx,sy);(x,y,sx+dx,sy)]) [(x,y,sx,sy-dy);(x,y,sx,sy);(x,y,sx,sy+dy)])
let detonate t rs ss =
let tmp = List.partition (fun (x, y, sx, sy, dt) -> not (t = dt)) rs
(fst tmp, ss # List.concat (List.map(fun (x, y, sx, sy, dt) -> (split 20 10 (x, y, sx, sy))) (snd tmp)))
let rec simulate t l fs rs ss =
let (nfs, trs) = launch t fs (movr rs)
let (nrs, nss) = detonate t trs (movs ss)
if (t = l) then (nfs,nrs,nss)
else
simulate (t+1) l nfs nrs nss
let screen t =
let (fs, rs, ss) = simulate 0 t fxs [] []
let os = ss # (List.map(fun (x, y, sx, sy,_) -> (x, y, sx, sy)) rs)
for y = 23 downto 0 do
for x = 0 to 79 do
let o = List.filter(fun (px,py,_,_)->((px/10)=x) && ((py/10)=y)) os
if o.Length = 0 then printf " "
elif o.Length = 1 then
let (_,_,sx,sy) = o.Head
printf (
if sy = 0 || abs(sx) / abs(sy) > 2 then "-"
elif sx = 0 || abs(sy) / abs(sx) > 2 then "|"
elif sy * sx > 0 then "/"
else"\\"
)
elif o.Length > 1 then printf "X"
printfn ""
[<EntryPointAttribute>]
let main args =
screen (int(args.[0]))
0
Completely stolenrewritten with new and improved logic. This is as close as I could get to Python. You can see the weakness of F# not being geared toward ad hoc scripting here, where I have to explicitly convert V and W to a float, declare a main function with an ugly attribute to get the command line args, and I have to reference the .NET System.Console.Write to get a pretty output.
Oh well, good exercise to learn a language with.
Here's the new code, at 544 bytes:
let Q p t f=if p then t else f
let K=[|for i in 1..1920->Q(i%80>0)' ''\n'|]
let g(X,S,W,V)=
if(X>=0&&X<790&&S>=0&&S<240)then(
let (j,v,w)=(80*(23-S/10)+X/10,abs(float V),abs(float W))
Array.set K j (Q(K.[j]=' ')(Q(w>2.9*v)'-'(Q(v>2.9*w)'|'(Q(W*V>0)'/''\\')))'X'))
let a=[-10;0;10]
[<EntryPointAttribute>]
let m s=
let Z=int s.[0]
for (X,W,V,U,T) in F do(
if Z>=U then
let z,R=Z-U,Z-T
let x=X+W*z
if(Z<T)then(g(x,V*z,W,V))else(for e in[|for i in a do for j in a->(x+j*2*R,z*V-R*(R-1)+i*R,W+j*2,V+i-2*R)|]do g e))
System.Console.Write K
0
Haskell
import Data.List
f=[(628,6,6,3,33),(586,7,11,11,23),(185,-1,17,24,28),(189,14,10,50,83),(180,7,5,70,77),(538,-7,7,70,105),(510,-11,19,71,106),(220,-9,7,77,100),(136,4,14,80,91),(337,-13,20,106,128)]
c=filter
d=True
e=map
a(_,_,_,t,_)=t
b(_,_,_,_,t)=t
aa(_,y,_,_)=y
ab(x,t,y,_,u)=(x,0,t,y,u)
ac(x,y,t,u,_)=[(x,y,t+20,u+10),(x,y,t,u+10),(x,y,t-20,u+10),(x,y,t+20,u),(x,y,t,u),(x,y,t-20,u),(x,y,t+20,u-10),(x,y,t,u-10),(x,y,t-20,u-10)]
g(x,y,t,u,v)=(x+t,y+u,t,u,v)
h(x,y,t,u)=(x+t,y+u,t,u-2)
i=(1,f,[],[])
j s 0=s
j(t,u,v,w)i=j(t+1,c((/=t).a)u,c((> t).b)x++(e ab.c((==t).a))u,c((>0).aa)(e h w)++(concat.e ac.c((==t).b))x)(i-1)
where x=e g v
k x y
|x==0='|'
|3*abs y<=abs x='-'
|3*abs x<=abs y='|'
|(y<0&&x>0)||(y>0&&x<0)='\\'
|d='/'
l(x,y,t,u,_)=m(x,y,t,u)
m(x,y,t,u)=(div x 10,23-div y 10,k t u)
n(x,y,_)(u,v,_)
|z==EQ=compare x u
|d=z
where z=compare y v
o((x,y,t):(u,v,w):z)
|x==u&&y==v=o((x,y,'X'):z)
|d=(x,y,t):(o((u,v,w):z))
o x=x
q _ y []
|y==23=""
|d='\n':(q 0(y+1)[])
q v u((x,y,z):t)
|u>22=""
|v>78='\n':(q 0(u+1)((x,y,z):t))
|u/=y='\n':(q 0(u+1)((x,y,z):t))
|v/=x=' ':(q(v+1)u((x,y,z):t))
|d = z:(q(v+1)u t)
p(_,_,v,w)=q 0 0((c z.o.sortBy n)((e l v)++(e m w)))
where z(x,y,_)=x>=0&&x<79&&y>=0
r x=do{z <- getChar;(putStr.p)x}
s=e(r.j i)[1..]
main=foldr(>>)(return())s
Not nearly as impressive as MizardX's, coming in at 1068 characters if you remove the f=… declaration, but hell, it was fun. It's been a while since I've had a chance to play with Haskell.
The (slightly) prettier version is also available.
Edit: Ack. Rereading, I don't quite meet the spec.: this version prints a new screen of firework display every time you press a key, and requires ^C to quit; it doesn't take a command line argument and print out the relevant screen.
Perl
Assuming firework data is defined as:
#f = (
[628, 6, 6, 3, 33],
[586, 7, 11, 11, 23],
[185, -1, 17, 24, 28],
[189, 14, 10, 50, 83],
[180, 7, 5, 70, 77],
[538, -7, 7, 70, 105],
[510, -11, 19, 71, 106],
[220, -9, 7, 77, 100],
[136, 4, 14, 80, 91],
[337, -13, 20, 106, 128]
);
$t=shift;
for(#f){
($x,$c,$d,$l,$e)=#$_;
$u=$t-$l;
next if$u<0;
$x+=$c*$u;
$h=$t-$e;
push#p,$t<$e?[$x,$d*$u,$c,$d]:map{$f=$_;map{[$x+$f*$h,($u*$d-$h*($h-1))+$_*$h,$c+$f,$d+$_-2*$h]}(-10,0,10)}(-20,0,20)
}
push#r,[($")x79]for(1..24);
for(#p){
($x,$y,$c,$d)=#$_;
if (0 <= $x && ($x=int$x/10) < 79 && 0 <= $y && ($y=int$y/10) < 24) {
#$_[$x]=#$_[$x]ne$"?'X':!$d||abs int$c/$d>2?'-':!$c||abs int$d/$c>2?'|':$c*$d<0?'\\':'/'for$r[23 - $y]
}
}
$"='';
print$.,map{"#$_\n"}#r
Compressed, it comes in at 433 characters. (see edits for history)
This is based off of pieces of multiple previous answers (mostly MizardX's) and can definitely be improved upon. The guilt of procrastinating other, job-related tasks means i have to give up for now.
Forgive the edit -- pulling out all of the tricks I know, this can be compressed to 356 char:
sub p{
($X,$=,$C,$D)=#_;
if(0<=$X&($X/=10)<79&0<=$=&($=/=10)<24){
#$_[$X]=#$_[$X]ne$"?X:$D&&abs$C/$D<3?$C&&abs$D/$C<3?
$C*$D<0?'\\':'/':'|':'-'for$r[23-$=]
}
}
#r=map[($")x79],1..24;
$t=pop;
for(#f){
($x,$c,$d,$u,$e)=#$_;
$x-=$c*($u-=$t);
$u>0?1:($h=$t-$e)<0
?p$x,-$d*$u,$c,$d
:map{for$g(-10,0,10){p$x+$_*$h,$h*(1-$h+$g)-$u*$d,$c+$_,$d+$g-2*$h}}-20,0,20
}
print#$_,$/for#r
$= is a special Perl variable (along with $%, $-, and $?) that can only take on integer values. Using it eliminates the need to use the int function.
FORTRAN 77
From the prehistoric languages department, here's my entry – in FORTRAN 77.
2570 chars including the initialization, a handful of spaces and some unnecessary whitespace, but I don't think it's likely to win for brevity. Especially since e.g. 6 leading spaces in each line are mandatory.
I called this file fireworks.ftn and compiled it with gfortran on a Linux system.
implicit integer(a-z)
parameter (n=10)
integer fw(5,n) /
+ 628, 6, 6, 3, 33,
+ 586, 7, 11, 11, 23,
+ 185, -1, 17, 24, 28,
+ 189, 14, 10, 50, 83,
+ 180, 7, 5, 70, 77,
+ 538, -7, 7, 70, 105,
+ 510, -11, 19, 71, 106,
+ 220, -9, 7, 77, 100,
+ 136, 4, 14, 80, 91,
+ 337, -13, 20, 106, 128
+ /
integer p(6, 1000) / 6000 * -1 /
character*79 s(0:23)
character z
c Transform input
do 10 r=1,n
p(1, r) = 0
do 10 c=1,5
10 p(c+1, r) = fw(c, r)
c Input end time
read *, t9
c Iterate from 1 to end time
do 62 t=1,t9
do 61 q=1,1000
if (p(1,q) .lt. 0 .or. t .lt. p(5,q)) goto 61
if (p(6,q).gt.0.and.t.gt.p(5,q) .or. t.gt.abs(p(6,q))) then
p(1,q) = p(1,q) + p(4,q)
p(2,q) = p(2,q) + p(3,q)
endif
if (t .lt. abs(p(6,q))) goto 61
if (t .gt. abs(p(6,q))) then
p(4,q) = p(4,q) - 2
elseif (t .eq. p(6,q)) then
c Detonation: Build 9 sparks
do 52 m=-1,1
do 51 k=-1,1
c Find a free entry in p and fill it with a spark
do 40 f=1,1000
if (p(1,f) .lt. 0) then
do 20 j=1,6
20 p(j,f) = p(j,q)
p(3,f) = p(3,q) + 20 * m
p(4,f) = p(4,q) + 10 * k
p(6,f) = -p(6,q)
goto 51
endif
40 continue
51 continue
52 continue
c Delete the original firework
p(1,q) = -1
endif
61 continue
62 continue
c Prepare output
do 70 r=0,23
70 s(r) = ' '
do 80 q=1,1000
if (p(1,q) .lt. 0) goto 80
if (p(5,q) .gt. t9) goto 80
y = p(1,q) / 10
if (y .lt. 0 .or. y .gt. 23) goto 80
x = p(2,q) / 10
if (x .lt. 0 .or. x .gt. 79) goto 80
if (s(y)(x+1:x+1) .ne. ' ') then
z = 'X'
elseif ((p(4,q) .eq. 0) .or. abs(p(3,q) / p(4,q)) .gt. 2) then
z = '-'
elseif ((p(3,q) .eq. 0) .or. abs(p(4,q) / p(3,q)) .gt. 2) then
z = '|'
elseif (sign(1, p(3,q)) .eq. sign(1, p(4,q))) then
z = '/'
else
z = '\'
endif
s(y)(x+1:x+1) = z
80 continue
c Output
do 90 r=23,0,-1
90 print *, s(r)
end
Here's a smaller Haskell implementation. It's 911 characters; minus the fireworks definition, it's 732 characters:
import System
z=789
w=239
r=replicate
i=foldl
main=do{a<-getArgs;p(f[(628,6,6,3,33),(586,7,11,11,23),(185,-1,17,24,28),(189,14,10,50,83),(180,7,5,70,77),(538,-7,7,70,105),(510,-11,19,71,106),(220,-9,7,77,100),(136,4,14,80,91),(337,-13,20,106,128)](read(a!!0)::Int));}
p[]=return()
p(f:g)=do{putStrLn f;p g}
f s t=i(a t)(r 24(r 79' '))s
a t f(x,s,y,l,d)=if t<l then f else if t<d then c f((x+s*u,y*u),(s,y))else i c f(map(v(t-d)(o(d-l)(x,0)(s,y)))[(g s,h y)|g<-[id,(subtract 20),(+20)],h<-[id,(subtract 10),(+10)]])where u=t-l
v 0(x,y)(vx,vy)=((x,y),(vx,vy))
v t(x,y)(vx,vy)=v(t-1)(x+vx,y+vy)(vx,vy-2)
o t(x,y)(vx,vy)=(x+(vx*t),y+(vy*t))
c f((x,y),(vx,vy))=if x<0||x>=z||y<0||y>=w then f else(take m f)++[(take n r)++[if d/=' 'then 'x'else if vy==0||abs(vx`div`vy)>2 then '-'else if vx==0||abs(vy`div`vx)>2 then '|'else if vx*vy>=0 then '/'else '\\']++(drop(n+1)r)]++(drop(m+1)f)where{s=w-y;n=x`div`10;m=s`div`10;r=f!!m;d=r!!n}
Here's the non-compressed version for the curious:
import System
sizeX = 789
sizeY = 239
main = do
args <- getArgs
printFrame (frame fireworks (read (args !! 0) :: Int))
where
fireworks = [
(628, 6, 6, 3, 33),
(586, 7, 11, 11, 23),
(185, -1, 17, 24, 28),
(189, 14, 10, 50, 83),
(180, 7, 5, 70, 77),
(538, -7, 7, 70, 105),
(510, -11, 19, 71, 106),
(220, -9, 7, 77, 100),
(136, 4, 14, 80, 91),
(337, -13, 20, 106, 128)]
printFrame :: [String] -> IO ()
printFrame [] = return ()
printFrame (f:fs) = do
putStrLn f
printFrame fs
frame :: [(Int,Int,Int,Int,Int)] -> Int -> [String]
frame specs time =
foldl (applyFirework time)
(replicate 24 (replicate 79 ' ')) specs
applyFirework :: Int -> [String] -> (Int,Int,Int,Int,Int) -> [String]
applyFirework time frame (x,sx,sy,lt,dt) =
if time < lt then frame
else if time < dt then
drawChar frame
((x + sx * timeSinceLaunch, sy * timeSinceLaunch), (sx,sy))
else
foldl drawChar frame
(
map
(
posVelOverTime (time - dt)
(posOverTime (dt - lt) (x,0) (sx, sy))
)
[
(fx sx, fy sy) |
fx <- [id,(subtract 20),(+20)],
fy <- [id,(subtract 10),(+10)]
]
)
where timeSinceLaunch = time - lt
posVelOverTime :: Int -> (Int,Int) -> (Int,Int) -> ((Int,Int),(Int,Int))
posVelOverTime 0 (x,y) (vx,vy) = ((x,y),(vx,vy))
posVelOverTime time (x,y) (vx,vy) =
posVelOverTime (time - 1) (x+vx, y+vy) (vx, vy - 2)
posOverTime :: Int -> (Int,Int) -> (Int,Int) -> (Int,Int)
posOverTime time (x,y) (vx, vy) = (x + (vx * time), y + (vy * time))
drawChar :: [String] -> ((Int,Int),(Int,Int)) -> [String]
drawChar frame ((x,y),(vx,vy)) =
if x < 0 || x >= sizeX || y < 0 || y >= sizeY then frame
else
(take mappedY frame)
++
[
(take mappedX row)
++
[
if char /= ' ' then 'x'
else if vy == 0 || abs (vx `div` vy) > 2 then '-'
else if vx == 0 || abs (vy `div` vx) > 2 then '|'
else if vx * vy >= 0 then '/'
else '\\'
]
++ (drop (mappedX + 1) row)
]
++ (drop (mappedY + 1) frame)
where
reversedY = sizeY - y
mappedX = x `div` 10
mappedY = reversedY `div` 10
row = frame !! mappedY
char = row !! mappedX
First draft in Tcl8.5 913 bytes excluding fireworks definition:
set F {
628 6 6 3 33
586 7 11 11 23
185 -1 17 24 28
189 14 10 50 83
180 7 5 70 77
538 -7 7 70 105
510 -11 19 71 106
220 -9 7 77 100
136 4 14 80 91
337 -13 20 106 128
}
namespace import tcl::mathop::*
proc # {a args} {interp alias {} $a {} {*}$args}
# : proc
# = set
# D d p
# up upvar 1
# < append out
# _ foreach
# e info exists
# ? if
: P {s d t l} {+ $s [* $d [- $t $l]]}
: > x {= x}
: d {P x X y Y} {up $P p
= x [/ $x 10]
= y [/ $y 10]
= p($x,$y) [? [e p($x,$y)] {> X} elseif {
$Y==0||abs($X)/abs($Y)>2} {> -} elseif {
$X==0||abs($Y)/abs($X)>2} {> |} elseif {
$X*$Y<0} {> \\} {> /}]}
: r {P} {up $P p
= out ""
for {= y 23} {$y >= 0} {incr y -1} {
for {= x 0} {$x < 79} {incr x} {? {[e p($x,$y)]} {< $p($x,$y)} {< " "}}
< "\n"}
puts $out}
: s {F t} {array set p {}
_ {x X Y l d} $F {? {$t >= $l} {? {$t < $d} {= x [P $x $X $t $l]
= y [P 0 $Y $t $l]
D $x $X $y $Y} {= x [P $x $X $d $l]
= y [P 0 $Y $d $l]
= v [- $t $d]
_ dx {-20 0 20} {_ dy {-10 0 10} {= A [+ $X $dx]
= B [- [+ $Y $dy] [* 2 $v]]
= xx [P $x $A $v 0]
= yy [P $y $B $v 0]
D $xx $A $yy $B}}}}}
r p}
s $F [lindex $argv 0]
Optimized to the point of unreadability. Still looking for room to improve. Most of the compression basically uses command aliasing substituting single characters for command names. For example, function definitions are done using Forth-like : syntax.
Here's the uncompressed version:
namespace import tcl::mathop::*
set fireworks {
628 6 6 3 33
586 7 11 11 23
185 -1 17 24 28
189 14 10 50 83
180 7 5 70 77
538 -7 7 70 105
510 -11 19 71 106
220 -9 7 77 100
136 4 14 80 91
337 -13 20 106 128
}
proc position {start speed time launch} {
+ $start [* $speed [- $time $launch]]
}
proc give {x} {return $x}
proc draw {particles x speedX y speedY} {
upvar 1 $particles p
set x [/ $x 10]
set y [/ $y 10]
set p($x,$y) [if [info exists p($x,$y)] {
give X
} elseif {$speedY == 0 || abs(double($speedX))/abs($speedY) > 2} {
give -
} elseif {$speedX == 0 || abs(double($speedY))/abs($speedX) > 2} {
give |
} elseif {$speedX * $speedY < 0} {
give \\
} else {
give /
}
]
}
proc render {particles} {
upvar 1 $particles p
set out ""
for {set y 23} {$y >= 0} {incr y -1} {
for {set x 0} {$x < 79} {incr x} {
if {[info exists p($x,$y)]} {
append out $p($x,$y)
} else {
append out " "
}
}
append out "\n"
}
puts $out
}
proc show {fireworks time} {
array set particles {}
foreach {x speedX speedY launch detonate} $fireworks {
if {$time >= $launch} {
if {$time < $detonate} {
set x [position $x $speedX $time $launch]
set y [position 0 $speedY $time $launch]
draw particles $x $speedX $y $speedY
} else {
set x [position $x $speedX $detonate $launch]
set y [position 0 $speedY $detonate $launch]
set travel [- $time $detonate]
foreach dx {-20 0 20} {
foreach dy {-10 0 10} {
set speedXX [+ $speedX $dx]
set speedYY [- [+ $speedY $dy] [* 2 $travel]]
set xx [position $x $speedXX $travel 0]
set yy [position $y $speedYY $travel 0]
draw particles $xx $speedXX $yy $speedYY
}
}
}
}
}
render particles
}
show $fireworks [lindex $argv 0]
First Post hahaha
http://zipts.com/position.php?s=0
not my final submission but could not resist
Btw: Characters 937 not counting spaces (do we count spaces? )
My answer is at http://www.starenterprise.se/fireworks.html
all done in javascript.
and no I didn't bother to make it ashortap, I just wanted to see if I could.
Clojure
Unindented, without input output and unnecessary whitespace, it comes to 640 characters - exactly double the best value :( Thus, I'm not providing a "blank optimized" version in an attempt to win at brevity.
(def fw [
[628 6 6 3 33]
[586 7 11 11 23]
[185 -1 17 24 28]
[189 14 10 50 83]
[180 7 5 70 77]
[538 -7 7 70 105]
[510 -11 19 71 106]
[220 -9 7 77 100]
[136 4 14 80 91]
[337 -13 20 106 128]
])
(defn rr [x y u v dt g] (if (<= dt 0) [x y u v] (recur (+ x u) (+ y v) u (+ v g) (dec dt) g)))
(defn pp [t f]
(let [y 0 [x u v a d] f r1 (rr x y u v (- (min t d) a) 0)]
(if (< t a)
'()
(if (< t d)
(list r1)
(for [m '(-20 0 20) n '(-10 0 10)]
(let [[x y u v] r1]
(rr x y (+ u m) (+ v n) (- t d) -2)))))))
(defn at [x y t]
(filter #(and (= x (quot (first %) 10)) (= y (quot (second %) 10))) (apply concat (map #(pp t %) fw))))
(defn g [h]
(if (empty? h) \space
(if (next h) \X
(let [[x y u v] (first h)]
(cond
(or (zero? v) (> (* (/ u v) (/ u v)) 4)) \-
(or (zero? u) (> (* (/ v u) (/ v u)) 4)) \|
(= (neg? u) (neg? v)) \/
:else \\
)))))
(defn q [t]
(doseq [r (range 23 -1 -1)]
(doseq [c (range 0 80)]
(print (g (at c r t))))
(println)))
(q 93)