How should I pack h264 to PES

How should I pack h264 to PES - h.264

Please tell me the best case to packing SPS PPS into PES package.
I have raw h264 stream like:
SPS PPS IDR P P ... P SPS PPS IDR P...
How should I packing it into PES? I packing like:
[PES with PTS] SPS PPS IDR [PES with PTS] P [PES with PTS] ...
or I should
[PES with PTS] SPS [PES with PTS] PPS [PES with PTS] ... ?
Thanks

One PES packet per access Unit. Access Unit delimiters are required for TS.

Related

arm cortex-m33 (trustzone, silabs efm32pg22) - assembler hardfaults accessing GPIO or almost any peripherals areas, any hint?

I am just lost here with this code trying to configure on baremetal the silicon labs efm32pg22 in theirs devkit accessed through internal J-Link from segger studio (great fast ide) - I have such example blink hello world in C working from theirs simplicity studio, but was trying to achieve the same thing I did on microchip pic32 mc00 or samd21g17d easily in pure assembler, having only clocks and startup configured through gui in mplab x... well, here I tried to go to segger IDE where is NO startup/clocks config easy way, or I didnt found it yet. On hardware level, registers of such cortex beasts are different by manufacturer, in C/C++ there is some not cheap unification over cmsis - but I want only to know what minimal is needed to just have working raw GPIO after clock/startup ... Segger project is generic cortex-m for specific efm32pg22 so cortex-M33 with trust-zone security - I probably dont know what all is locked or switched off or in which state MCU is, if privileged or nonprivileged - there are 2 sets of registers mapping, but nothing works. As far as I try to "store" or even "load" on GPIO config registers (or SMU regs to query someting too) it is throw hardfault exception. All using segger ide debugger over onboard j-link. Kindly please, what I am doing wrong, whats missing here?
in C, I have only this code:
extern void blink(void);
int main ( void )
{
blink();
}
In blink.s I have this:
;#https://github.com/hubmartin/ARM-cortex-M-bare-metal-assembler-examples/blob/master/02%20-%20Bare%20metal%20blinking%20LED/main.S
;#https://sites.google.com/site/hubmartin/arm/arm-cortex-bare-metal-assembly/02---arm-cortex-bare-metal-assembly-blinking-led
;#https://mecrisp-stellaris-folkdoc.sourceforge.io/projects/blink-f0disco-gdbtui/doc/readme.html
;#https://microcontrollerslab.com/use-gpio-pins-tm4c123g-tiva-launchpad/
;#!!! ENABLE GPIO CLOCK SOURCE ON EFM32 !!!
;#https://community.silabs.com/s/share/a5U1M000000knsWUAQ/hello-world-part-2-create-firmware-to-blink-the-led?language=en_US
;#EFM32 GPIO
;#https://www.silabs.com/documents/public/application-notes/an0012-efm32-gpio.pdf
;# ARM thumb2 ISA
;#https://www.engr.scu.edu/~dlewis/book3/docs/ARM_and_Thumb-2_Instruction_Set.pdf
;#https://sciencezero.4hv.org/index.php?title=ARM:_Cortex-M3_Thumb-2_instruction_set
;#!!! https://stackoverflow.com/questions/48561243/gnu-arm-assembler-changes-orr-into-movw
;#segger assembler
;#https://studio.segger.com/segger/UM20006_Assembler.pdf
;#https://www.segger.com/doc/UM20006_Assembler.html
;#!!! unfortunatelly, we dont know here yet how to include ASM SFR defines, nor for MPLAB ARM (Harmony) !!!
;##include <xc.h>
;##include "definitions.h"
.cpu cortex-m33
.thumb
.text
.section .text.startup.main,"ax",%progbits
.balign 2
.p2align 2,,3
.global blink
//.arch armv8-m.base
.arch armv6-m
.syntax unified
.code 16
.thumb_func
.fpu softvfp
.type blink, %function
//!!! here we have manually entered GPIO PORT defines for PIC32CM
.equ SYSCFG_BASE_ADDRESS, 0x50078000
.equ SMU_BASE_ADDRESS, 0x54008000
//.equ SMU_BASE_ADDRESS, 0x5400C000
.equ CMU_BASE_ADDRESS, 0x50008000
.equ GPIO_BASE_ADDRESS, 0x5003C000 // this differs totally from both "special" infineon and microchip "standard?" cortex devices !!!
.equ DELAY, 40000
// Vector table
.word 0x20001000 // Vector #0 - Stack pointer init value (0x20000000 is RAM address and 0x1000 is 4kB size, stack grows "downwards")
.word blink // Vector #1 - Reset vector - where the code begins
// Vector #3..#n - I don't use Systick and another interrupts right now
// so it is not necessary to define them and code can start here
blink:
LDR r0, =(SYSCFG_BASE_ADDRESS + 0x200) // SYSCFG SYSCFG_CTRL
LDR r1, =0 // 0 diable address faults exceptions
ldr r1, [r0] // Store R0 value to r1
LDR r0, =(CMU_BASE_ADDRESS) // CMU CMU_SYSCLKCTRL PCLKPRESC + CLKSEL
LDR r1, =0b10000000001 // FSRCO 20MHz + PCLK = HCLK/2 = 10MHz
STR r1, [r0, 0x70] // Store R0 value to r1
LDR r0, =(CMU_BASE_ADDRESS) // CMU CMU_CLKEN0
LDR r1, [r0, 0x64]
LDR r2, =(1 << 25) // GPIO CLK EN
orrs r1, r2 // !!! HORROR !!! -- orr is not possible in thumb2 ?? only orrs !! (width suffix)
STR r1, [r0, 0x64] // Store R0 value to r1
LDR r1, [r0, 0x68]
LDR r2, =(1 << 14) // SMU CLK EN
orrs r1, r2 // !!! HORROR !!! -- orr is not possible in thumb2 ?? only orrs !! (width suffix)
STR r1, [r0, 0x68] // Store R0 value to r1
//LDR r0, =(SMU_BASE_ADDRESS) // SMU SMU_LOCK
//LDR r1, =11325013 // SMU UNLOCK CODE
//STR r1, [r0, 0x08] //Store R0 value to r1
ldr r0, =(SMU_BASE_ADDRESS) // SMU reading values, detection - AGAIN, HARD FAULTS !!!!!!!
ldr r1, [r0, 0x04]
ldr r1, [r0, 0x20]
ldr r1, [r0, 0x40]
//LDR r0, =(GPIO_BASE_ADDRESS + 0x300) // GPIO UNLOCK
//LDR r1, =0xA534
//STR r1, [r0] // Store R0 value to r1
//!! THIS BELOW IS OLD FOR SAMD , WE STILL SIMPLY CANT ENABLE GPIO !!!!
// Enable PORTA pin 4 as output
LDR r0, =(GPIO_BASE_ADDRESS) // DIR PORTA
LDR r1, =0b00000000000001000000000000000000
STR r1, [r0, 0x04] // Store R0 value to r1
LDR R2, =1
loop:
// Write high to pin PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b10000 // PORT_PA04
STR r1, [r0, 0x10] // Store R1 value to address pointed by R0
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop0:
ADD R0, R2
cmp R0, R1
bne loop0
// Write low to PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b00000
STR r1, [r0, 0x10] // Store R1 value to address pointed by R0
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop1:
ADD R0, R2
cmp R0, R1
bne loop1
b loop
UPDATE: well, now I tried it again in SimplicityStudio, placing blink() call after pregenerated system init:
extern void blink(void);
int main(void)
{
// Initialize Silicon Labs device, system, service(s) and protocol stack(s).
// Note that if the kernel is present, processing task(s) will be created by
// this call.
sl_system_init();
blink();
}
having this code in blink.s: - and here it works this way and blinks ...
.cpu cortex-m33
.thumb
.text
.section .text.startup.main,"ax",%progbits
.balign 2
.p2align 2,,3
.global blink
//.arch armv8-m.base
.arch armv6-m
.syntax unified
.code 16
.thumb_func
.fpu softvfp
.type blink, %function
/*
//!!! here we have manually entered GPIO PORT defines for PIC32CM
.equ SYSCFG_BASE_ADDRESS, 0x50078000
.equ SMU_BASE_ADDRESS, 0x54008000
//.equ SMU_BASE_ADDRESS, 0x5400C000
.equ CMU_BASE_ADDRESS, 0x50008000
*/
.equ GPIO_BASE_ADDRESS, 0x5003C000 // this differs totally from both "special" infineon and microchip "standard?" cortex devices !!!
.equ DELAY, 400000
// Vector table
.word 0x20001000 // Vector #0 - Stack pointer init value (0x20000000 is RAM address and 0x1000 is 4kB size, stack grows "downwards")
.word blink // Vector #1 - Reset vector - where the code begins
// Vector #3..#n - I don't use Systick and another interrupts right now
// so it is not necessary to define them and code can start here
blink:
// Enable PORTA pin 4 as output
LDR r0, =(GPIO_BASE_ADDRESS) // DIR PORTA
LDR r1, =0b00000000000001000000000000000000
STR r1, [r0, 0x04]
loop:
// Write high to pin PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b10000 // PORT_PA04
STR r1, [r0, 0x10]
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop0:
ADD R0, R2
cmp R0, R1
bne loop0
// Write low to PA04
LDR r0, =GPIO_BASE_ADDRESS // OUT PORTA
LDR r1, =0b00000
STR r1, [r0, 0x10]
// Dummy counter to slow down my loop
LDR R0, =0
LDR R1, =DELAY
loop1:
ADD R0, R2
cmp R0, R1
bne loop1
b loop
... so NOW, I am just curious, what all is missing in pure assembly code to bring that cortex-m33 into some "easy" state, just ignoring trustzone, probably to use it similary as say, plain cortex-m3 ??
can anybody help? I am digging deeply into this datasheet/ref manual, but no luck till now ...
https://www.silabs.com/documents/public/reference-manuals/efm32pg22-rm.pdf
UPDATE AGAIN: umm, will try to figure out ... by traversing system_init C-code its clear whats going on, there are also some chip errata workarounds, but I never touched DCDC while initializing, this may be culprit...
void sl_platform_init(void)
{
CHIP_Init();
sl_device_init_nvic();
sl_board_preinit();
sl_device_init_dcdc();
sl_device_init_hfxo();
sl_device_init_lfxo();
sl_device_init_clocks();
sl_device_init_emu();
sl_board_init();
}

well, okay, manufacturer specific code generation for MCU startup IS really important and useful thing )) ... such MCUs from different manufacturers are really much different at registers level (even that all are "cortex-m" core based), that its worthless to try to configure them manually in assembly if there is enough flash available, and it mostly IS. So, till now, no luck with segger/keil/iar "generic" arm/cortex IDEs to do this properly on specific parts, so using manufacturer specific IDE to (mostly) graphically configure startup clocks and peripherals IS CRUCIAL, or at least, its really easiest way (I know, quite expensive observation after all the assembly tries... )). After then, its easy to make even pure assembly "blink" helloworld test called as extern C-function. You may be asking why I am still considering assembly if there are even CMSIS (on arm) "platform abstraction layer" C-headers at least (no, it doesnt help in abstraction, as the devices are still very different, you only have registers symbols #defines and typedefs and enums to do something in C easily, okay). But I am trying to compare some C-compiled code with handwriten assembly for some specific purpose, which needs forced optimized algorithm from scratch and its often quite easier to think/design it directly in assembly that to rely on very complexly described C-compiler optimisations (each compiler has its own LONG document how his optimisations work and at this level, C is simply still too abstract and moving target, the more, you try to write something for even different MCU architectures (think ARM cortex-m, PIC32/mips, and/or even PIC16/18 + PIC24, AVR , MSP430 ...) - while general algorithm may be described in shared pseudoassenbly to be as near to hardware as possible, withnout knowing all optimization quirks of each architecture C compiler(s) - there are often MORE different C compilers too. So, to compare C-compiler generated code with handwriten assembly you can do it, and I already tried such assembly blink on MANY VERY different architectures, in case I definitelly used mfg specific IDE to genearte startup in C, using all the GUI configurations and code generation down to always compilable empty C project, of course, having very different code size output using such generated startups. Most advanced MCUs are really very complex, mostly in clocks configuration and pins functions config and then different peripheral devices too, sure. Some similarities are possible only at single mfg level, to some extent, so MCU of single manufacturer often share similar approach, obviously. So final solution is to have startup generated and then switch to assembly immediatelly, this is feasible. Sure that in case of small flash, its further possible to optimize even startup code, but its mostly important on smallest 8bit parts, where startup IS quite easy anyway or the generated code is also small, obviously.

Why linking and calling a C function with variadic arguments in Rust causes random functions to be called? [duplicate]

This question already has an answer here:
Why does my C strlen() in Rust also count string slice inside print! macro after `s` variable?
(1 answer)
Closed last year.
I was playing with Rust FFI and I was trying to link printf C function, which has variadic arguments, with my Rust executable. After I ran the executable I witnessed some strange behavior.
This is my Rust code:
use cty::{c_char, c_int};
extern "C" {
fn printf(format: *const c_char, ...) -> c_int;
}
fn main() {
unsafe {
printf("One number: %d\n".as_ptr() as *const i8, 69);
printf("Two numbers: %d %d\n".as_ptr() as *const i8, 11, 12);
printf(
"Three numbers: %d %d %d\n".as_ptr() as *const i8,
30,
31,
32,
);
}
}
This is the output after running cargo run:
cargo run
Compiling c-bindings v0.1.0 (/home/costin/Rust/c-bindings)
Finished dev [unoptimized + debuginfo] target(s) in 0.20s
Running `target/debug/c-bindings`
One number: 1
Two numbers: 1 11376
Three numbers: 0 0 0
Two numbers: 11 12
Three numbers: 0 0 56
Three numbers: 30 31 32
I looks like the first printf calls the second and the third with random parameters, then the second printf calls the third one with random parameters because the expected output should have been:
One number: 1
Two numbers: 11 12
Three numbers: 30 31 32
Can anyone explain to me why is this strange behavior happening?

Rust strings are not null-terminated like printf expects. You'll need to either manually include \0 at the end of your format strings or use CString:
printf("One number: %d\n\0".as_ptr() as *const i8, 69);
use std::ffi::CString;
let s = CString::new("One number: %d\n").unwrap();
printf(s.as_ptr(), 69);

modular arithmetic on the gpu

I am working on the GPU algorithm which is supposed to do a lot of modular computations. Particularly, various operations on matrices in a finite field which in the long run
reduce to primitive operations like: (a*b - c*d) mod m or (a*b + c) mod m where a,b,c and d are residues modulo m and m is a 32-bit prime.
Through experimentation I learned that the performance of the algorithm is mostly limited by slow modular arithmetic because integer modulo (%) and division operations are not supported on the GPU in hardware.
I appreciate if somebody can give me an idea how to realize efficient modular computations with CUDA ?
To see how this is implemented on CUDA, I use the following code snippet:
__global__ void mod_kernel(unsigned *gout, const unsigned *gin) {
unsigned tid = threadIdx.x;
unsigned a = gin[tid], b = gin[tid * 2], m = gin[tid * 3];
typedef unsigned long long u64;
__syncthreads();
unsigned r = (unsigned)(((u64)a * (u64)b) % m);
__syncthreads();
gout[tid] = r;
}
This code is not supposed to work, I just wanted to see how modular reduction is
implemented on CUDA.
When I disassemble this with cuobjdump --dump-sass (thanks njuffa for advice!), I see the following:
/*0098*/ /*0xffffdc0450ee0000*/ BAR.RED.POPC RZ, RZ;
/*00a0*/ /*0x1c315c4350000000*/ IMUL.U32.U32.HI R5, R3, R7;
/*00a8*/ /*0x1c311c0350000000*/ IMUL.U32.U32 R4, R3, R7;
/*00b0*/ /*0xfc01dde428000000*/ MOV R7, RZ;
/*00b8*/ /*0xe001000750000000*/ CAL 0xf8;
/*00c0*/ /*0x00000007d0000000*/ BPT.DRAIN 0x0;
/*00c8*/ /*0xffffdc0450ee0000*/ BAR.RED.POPC RZ, RZ;
Note that between the two calls to bar.red.popc there is a call to 0xf8 procedure which implements some sophisticated algorithm (about 50 instructions or even more). Not surpising that mod (%) operation is slow

Some time ago I experimented a lot with modular arithmetic on the GPU. On Fermi GPUs you can use double-precision arithmetic to avoid expensive div and mod operations. For example, modular multiplication can be done as follows:
// fast truncation of double-precision to integers
#define CUMP_D2I_TRUNC (double)(3ll << 51)
// computes r = a + b subop c unsigned using extended precision
#define VADDx(r, a, b, c, subop) \
asm volatile("vadd.u32.u32.u32." subop " %0, %1, %2, %3;" : \
"=r"(r) : "r"(a) , "r"(b), "r"(c));
// computes a * b mod m; invk = (double)(1<<30) / m
__device__ __forceinline__
unsigned mul_m(unsigned a, unsigned b, volatile unsigned m,
volatile double invk) {
unsigned hi = __umulhi(a*2, b*2); // 3 flops
// 2 double instructions
double rf = __uint2double_rn(hi) * invk + CUMP_D2I_TRUNC;
unsigned r = (unsigned)__double2loint(rf);
r = a * b - r * m; // 2 flops
// can also be replaced by: VADDx(r, r, m, r, "min") // == umin(r, r + m);
if((int)r < 0)
r += m;
return r;
}
However this only works for 31-bit integer modulos (if 1 bit is not critical for you)
and you also need to precompute 'invk' beforehand. This gives absolute minimum of instructions I can achieve, ie.:
SHL.W R2, R4, 0x1;
SHL.W R8, R6, 0x1;
IMUL.U32.U32 R4, R4, R6;
IMUL.U32.U32.HI R8, R2, R8;
I2F.F64.U32 R8, R8;
DFMA R2, R2, R8, R10;
IMAD.U32.U32 R4, -R12, R2, R4;
ISETP.GE.AND P0, pt, R4, RZ, pt;
#!P0 IADD R4, R12, R4;
For description of the algorithm, you can have a look at my paper:
gpu_resultants. Other operations like (xy - zw) mod m are also explained there.
Out of curiosity, I compared the performance of the resultant algorithm
using your modular multiplication:
unsigned r = (unsigned)(((u64)a * (u64)b) % m);
against the optimized version with mul_m.
Modular arithmetic with default % operation:
low_deg: 11; high_deg: 2481; bits: 10227
nmods: 330; n_real_pts: 2482; npts: 2495
res time: 5755.357910 ms; mod_inv time: 0.907008 ms; interp time: 856.015015 ms; CRA time: 44.065857 ms
GPU time elapsed: 6659.405273 ms;
Modular arithmetic with mul_m:
low_deg: 11; high_deg: 2481; bits: 10227
nmods: 330; n_real_pts: 2482; npts: 2495
res time: 1100.124756 ms; mod_inv time: 0.192608 ms; interp time: 220.615143 ms; CRA time: 10.376352 ms
GPU time elapsed: 1334.742310 ms;
So on the average it is about 5x faster. Note also that, you might not see a speed-up if you just evaluate raw arithmetic performance using a kernel with a bunch of mul_mod operations (like saxpy example). But in real applications with control logic, synchronization barriers etc. the speed-up is very noticeable.

A high-end Fermi GPU (e.g. a GTX 580) will likely give you the best performance among shipping cards for this. You would want all 32-bit operands to be of type "unsigned int" for best performance, as there is some additional overhead for the handling of signed divisions and modulos.
The compiler generates very efficient code for division and modulo with fixed divisor As I recall it is usually around three to five machine instructions instructions on Fermi and Kepler. You can check the generated SASS (machine code) with cuobjdump --dump-sass. You might be able to use templated functions with constant divisors if you only use a few different divisors.
You should see on the order of sixteen inlined SASS instructions being generated for the unsigned 32-bit operations with variable divisor, across Fermi and Kepler. The code is limited by the throughput of integer multiplies and for Fermi-class GPUs is competitive with hardware solutions. Somewhat reduced performance is seen on currently shipping Kepler-class GPUs due to their reduced integer multiply throughput.
[Added later, after clarification of the question:]
Unsigned 64-bit division and modulo with variable divisor on the other hand are called subroutines of about 65 instructions on Fermi and Kepler. They look close to optimal. On Fermi, this is still reasonably competitive with hardware implementations (note that 64-bit integer divisions are not exactly super fast on CPUs that provide this as a built-in instruction). Below is some code that I posted to the NVIDIA forums some time back for the kind of task described in the clarification. It avoids the expensive division, but does assume that fairly large batches of operands are sharing the same divisior. It uses double-precision arithmetic, which is especially fast on Tesla-class GPUs (as opposed to consumer cards). I only did a cursory test of the code, you might want to test this more carefully before deploying it.
// Let b, p, and A[i] be integers < 2^51
// Let N be a integer on the order of 10000
// for i from 1 to N
// A[i] <-- A[i] * b mod p
/*---- kernel arguments ----*/
unsigned long long *A;
double b, p; /* convert from unsigned long long to double before passing to kernel */
double oop; /* pass precomputed 1.0/p to kernel */
/*---- code inside kernel -----*/
double a, q, h, l, rem;
const double int_cvt_magic = 6755399441055744.0; /* 2^52+2^51 */
a = (double)A[i];
/* approximate quotient and round it to the nearest integer */
q = __fma_rn (a * b, oop, int_cvt_magic);
q = q - int_cvt_magic;
/* back-multiply, representing p*q as a double-double h:l exactly */
h = p * q;
l = __fma_rn (p, q, -h);
/* remainder is double-width product a*b minus double-double h:l */
rem = __fma_rn (a, b, -h);
rem = rem - l;
/* remainder may be negative as quotient rounded; fix if necessary */
if (rem < 0.0) rem += p;
A[i] = (unsigned long long)rem;

There are tricks to efficiently perform mod operations but if only m is radix 2.
For instance, x mod y == x & (y-1), where y is 2^n. Performing bitwise operation is the fastest.
Otherwise, probably a look-up table?
Below is a link on discussion of efficient modulo implementation. You might need to implement it yourself to get the most out of it.
Efficient computation of mod

CUDA: Erroneous lmem statistics displayed for sm_20?

A CUDA kernel compiled with the option --ptxas-options=-v seems to be displaying erroneous lmem (local memory) statistics when sm_20 GPU architecture is specified. The same gives meaningful lmem statistics with sm_10 / sm_11 / sm_12 / sm_13 architectures.
Can someone clarify if the sm_20 lmem statistics need to be read differently or they are plain wrong?
Here is the kernel:
__global__ void fooKernel( int* dResult )
{
const int num = 1000;
int val[num];
for ( int i = 0; i < num; ++i )
val[i] = i * i;
int result = 0;
for ( int i = 0; i < num; ++i )
result += val[i];
*dResult = result;
return;
}
--ptxas-options=-v and sm_20 report:
1>ptxas info : Compiling entry function '_Z9fooKernelPi' for 'sm_20'
1>ptxas info : Used 5 registers, 4+0 bytes lmem, 36 bytes cmem[0]
--ptxas-options=-v and sm_10 / sm_11 / sm_12 / sm_13 report:
1>ptxas info : Compiling entry function '_Z9fooKernelPi' for 'sm_10'
1>ptxas info : Used 3 registers, 4000+0 bytes lmem, 4+16 bytes smem, 4 bytes cmem[1]
sm_20 reports a lmem of 4 bytes, which is simply not possible if you see the 4x1000 byte array being used in the kernel. The older GPU architectures report the correct 4000 byte lmem statistic.
This was tried with CUDA 3.2. I have referred to the Printing Code Generation Statistics section of the NVCC manual (v3.2), but it does not help explain this anomaly.

The compiler is correct. Through clever optimization the array doesn't need to be stored. What you do is essentially calculating result += i * i without ever storing temporaries to val.
A look at the generated ptx code won't show any differences for sm_10 vs. sm_20. Decompiling the generated cubins with decuda will reveal the optimization.
BTW: Try to avoid local memory! It is as slow as global memory.

Code Golf: Decision Tree

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
In Google Code Jam 2009, Round 1B, there is a problem called Decision Tree that lent itself to rather creative solutions.
Post your shortest solution; I'll update the Accepted Answer to the current shortest entry on a semi-frequent basis, assuming you didn't just create a new language just to solve this problem. :-P
Current rankings:
107 Perl
121 PostScript (binary)
132 Ruby
154 Arc
160 PostScript (ASCII85)
170 PostScript
192 Python
196 JavaScript
199 Common Lisp
212 LilyPond
273 Scheme
280 R
281 sed w/ bc
312 Haskell
314 PHP
339 m4 w/ bc
346 C
381 Fortran
462 Java
718 OCaml
759 F#
1554 sed
C++ not qualified for now

sed in 1554 chars (pure) / 281 (with bc)
Yes, seriously.
Usage: sed -r -f thisfile.sed < input.in > output.out
(works on GNU sed)
1d
/ /!{x
s/^$/Case #Y:/
:i
s/9Y/Y0/
ti
s/#Y/#0Y/
s/:/:0123456789/
s/(.)Y(.*):[0-9]*\1(.).*/\3\2Y:/
x
G
s/.*\n|Y//gp
z
:p
N
/[()]/s/ |\n//g
y/()/JK/
tp
H
d}
G
s/\n[^J]*/ %/
s/[^JK]*$//
:c
s/J1?([.-9]+)(.*)K/\2#\1/
/%#/by
:b
/J/s/T//
s/J([^JK]*)K/TC\1B/
tb
/ (.+) .*%\1C/{s/%[^C]*/%/
s/T.*B//
by}
s/%.*T/%/
:y
y/CB/JK/
tc
s/.\.0*\b//g
:r
/#.*#/{s/\w*#\w*$/C&B/
s/C(\w)(.*B)/\1C\2~/
s/"[^"]*/&0/g
:t
s/(\w)(C.*)(\w)B(.*~)/\1\2B\3\4\1\3/
T
s/~(10|2[01]|3[0-2]|4[0-3]|5[0-4]|6[0-5]|7[0-6]|8[0-7]|9.)/&Q/
s/(.)(.)Q/\2\1/
s/~0\w/`00/
s/~1\B/`0/
s/~22/`04/
s/~23/`06/
s/~24/`08/
s/~33/`09/
s/~25/`10/
s/~26|~34/`12/
s/~27/`14/
s/~28|~44/`16/
s/~29|~36/`18/
s/~35/`15/
s/~45/`20/
s/~37/`21/
s/~38|~46/`24/
s/~55/`25/
s/~39/`27/
s/~47/`28/
s/~56/`30/
s/~48/`32/
s/~57/`35/
s/~49|~66/`36/
s/~58/`40/
s/~67/`42/
s/~59/`45/
s/~68/`48/
s/~77/`49/
s/~69/`54/
s/~78/`56/
s/~79/`63/
s/~88/`64/
s/~89/`72/
s/~99/`81/
s/`(.)(.)/~\1'\2/
bt
:
s/(~.)'/\1/
s/..'/K&/
/K/bk
:v
s/=(,?.)'/\1/
s/,/1'/
t
s/B(.*)~/\1B"/
tr
s/"(\w*)0/A\1/g
/A.*A/{s/A[^A]*$/J&K/
:k
s/([^A])(J.*)([^A])K/\2K\1\3/
s/K(10|2[01]|3[0-2]|4[0-3]|5[0-4]|6[0-5]|7[0-6]|8[^9]|9.)/&Q/
s/(.)(.)Q/\2\1/
s/K0/=/
s/K11/=2/
s/K12/=3/
s/K13|K22/=4/
s/K14|K23/=5/
s/K15|K24|K33/=6/
s/K16|K25|K34/=7/
s/K(17|26|35|44)/=8/
s/K(18|27|36|45)/=9/
s/K(19|28|37|46|55)/W0/
s/K(29|38|47|56)/W1/
s/K(39|48|57|66)/W2/
s/K49|K58|K67/W3/
s/K59|K68|K77/W4/
s/K69|K78/W5/
s/K79|K88/W6/
s/K89/W7/
s/K99/W8/
s/W/=,/
/'/bv
s/\b=/K:/
tk
s/[:JK]A?//g
s/,/,0123456789GF/
s/(.),.*\1(.).*F/\2/
s/G/,0/
tk}
/A.*A/bv}
s/\w*C.*A//
tr
s/.*#/./
This solution omits the leading zero in front of the decimal point, and does not handle cases where the answer is 1.00. Luckily, the GCJ judge accepts the lack of a zero, and does not have any cases where the answer is 1.00.
To include the leading zero, change the last line to s/.*#/0./; and to handle a 1.00 case, append the line s/^$/1/.
Here's a solution that outsources the multiplication to bc:
1d
/ /!{x
s/\n.*//
s/.*/echo 0&+1|bc/e
x
g
s/.*/Case #&:/p
:p
N
/[()]/s/ |\n//g
y/()/JK/
tp
H
d}
G
s/\n[^J]*/ %/
s/[^JK]*$//
:c
s/J([.-9]+)(.*)K/\2*\1/
/%\*/s/.*%.(.*)/echo \1|bc -l/e
:b
/J/s/T//
s/J([^JK]*)K/TC\1B/
tb
/ (.+) .*%\1C/{s/%[^C]*/%/
s/T.*B//
b}
s/%.*T/%/
:
y/CB/JK/
tc

Perl in 107 characters
say("Case #$_:"),
$_=eval"''".'.<>'x<>,
s:[a-z]+:*(/ $&\\s/?:g,s/\)\s*\(/):/g,
eval"\$_=<>;say$_;"x<>for 1..<>
Newlines for legibility; none of them is necessary or counted in.
It uses features found only in the latest versions of Perl, so run with perl -M5.010 or later.
I used to be a Perl noob too, so this works almost the same as the ruby one. Original version 126 chars, optimizations by peutri.
Backlinks:
Word Aligned - Power Programming

LilyPond: 212 characters
Craziness! Utter ridiculousness!! LilyPond, with its built-in Scheme interpreter, manages to outdo Scheme by more than FIFTY BYTES! Holy acrobatic flying mooses in tights!!
x=#lambda
w=#read
#(letrec((v(x(a)(map a(iota(w)1))))(c(x(f q)(*(car q)(if(any list? q)(c
f((if(memq(cadr q)f)caddr cadddr)q))1)))))(v(x(i)(w)(set! #(w))(format
#t"Case #~a:
~{~y~}"i(v(x i(w)(c(v(x i(w)))#)))))))
Usage: lilypond thisfile.ly <input.in >output.out 2>/dev/null
Credit goes to cky for writing the Scheme solution this was based on, though this version is now substantially different. Seriously, though, the Scheme could be golfed a bit further...

PostScript: 170 (regular) / 160 (ASCII85) / 121 (binary)
My shortest (regular) PostScript solution so far, provided that you rename the input file to "r" (170 characters, including newlines); uses a GhostScript-specific procedure (=only):
1[/:{repeat}/!{exch token{\ exch known{/<>}if]pop]]3 index mul
!}if}(]){token pop}/?(r)(r)file([){?]}>>begin
1[{(Case #)2{=only}:(:)=[/|[def[{[/\<<[{[/}:>>def |]! =}:}for
Usage: cp input.in r; gs -q -dNOPROMPT -dNODISPLAY -dBATCH thisfile.ps > output.out
Here's a binary version of this in 121 bytes (backslashes and unprintable characters escaped):
1[/!{\x92>\x92\xab{\\\x92>\x92`\x92p{]\x92u}if]]3\x92X\x92l!}if}(]){\x92\xab\x92u}/r(r)\x928\x92A([){r]}>>\x92\r1[{(Case #)\x92v=only[/:\x928[\x923=[{[/\\<<[{[/}\x92\x83>>\x923:]! =}\x92\x83}\x92H
If characters outside the ASCII printable range are disallowed, PS has built-in ASCII85 encoding of binary sources. We therefore have the following 160-byte solution in all ASCII printable characters:
1[([){r]}/r(r)<~OuSUj0-P\*5*Dsn>`q:6#$5JU?'9>YBkCXV1Qkk'Ca"4#Apl(5.=75YP')1:5*?#0>C.bc#<6!&,:Se!4`>4SH!;p_OuQ[/1Herh>;'5D4Bm/:07B"95!G,c3aEmO4aiKGI?I,~>cvx exec

Ruby in 132
Improved by leonid. Newlines are essential.
def j
'1
'..gets
end
j.map{|c|s=j.map{gets}*''
puts"Case #%d:"%c,j.map{gets;eval s.gsub(/[a-z]+/,'*(/ \&\b/?').gsub /\)\s*\(/,'):'}}
Ruby in 136
def j;1..gets.to_i;end;j.map{|c|m=j.map{gets}*"";puts"Case ##{c}:";j.map{gets;p eval m.gsub(/[a-z]+/,'*(/ \0\s/?').gsub /\)\s*\(/,'):'}}
I just learned about *"" being equivalent to .join"". Also realised that map could be used in a few places
Ruby in 150
1.upto(gets.to_i){|c|m=eval("gets+"*gets.to_i+"''");puts"Case ##{c}:";1.upto(gets.to_i){gets;p eval m.gsub(/[a-z]+/,'*(/ \0\s/?').gsub /\)\s*\(/,'):'}}
I am just a noob to ruby, so there is probably still a lot of room for improvement

Python in 192
import re;S=re.sub;R=raw_input;I=input;c=0;exec r"c+=1;L=S('\) *\(',')or ',S('([a-z]+)','*(\' \\1 \'in a and',eval(('+R()'*I('Case #%s:\n'%c))[1:])));exec'a=R()+\' \';print eval(L);'*I();"*I()

Common Lisp, 199 bytes
Wrapped every 80 characters:
(defun r()(read))(dotimes(i(r))(format t"~&Case #~D:"(1+ i))(r)(set'z(r))(dotime
s(a(r))(r)(print(do((g(mapcar'read(make-list(r))))(p 1(*(pop c)p))(c z(if(find(p
op c)g)(car c)(cadr c))))((not c)p)))))
Spaced and indented:
(defun r () (read))
(dotimes (i (r))
(format t "~&Case #~D:" (1+ i))
(r)
(set 'z (r))
(dotimes (a (r))
(r)
(print
(do ((g (mapcar 'read (make-list (r))))
(p 1 (* (pop c) p))
(c z (if (find (pop c) g)
(car c)
(cadr c))))
((not c) p)))))

C - 346 bytes
Compile with gcc -w
#define N{int n=atoi(gets(A));for(;n--;)
T[999];F[99];char*t,*f,*a,A[99];float p(){float
d,m=1;for(;*t++^40;);sscanf(t,"%f %[^ (]",&d,A);if(*A^41){for(f=F;m**f;){for(;*f&&*f++^32;);for(a=A;*a&&*f==*a;f++,a++);m=*a||*f&64;}d*=!m*p()+m*p();}return
d;}main(I)N{printf("Case #%d:\n",I++);t=T;N
for(gets(t);*++t;);}N gets(F),t=T,printf("%f\n",p());}}}

Arc, 143 154 characters
Very similar to the CL one, but Arc sure has terse identifiers. Wrapped every 40 chars:
(for i 1((= r read))(prn"Case #"i":")(r)
(= z(r))(repeat(r)(r)(loop(= g(n-of(r)(r
))c z p 1)c(= p(*(pop c)p)c(if(pos(pop c
)g)c.0 cadr.c)))prn.p))
Indented:
(for i 1 ((= r read))
(prn "Case #" i ":")
(r)
(= z (r))
(repeat (r)
(r)
(loop (= g (n-of (r) (r))
c z
p 1)
c
(= p (* (pop c) p)
c (if (pos (pop c) g)
(c 0)
(cadr c))))
(prn p)))
Backlink: Word Aligned - Power Programming

JavaScript in 196 bytes
r='replace'
q=readline
for(n=0,t=q();t-n++;){for(print('Case #'+n+':'),d='',x=q();x--;d+=q());for(x=q();x--;)print(eval(d[r](/([a-z]+)/g,'*({'+q()[r](/ /g,':1,z')+':1}.z$1?')[r](/\) *\(/g,'):')))}
Usage: $ smjs thisfile.js <input.in
With contributions by Hyperlisk.

PHP in 314
<?php function q(){return trim(fgets(STDIN));}for($n=q($x=0);$x++<$n;){for($s=q($t='');$s--;$t.=q());echo"Case #$x:\n";for($z=q();$z--;){$l=explode(' ',q());$l[0]=0;printf("%f\n",eval('return'.preg_replace(array('/\(/','/(\w+),/','/(\d\)*),\((\d)/','/^./'),array(',(','*(in_array("$1",$l,1)?','$1:$2'),$t).';'));}}

FORTRAN - 381
Save as a.F95
Compile with f95 a.F95
#define _ ENDDO
#define A READ(t(k:l-1),*),a
#define Q j=1,n;READ"(A)",s
#define R READ*,n;DO
#define S k+SCAN(t(k:),'()')
CHARACTER(999)s,t,u;R i=1,n;t="";PRINT"('Case #'i0':')",i
R Q;t=TRIM(t)//s;_;R Q;d=1;k=1;DO;k=S;l=S-1
IF(t(l:l)>"(")EXIT;A,u;d=d*a;k=l;m=0
IF(INDEX(s," "//TRIM(u)//" ")>0)CYCLE;DO;IF(')'>t(k:k))m=m+2;m=m-1;k=k+1
IF(1>m)EXIT;k=S-1;_;_;A;d=d*a;PRINT*,d;_;_;END
By using the default format, each of the results starts with 2 spaces, but the google judge permits it. Thanks google judge!
EXPANDED VERSION
CHARACTER(999)s,t,u
READ*,n
DO i=1,n
t=""
PRINT"('Case #'I0':')",i
READ*,n
DO j=1,n
READ"(A)",s
t=TRIM(t)//s
ENDDO
READ*,n
DO j=1,n
READ"(A)",s
d=1
k=1
DO
k=k+SCAN(t(k:),'()')
l=k+SCAN(t(k:),'()')-1
IF(t(l:l)>"(")THEN
READ(t(k:l-1),*),a
d=d*a
PRINT*,d
EXIT
ELSE
READ(t(k:l-1),*),a,u
d=d*a
k=l
m=0
IF(INDEX(s," "//TRIM(u)//" ")>0)CYCLE
DO
IF(')'>t(k:k))m=m+2
m=m-1
k=k+1
IF(1>m)EXIT
k=k+SCAN(t(k:),'()')-1
ENDDO
ENDIF
ENDDO
ENDDO
ENDDO
END

Haskell, 312 characters
Here's another aproach to Haskell. I left the dirty work to the Prelude's lex. The wrapping around it is Text.ParserCombinators.ReadP. Importing it cost 36 characters on its own—ugh!
The parser is a Features -> SExp -> Cuteness function, which spares me most of the type declarations in quibble's/yairchu's solution.
import Text.ParserCombinators.ReadP
main=f(\t->do putStrLn$"Case #"++show t++":";s<-r g;r$print.fst.head.($id=<<s).readP_to_S.d.tail.words=<<g)
d x=do"("<-e;w<-e;c<-do{f<-e;y<-d x;n<-d x;u$if elem f x then y else n}<++u 1.0;e;u$c*read w
f x=do n<-g;mapM x[1..read n]
e=readS_to_P lex
r=f.const
g=getLine
u=return
It used to use Control.Monad's join, forM_ and replicateM, but it turns out it takes less space to redefine them approximately than to import.
I also abandoned the Prelude's readParen in favor of just calling lex before and after. In the current version, there is no need to verify the closing parenthesis: on a valid input it will always be there. On the other hand, it is vital to check the opening one: since the number is only converted after the whole subexpression has been read, a lot of backtracking would be needed to align to the correct parse.
On a theoretical machine with infinite memory and time to spare, the "("<- part might be dropped (4 characters' gain, 308 in total). Unless the call to read just aborts. On mine, the stack just overflows pretty fast.

Java in 467 bytes
This uses the javascript interpreter contained in java 6.
import java.util.*;class D{static{Scanner c=new
Scanner(System.in);int n=c.nextInt(),i=0,l;while(i++<n){l=c.nextInt();String
s="(";while(l-->=0)s+=c.nextLine();System.out.println("Case #"+i+":");l=c.nextInt();while(l-->0)try{c.next();System.out.println(new
javax.script.ScriptEngineManager().getEngineByName("js").eval(s.replace(")","))").replaceAll("\\) *\\(",":(").replaceAll("[a-z]+","*(/ $0 /.test('"+c.nextLine()+" ')?")));}catch(Exception
x){}}System.exit(0);}}
Thanks Varan, Chris and pfn (indirectly) for helping me shorten it.
Please see my other (even shorter!) java answer.

m4 with echo and bc, 339 bytes
This solution is a complete and utter hack, and it gives me a headache. It contains, among other things, escaped double quotes, unescaped double quotes, unescapable backquote and single quote pairs (including a nested pair seven quotes deep), unquoted regular expressions, outsourcing decimal multiplication to bc, and the use of craZy caSE to circumvent macro expansion. But it had to be done, I guess. :p
This adds an "ultimate macroizing" solution to the previous kinds of solutions (iterated loops, recursion w/ lambda mapping, labels and branches, regexp and eval, etc.)
I think a good term for this is "macroni code" :D
(wrapped every 60 characters, for clarity)
define(T,`translit($#)')define(Q,`patsubst($#)')define(I,0)Q
(T(T(T(Q(Q(Q(Q(Q(Q(T(include(A),(),<>),>\s*>,>>),>\s*<,>;),\
([a-z]+\)\s*<,`*ifElsE<rEgExp<P;``````` \1 ''''''';0>;0;<'),
^<,`defiNe<````I';iNcr<I>>\\"Case `#'I:\\"defiNe<`A'''';'),^
[0-9]*),.+ [0-9]+.*,`dEfiNE<```P';`\& '''>A'),<>;N,`(),n'),E
,e),()),.*,`syscmd(`echo "\&"|bc -l')')
Usage: $ cp input.in A; m4 thisfile.m4 > output.out
I'm an m4 n00b, though, having learned it only an hour before writing this. So there's probably room for improvement.

C++ in 698 bytes
Compile with 'g++ -o test source.cpp -include iostream -include vector -include sstream'
#define R(x,f,t) for(int x=f;x<t;x++){
#define S(x) x.size()
#define H string
#define U while
#define I if
#define D cin>>
#define X t.substr(p,S(t))
using namespace std;
int main(){int h,l,n,a,p,Y,W;D h;for(int q=1;q<=h;q++){D l;H s;char c;D c;R(i,0,l)H L;getline(cin,L);R(j,0,S(L))I (L[j]==41||L[j]==40)s+=32;s+=L[j];I(L[j]==40)s+=32;}}D a;printf("Case #%d:\n",q);R(i,0,a)H N;D N;D n;vector<H>f;R(j,0,n)D N;f.push_back(N);}H t=s;float P=1;p=0;U(p<S(t)-1){p=0;U(t[p]!=48&&t[p]!=49)p++;t=X;stringstream T(t);float V;T>>V;H F;T>>F;P*=V;I(F[0]==41)break;Y=0;R(j,0,S(f))if(F==f[j])Y=1;}p=t.find(40)+1;t=X;p=0;I(Y==0){W=1;U (W>0){I(t[p]==40)W++;I(t[p]==41)W--;p++;}t=X;p=0;}}cout<<P<<endl;}}return 0;}
EDIT: I'm sorry; I thought it was ok for the includes (eg, C works even w/o including basic libraries), while I'm sure it would be if I decleared the defines this way.
I'm not home now, and I won't be for some time: I won't be able to modify it. Just ignore my submission.

OCaml in 718 bytes
I'm an OCaml n00b, so this is probably much longer than it needs to be.
Usage: ocaml thisfile.ml <input.in >output.out
#load"str.cma";;open List;;open String;;open Str;;let x=length and
y=Printf.printf and e=global_replace and h=float_of_string and b=regexp and
k=index and r=read_line and a=read_int and w s m c=sub s(c+1)(m-c-1);;for i=1to
a()do y"Case #%d:\n"i;let t=let n=a()in let rec g d j=if j>n then d else
g(d^(r()))(j+1)in e(b" ")""(e(b"\\b")"^"(g""1))and n=a()in let rec z j=if j>n
then()else let q=tl(split(b" ")(r()))in let rec g l j s p=let o=k s '('and c=k
s ')'in if j then let f=w s c o in if contains f '('then let m=k s '^'in let
c=index_from s(m+1)'^'in g 0(mem(w s c m)q)(w s(x s)c)(h(w s m o)*.p)else h f*.p
else if o<c then g(l+1)j(w s(x s)o)p else g(l-1)(l=1)(w s(x s)c)p in y"%f\n"(g
0(0=0)t 1.);z(j+1)in z 1done

Scheme (Guile 1.8)
Here's my version at 278 bytes (with improvements from KirarinSnow to bring it down to 273), after stripping off all the newlines (except ones in string literals, of course). It only works on Guile 1.8 (since in standard Scheme, define is a syntax, not an object, but Guile represents it as an object anyway).
(define ! define)
(!(c f p w . r)(if(null? r)(* p w)(apply c f(* p w)((if(memq(car r)f)cadr caddr)r))))
(!(d . l)(map display l))
(!(r . x)(read))
(! n(r))
(do((i 1(1+ i)))((> i n))(r)(let((t(r)))(d"Case #"i":
")(do((a(r)(1- a)))((= a 0))(r)(d(apply c(map r(iota(r)))1 t)"
"))))

Pure java in 440 bytes
A shorter java solution that doesn't use any eval trick. Can be reduced to 425 by removing System.exit(0) if stderr output is ignored.
import java.util.*;enum A{_;Scanner c,d;float p(String a){return
d.nextFloat()*(d.hasNext("\\D+")?a.contains(' '+d.next()+' ')?p(a)+0*p(a):0*p(a)+p(a):1);}{c=new
Scanner(System.in);for(int n=c.nextInt(),i=0,l;i++<n;){String
s="";for(l=c.nextInt();l-->=0;)s+=c.nextLine();System.out.println("Case #"+i+":");for(l=c.nextInt();l-->0;){c.next();d=new
Scanner(s.replaceAll("[()]"," "));System.out.println(p(c.nextLine()+' '));}}System.exit(0);}}

Haskell, 514 bytes (I suck?).
Based on quibble's solution:
import Control.Monad
import Text.ParserCombinators.Parsec
data F=N|F String(Float,F)(Float,F)
r=return
f=many1 letter>>= \i->w>>d>>= \t->d>>=r.F i t
d=char '('>>w>>many1(oneOf".0123456789")>>= \g->w>>(f<|>r N)>>= \p->char ')'>>w>>r(read g,p)
w=many$oneOf" \n"
g=getLine
l=readLn
m=replicateM
main=l>>= \n->forM_[1..n]$ \t->putStrLn("Case #"++show t++":")>>l>>=(`m`g)>>=(\(Right q)->l>>=(`m`p q)).parse d"".join
z(p,f)=(p*).y f
y N _=1
y(F n t f)x=z(if n`elem`x then t else f)x
p q=fmap(drop 2.words)g>>=print.z q

C in 489 bytes
Code wrapped at 80 chars, there are actually just 3 lines.
Save in a.c and compile with: gcc -w a.c -o a
#define S int I,N;scanf("%d\n",&N);for(I=-1;++I<N;)
#define M 1000
char B[M],Z[M],Q[M]={' '},*F[M],*V;float W[M],H;int J,C,L[M],R[M];t(){V=strtok(0
," \n()");}p(){int U=C++;F[U]=0;if(!V)t();sscanf(V,"%f",W+U);t();if(V&&*V>='a')s
trcpy(Q+1,V),V=0,F[U]=strdup(strcat(Q," ")),L[U]=p(),R[U]=p();return U;}main(){S
{printf("Case #%d:\n",I+1);*B=0;{S strcat(B,gets(Z));}V=strtok(B," \n(");C=0,p()
;{S{strcat(gets(B)," ");for(J=0,H=W[0];F[J];J=strstr(B,F[J])?L[J]:R[J],H*=W[J]);
printf("%f\n",H);};}}}

F#: 759 significant chars (Wow, I'm bad at this ;) )
Minimized version
open System.Text.RegularExpressions
type t=T of float*(string*t*t)option
let rec e=function x,T(w,Some(s,a,b))->e(x,if Set.contains s x then a else b)*w|x,T(w,_)->w
let rec h x=Regex.Matches(x, #"\(|\)|\d\.\d+|\S+")|>Seq.cast<Match>|>Seq.map (fun x -> x.Value)|> Seq.toList
let rec p=function ")"::y->p y|"("::w::x::y->match x with ")"->T(float w,None),y|n->let a,f=p y in let b,g=p f in T(float w,Some(n,a,b)),g
let solve input =
Regex.Matches(input,#"(\(((?<s>\()|[^()]|(?<-s>\)))*\)(?(s)(?!)))\s+\d+\s+((\S+\s\d(.+)?\s*)+)")
|>Seq.cast<Match>
|>Seq.map(fun m->fst(p(h(m.Groups.[1].Value))), [for a in m.Groups.[3].Value.Trim().Split([|'\n'|])->set(a.Split([|' '|]))])
|>Seq.iteri(fun i (r,c)->printfn"Case #%i"(i+1);c|>Seq.iter(fun x->printfn"%.7F"(e(x, r))))
Readable version
open System.Text.RegularExpressions
type decisionTree = T of float * (string * decisionTree * decisionTree) option
let rec eval = function
| x, T(w, Some(s, a, b)) -> eval(x, if Set.contains s x then a else b) * w
| x, T(w, _) -> w
// creates a token stream
let rec tokenize tree =
Regex.Matches(tree, #"\(|\)|\d\.\d+|\S+")
|> Seq.cast<Match>
|> Seq.map (fun x -> x.Value)
|> Seq.toList
// converts token stream into a decisionTree
let rec parse = function
| ")"::xs -> parse xs
| "("::weight::x::xs ->
match x with
| ")" -> T(float weight, None), xs
| name ->
let t1, xs' = parse xs
let t2, xs'' = parse xs'
T(float weight, Some(name, t1, t2)), xs''
// uses regex to transform input file into a Seq<decisionTree, list<set<string>>, which each item in our
// list will be tested against the decisionTree
let solve input =
Regex.Matches(input, #"(\(((?<s>\()|[^()]|(?<-s>\)))*\)(?(s)(?!)))\s+\d+\s+((\S+\s\d(.+)?\s*)+)")
|> Seq.cast<Match>
|> Seq.map (fun m -> fst(parse(tokenize(m.Groups.[1].Value))), [for a in m.Groups.[3].Value.Trim().Split([|'\n'|]) -> set(a.Split([|' '|])) ])
|> Seq.iteri (fun i (tree, testCases) ->
printfn "Case #%i" (i+1)
testCases |> Seq.iter (fun testCase -> printfn "%.7F" (eval (testCase, tree)))
)

R in 280 bytes
Note: On the standard distribution of R (as of v. 2.9.2), this program does not pass the large input and fails on just Case 28 (which is nested to 99 levels), generating a "contextstack overflow". To fix this, modify the line in src/main/gram.c that reads
#define CONTEXTSTACK_SIZE 50
and replace the 50 with something like 500. Then recompile. Et voilà!
n=0
g=gsub
eval(parse(text=g('[^
]* [0-9]+( [^
]*|
)','f=c(\\1)
cat(eval(d),"
")
',g('
\\(','
cat("Case #",n<-n+1,":
",sep="")
d=expression(',g('" "','","',g(')\\s*\\(',',',g(' *("[a-z]+")\\s*\\(','*ifelse(\\1%in%f,',g('([a-z]+)','"\\1"',paste(readLines('A'),collapse='
')))))))))
Usage (requires renaming input): cp input.in A; R -q --slave -f thisfile.R >output.out

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How should I pack h264 to PES - h.264

One PES packet per access Unit. Access Unit delimiters are required for TS.

Related

arm cortex-m33 (trustzone, silabs efm32pg22) - assembler hardfaults accessing GPIO or almost any peripherals areas, any hint?

Why linking and calling a C function with variadic arguments in Rust causes random functions to be called? [duplicate]

modular arithmetic on the gpu

CUDA: Erroneous lmem statistics displayed for sm_20?

Code Golf: Decision Tree

Categories

Resources