cuda trace emulation -Need some expert insight

cuda trace emulation -Need some expert insight - cuda

I am working on a gpu trace emulation tool in windows as part of my research work in grad school . I am working on cuda runtime trace emulation to be specific.
I use simple DLL injection using MS Detours to enable interception of the cuda runtime APIs. I store the API calls and their parameters in a trace file. I get into some problems while trying to emulate the API from my trace file(I use the word playback to denote this action)
A typical trace file begins by making calls to __cudaRegisterFatBinary and __cudaRegisterFunction. This is followed by a call to cudaMalloc.
What I did?
1) I came across the famous GPUOcelot and I found the cubin structure that Nvidia is using right now. I am using that to save the address parameter of cudaRegisterFatBinary in intercept mode and I am using the pointer in the playback for _cudaRegisterFatBinary by repopulating the structure in the memory.
2)In _cudaRegisterFunction I am not sure what the parameters hostFunction,Device Function and Device Name refer to. I mean I don't understand how I could populate it while playing back from my trace file. I am just saving the pointer from the original execution and using it to imitate the call. But there is no way of knowing whether the function goes through fine since it does not have a return value.
3)cudaMalloc following these two entry point functions return cuda error code 11. It is cuda invalid value according to the Nvidia documentation. I have no idea why this should be the case. I am assuming that something is wrong with the previous two function calls. I also have a feeling that something is wrong with implicit primary context creation by the cuda runtime. Can someone give me some insights about cuda runtime execution and point me to what might I be missing?
I know its a ton of information without any useful code. I dont know which part of the code to post here. I will do it when people start taking interest in my question and ask me specific things about my project. Initially am just hoping that I am missing something big and high level that one of you can spot.
I greatly appreciate your time and interest!

Sounds very interesting overall. Your "Error:Cuda invalid value" is could be related to the params of _cudaRegisterFunction. The param 'DeviceName' sounds like it identifies which GPU (card?) to use. Check the CUDA SDK, there are many demos that enumerate the GPUs on the system, perhaps these values are valid for 'DeviceName'. As for 'hostFunction' and 'deviceFunction', these sound like either function IDs, or perhaps function pointers. Also, you can call 'cudaGetLastError()' to test whether the function call was successful (it returns 'cudaSuccess' if everything is ok... take a look at the error logging macros in the sdk). Good luck!

Related

External FLASH slow verification with STM32cubeProgrammer

I'm working with an STM32F469 chip with a Micron MT25Q Quad_SPI Flash. To program the Flash, there needs to be an external loader program developed. That's all working, but the problem is that verification of the QSPI Flash is extremely slow.
Looking in the log file, it shows that the Flash is being programmed in 150K byte blocks. However, the verification is being done in 1K byte blocks. In addition, the chip is re-initialized before each block check. I've tried this with both through STM32cubeIDE and in STM32cubeProgrammer directly.
The external programmer program include the correct chip configuration information and specifies a 64K page size. I don't see how to get the programmer to use a larger block size. It looks like it understands what part of the SRAM is used and is using the balance of the 256K in the on-board SRAM for programming the QSPI Flash. It could use the same size for reading the data back or use the Verify() function in the external loader. It's calling Read() and then checking the data itself.
Any thought or hints?
Let me add some observations on creating a new external loader. The first observation is "Don't." If you can pick a supported external chip and pin it out to use an existing loader, then do that. STM provides just 4 example programs but they must have 50 external loaders. If the hardware design copies the schematic for a demo board that has an external loader, you should be fine and avoid doing the development work.
The external loader is not a complete executable. It provides a set of functions to do basic operations like Init(), Erase(), Read() and Write(). The trick is that there is no main() and no start-up code is run when the program starts.
The external loader is an ELF file, renamed to "*.stldr". The programming tool looks into the debug information to find the location of the functions. It then sets the registers to provide the parameters, the PC to run the function, and then let's it run. There's some super-clever work going on to make this work. The programmer looks at the returned value (R0) to see if things pass or not. It can also figure out if the function has crashed the core or otherwise timed-out.
What makes writing the external super fun is that the debugger is running the program so there's no debugger available to see what the code is doing. I settled on outputting errors, and encoded information, on the return() from the called functions to give hints as to what was happening.
The external loader isn't a "full" program. Without the startup code, lots of on-chip stuff isn't set up and some just isn't going to work. At least I couldn't figure it out. I'm not sure if it wasn't configured right or the debugger was blocking its use. Looking at the example external loaders, they are written in a very simple way and do not call the HAL or use interrupts. You'll need to provide core set-up functions to configure the clock chains. That Hal_Delay() method will never return as the timers and/or interrupts aren't working. I could never make them work and suspect the NVIC was somehow being disabled. I ended up replacing the HAL_delay() function with a for loop that spun based on core clock rate and a the instruction cycles per loop.
The app note suggests developing a stand-alone program to debug the basic capabilities. That's a good idea but a challenge. Prior to starting the external loader, I had the QSPI doing the needed operations but from a C++ application calling the HAL. Creating an external loader from that was a long exercise in stripping out and replacing functionality. A hint is that the examples are written at a register level. I'm not that good to deal directly with the QuadSPI peripheral and the chip's instruction set at the same time.
The normal start-up of a program is eliminated. Everything that's done before the main() is called (E.g., in startup_stm32f469nihx.s) is up to you. This includes setting the clock chains to boost the core clock and get the peripheral buses working. The program runs in the on-chip SRAM so any initialized variables are loaded correctly. There's no moving data needed but the stack and uninitialized data areas could/should still be zeroed.
I hope this helps someone!

Today I've faced the same issue.
I was able to improve the Verify speed with two simple steps, but Verify still much slower than programming and this is strange...
If anyone find a way to change the 1KB block read of STM32CubeProgrammer I would like to know =).
Follow the changes I made to improve a bit the performance.
Add a kind of lock in the Init Function to avoid multiple initializations. This was the most significant change because I'am checking the Flash ID in my initialization proccess. Other aproachs could be safer but this simple code snippet worked to me.
int Init(void)
{
static uint32_t lock;
if(lock != 0x43213CA5)
{
lock = 0x43213CA5;
/* Init procedure goes here */
}
return(1);
}
Cache a page instead of reading the external memory for each call. This will help more if your external memory page read has too much overhead, otherwise this idea won't give relevant results.

QEMU MIPS32 - 16550 Uart Implementation on a Custom Board

I’m trying to use QEMU to emulate a piece of firmware, but I’m having trouble getting the UART device to properly update the Line Status Register and display the input character.
Details:
Target device: Qualcomm QCA9533 (Documentation here if you're curious)
Target firmware: VxWorks 6.6 with U-Boot bootload
CPU: MIPS 24Kc
Board: mipssim (modified)
Memory: 512MB
Command used: qemu-system-mips -S -s -cpu 24Kc -M mipssim –nographic -device loader,addr=0xBF000000,cpu-num=0 -serial /dev/ttyS0 -bios target_image.bin
I have to apologize here, but I am unable to share my source. However, as I am attempting to retool the mipssim board, I have only made minor changes to the code, which are as follows:
Rebased bios memory region to 0x1F000000
Changed load_image_targphys() target address to 0x1F000000
Changed $pc initial value to 0xBF000000 (TLB remap of 0x1F000000)
Replaced the mipssim serial_init() ¬call with serial_mm_init(isa, 0x20000, env->irq[0], 115200, serial_hd(0), DEVICE_NATIVE_ENDIAN).
While it seems like serial_init() is probably the currently accepted standard, I wasn’t having any luck with remapping it. I noticed the malta board had no issues outputting on a MIPS test kernel I gave it, so I tried to mimic what was done there. However, I still cannot understand how QEMU works and I am unable to find many good resources that explain it. My slog through the source and included docs is ongoing, but in the meantime I was hoping someone might have some insight into what I’m doing wrong.
The binary loads and executes correctly from address 0xBF000000, but hangs when it hits the first UART polling loop. A look at mtree in the QEMU monitor shows that the I/O device is mapped correctly to address range 0x18020000-0x1802003F, and when the firmware writes to the Tx buffer, gdb shows the character successfully is written to memory. There’s just no further action from the serial device to pull that character and display it, so the firmware endlessly polls on the LSR waiting for an update.
Is there something I’m missing when it comes to serial/hardware interaction in QEMU? I would have assumed that remapping all of the existing functional components of the mipssim board would be enough to at least get serial communication working, especially since the target uses the same 16550 UART that mipssim does. Please let me know if you have any insights. It would be helpful if I could find a way to debug QEMU itself with symbols, but at the same time I’m not totally sure what I’d be looking for. Even advice on how to scope down the issue would be useful.
Thank you!

Well after a lot of hard work I got the UART working. The answer to the question lies within the serial_ioport_read() and serial_ioport_write() functions. These two methods are assigned as the callbacks QEMU invokes when data is read or written to the MemoryRegion for the serial device (which is initialized in serial_init() or serial_mm_init()). These functions do a bit of masking on the address (passed into the functions as addr) to determine which register is being referenced, then return the value from the SerialState struct corresponding to that register. It's surprisingly simple, but I guess everything seems simple once you've figured it out. The big turning point was the realization that QEMU effectively implements the serial device as a MemoryRegion with special functionality that is triggered on a memory operation.
Anyway, hope this helps someone in the future avoid the nightmare I went through. Cheers!

What is the use of task graphs in CUDA 10?

CUDA 10 added runtime API calls for putting streams (= queues) in "capture mode", so that instead of executing, they are returned in a "graph". These graphs can then be made to actually execute, or they can be cloned.
But what is the rationale behind this feature? Isn't it unlikely to execute the same "graph" twice? After all, even if you do run the "same code", at least the data is different, i.e. the parameters the kernels take likely change. Or - am I missing something?
PS - I skimmed this slide deck, but still didn't get it.

My experience with graphs is indeed that they are not so mutable. You can change the parameters with 'cudaGraphHostNodeSetParams', but in order for the change of parameters to take effect, I had to rebuild the graph executable with 'cudaGraphInstantiate'. This call takes so long that any gain of using graphs is lost (in my case). Setting the parameters only worked for me when I build the graph manually. When getting the graph through stream capture, I was not able to set the parameters of the nodes as you do not have the node pointers. You would think the call 'cudaGraphGetNodes' on a stream captured graph would return you the nodes. But the node pointer returned was NULL for me even though the 'numNodes' variable had the correct number. The documentation explicitly mentions this as a possibility but fails to explain why.

Task graphs are quite mutable.
There are API calls for changing/setting the parameters of task graph nodes of various kinds, so one can use a task graph as a template, so that instead of enqueueing the individual nodes before every execution, one changes the parameters of every node before every execution (and perhaps not all nodes actually need their parameters changed).
For example, See the documentation for cudaGraphHostNodeGetParams and cudaGraphHostNodeSetParams.

Another useful feature is the concurrent kernel executions. Under manual mode, one can add nodes in the graph with dependencies. It will explore the concurrency automatically using multiple streams. The feature itself is not new but make it automatic becomes useful for certain applications.

When training a deep learning model it happens often to re-run the same set of kernels in the same order but with updated data. Also, I would expect Cuda to do optimizations by knowing statically what will be the next kernels. We can imagine that Cuda can fetch more instructions or adapt its scheduling strategy when knowing the whole graph.

CUDA Graphs is trying to solve the problem that in the presence of too many small kernel invocations, you see quite some time spent on the CPU dispatching work for the GPU (overhead).
It allows you to trade resources (time, memory, etc.) to construct a graph of kernels that you can use a single invocation from the CPU instead of doing multiple invocations. If you don't have enough invocations, or your algorithm is different each time, then it won't worth it to build a graph.
This works really well for anything iterative that uses the same computation underneath (e.g., algorithms that need to converge to something) and it's pretty prominent in a lot of applications that are great for GPUs (e.g., think of the Jacobi method).
You are not going to see great results if you have an algorithm that you invoke once or if your kernels are big; in that case the CPU invocation overhead is not your bottleneck. A succinct explanation of when you need it exists in the Getting Started with CUDA Graphs.
Where task graph based paradigms shine though is when you define your program as tasks with dependencies between them. You give a lot of flexibility to the driver / scheduler / hardware to do scheduling itself without much fine-tuning from the developer's part. There's a reason why we have been spending years exploring the ideas of dataflow programming in HPC.

Runtime error: Cannot set while device is active in this process

I'm trying to implement a separable convolution filter using CUDA as part of a bigger application I'm working on. My code has multiple CUDA kernels which are called one after the other (each performing one stage of the process). The problem is I keep getting this weird error and I'm not sure what exactly does it mean or what is causing it. I also can't find anything about it on the Internet except a couple of Stack Overflow questions related to OpenGL and CUDA interoperability (which I'm not doing, i.e. I'm not using OpenGL at all).
Can someone please explain to me why such an error may occur?
Thanks.

Runtime vs. Compile time

What is the difference between run-time and compile-time?

The difference between compile time and run time is an example of what pointy-headed theorists call the phase distinction. It is one of the hardest concepts to learn, especially for people without much background in programming languages. To approach this problem, I find it helpful to ask
What invariants does the program satisfy?
What can go wrong in this phase?
If the phase succeeds, what are the postconditions (what do we know)?
What are the inputs and outputs, if any?
Compile time
The program need not satisfy any invariants. In fact, it needn't be a well-formed program at all. You could feed this HTML to the compiler and watch it barf...
What can go wrong at compile time:
Syntax errors
Typechecking errors
(Rarely) compiler crashes
If the compiler succeeds, what do we know?
The program was well formed---a meaningful program in whatever language.
It's possible to start running the program. (The program might fail immediately, but at least we can try.)
What are the inputs and outputs?
Input was the program being compiled, plus any header files, interfaces, libraries, or other voodoo that it needed to import in order to get compiled.
Output is hopefully assembly code or relocatable object code or even an executable program. Or if something goes wrong, output is a bunch of error messages.
Run time
We know nothing about the program's invariants---they are whatever the programmer put in. Run-time invariants are rarely enforced by the compiler alone; it needs help from the programmer.
What can go wrong are run-time errors:
Division by zero
Dereferencing a null pointer
Running out of memory
Also there can be errors that are detected by the program itself:
Trying to open a file that isn't there
Trying find a web page and discovering that an alleged URL is not well formed
If run-time succeeds, the program finishes (or keeps going) without crashing.
Inputs and outputs are entirely up to the programmer. Files, windows on the screen, network packets, jobs sent to the printer, you name it. If the program launches missiles, that's an output, and it happens only at run time :-)

I think of it in terms of errors, and when they can be caught.
Compile time:
string my_value = Console.ReadLine();
int i = my_value;
A string value can't be assigned a variable of type int, so the compiler knows for sure at compile time that this code has a problem
Run time:
string my_value = Console.ReadLine();
int i = int.Parse(my_value);
Here the outcome depends on what string was returned by ReadLine(). Some values can be parsed to an int, others can't. This can only be determined at run time

Compile-time: the time period in which you, the developer, are compiling your code.
Run-time: the time period which a user is running your piece of software.
Do you need any clearer definition?

(edit: the following applies to C# and similar, strongly-typed programming languages. I'm not sure if this helps you).
For example, the following error will be detected by the compiler (at compile time) before you run a program and will result in a compilation error:
int i = "string"; --> error at compile-time
On the other hand, an error like the following can not be detected by the compiler. You will receive an error/exception at run-time (when the program is run).
Hashtable ht = new Hashtable();
ht.Add("key", "string");
// the compiler does not know what is stored in the hashtable
// under the key "key"
int i = (int)ht["key"]; // --> exception at run-time

Translation of source code into stuff-happening-on-the-[screen|disk|network] can occur in (roughly) two ways; call them compiling and interpreting.
In a compiled program (examples are c and fortran):
The source code is fed into another program (usually called a compiler--go figure), which produces an executable program (or an error).
The executable is run (by double clicking it, or typing it's name on the command line)
Things that happen in the first step are said to happen at "compile time", things that happen in the second step are said to happen at "run time".
In an interpreted program (example MicroSoft basic (on dos) and python (I think)):
The source code is fed into another program (usually called an interpreter) which "runs" it directly. Here the interpreter serves as an intermediate layer between your program and the operating system (or the hardware in really simple computers).
In this case the difference between compile time and run time is rather harder to pin down, and much less relevant to the programmer or user.
Java is a sort of hybrid, where the code is compiled to bytecode, which then runs on a virtual machine which is usually an interpreter for the bytecode.
There is also an intermediate case in which the program is compiled to bytecode and run immediately (as in awk or perl).

Basically if your compiler can work out what you mean or what a value is "at compile time" it can hardcode this into the runtime code. Obviously if your runtime code has to do a calculation every time it will run slower, so if you can determine something at compile time it is much better.
Eg.
Constant folding:
If I write:
int i = 2;
i += MY_CONSTANT;
The compiler can perform this calulation at compile time because it knows what 2 is, and what MY_CONSTANT is. As such it saves itself from performing a calculation every single execution.

Hmm, ok well, runtime is used to describe something that occurs when a program is running.
Compile time is used to describe something that occurs when a program is being built (usually, by a compiler).

Compile Time:
Things that are done at compile time incur (almost) no cost when the resulting program is run, but might incur a large cost when you build the program.
Run-Time:
More or less the exact opposite. Little cost when you build, more cost when the program is run.
From the other side; If something is done at compile time, it runs only on your machine and if something is run-time, it run on your users machine.
Relevance
An example of where this is important would be a unit carrying type. A compile time version (like Boost.Units or my version in D) ends up being just as fast as solving the problem with native floating point code while a run-time version ends up having to pack around information about the units that a value are in and perform checks in them along side every operation. On the other hand, the compile time versions requiter that the units of the values be known at compile time and can't deal with the case where they come from run-time input.

As an add-on to the other answers, here's how I'd explain it to a layman:
Your source code is like the blueprint of a ship. It defines how the ship should be made.
If you hand off your blueprint to the shipyard, and they find a defect while building the ship, they'll stop building and report it to you immediately, before the ship has ever left the drydock or touched water. This is a compile-time error. The ship was never even actually floating or using its engines. The error was found because it prevented the ship even being made.
When your code compiles, it's like the ship being completed. Built and ready to go. When you execute your code, that's like launching the ship on a voyage. The passengers are boarded, the engines are running and the hull is on the water, so this is runtime. If your ship has a fatal flaw that sinks it on its maiden voyage (or maybe some voyage after for extra headaches) then it suffered a runtime error.

Following from previous similar answer of question What is the difference between run-time error and compiler error?
Compilation/Compile time/Syntax/Semantic errors: Compilation or compile time errors are error occurred due to typing mistake, if we do not follow the proper syntax and semantics of any programming language then compile time errors are thrown by the compiler. They wont let your program to execute a single line until you remove all the syntax errors or until you debug the compile time errors.
Example: Missing a semicolon in C or mistyping int as Int.
Runtime errors: Runtime errors are the errors that are generated when the program is in running state. These types of errors will cause your program to behave unexpectedly or may even kill your program. They are often referred as Exceptions.
Example: Suppose you are reading a file that doesn't exist, will result in a runtime error.
Read more about all programming errors here

Here is a quote from Daniel Liang, author of 'Introduction to JAVA programming', on the subject of compilation:
"A program written in a high-level language is called a source program or source code. Because a computer cannot execute a source program, a source program must be translated into machine code for execution. The translation can be done using another programming tool called an interpreter or a compiler." (Daniel Liang, "Introduction to JAVA programming", p8).
...He Continues...
"A compiler translates the entire source code into a machine-code file, and the machine-code file is then executed"
When we punch in high-level/human-readable code this is, at first, useless! It must be translated into a sequence of 'electronic happenings' in your tiny little CPU! The first step towards this is compilation.
Simply put: a compile-time error happens during this phase, while a run-time error occurs later.
Remember: Just because a program is compiled without error does not mean it will run without error.
A Run-time error will occur in the ready, running or waiting part of a programs life-cycle while a compile-time error will occur prior to the 'New' stage of the life cycle.
Example of a Compile-time error:
A Syntax Error - how can your code be compiled into machine level instructions if they are ambiguous?? Your code needs to conform 100% to the syntactical rules of the language otherwise it cannot be compiled into working machine code.
Example of a run-time error:
Running out of memory - A call to a recursive function for example might lead to stack overflow given a variable of a particular degree! How can this be anticipated by the compiler!? it cannot.
And that is the difference between a compile-time error and a run-time error

For example: In a strongly typed language, a type could be checked at compile time or at runtime. At compile time it means, that the compiler complains if the types are not compatible. At runtime means, that you can compile your program just fine but at runtime, it throws an exception.

In simply word difference b/w Compile time & Run time.
compile time:Developer writes the program in .java format & converts in to the Bytecode which is a class file,during this compilation any error occurs can be defined as compile time error.
Run time:The generated .class file is use by the application for its additional functionality & the logic turns out be wrong and throws an error which is a run time error

Compile time:
Time taken to convert the source code into a machine code so that it becomes an executable is called compile time.
Run time:
When an application is running, it is called run time.
Compile time errors are those syntax errors, missing file reference errors.
Runtime errors happen after the source code has been compiled into an executable program and while the program is running. Examples are program crashes, unexpected program behavior or features don't work.

Run time means something happens when you run the program.
Compile time means something happens when you compile the program.

Imagine that you are a boss and you have an assistant and a maid, and you give them a list of tasks to do, the assistant (compile time) will grab this list and make a checkup to see if the tasks are understandable and that you didn't write in any awkward language or syntax, so he understands that you want to assign someone for a Job so he assign him for you and he understand that you want some coffee, so his role is over and the maid (run time)starts to run those tasks so she goes to make you some coffee but in sudden she doesn’t find any coffee to make so she stops making it or she acts differently and make you some tea (when the program acts differently because he found an error).

Compile Time:
Things that are done at compile time incur (almost) no cost when the resulting program is run, but might incur a large cost when you build the program.
Run-Time:
More or less the exact opposite. Little cost when you build, more cost when the program is run.
From the other side; If something is done at compile time, it runs only on your machine and if something is run-time, it run on your users machine.

I have always thought of it relative to program processing overhead and how it affects preformance as previously stated. A simple example would be, either defining the absolute memory required for my object in code or not.
A defined boolean takes x memory this is then in the compiled program and cannot be changed. When the program runs it knows exactly how much memory to allocate for x.
On the other hand if I just define a generic object type (i.e. kind of a undefined place holder or maybe a pointer to some giant blob) the actual memory required for my object is not known until the program is run and I assign something to it, thus it then must be evaluated and memory allocation, etc. will be then handled dynamically at run time (more run time overhead).
How it is dynamically handled would then depend on the language, the compiler, the OS, your code, etc.
On that note however it would really depends on the context in which you are using run time vs compile time.

Here is an extension to the Answer to the question "difference between run-time and compile-time?" -- Differences in overheads associated with run-time and compile-time?
The run-time performance of the product contributes to its quality by delivering results faster. The compile-time performance of the product contributes to its timeliness by shortening the edit-compile-debug cycle. However, both run-time performance and compile-time performance are secondary factors in achieving timely quality. Therefore, one should consider run-time and compile-time performance improvements only when justified by improvements in overall product quality and timeliness.
A great source for further reading here:

we can classify these under different two broad groups static binding and dynamic binding. It is based on when the binding is done with the corresponding values. If the references are resolved at compile time, then it is static binding and if the references are resolved at runtime then it is dynamic binding. Static binding and dynamic binding also called as early binding and late binding. Sometimes they are also referred as static polymorphism and dynamic polymorphism.
Joseph Kulandai‏.

The major difference between run-time and compile time is:
If there are any syntax errors and type checks in your code,then it throws compile time error, where-as run-time:it checks after executing the code.
For example:
int a = 1
int b = a/0;
here first line doesn't have a semi-colon at the end---> compile time error after executing the program while performing operation b, result is infinite---> run-time error.
Compile time doesn't look for output of functionality provided by your code, whereas run-time does.

here's a very simple answer:
Runtime and compile time are programming terms that refer to different stages of software program development.
In order to create a program, a developer first writes source code, which defines how the program will function. Small programs may only contain a few hundred lines of source code, while large programs may contain hundreds of thousands of lines of source code. The source code must be compiled into machine code in order to become and executable program. This compilation process is referred to as compile time.(think of a compiler as a translator)
A compiled program can be opened and run by a user. When an application is running, it is called runtime.
The terms "runtime" and "compile time" are often used by programmers to refer to different types of errors. A compile time error is a problem such as a syntax error or missing file reference that prevents the program from successfully compiling. The compiler produces compile time errors and usually indicates what line of the source code is causing the problem.
If a program's source code has already been compiled into an executable program, it may still have bugs that occur while the program is running. Examples include features that don't work, unexpected program behavior, or program crashes. These types of problems are called runtime errors since they occur at runtime.
The reference

Look into this example:
public class Test {
public static void main(String[] args) {
int[] x=new int[-5];//compile time no error
System.out.println(x.length);
}}
The above code is compiled successfully, there is no syntax error, it is perfectly valid.
But at the run time, it throws following error.
Exception in thread "main" java.lang.NegativeArraySizeException
at Test.main(Test.java:5)
Like when in compile time certain cases has been checked, after that run time certain cases has been checked once the program satisfies all the condition you will get an output.
Otherwise, you will get compile time or run time error.

You can understand the code compile structure from reading the actual code. Run-time structure are not clear unless you understand the pattern that was used.

public class RuntimeVsCompileTime {
public static void main(String[] args) {
//test(new D()); COMPILETIME ERROR
/**
* Compiler knows that B is not an instance of A
*/
test(new B());
}
/**
* compiler has no hint whether the actual type is A, B or C
* C c = (C)a; will be checked during runtime
* #param a
*/
public static void test(A a) {
C c = (C)a;//RUNTIME ERROR
}
}
class A{
}
class B extends A{
}
class C extends A{
}
class D{
}

It's not a good question for S.O. (it's not a specific programming question), but it's not a bad question in general.
If you think it's trivial: what about read-time vs compile-time, and when is this a useful distinction to make? What about languages where the compiler is available at runtime? Guy Steele (no dummy, he) wrote 7 pages in CLTL2 about EVAL-WHEN, which CL programmers can use to control this. 2 sentences are barely enough for a definition, which itself is far short of an explanation.
In general, it's a tough problem that language designers have seemed to try to avoid.
They often just say "here's a compiler, it does compile-time things; everything after that is run-time, have fun". C is designed to be simple to implement, not the most flexible environment for computation. When you don't have the compiler available at runtime, or the ability to easily control when an expression is evaluated, you tend to end up with hacks in the language to fake common uses of macros, or users come up with Design Patterns to simulate having more powerful constructs. A simple-to-implement language can definitely be a worthwhile goal, but that doesn't mean it's the end-all-be-all of programming language design. (I don't use EVAL-WHEN much, but I can't imagine life without it.)
And the problemspace around compile-time and run-time is huge and still largely unexplored. That's not to say S.O. is the right place to have the discussion, but I encourage people to explore this territory further, especially those who have no preconceived notions of what it should be. The question is neither simple nor silly, and we could at least point the inquisitor in the right direction.
Unfortunately, I don't know any good references on this. CLTL2 talks about it a bit, but it's not great for learning about it.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008