Best practice to monitor program life - language-agnostic

I want to hear your opinion about program life monitoring.
This is the scenario. You have a simple program which normally works, that means that it's well written, exception are handled and so on.
How will you operate if you want to ensure that this program works FOREVER?
No external tools like crontab are available, but any overhead can be added.
Using another program that continuously "pings" the main program? Touching a file and check with another program for the file modification?
And how do you assure that this second program always works?
So, come on, tell me which are your opinion or best practice in this context!
As footnote, I've to write this program in Python, but it's a general purpose question!

In embedded systems, what is often done is a watchdog module.
A watchdog checks some location (could be a file, could be a memory location, whatever), and restarts the system under examination if the location does not meet criteria.
So you might have your program under probe do is to write some programname_watchdog file with an epoch stamp periodically. This would be part of the regular loop.
Then your watchdog (in a totally different process) would check the file. If the date listed was sufficiently outdated, the other program would be killed and restarted, since it would be deemed to have critically malfunctioned(either hung or crashed). Note that your watchdog will have some simple logic, so its chances of failing are much lower.
I'm positive there are other ways to accomplish this as well. This is just one way.
edit: You have to consider the stack your system is built on. The more external dependencies you have, the more risk of failure. You also have to consider a formal proof of program correctness if you are looking for perfect operation.
The question really becomes what you are expecting from your system; what sort of failures are unacceptable and what sort of failures are expected so you can compensate for them.
This question becomes a proof-hardware-software co-design issue very fast (and expensive, too). I'm curious to see what you are doing and what your solution is.

Like Paul Nathan said, use a watchdog.
There are a few things you can do to make things more robust though, for example:
int lastTick;
int RemoteProcessState()
{
int tick = GetRemoteTick();
if (tick == -1)
{
// Process recoverable error state.
return -1;
}
if (tick == -2)
{
// Process unrecoverable error state.
return -1;
}
if (tick < 0)
{
// Detect if the watchdog is overflowed.
return -1;
}
if (abs(abs(tick) - abs(lastTick)) > ALLOWED_PROCESS_LAG)
{
// Resynchronize process
}
else
{
// Process running normally.
}
return 0;
}
That is a pseudeocode sample from real code used in a embedded RTU for process control.
Its primitive, but it works. Not only does this ensure that the remote process is alive, but if the remote process has drifted in calculation speed (scan rates are affected by program size and complexity) it will make sure that the two processes are still synchronized.
If you want more data, start investigating the return codes used by Modbus, or how the OPC protocol handles managing its Quality byte.

Well. I've thought long over this problem, and 2 things have come up.
A Software Watchdog should be so simple that crashing should be nearby impossible. For maniac people, an interesting programming challenge can be write a net of watchdogs, written in different languages, which have to keep alive one with other and all together should monitor the main process.
Even if challenging and interesting, it seems a big waste of time, and the scenario look like soldiers in war.
Secondly, in the application I'm developing I've a Hardware watchdog, which should be always present in critical operation.
So now my application has a software watchdog which refresh the hardware one, and monitor the program life.
In the end, Paul, I completely agree with you.

Related

AS3 - Does the amount of code in an object matter for multiple instances?

What's better?
1000 objects with 500 lines of code each
vs
1000 objects with 30 lines of code each that delegate to one manager with 500 lines of code.
Context:
I have signals in my game. Dozens and dozens of them. I like that they have better performance than the native Flash Event.
I figured that as I was going to have so many, they should be lightweight, so each signal only stores a few variables and three methods.
Variables: head, tail, isStepping, hasColdNodes, signalMgr
Methods: addListener, removeListener, dispatch
But all these signals delegate the heavywork to the signalMgr, as in:
signalMgr.addListener(this, listener, removeOnFirstCallback);
That manager handles all the doublylinked list stuff, and has much more code than a signal.
So, is it correct to think this way?
I figure that if I had all the management code in the signal, that would be repeated in memory every time I instance one.
In the context of your question this is pretty much irrelevant and both cases should not produce much difference.
From what you say it seems that you assume many things but did not actually check any. For instance when you say: "I like that they have better performance than the native Flash Event." I can only assume that you did read that somewhere but never try to verify it by yourself. There's only a few cases where using the signal system can make a tiny bit of difference, in most cases they don't bring much, in some cases they make things worse. In the context of Flash development signals do not bring anything more than simple convenience. Flash is an event driven system that cannot be turned off so using signals with it means using event + signals, not using signals alone. In the case of custom events using delegation is much more efficient and easier to use and doesn't require any object creation.
But the real answer to the question is even more simple: There's no point optimizing something that you don't know needs optimization. Even worse different OS will produce different optimization needs so anyone trying to answer a general optimization question can only fail or pretend to know.
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. (c) DonaldKnuth
The ultimate goal of any programmer is to write clean code. So you should write clean code with a only a few thoughts of optimizations.

Juding whether an exception is exceptional

It's a pretty popular and well known phrase that you should "only catch/throw exceptions which are exceptional". However, how is an "exceptional" exception determined?
For example, a bad password is very routine in logging into a service, so this is not exceptional. Statistics for a web app would probably show something like one bad login attempt for every 5 attempts (from no specific user). Likewise, with attempting to go to a checkout with a basket in an online store, this could be very commmon (especially for new users). However, a file not found could go either way. I usually work along the lines that if a method is missing something to do its work, throw an exception, but then it gets a little confusing here. In some cases, a file not found could be common (e.g. a file share used by many users with no tight controls), compared to a very locked down production environment missing a file, which would be exceptional.
Is this the right way to deduce between whether an exception is exceptional or not? I can easily filter things like no network connection etc as exceptional, but some cases are hard to judge. Is it subjective?
Thanks
I think it's pretty subjective, honestly, so I prefer to avoid that method of figuring out when I should use exceptions.
Instead, I prefer to consider three things:
Is it likely that I might want to let the call stack unwind more than one level?
Is there another way? (Return null or an error code, etc.) If so, do I have even the slightest performance concern?
If neither of those lead to a clear decision, which is easier to read by someone who has to maintain the code?
If #1 is true, and I don't have a MAJOR performance concern, I will probably opt to use exceptions because it will speed up my development time not to have to code return codes (and manually code the logic to have them propagate up the call stack if needed). When you use exceptions, call stack unwinding is free of charge for development time.
If #2 is true, and either I'm not going more than one frame (maybe two?) up the call stack or I have a serious performance concern (in a tight loop, for example), then I'll try really hard to find another way that doesn't involve exceptions.
Exceptions are only a tool for programmers in a language which supports them. I don't believe they have to have any intrinsic value as to what is "exceptional" or not. Instead, I say use them when they are the best tool for the job.

how to make a private game server?

i have always wanted to make a private server but i don't know how i would do this.
i know how a private server works, the game sends data packets to the server. the server will take the data and process it and send data to the other games connected.
my questions are,
how do you edit the game so it will go to your server/change game data.
how do you find what packets do what.
the game will be something like WOW, i have not desided yet.
If you are hoping to embark on creating your own MMORPG then you have a huge task ahead of you, and unfortunately to put it nicely you are probably being too ambitious especially if you are asking these sorts of questions.
You should probably read up on client server architecture.
Also, in answer to your questions about the structure of the data being sent and how it is interpreted, well, that's 100% up to the people that design the system. You will want to simulate the entire game on the server(s) and don't trust the clients at all.
For something as complex as a MMORPG it is really important to create a solid design for the system before anything else, this is very important.
Just to be clear your intent is to create an emulated MMO server to the effect of WOW?
That's not really a trivial task and carries with it its own ethical implications.
Just to get started will require a ton of research, inspection, decoding, an extreme attention to detail.
If you are serious about it, then I would suggest looking up networking tools that can help you inspect traffic across the network and creating a scientific process for operation inspection.
Again, it should be noted this is by no means a trivial task.
This will be fairly difficult as you do not have the communicaton protocol specification for the game's client/server communication.
If you want to start this, then create a server that is simply a pass through. That is, all client requests are forwarded to the particular server. Once you have generated a large enough sample size of packets to study, then you can begin to dissect the meaning of each byte (possibly). Of course, if the packets are encrypted in any way (even a simple XOR encryption) then you will have an even harder time trying to figure out what each byte means. You should capture a sample set using two clients running sniffers so you can see what happens when one client does something and it needs to be sent to all clients.
But if I were you, I would just abandon the idea and work on something else. My two cents..
If you'd like an inside look at how games do networking, there's always Ryzom, which went open-source earlier this year. If you're creating your own MMO you can begin right there, and if you're looking to reverse-engineer one you can practice with your own client and server.

Any good threads related job-interview question?

When interviewing graduates I usually ask them questions about data structures, algorithms and complexity theory. I would really like to ask a question that will enable them to show their familiarity with multi-threaded concepts, without dwelling into language specific issues.
Any good questions? The only question I could think of is how to write a Singleton that supports multi-threaded access.
I find the classic "write me a consumer-producer queue" question to be quite good. You can talk about synchronization in a handwavy way beforehand for five minutes or so (e.g. start with "What does Object.wait() do? What other methods on Object is it related to? Can you give me an example of when you might use these? What other concurrency techniques might you use in practice [because really, it's quite rare that actually using the wait/notify primitives is the best approach]?"). Make sure the candidate addresses (or at least makes clear he is aware of) both atomicity ("missed updates") and volatility (visibility of the new value on other threads)
Then after you've had a chat about the theory of these, get them to spend a few minutes actually writing the code for a primitive producer-consumer queue. This should be straightforward to anyone who actually understands what they were talking about above, yet it will weed out those who can "talk the talk" but don't actually understand it in practice (arguably the most dangerous group).
What I like about these mini-coding exercises, is that they're often easy to extend. For instance, if the candidate completes the task easily, you can ask how they would extend it for situation XXX - invent requirements that you know will push the limits of the noddy solution you asked for. This not only lets you tailor the depth of questions you're asking but gives some insight into how well the candidate handles clarification of requirements, and modifications of existing design (which is pretty important in this industry).
Here you can find some topics to discuss:
threads implementation ( kernel vs user space)
thread local storage
synchronization primitives
deadlocks, livelocks
Differences between mutex and
semaphore.
Use of condition variables.
When not to use threads. (eg. IO multiplexing)
Talk with them about a popular, but not well-known topic, where thread handling is essential.
I recommend you, build a web server with them, of course, only on paper or just in words. The result should look something like this: there is a main thread, it's listening on a socket. When something arrives, it passes the socket into the pool, then this thread returns back to socket listening. The pool has fixed number of slots. The request processing threads are dedicated to get job from the pool. Find out, what's better, if the threads are checking the pool concurrently, or the listner main thread selects a free slot/thread for the new incoming request. Try to write a small pseudocode, or a graph for both side of the pool handling.
Let's introduce a small application: page counter, which tells that how many page request has been made since server startup. Don't tell them that the counter must be protected against concurrent modification, let them to find it out how to do this with mutexes or synchronization or whatsoever. Maybe you could skip the web server part, the page counter app is easier to specify.
Another example is a chat, with 2+ clients and a server, find out, how to solve the problem, that all the messages should arrive in the same order for all clients. Or reflex game: the server waits for 1..5 secs random, then says "peek-a-boo", and the player wins who presses space key first. Specify it with 2 player, then try to expand it to N players.
Also, be aware of NPPs. NPP stands: "non-programming programmer". There are dudes, who can talk about programming issues, they know all the 3/4-letter abbrevations (there're lot in the Java world, EJB, JSP, XSLT, and my favourite: POJO, which means Pure Old Java Objects, lol), they understand and modify codes, or make similar programs from a base, but they fail even with small problems, it it has to do it theirselves, e.g. finding the nearest element to a base in an array. Sometimes it takes months, until it turns out. They performs well at interviews, because they prepare for it. Maybe they don't even known, that they're NPPs, this is a known effect: http://en.wikipedia.org/wiki/Dunning-Kruger_effect
It's hard to recognize the opposite dudes, who have not heard about trendy libraries or patterns, but they can learn it even at the job interview. (Personal remark: my last interview was in 1999, and it seems that I will not do interview anymore. I have never heard of dynamic web pages before, but I've figured out the term "session" during the interview, the question was that how to build a simple hanging man web app. I was hired.)

how are serial generators / cracks developed?

I mean, I always was wondered about how the hell somebody can develop algorithms to break/cheat the constraints of legal use in many shareware programs out there.
Just for curiosity.
Apart from being illegal, it's a very complex task.
Speaking just at a teoretical level the common way is to disassemble the program to crack and try to find where the key or the serialcode is checked.
Easier said than done since any serious protection scheme will check values in multiple places and also will derive critical information from the serial key for later use so that when you think you guessed it, the program will crash.
To create a crack you have to identify all the points where a check is done and modify the assembly code appropriately (often inverting a conditional jump or storing costants into memory locations).
To create a keygen you have to understand the algorithm and write a program to re-do the exact same calculation (I remember an old version of MS Office whose serial had a very simple rule, the sum of the digit should have been a multiple of 7, so writing the keygen was rather trivial).
Both activities requires you to follow the execution of the application into a debugger and try to figure out what's happening. And you need to know the low level API of your Operating System.
Some heavily protected application have the code encrypted so that the file can't be disassembled. It is decrypted when loaded into memory but then they refuse to start if they detect that an in-memory debugger has started,
In essence it's something that requires a very deep knowledge, ingenuity and a lot of time! Oh, did I mention that is illegal in most countries?
If you want to know more, Google for the +ORC Cracking Tutorials they are very old and probably useless nowdays but will give you a good idea of what it means.
Anyway, a very good reason to know all this is if you want to write your own protection scheme.
The bad guys search for the key-check code using a disassembler. This is relative easy if you know how to do this.
Afterwards you translate the key-checking code to C or another language (this step is optional). Reversing the process of key-checking gives you a key-generator.
If you know assembler it takes roughly a weekend to learn how to do this. I've done it just some years ago (never released anything though. It was just research for my game-development job. To write a hard to crack key you have to understand how people approach cracking).
Nils's post deals with key generators. For cracks, usually you find a branch point and invert (or remove the condition) the logic. For example, you'll test to see if the software is registered, and the test may return zero if so, and then jump accordingly. You can change the "jump if equals zero (je)" to "jump if not-equals zero (jne)" by modifying a single byte. Or you can write no-operations over various portions of the code that do things that you don't want to do.
Compiled programs can be disassembled and with enough time, determined people can develop binary patches. A crack is simply a binary patch to get the program to behave differently.
First, most copy-protection schemes aren't terribly well advanced, which is why you don't see a lot of people rolling their own these days.
There are a few methods used to do this. You can step through the code in a debugger, which does generally require a decent knowledge of assembly. Using that you can get an idea of where in the program copy protection/keygen methods are called. With that, you can use a disassembler like IDA Pro to analyze the code more closely and try to understand what is going on, and how you can bypass it. I've cracked time-limited Betas before by inserting NOOP instructions over the date-check.
It really just comes down to a good understanding of software and a basic understanding of assembly. Hak5 did a two-part series on the first two episodes this season on kind of the basics of reverse engineering and cracking. It's really basic, but it's probably exactly what you're looking for.
A would-be cracker disassembles the program and looks for the "copy protection" bits, specifically for the algorithm that determines if a serial number is valid. From that code, you can often see what pattern of bits is required to unlock the functionality, and then write a generator to create numbers with those patterns.
Another alternative is to look for functions that return "true" if the serial number is valid and "false" if it's not, then develop a binary patch so that the function always returns "true".
Everything else is largely a variant on those two ideas. Copy protection is always breakable by definition - at some point you have to end up with executable code or the processor couldn't run it.
The serial number you can just extract the algorithm and start throwing "Guesses" at it and look for a positive response. Computers are powerful, usually only takes a little while before it starts spitting out hits.
As for hacking, I used to be able to step through programs at a high level and look for a point where it stopped working. Then you go back to the last "Call" that succeeded and step into it, then repeat. Back then, the copy protection was usually writing to the disk and seeing if a subsequent read succeeded (If so, the copy protection failed because they used to burn part of the floppy with a laser so it couldn't be written to).
Then it was just a matter of finding the right call and hardcoding the correct return value from that call.
I'm sure it's still similar, but they go through a lot of effort to hide the location of the call. Last one I tried I gave up because it kept loading code over the code I was single-stepping through, and I'm sure it's gotten lots more complicated since then.
I wonder why they don't just distribute personalized binaries, where the name of the owner is stored somewhere (encrypted and obfuscated) in the binary or better distributed over the whole binary.. AFAIK Apple is doing this with the Music files from the iTunes store, however there it's far too easy, to remove the name from the files.
I assume each crack is different, but I would guess in most cases somebody spends
a lot of time in the debugger tracing the application in question.
The serial generator takes that one step further by analyzing the algorithm that
checks the serial number for validity and reverse engineers it.