I am working on a software program that has to be deployed on private cloud server of a client, who has root access. I can communicate with the software through a secure port.
I want to prevent client from reverse engineering my program, or at least make it "hard enough". Below is my approach:
Write code in Go and compile the software into binary code (may be with obfuscation)
Make sure that program can only be initiated with secret key that can be sent through the secure port. The secret key can be changing depending on time.
Every time I need to start/stop the program, I can send commands with the secret keys through the secured port.
I think this approach can prevent a root user from either:
Using a debugger to reverse engineer my code
Running the program repeatedly to check outputs
My question is: What are the weak spots of this design? How can a root user attack it?
I want to prevent client from reverse engineering my program,
You can't prevent this fully when software runs on hardware you don't own. To run the software, the CPU must see all instructions of the program, and they will be stored in the computer memory.
https://softwareengineering.stackexchange.com/questions/46434/how-can-software-be-protected-from-piracy
Code is data. When the code is runnable, a copy of that data is un-protected code. Unprotected code can be copied.
Peppering the code with anti-piracy checks makes it slightly harder, but hackers will just use a debugger and remove them. Inserting no-ops instead of calls to "check_license" is pretty easy.
(answers in https://softwareengineering.stackexchange.com/questions/46434 may be useful for you)
Hardware owner controls the OS and memory, he can dump everything.
or at least make it "hard enough".
You can only make it a bit harder.
Write code in Go and compile the software into binary code (may be with obfuscation)
IDA will decompile any machine code. Using native machine code is a bit stronger than bytecode (java or .NET or dex)
Make sure that program can only be initiate with secret key that can be sent through the secure port. The secret key can be changing depending on time.
If the copy of the same secret key (keys) is in code or memory of the program, user may dump it and simulate your server. If part of your code, or part of data needed for code to run is stored encrypted, and deciphered with such external key, user may either eavesdrop the key (after it will be decoded from SSL but before it will be used to decrypt secret part of code), or dump decrypted code/data from the memory (It is very easy to see new executable code created in memory even with default preinstalled tools like strace in Linux, just search for all mmaps with PROT_EXEC flags)
Every time I need to start/stop the program, I can send commands with the secret keys through the secured port.
This is just a variation of online license/antipiracy check ("phone home")
I think this approach can prevent a root user to: use a debugger to reverse engineer my code, or
No, he can start debugger at any time; but you can make it a bit harder to use interactive debugger, if the program communicates with your server often (every 5 seconds). But if it communicates so often it is better to move some part of computations to your server; this part will be protected.
And he still can use non-interactive debuggers, tracing tools and memory dumping. Also he can run program in virtual machine, wait until online check is done (using tcpdump and netstat to monitor network traffic), then do live snapshot of the VM (there are several variants to enable "live migration" of VM; only short pause may be recorded by your program if it has external timing), continue to run the first copy online, and take snapshot for offline debugging (with all keys and decrypted code in it).
run the program repeatedly to check outputs
Until he cracks the communications...
Related
I built a malware analysis test lab and i used Pafish to detect analysis environment and i want to patch some fault. How could I hide the registry keys and processes from malware VM detection?
In Windows, there are many points where the operating system enables programs to intercept calls to operating system functions (these are called "hooks"). For example, a program can "hook" the calls to the file system functions that return the entries in a directory. Normally, a program hooks a function to monitor and measure performance, or perhaps to add an additional level of validation.
A rootkit or SANDBOX can use a hook to check every value returned by the function, and skip any value that represents a part of the rootkit. In the case of the directory enumerator, when the next file to be returned is a part of the rootkit, it is skipped - the file becomes "invisible".
Similarly, a hook on the function that returns registry values can hide a registry entry that you dont want the sandboxed app to check.
I am experimenting with Ethereum. I have successfully setup a private testnet via the instructions on the site. However, I am having trouble adding peers from different machines. On any node I create, the admin.nodeInfo.NodeUrl parameter is undefined. I have gotten the enode address by calling admin.nodeInfo and when I try the admin.addPeer("enode://address") command (with the public IP,) it returns true but the peers are not listed when calling admin.peers.
I read on another thread (here) that the private testnet is only local, but I am seeing plenty of documentation that suggests otherwise (here and here.) I have tried the second tutorial adding the command-line flags for custom networkid and genesis block.
Any advice would be much appreciated. Please let me know if I can provide more details.
It is difficult to find in the available documentation but a key function is admin.addPeer().
https://github.com/ethereum/go-ethereum/wiki/JavaScript-Console
There are a few ways you could do it I suppose, but I have 1 node running on my local PC and one node running on a remote server. This saves me Ether while testing contracts and keeps me from polluting the Ethereum blockchain with junk. The key when running the admin.addPeer() is to find the "enode" for each of the notes such that you will run the function to look something like this on one of the nodes: admin.addPeer(enode#ipaddress:port). If you run admin.peers and see something other than an empty list, you were probably successful. The main thing to check for is that the enode ID and ip address from admin.peers match what you were expecting.
The geth configuration settings are a little tricky as well. You will have to adopt it for your particular uses, but here are some of the parameters I use:
geth --port XYZ --networkid XYZ --maxpeers X
Replace XYZ and X with the numbers you want to use and make sure you run the same parameters when starting both notes. There could be more parameters involved, but that should get you pretty far.
Disclaimer: I'm new to Geth myself as well as using computers for anything more than facebook, so take my answer with a grain of salt. Also, I haven't given you my full command line with starting up Geth because I'm not 100% sure on whether some of the parameters are related to a private testnet and which are not. I've only given you the ones that I'm sure are related to running a private testnet.
Also, you may find that can't execute any transactions which running a private test net. That's because you need one of them to start mining. So run: miner.start(X) when you are ready to start deploying contracts.
I apologize for this not being fully complete, but just passing on my experience after spending 1-2 weeks trying to figure out myself because the documentation isn't full clear on how to do this. I think it should be actively discouraged in the spirit of Ethereuem, but in my case, I run primarily not to pollute the blockchain.
PS. As I was just getting ready to hit submit, I found this that also sheds more light.
connecting to the network
Here are the ways I've come up with:
Have an unversion-controlled config file
Check the server-name/IP address against a list of known dev servers
Set some environment variable that can be read
I've used (2) on some of my projects, and that has worked well with only one dev machine, but we're up to about 10 now, it may become difficult to manage an ever-changing list.
(1) I don't like, because that's an important file and it should be version controlled.
(3) I've never tried. It requires more configuration when we set up each server, but it could be an OK solution.
Are there any others I've missed? What are the pros/cons?
(3) doesn't have to require more configuration on the servers. You could instead default to server mode, and require more configuration on the dev machines.
In general I'd always want to make the dev machines the special case, and release behavior the default. The only tricky part is that if the relevant setting is in the config file, then developers will keep accidentally checking in their modified version of the file. You can avoid this either in your version-control system (for example a checkin hook), or:
read two config files, one of which is allowed to not exist (and only exists on dev machines, or perhaps on servers set up by expert users)
read an environment variable that is allowed to not exist.
Personally I prefer to have a config override file, just because you've already got the code to load the one config file, it should be pretty straightforward to add another. Reading the environment isn't exactly difficult, of course, it's just a separate mechanism.
Some people really like their programs to be controlled by the environment (especially those who want to control them when running from scripts. They don't want to have to write a config file on the fly when it's so easy to set the environment from a script). So it might be worth using the environment from that POV, but not just for this setting.
Another completely different option: make dev/release mode configurable within the app, if you're logged into the app with suitable admin privileges. Whether this is a good idea might depend whether you have the kind of devs who write debug logging messages along the lines of, "I can't be bothered to fix this, but no customer is ever going to tell the difference, they're all too stupid." If so, (a) don't allow app admins to enable debug mode (b) re-educate your devs.
Here are a few other possibilities.
Some organizations keep development machines on one network, and production machines on another network, for example, dev.example.com and prod.example.com. If your organization uses that practice, then an application can determine its environment via the fully-qualified hostname on which it is running, or perhaps by examining some bits in its IP address.
Another possibility is to use an embeddable scripting language (Tcl, Lua and Python come to mind) as the syntax of your configuration file. Doing that means your configuration file can easily query environment variables (or IP addresses) and use that to drive an if-then-else statement. A drawback of this approach is the potential security risk of somebody editing a configuration file to add malicious code (for example, to delete files).
A final possibility is to start each application via a shell/Python/Perl script. The script can query its environment and then use that to driven an if-then-else statement for passing a command-line option to the "real" application.
By the way, I don't like to code an environment-testing if-then-else statement as follows:
if (check-for-running-in-production) {
... // run program in production mode
} else {
... // run program in development mode
}
The above logic silently breaks if the check-for-running-in-production test has not been updated to deal with a newly added production machine. Instead, if prefer to code a bit more defensively:
if (check-for-running-in-production) {
... // run program in production mode
} else if (check-for-running-in-development) {
... // run program in development mode
} else {
print "Error: unknown environment"
exit
}
This may be a stupid question, as most of my programming consists of one-man scientific computing research prototypes and developing relatively low-level libraries. I've never programmed in the large in an enterprise environment before. I've always wondered, what are the main things that logging libraries make substantially easier than just using good old fashioned print statements or file output, simple programming logic and a few global variables to determine how verbosely things get logged? How do you know when a few print statements or some basic file output ain't gonna cut it and you need a real logging library?
Logging helps debug problems especially when you move to production and problems occur on people's machines you can't control. Best laid plans never survive contact with the enemy, and logging helps you track how that battle went when faced with real world data.
Off the shel logging libraries are easy to plug in and play in less than 5 minutes.
Log libraries allow for various levels of logging per statement (FATAL, ERROR, WARN, INFO, DEBUG, etc).
And you can turn up or down logging to get more of less information at runtime.
Highly threaded systems help sort out what thread was doing what. Log libraries can log information about threads, timestamps, that ordinary print statements can't.
Most allow you to turn on only portions of the logging to get more detail. So one system can log debug information, and another can log only fatal errors.
Logging libraries allow you to configure logging through an external file so it's easy to turn on or off in production without having to recompile, deploy, etc.
3rd party libraries usually log so you can control them just like the other portions of your system.
Most libraries allow you to log portions or all of your statements to one or many files based on criteria. So you can log to both the console AND a log file.
Log libraries allow you to rotate logs so it will keep several log files based on many different criteria. Say after the log gets 20MB rotate to another file, and keep 10 log files around so that log data is always 100MB.
Some log statements can be compiled in or out (language dependent).
Log libraries can be extended to add new features.
You'll want to start using a logging libraries when you start wanting some of these features. If you find yourself changing your program to get some of these features you might want to look into a good log library. They are easy to learn, setup, and use and ubiquitous.
There are used in environments where the requirements for logging may change, but the cost of changing or deploying a new executable are high. (Even when you have the source code, adding a one line logging change to a program can be infeasible because of internal bureaucracy.)
The logging libraries provide a framework that the program will use to emit a wide variety of messages. These can be described by source (e.g. the logger object it is first sent to, often corresponding to the class the event has occurred in), severity, etc.
During runtime the actual delivery of the messaages is controlled using an "easily" edited config file. For normal situations most messages may be obscured altogether. But if the situation changes, it is a simpler fix to enable more messages, without needing to deploy a new program.
The above describes the ideal logging framework as I understand the intention; in practice I have used them in Java and Python and in neither case have I found them worth the added complexity. :-(
They're for logging things.
Or more seriously, for saving you having to write it yourself, giving you flexible options on where logs are store (database, event log, text file, CSV, sent to a remote web service, delivered by pixies on a velvet cushion) and on what is logged at runtime, rather than having to redefine a global variable and then recompile.
If you're only writing for yourself then it's unlikely you need one, and it may introduce an external dependency you don't want, but once your libraries start to be used by others then having a logging framework in place may well help your users, and you, track down problems.
I know that a logging library is useful when I have more than one subsystem with "verbose logging," but where I only want to see that verbose data from one of them.
Certainly this can be achieved by having a global log level per subsystem, but for me it's easier to use a "system" of some sort for that.
I generally have a 2D logging environment too; "Info/Warning/Error" (etc) on one axis and "AI/UI/Simulation/Networking" (etc) on the other. With this I can specify the logging level that I care about seeing for each subsystem easily. It's not actually that complicated once it's in place, indeed it's a lot cleaner than having if my_logging_level == DEBUG then print("An error occurred"); Plus, the logging system can stuff file/line info into the messages, and then getting totally fancy you can redirect them to multiple targets pretty easily (file, TTY, debugger, network socket...).
Say there is some functionality needed for an application under development which could be achieved by making a system call to either a command line program or utilizing a library. Assuming efficiency is not an issue, is it bad practice to simply make a system call to a program instead of utilizing a library? What are the disadvantages of doing this?
To make things more concrete, an example of this scenario would be an application which needs to download a file from a web server, either the cURL program or the libcURL library could be used for this.
Unless you are writing code for only one OS, there is no way of knowing if your system call will even work. What happens when there is a system update or an OS upgrade?
Never use a system call if there is a library to do the same function.
I prefer libraries because of the dependency issue, namely the executable might not be there when you call it, but the library will be (assuming external library references get taken care of when the process starts on your platform). In other words, using libraries would seem to guarantee a more stable, predictable outcome in more environments than system calls would.
There are several factors to take into account. One key one is the reliability of whether the external program will be present on all systems where your software is installed. If there is a possibility that it will be missing, then maybe it is better to do it inside your program.
Weighing against that, you might consider that the extra code loaded into your program is prohibitive - you don't need the code bloat for such a seldom-used part of your application.
The system() function is convenient, but dangerous, not least because it invokes a shell, usually. You may be better off calling the program more directly - on Unix, via the fork() and exec() system calls. [Note that a system call is very different from calling the system() function, incidentally!] OTOH, you may need to worry about ensuring all open file descriptors in your program are closed - especially if your program is some sort of daemon running on behalf of other users; that is less of a problem if your are not using special privileges, but it is still a good idea not to give the invoked program access to anything you did not intend. You may need to look at the fcntl() system call and the FD_CLOEXEC flag.
Generally, it is easier to keep control of things if you build the functionality into your program, but it is not a trivial decision.
Security is one concern. A malicious cURL could cause havoc in your program. It depends if this is a personal program where coding speed is your main focus, or a commercial application where things like security play a factor.
System calls are much harder to make safely.
All sorts of funny characters need to be correctly encoded to pass arguments in, and the types of encoding may vary by platform or even version of the command. So making a system call that contains any user data at all requires a lot of sanity-checking and it's easy to make a mistake.
Yeah, as mentioned above, keep in mind the difference between system calls (like fcntl() and open()) and system() calls. :)
In the early stages of prototyping a c program, I often make external calls to programs like grep and sed for manipulation of files using popen(). It's not safe, it's not secure, and it's certainly not portable. But it can allow you to get going quickly. That's valuable to me. It lets me focus on the really important core of the program, usually the reason I used c in the first place.
In high level languages, you'd better have a pretty good reason. :)
Instead of doing either, I'd Unix it up and build a script framework around your app, using the command line arguments and stdin.
Other's have mentioned good points (reliability, security, safety, portability, etc) - but I'll throw out another. Performance. Generally it is many times faster to call a library function or even spawn a new thread then it is to start an entire new process (and then you still have to correctly check/verify it's execution and parse it's output!)