Performing a distributed CUDA/OpenCL based password cracking - cuda

Is there a way to perform a distributed (as in a cluster of a connected computers) CUDA/openCL based dictionary attack?
For example, if I have a one computer with some NVIDIA card that is sharing the load of the dictionary attack with another coupled computer and thus utilizing a second array of GPUs there?
The idea is to ensure a scalability option for future expanding without the need of replacing the whole set of hardware that we are using. (and let's say cloud is not an option)

This is a simple master / slave work delegation problem. The master work server hands out to any connecting slave process a unit of work. Slaves work on one unit and queue one unit. When they complete a unit, they report back to the server. Work units that are exhaustively checked are used to estimate operations per second. Depending on your setup, I would adjust work units to be somewhere in the 15-60 second range. Anything that doesn't get a response by the 10 minute mark is recycled back into the queue.
For queuing, offer the current list of uncracked hashes, the dictionary range to be checked, and the permutation rules to be applied. The master server should be able to adapt queues per machine and per permutation rule set so that all machines are done their work within a minute or so of each other.
Alternately, coding could be made simpler if each unit of work were the same size. Even then, no machine would be idle longer than the amount of time for the slowest machine to complete one unit of work. Size your work units so that the fastest machine doesn't enter a case of resource starvation (shouldn't complete work faster than five seconds, should always have a second unit queued). Using that method, hopefully your fastest machine and slowest machine aren't different by a factor of more than 100x.

It would seem to me that it would be quite easy to write your own service that would do just this.
Super Easy Setup
Let's say you have some GPU enabled program X that takes a hash h as input and a list of dictionary words D, then uses the dictionary words to try and crack the password. With one machine, you simply run X(h,D).
If you have N machines, you split the dictionary into N parts (D_1, D_2, D_3,...,D_N). Then run P(x,D_i) on machine i.
This could easily be done using SSH. The master machine splits the dictionary up, copies it to each of the slave machines using SCP, then connects to the slaves and tells them to run the program.
Slightly Smarter Setup
When one machine cracks the password, they could easily notify the master that they have completed the task. The master then kills the programs running on the other slaves.

Related

Replication Factor to use for system_auth

When using internal security with Cassandra, what replication factor do you use for system_auth?
The older docs seem to suggest it should be N, where N is the number of nodes, while the newer ones suggest we should set it to a number greater than 1. I can understand why it makes sense for it to be higher - if a partition occurs and one section doesn't have a replica, nobody can log in.
However, does it need to be all nodes? What are the downsides of setting it to all ndoes?
Let me answer this question by posing another:
If (due to some unforseen event) all of your nodes went down, except for one; would you still want to be able to log into (and use) that node?
This is why I actually do ensure that my system_auth keyspace replicates to all of my nodes. You can't predict node failure, and in the interests of keeping your application running, it's better safe than sorry.
I don't see any glaring downsides in doing so. The system_auth keyspace isn't very big (mine is 20kb) so it doesn't take up a lot of space. The only possible scenario, would be if one of the nodes is down, and a write operation is made to a column family in system_auth (in which case, I think the write gets rejected...depending on your write consistency). Either way system_auth isn't a very write-heavy keyspace. So you'll be ok as long as you don't plan on performing user maintenance during a node failure.
Setting the replication factor of system_auth to the number of nodes in the cluster should be ok. At the very least, I would say you should make sure it replicates to all of your data centers.
In case you were still wondering about this part of your question:
The older docs seem to suggest it should be N where n is the number of nodes, while the newer ones suggest we should set it to a number greater than 1."
I stumbled across this today in the 2.1 documentation Configuring Authentication:
Increase the replication factor for the system_auth keyspace to N
(number of nodes).
Just making sure that recommendation was clear.
Addendum 20181108
So I originally answered this back when the largest cluster I had ever managed had 10 nodes. Four years later, after spending three of those managing large (100+) node clusters for a major retailer, my opinions on this have changed somewhat. I can say that I no longer agree with this statement of mine from four years ago.
This is why I actually do ensure that my system_auth keyspace replicates to all of my nodes.
A few times on mind-to-large (20-50 nodes) clusters , we have deployed system_auth with RFs as high as 8. It works as long as you're not moving nodes around, and assumes that the default cassandra/cassandra user is no longer in-play.
The drawbacks were seen on clusters which have a tendency to fluctuate in size. Of course, clusters which change in size usually do so because of high throughput requirements across multiple providers, further complicating things. We noticed that occasionally, the application teams would report auth failures on such clusters. We were able to quickly rectify these situation, by running a SELECT COUNT(*) on all system_auth tables, thus forcing a read repair. But the issue would tend to resurface the next time we added/removed several nodes.
Due to issues that can happen with larger clusters which fluctuate in size, we now treat system_auth like we do any other keyspace. That is, we set system_auth's RF to 3 in each DC.
That seems to work really well. It gets you around the problems that come with having too many replicas to manage in a high-throughput, dynamic environment. After all, if RF=3 is good enough for your application's data, it's probably good enough for system_auth.
The reason the recommendation changed was that a quorum query would require responses from over half of your nodes to fullfill. So if you accidentally left Cassandra user active and you have 80 nodes - we need 41 responses.
Whilst it's good practice to avoid using the super user like that - you'd be surprised how often it's still out there.

Reliably monitor a serial port (Nortel CS1000)

I have a custom python script that monitors the call logs from a Nortel phone system. This phone system is under extremely high volume throughout the day and it's starting to appear that some records may be getting lost.
Some of you may dislike this, but I'm not interested in sharing the source code or current method in any way. I would rather consider this from a "new project" approach.
I'm looking for insight into the easiest and safest way to reliably monitor heavy data output through a serial port on Linux. I'm not limiting this to any particular set of tools or languages, I want to find out what works best to do this one critical job. I'm comfortable enough parsing the data and inserting it into mysql that we could just assume the data could be dropped to a text file.
Thank you
Well, the way that I would approach this this to have 2 threads (or processes) working.
Thread 1: The read thread
This thread does nothing but read data from the raw serial port and put the data into a local buffer/queue (In memory is preferred for speed). It should do nothing else. Depending on the clock speed of the serial connection, this should be pretty easy to do.
Thread2: The processing thread
This thread just sleeps until there is data in the local buffer to process, then reads and processes it. That's it.
The reason for splitting it apart in two, is so that if one is busy (a block in MySQL for the processing thread) it won't affect the other. After all, while the serial port is buffered by the OS, the buffer size is limited.
But then again, any local program is likely going to be way faster than the serial port can send data. Serial transfer is actually quite slow relative to the clock speed of the processor (115.2kbps is about the limit on standard hardware). So unless you're CPU speed bound (such as on an Arduino), I can't see normal conditions affecting it too much. So your choice of language really shouldn't be of too much concern (assuming modern hardware). Stick to what you know.

Does there exist an open-source distributed logging library?

I'm talking about a library that would allow me to log events from different machines and would align these events on a "global" time axis with sufficiently high precision.
Actually, I'm asking because I've written such a thing myself in the course of a cluster computing project, I found it terrifically useful, and I was surprised that I couldn't find any analogues.
Therefore, the point is whether something like this exists (and I better contribute to it) or nothing exists (and I better write an open-source analogue of my solution).
Here are the features that I'd expect from such a library:
Independence on the clock offset between different machines
Timing precision on the order of at least milliseconds, preferably microseconds
Scalability to thousands of concurrent logging processes, with at least several megabytes of aggregated logs per second
Soft real-time operation (t.i. I don't want to collect 200 big logs from 200 machines and then compute clock offsets and merge them - I want to see what happens "live", perhaps with a small lag like 10s)
Facebook's contribution in the matter is called 'Scribe'.
Excerpt:
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.
...
Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.
The API is Thrift-based, so you have a good platform coverage, but in case you're looking for simple integration for Java you may want to have a look at Digg's log4j appender for Scribe.
You could use log4j/log4net targeting a central syslog daemon. log4j has a builtin SyslogAppender, and in log4net you can do it as shown here. log4cpp docs here.
There are Windows implementations of Syslog around if you don't have a Unix system to hand for this.
Use Chukwa, Its Open source and Large scale Log Monitoring System

How can I model this usage scenario?

I want to create a fairly simple mathematical model that describes usage patterns and performance trade-offs in a system.
The system behaves as follows:
clients periodically issue multi-cast packets to a network of hosts
any host that receives the packet, responds with a unicast answer directly
the initiating host caches the responses for some given time period, then discards them
if the cache is full the next time a request is required, data is pulled from the cache not the network
packets are of a fixed size and always contain the same information
hosts are symmetic - any host can issue a request and respond to requests
I want to produce some simple mathematical models (and graphs) that describe the trade-offs available given some changes to the above system:
What happens where you vary the amount of time a host caches responses? How much data does this save? How many calls to the network do you avoid? (clearly depends on activity)
Suppose responses are also multi-cast, and any host that overhears another client's request can cache all the responses it hears - thereby saving itself potentially making a network request - how would this affect the overall state of the system?
Now, this one gets a bit more complicated - each request-response cycle alters the state of one other host in the network, so the more activity the quicker caches become invalid. How do I model the trade off between the number of hosts, the rate of activity, the "dirtyness" of the caches (assuming hosts listen in to other's responses) and how this changes with cache validity period? Not sure where to begin.
I don't really know what sort of mathematical model I need, or how I construct it. Clearly it's easier to just vary two parameters, but particularly with the last one, I've got maybe four variables changing that I want to explore.
Help and advice appreciated.
Investigate tokenised Petri nets. These seem to be an appropriate tool as they:
provide a graphical representation of the models
provide substantial mathematical analysis
have a large body of prior work and underlying analysis
are (relatively) simple mathematical models
seem to be directly tied to your problem in that they deal with constraint dependent networks that pass tokens only under specified conditions
I found a number of references (quality not assessed) by a search on "token Petri net"

What happens during Stand-By and Hibernation?

It just hit me the other day. What actually happens when I tell the computer to go into Stand-By or to Hibernate?
More spesifically, what implications, if any, does it have on code that is running? For example if an application is compressing some files, encoding video files, checking email, running a database query, generating reports or just processing lots of data or doing complicated math stuff. What happens? Can you end up with a bug in your video? Can the database query fail? Can data processing end up containing errors?
I'm asking this both out of general curiosity, but also because I started to wonder if this is something I should think about when I program myself.
You should remember that the OS (scheduler) freezes your program about a gazillion times each second. This means that your program can already function pretty well when the operating system freezes it. There isn't much difference, from your point of view, between stand-by, hibernate and context switching.
What is different is that you'll be frozen for a long time. And this is the only thing you need to think about. In most cases, this shouldn't be a problem.
If you have a network connection you'll probably need to re-establish it, and similar issues. But this just means checking for errors in all IO operations, which I'm sure you're already doing... :-)
My initial thought is that as long as your program and its eco-system is contained within the pc that is going on stand - by or hibernation, then, upon resume your program should not be affected.
However, if you are say updating a record in some database hosted on a separate machine then hibernation / stand - by will be treated as a timeout.
If your program is dependent on such a change in "power status" you can listen to WM_POWERBROADCAST Message as mentioned on msdn
Stand-By keeps your "state" alive by keeping it in RAM. As a consequence if you lose power you'll lose your stored "state".
But it makes it quicker to achieve.
Hibernation stores your "state" in virtual RAM on the hard disk, so if you lose power you can still come back three days later. But it's slower.
I guess a limitation with Stand-By is how much RAM you've got, but I'm sure virtual RAM must be employed by Stand-By when it runs out of standard RAM. I'll look that up though and get back!
The Wikipedia article on ACPI contains the details about the different power savings modes which are present in modern PCs.
Here's the basic idea, from how I understand things:
The basic idea is to keep the current state of the system persisted, so when the machine is brought back into operation, it can resume at the state it was before the machine was put into sleep/standby/hibernation, etc. Think of it as serialization for your PC.
In standby, the computer will keep feeding power to the RAM, as the main memory is volatile memory that needs constant refreshing to hold on to its state. This means that the hard drives, CPU, and other components can be turned off, as long as there is enough power to keep the DRAM refreshed to keep its contents from disappearing.
In hibernation, the main memory will also be turned off, so the contents must be copied to permanent storage, such as a hard drive, before the system power is turned off. Other than that, the basic premise of hiberation is no different from standby -- to store the current state of the machine to restore at a later time.
With that in mind, it's probably not too likely that going into standby or hibernate will cause problems with tasks that are executing at the moment. However, it may not be a good idea to allow network activity to stop in the middle of execution, as depending on the protocol, your network connection could timeout and be unable to resume upon returning the system to its running state.
Also, there may be some machines that just have flaky power-savings drivers which may cause it to go to standby and never come back, but that's completely a different issue.
There are some implications for your code. Hibernation is more than just a context switch from the scheduler. Network connections will be closed, network drives or removable media might be disconnected during the hibernation, ...
I dont think your application can be notified of hibernation (but I might be wrong). What you should do is handle error scenarios (loss of network connectivity for example) as gracefully as possible. And note that those error scenario can occur during normal operation as well, not only when going into hibernation ...