Does there exist an open-source distributed logging library? - language-agnostic

I'm talking about a library that would allow me to log events from different machines and would align these events on a "global" time axis with sufficiently high precision.
Actually, I'm asking because I've written such a thing myself in the course of a cluster computing project, I found it terrifically useful, and I was surprised that I couldn't find any analogues.
Therefore, the point is whether something like this exists (and I better contribute to it) or nothing exists (and I better write an open-source analogue of my solution).
Here are the features that I'd expect from such a library:
Independence on the clock offset between different machines
Timing precision on the order of at least milliseconds, preferably microseconds
Scalability to thousands of concurrent logging processes, with at least several megabytes of aggregated logs per second
Soft real-time operation (t.i. I don't want to collect 200 big logs from 200 machines and then compute clock offsets and merge them - I want to see what happens "live", perhaps with a small lag like 10s)

Facebook's contribution in the matter is called 'Scribe'.
Excerpt:
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.
...
Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.
The API is Thrift-based, so you have a good platform coverage, but in case you're looking for simple integration for Java you may want to have a look at Digg's log4j appender for Scribe.

You could use log4j/log4net targeting a central syslog daemon. log4j has a builtin SyslogAppender, and in log4net you can do it as shown here. log4cpp docs here.
There are Windows implementations of Syslog around if you don't have a Unix system to hand for this.

Use Chukwa, Its Open source and Large scale Log Monitoring System

Related

looking for simple cluster configuration

I am using compute engine for embarrassingly parallel scientific calculations. Some of my calculations require a single core and some require 64-cores machines. I am currently using my own scripts: I have a qsub-like command that creates a new instance with the required number of cores, booting it from a custom image with the pre-installed software, connects to a storage bucket via gcsfuse, runs the required command and then kills the instance after it's done.
Do I really need to do all of that with my own scripts, or is there any tool that I should use instead? I'd much rather use some ready made tool for all of the management.
My usage fluctuates widely (hundreds of cores in parallel for 3 hours, then 2 days with nothing, etc). So I don't want constant sized machines: I like to be billed by the minute for my computations.
You may want to use auto-scaling feature for managed instance group in Google Compute Engine(GCE). This feature adds more instances to your instance group when there is more load (upscaling), and removes instances when there is less load (downscaling). Moreover, you can define autoscaling policy based upon CPU utilization, or Load balancer utilization or request per seconds. Please refer autoscaler decisions document to understand decisions that autoscaler might make when scaling instance groups.

How to understand a role of a queue in a distributed system?

I am trying to understand what is the use case of a queue in distributed system.
And also how it scales and how it makes sure it's not a single point of failure in the system?
Any direct answer or a reference to a document is appreciated.
Use case:
I understand that queue is a messaging system. And it decouples the systems that communicate between each other. But, is that the only point of using a queue?
Scalability:
How does the queue scale for high volumes of data? Both read and write.
Reliability:
How does the queue not becoming a single point of failure in the system? Does the queue do a replication, similar to data-storage?
My question is not specified to any particular queue server like Kafka or JMS. Just in general.
Queue is a mental concept, the implementation decides about 1 + 2 + 3
A1: No, it is not the only role -- a messaging seems to be main one, but a distributed-system signalling is another one, by no means any less important. Hoare's seminal CSP-paper is a flagship in this field. Recent decades gave many more options and "smart-behaviours" to work with in designing a distributed-system signalling / messaging services' infrastructures.
A2: Scaling envelopes depend a lot on implementation. It seems obvious that a broker-less queues can work way faster, that a centralised, broker-based, infrastructure. Transport-classes and transport-links account for additional latency + performance degradation as the data-flow volumes grow. BLOB-handling is another level of a performance cliff, as the inefficiencies are accumulating down the distributed processing chain. Zero-copy ( almost ) zero-latency smart-Queue implementations are still victims of the operating systems and similar resources limitations.
A3: Oh sure it is, if left on its own, the SPOF. However, Theoretical Cybernetics makes us safe, as we can create reliable systems, while still using error-prone components. ( M + N )-failure-resilient schemes are thus achievable, however the budget + creativity + design-discipline is the ceiling any such Project has to survive with.
my take:
I would be careful with "decouple" term - if service A calls api on service B, there is coupling since there is a contract between services; this is true even if the communication is happening over a queue, file or fax. The key with queues is that the communication between services is asynchronous. Which means their runtimes are decoupled - from practical point of view, either of systems may go down without affecting the other.
Queues can scale for large volumes of data by partitioning. From clients point of view, there is one queue, but in reality there are many queues/shards and number of shards helps to support more data. Of course sharding a queue is not "free" - you will lose global ordering of events, which may need to be addressed in you application.
A good queue based solution is reliable based on replication/consensus/etc - depends on set of desired properties. Queues are not very different from databases in this regard.
To give you more direction to dig into:
there an interesting feature of queues: deliver-exactly-once, deliver-at-most-once, etc
may I recommend Enterprise Architecture Patterns - https://www.enterpriseintegrationpatterns.com/patterns/messaging/Messaging.html this is a good "system design" level of information
queues may participate in distributed transactions, e.g. you could build something like delete a record from database and write it into queue, and that will be either done/committed or rolledback - another interesting topic to explore

Choosing a TSDB for one-off smart-home installation

I'm building a one-off smart-home data collection box. It's expected to run on a raspberry-pi-class machine (~1G RAM), handling about 200K data points per day (each a 64-bit int). We've been working with vanilla MySQL, but performance is starting to crumble, especially for queries on the number of entries in a given time interval.
As I understand it, this is basically exactly what time-series databases are designed for. If anything, the unusual thing about my situation is that the volume is relatively low, and so is the amount of RAM available.
A quick look at Wikipedia suggests OpenTSDB, InfluxDB, and possibly BlueFlood. OpenTSDB suggests 4G of RAM, though that may be for high-volume settings. InfluxDB actually mentions sensor readings, but I can't find a lot of information on what kind of resources are required.
Okay, so here's my actual question: are there obvious red flags that would make any of these systems inappropriate for the project I describe?
I realize that this is an invitation to flame, so I'm counting on folks to keep it on the bright and helpful side. Many thanks in advance!
InfluxDB should be fine with 1 GB RAM at that volume. Embedded sensors and low-power devices like Raspberry Pi's are definitely a core use case, although we haven't done much testing with the latest betas beyond compiling on ARM.
InfluxDB 0.9.0 was just released, and 0.9.x should be available in our Hosted environment in a few weeks. The low end instances have 1 GB RAM and 1 CPU equivalent, so they are a reasonable proxy for your Pi performance, and the free trial lasts two weeks.
If you have more specific questions, please reach out to us at influxdb#googlegroups.com or support#influxdb.com and we'll see hwo we can help.
Try VictoriaMetrics. It should run on systems with low RAM such as Raspberry Pi. See these instructions on how to build it for ARM.
VictoriaMetrics has the following additional benefits for small systems:
It is easy to configure and maintain since it has zero external dependencies and all the configuration is done via a few command-line flags.
It is optimized for low CPU usage and low persistent storage IO usage.
It compresses data well, so it uses small amounts of persistent storage space comparing to other solutions.
Did you try with OpenTSDB. We are using OpenTSDB for almost 150 houses to collect smart meter data where data is collected every 10 minutes. i.e is a lot of data points in one day. But we haven't tested it in Raspberry pi. For Raspberry pi OpenTSDB might be quite heavy since it needs to run webserver, HBase and Java.
Just for suggestions. You can use Raspberry pi as collecting hub for smart home and send the data from Raspberry pi to server and store all the points in the server. Later in the server you can do whatever you want like aggregation, or performing statistical analysis etc. And then you can send results back to the smart hub.
ATSD supports ARM architecture and can be installed on a Raspberry Pi 2 to store sensor data. Currently, Ubuntu or Debian OS is required. Be sure that the device has at least 1 GB of RAM and an SD card with high write speed (60mb/s or more). The size of the SD card depends on how much data you want to store and for how long, we recommend at least 16GB, you should plan ahead. Backup batter power is also recommended, to protect against crashes and ungraceful shutdowns.
Here you can find an in-depth guide on setting up a temperature/humidity sensor paired with an Arduino device. Using the guide you will be able to stream the sensor data into ATSD using MQTT or TCP protocol. Open-source sketches are included.

Architecture advice for EventMachine and MySQL

We are writing a real-time game in EventMachine/Ruby. We're using ActiveRecord with MySQL for storing the game objects.
When we start the server we plan to load all the game objects into memory. This will allow for us to avoid any blocking/slow SQL queries with ActiveRecord.
However, we still need to persist the data in the database in case the server crashes, of course.
What are our options for doing so? I could use EM.Defer but I have no idea how many concurrent players that could handle since the thread pool is limited to 20.
Currently I'm thinking using Resque with Redis would be the best bet. Do everything with the objects in memory, and whenever there is a save that needs to occur for the database, fire off a job and add it to the Resque queue.
Any advice?
Threadpool size can be tweaked - see EventMachine.threadpool_size
Each server process (apache...) will spawn its own EventMachine reactor and its own EM.defer threadpool, so if you use a forking server (a mongrel farm, passenger, ...) you don't need to go crazy on the threadpool size
See EM-Synchrony by Ilja Grigorik (https://github.com/igrigorik/em-synchrony) - you should be able to simplify your code with it
Afaik, mysql has a non-blocking driver that you can use freely with EM, EM::Synchrony supports it http://www.igvita.com/2010/04/15/non-blocking-activerecord-rails/ - this would mean you don't need EM.defer at all!
Take a look at Thin - https://github.com/macournoyer/thin/ - it's non-blocking EM-based webserver that supports Rails
Having said all this, writing evented code is a bitch - forget about stack traces and make sure you're running benchmark tests often as anything blocking your reactor will block the entire application.
Also, this all applies to MRI Ruby ONLY. If you mean to use jruby... You're bound to get into trouble as thread-safety of eventmachine seems to be largely due to GIL of MRI Ruby and standard patterns don't work (many aspects of it can be made to work with this fork https://github.com/WebtehHR/eventmachine/tree/v1.0.3_w_fix which fixes some issues EM has with JRuby)
Unfortunately, guys from https://github.com/eventmachine/eventmachine are kind of not very active, the project currently has 200+ issues and almost 60 open pull requests which is why I've had to use a separate fork to continue playing with my current project - this still means EM is an awesome project just don't expect problems you encounter to be quickly fixed so do your best to not go out of the trodden path of EM use.
Another problem with JRuby is that EM::Synchrony imposes a heavy performance penalty because JRuby doesn't have implemented fibers as of 1.7.8 but rather maps them to Java native threads which are MUCH slower
Also, have you considered messaging with something like RabbitMQ (it has a synchronous https://github.com/ruby-amqp/bunny, and evented driver https://github.com/ruby-amqp/amqp) as a possibility to communicate game objects between clients and perhaps reduce overhead on the database / distributed memory store that you had in mind?
Redis/Resque seem good, but if all the jobs will need to do is simple persistance, and if there will be A LOT of such calls, you might want to consider beanstalkd - it has A LOT faster but simpler queue then Resque and you can probably make this even faster if you don't really need activerecord to dump attribute hashes into the database, see delayed_jobs vs resque vs beanstalkd?
A couple years and a failed project later, some thoughts:
avoid eventmachine if any way possible, there's a plethora of opportunities to peg your CPU nowadays with YARV/MRI Ruby on a IO constrained application and without wasting memory.
My favorite approach for a web application at this time is use Puma with multiple processes and threads.
Have in mind that GIL in YARV only affects the Ruby interpreter code, not the IO operations, meaning that on a IO constrained application you can add threads and see better utilization of a single core,
add more processes and you see better utilization of many cores :) On Heroku 1x worker we run 2 processes with 4 threads each and this pegs our CPU potential to the top in benchmark meaning the application is no longer IO bound, but CPU bound and doing so without unacceptable memory losses.
When we needed super-fast responses we were troubled by the DB write operation times which did not affect the response to client, so we did asynchronous database writes using sidekiq / resque,
In hindsight you could even do celluloid or concurrent-ruby for asynchronous IO reads/writes (think DB writes, cache visits etc), it's less overhead and infrastructure but harder to debug and problem solve in production - my worst nightmare being an async operation failing silently with no error trace in our Errors console (an exception in exception handling for example)
End result is that your application experiences the same sort of benefits you used to get from using eventmachine (elimination of the IO bound, full utilization of CPU without huge memory footprint, parallel non-blocking IO) without resorting to writing reactor code which is a complete bitch to do as explained in my 2013 post

Performing a distributed CUDA/OpenCL based password cracking

Is there a way to perform a distributed (as in a cluster of a connected computers) CUDA/openCL based dictionary attack?
For example, if I have a one computer with some NVIDIA card that is sharing the load of the dictionary attack with another coupled computer and thus utilizing a second array of GPUs there?
The idea is to ensure a scalability option for future expanding without the need of replacing the whole set of hardware that we are using. (and let's say cloud is not an option)
This is a simple master / slave work delegation problem. The master work server hands out to any connecting slave process a unit of work. Slaves work on one unit and queue one unit. When they complete a unit, they report back to the server. Work units that are exhaustively checked are used to estimate operations per second. Depending on your setup, I would adjust work units to be somewhere in the 15-60 second range. Anything that doesn't get a response by the 10 minute mark is recycled back into the queue.
For queuing, offer the current list of uncracked hashes, the dictionary range to be checked, and the permutation rules to be applied. The master server should be able to adapt queues per machine and per permutation rule set so that all machines are done their work within a minute or so of each other.
Alternately, coding could be made simpler if each unit of work were the same size. Even then, no machine would be idle longer than the amount of time for the slowest machine to complete one unit of work. Size your work units so that the fastest machine doesn't enter a case of resource starvation (shouldn't complete work faster than five seconds, should always have a second unit queued). Using that method, hopefully your fastest machine and slowest machine aren't different by a factor of more than 100x.
It would seem to me that it would be quite easy to write your own service that would do just this.
Super Easy Setup
Let's say you have some GPU enabled program X that takes a hash h as input and a list of dictionary words D, then uses the dictionary words to try and crack the password. With one machine, you simply run X(h,D).
If you have N machines, you split the dictionary into N parts (D_1, D_2, D_3,...,D_N). Then run P(x,D_i) on machine i.
This could easily be done using SSH. The master machine splits the dictionary up, copies it to each of the slave machines using SCP, then connects to the slaves and tells them to run the program.
Slightly Smarter Setup
When one machine cracks the password, they could easily notify the master that they have completed the task. The master then kills the programs running on the other slaves.