Choosing a TSDB for one-off smart-home installation - mysql

I'm building a one-off smart-home data collection box. It's expected to run on a raspberry-pi-class machine (~1G RAM), handling about 200K data points per day (each a 64-bit int). We've been working with vanilla MySQL, but performance is starting to crumble, especially for queries on the number of entries in a given time interval.
As I understand it, this is basically exactly what time-series databases are designed for. If anything, the unusual thing about my situation is that the volume is relatively low, and so is the amount of RAM available.
A quick look at Wikipedia suggests OpenTSDB, InfluxDB, and possibly BlueFlood. OpenTSDB suggests 4G of RAM, though that may be for high-volume settings. InfluxDB actually mentions sensor readings, but I can't find a lot of information on what kind of resources are required.
Okay, so here's my actual question: are there obvious red flags that would make any of these systems inappropriate for the project I describe?
I realize that this is an invitation to flame, so I'm counting on folks to keep it on the bright and helpful side. Many thanks in advance!

InfluxDB should be fine with 1 GB RAM at that volume. Embedded sensors and low-power devices like Raspberry Pi's are definitely a core use case, although we haven't done much testing with the latest betas beyond compiling on ARM.
InfluxDB 0.9.0 was just released, and 0.9.x should be available in our Hosted environment in a few weeks. The low end instances have 1 GB RAM and 1 CPU equivalent, so they are a reasonable proxy for your Pi performance, and the free trial lasts two weeks.
If you have more specific questions, please reach out to us at influxdb#googlegroups.com or support#influxdb.com and we'll see hwo we can help.

Try VictoriaMetrics. It should run on systems with low RAM such as Raspberry Pi. See these instructions on how to build it for ARM.
VictoriaMetrics has the following additional benefits for small systems:
It is easy to configure and maintain since it has zero external dependencies and all the configuration is done via a few command-line flags.
It is optimized for low CPU usage and low persistent storage IO usage.
It compresses data well, so it uses small amounts of persistent storage space comparing to other solutions.

Did you try with OpenTSDB. We are using OpenTSDB for almost 150 houses to collect smart meter data where data is collected every 10 minutes. i.e is a lot of data points in one day. But we haven't tested it in Raspberry pi. For Raspberry pi OpenTSDB might be quite heavy since it needs to run webserver, HBase and Java.
Just for suggestions. You can use Raspberry pi as collecting hub for smart home and send the data from Raspberry pi to server and store all the points in the server. Later in the server you can do whatever you want like aggregation, or performing statistical analysis etc. And then you can send results back to the smart hub.

ATSD supports ARM architecture and can be installed on a Raspberry Pi 2 to store sensor data. Currently, Ubuntu or Debian OS is required. Be sure that the device has at least 1 GB of RAM and an SD card with high write speed (60mb/s or more). The size of the SD card depends on how much data you want to store and for how long, we recommend at least 16GB, you should plan ahead. Backup batter power is also recommended, to protect against crashes and ungraceful shutdowns.
Here you can find an in-depth guide on setting up a temperature/humidity sensor paired with an Arduino device. Using the guide you will be able to stream the sensor data into ATSD using MQTT or TCP protocol. Open-source sketches are included.

Related

Compute Engine - Automatic scale

I have one VM Compute Engine to host simple apps. My apps is growing and the number of users too.
Now my users work basicaly from 08:00 AM to 07:00 PM, in this period the usage os CPU and Memory is High and the speed of work is very important.
I'm preparing to expand the memory and processor in the next days, but i search a more scalable and cost efective way.
Is there a way for automatic add resources when i need and reduce after no more need?
Thanks
The cost of running your VMs is directly related to a number of different factors i.e. the type of network in use (premium vs standard), the machine type, the boot disk image you use (premium vs open-source images) and the region/zone where your workloads are running, among other things.
Your use case seems to fit managed instance groups (MIGs). With MIGs you essentially configure a template for VMs that share the same attributes. During the configuration of your MIG, you will be able to specify the CPU/memory limit beyond which the MIG autoscaler will kick off. When your CPU/memory reading goes below that threshold, MIG scales your VMs down to the number of instances specified in your template.
You can also use requests per second as a threshold for autoscaling and I would recommend you explore the docs to know more about it.
See docs

looking for simple cluster configuration

I am using compute engine for embarrassingly parallel scientific calculations. Some of my calculations require a single core and some require 64-cores machines. I am currently using my own scripts: I have a qsub-like command that creates a new instance with the required number of cores, booting it from a custom image with the pre-installed software, connects to a storage bucket via gcsfuse, runs the required command and then kills the instance after it's done.
Do I really need to do all of that with my own scripts, or is there any tool that I should use instead? I'd much rather use some ready made tool for all of the management.
My usage fluctuates widely (hundreds of cores in parallel for 3 hours, then 2 days with nothing, etc). So I don't want constant sized machines: I like to be billed by the minute for my computations.
You may want to use auto-scaling feature for managed instance group in Google Compute Engine(GCE). This feature adds more instances to your instance group when there is more load (upscaling), and removes instances when there is less load (downscaling). Moreover, you can define autoscaling policy based upon CPU utilization, or Load balancer utilization or request per seconds. Please refer autoscaler decisions document to understand decisions that autoscaler might make when scaling instance groups.

OBDII Based Lock/Unlock and Engine Start/Stop

I want to know what it takes to build a device that can Lock/Unlock door and Engine Start/Stop for vehicles using OBDII. Is it possible? The idea is to make them app connected using Bluetooth Low Energy/ or 3G that connects with he car.
If not possible via ODBII, then what is the best way to do it?
I did some search to see if there is an device that can do such a thing of the shelf which can be controlled using APIs/SDKs but all were propriety and not open for integration. Any suggestions?
Keep in mind that the OBD II only supports the emission related data of a vehicle. But since the CAN BUS is a serial network so you might get all the data using OBD II port and not the OBD!
Lock/Unlock system is usually part of LIN system and not OBD which is a cheaper version of OBD (I think!).
CAN BUS has no encryption therefore it is possible to read all the CAN BUS traffic but the meaning of each ECU data packet is not known and manufacturers keeps them under heavy control due to the security. So if you want to have a start/stop you probably should hack the system or anyhow find the translation of each ECU data packet.
At the End if you are not a hacker with some network and vehicle knowledge, the probability is not too much for you to write an App that you expected!

Does there exist an open-source distributed logging library?

I'm talking about a library that would allow me to log events from different machines and would align these events on a "global" time axis with sufficiently high precision.
Actually, I'm asking because I've written such a thing myself in the course of a cluster computing project, I found it terrifically useful, and I was surprised that I couldn't find any analogues.
Therefore, the point is whether something like this exists (and I better contribute to it) or nothing exists (and I better write an open-source analogue of my solution).
Here are the features that I'd expect from such a library:
Independence on the clock offset between different machines
Timing precision on the order of at least milliseconds, preferably microseconds
Scalability to thousands of concurrent logging processes, with at least several megabytes of aggregated logs per second
Soft real-time operation (t.i. I don't want to collect 200 big logs from 200 machines and then compute clock offsets and merge them - I want to see what happens "live", perhaps with a small lag like 10s)
Facebook's contribution in the matter is called 'Scribe'.
Excerpt:
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.
...
Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.
The API is Thrift-based, so you have a good platform coverage, but in case you're looking for simple integration for Java you may want to have a look at Digg's log4j appender for Scribe.
You could use log4j/log4net targeting a central syslog daemon. log4j has a builtin SyslogAppender, and in log4net you can do it as shown here. log4cpp docs here.
There are Windows implementations of Syslog around if you don't have a Unix system to hand for this.
Use Chukwa, Its Open source and Large scale Log Monitoring System

How to measure current load of MySQL server?

How to measure current load of MySQL server? I know I can measure different things like CPU usage, RAM usage, disk IO etc but is there a generic load measure for example the server is at 40% load etc?
mysql> SHOW GLOBAL STATUS;
Found here.
The notion of "40% load" is not really well-defined. Your particular application may react differently to constraints on different resources. Applications will typically be bound by one of three factors: available (physical) memory, available CPU time, and disk IO.
On Linux (or possibly other *NIX) systems, you can get a snapshot of these with vmstat, or iostat (which provides more detail on disk IO).
However, to connect these to "40% load", you need to understand your database's performance characteristics under typical load. The best way to do this is to test with typical queries under varying amounts of load, until you observe response times increasing dramatically (this will mean you've hit a bottleneck in memory, CPU, or disk). This load should be considered your critical level, which you do not want to exceed.
is there a generic load measure for example the server is at 40% load ?
Yes! there is:
SELECT LOAD_FILE("/proc/loadavg")
Works on a linux machine. It displays the system load averages for the past 1, 5, and 15 minutes.
System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU.
A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of
CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.
So if you want to normalize you need to count de number of cpu's also.
you can do that too with
SELECT LOAD_FILE("/proc/cpuinfo")
see also 'man proc'
with top or htop you can follow the usage in Linux realtime
On linux based systems the standard check is usually uptime, a load index is returned according to metrics described here.
aside from all the good answers on this page (SHOW GLOBAL STATUS, VMSTAT, TOP...) there is also a very simple to use tool written by Jeremy Zawodny, it is perfect for non-admin users. It is called "mytop". more info # http://jeremy.zawodny.com/mysql/mytop/
Hi friend as per my research we have some command like
MYTOP: open source program written using PERL language
MTOP: also an open source program written on PERL, It works same as MYTOP but it monitors the queries which are taking longer time and kills them after specific time.
Link for details of above command