Google Colaboratory disconnects after 10-15 minutes - deep-learning

I am trying to train my Deep Learning model on Google colab where they offer a free K80 GPU. I learned that it can be used for 12 hours at a time and then you have to reconnect to it. But my connection is lost after 10-15 minutes and I cannot reconnect (it stays stuck on Initializing) . What's the issue here ?

This proved to be a network issue in my university. My university has a login portal to access the internet. Bypassing it solved the problem.

I have been able to running a vision training model and it disconnects and stops sometime overnight. It runs hours and may be 12 hours. I also trained the model using the CPU and got the same results although without as many epochs completed. I have searched to see what the time limit is for the CPU without success. The training program uses tensorflow.saver to use checkpoints during training that allow for restarting training from a checkpoint when it is disrupted.

Related

Google Collab keeps on crashing because it runs out of memory

I'm trying to build a Resnet200D transfer learning model for a kaggle competition but I'm unable to train the model on Google Collab since it runs out of memory even with a batchSize of 1 on CPU as well as GPU. I'm not sure where the memory is getting used up since other participants claim they've been able to train the model with a batch size of 16 as well. If anyone could look at the notebook and leave suggestions, that would be really helpful.
Google Collab Notebook

Why does the CPU load dropped in the last days?

Anybody has a hint? I didn't change anything in the machine (except for the security updates), and the sites hosted there didn't suffer a significant change in connections.
May be Google changed something in their infrastructure? Coincidentally, it was an issue with the Cloud DNS ManagedZone these days: they charged me with $ 920 for half month usage, and it was an error (they counted thousands of weeks of usage too) so they recently changed back to $ 0,28. May be there was some process that indeed used the Cloud DNS by error and thus consumed CPU power, and they corrected now?
I wish to know what is happening from someone that knows what going on in GC. Thank you.
CPU utilization reporting is now more accurate from a VM guest perspective as it doesn't include virtualization layer overhead anymore. It has nothing to do with Cloud DNS.
See this issue for some extra context:
https://code.google.com/p/google-compute-engine/issues/detail?id=281

Choosing a TSDB for one-off smart-home installation

I'm building a one-off smart-home data collection box. It's expected to run on a raspberry-pi-class machine (~1G RAM), handling about 200K data points per day (each a 64-bit int). We've been working with vanilla MySQL, but performance is starting to crumble, especially for queries on the number of entries in a given time interval.
As I understand it, this is basically exactly what time-series databases are designed for. If anything, the unusual thing about my situation is that the volume is relatively low, and so is the amount of RAM available.
A quick look at Wikipedia suggests OpenTSDB, InfluxDB, and possibly BlueFlood. OpenTSDB suggests 4G of RAM, though that may be for high-volume settings. InfluxDB actually mentions sensor readings, but I can't find a lot of information on what kind of resources are required.
Okay, so here's my actual question: are there obvious red flags that would make any of these systems inappropriate for the project I describe?
I realize that this is an invitation to flame, so I'm counting on folks to keep it on the bright and helpful side. Many thanks in advance!
InfluxDB should be fine with 1 GB RAM at that volume. Embedded sensors and low-power devices like Raspberry Pi's are definitely a core use case, although we haven't done much testing with the latest betas beyond compiling on ARM.
InfluxDB 0.9.0 was just released, and 0.9.x should be available in our Hosted environment in a few weeks. The low end instances have 1 GB RAM and 1 CPU equivalent, so they are a reasonable proxy for your Pi performance, and the free trial lasts two weeks.
If you have more specific questions, please reach out to us at influxdb#googlegroups.com or support#influxdb.com and we'll see hwo we can help.
Try VictoriaMetrics. It should run on systems with low RAM such as Raspberry Pi. See these instructions on how to build it for ARM.
VictoriaMetrics has the following additional benefits for small systems:
It is easy to configure and maintain since it has zero external dependencies and all the configuration is done via a few command-line flags.
It is optimized for low CPU usage and low persistent storage IO usage.
It compresses data well, so it uses small amounts of persistent storage space comparing to other solutions.
Did you try with OpenTSDB. We are using OpenTSDB for almost 150 houses to collect smart meter data where data is collected every 10 minutes. i.e is a lot of data points in one day. But we haven't tested it in Raspberry pi. For Raspberry pi OpenTSDB might be quite heavy since it needs to run webserver, HBase and Java.
Just for suggestions. You can use Raspberry pi as collecting hub for smart home and send the data from Raspberry pi to server and store all the points in the server. Later in the server you can do whatever you want like aggregation, or performing statistical analysis etc. And then you can send results back to the smart hub.
ATSD supports ARM architecture and can be installed on a Raspberry Pi 2 to store sensor data. Currently, Ubuntu or Debian OS is required. Be sure that the device has at least 1 GB of RAM and an SD card with high write speed (60mb/s or more). The size of the SD card depends on how much data you want to store and for how long, we recommend at least 16GB, you should plan ahead. Backup batter power is also recommended, to protect against crashes and ungraceful shutdowns.
Here you can find an in-depth guide on setting up a temperature/humidity sensor paired with an Arduino device. Using the guide you will be able to stream the sensor data into ATSD using MQTT or TCP protocol. Open-source sketches are included.

Is hosting my multiplayer HTML5 game on a free heroku dyno hurting my network performance?

I've recently built a multiplayer game in HTML5 using the TCP-based WebSockets protocol for the networking. I already have taken steps in my code to minimize lag (using interpolation, minimizing the number of messages sent/message size), but I occasionally run into issues with lag and choppiness that I believe are happening due to a combination of packet loss and TCP's policy of in-order delivery.
To elaborate - my game sends out frequent websocket messages to players to update them on the position of the enemy players. If a packet gets dropped/delayed, my understanding is that it will prevent later packets from being received in a timely manner, which causes the enemy players to appear frozen in the same spot and then zoom to the correct location once the delayed packet is finally received.
I confess that my understanding of networking/bandwidth/congestion is quite weak. I've been wondering whether running my game on a single free heroku dyno, which is basically a VM on another virtual server (heroku dynos are on EC2 instances) could be exacerbating this problem. Do heroku dynos and multi-tenant servers in general tend to have worse network congestion due to noisy neighbors or other reasons?
Yes. You don't get dedicated networking performance from Heroku instances. Some classes of EC2 instances in a VPC can have "Enhanced Networking" enabled which is supposed to help give you dedicated performance.
Ultimately, though the best thing to do before jumping to a new solution is benchmarking. Benchmark what level of throughput you can get from a Heroku dyno then try benchmarking an Amazon instance to see what kind of difference it makes.

Perfmon - Refresh rate of power meter

I'm writing a tool to collect information about power consumption of notebooks. I need to measure the current power consumption, and I use Perfmon to do so. But I found a strange bug.
Here is the typical graph of power consumption (this is "Power Meter" - "Power" - "_Total"):
Measurements are updated about once every 10-15 seconds.
But if the run Everest (or AIDA64) Power Management tab will be updating this more often, the results are more accurate:
Measurements are updated about once every 1-2 seconds.
I do not understand what happens when we run Everest. I really need to get accurate data.
Do you have any ideas?
I would really appreciate any suggestions in this regard.