I am training a few deep learning models on Google Colab with runtime type set to TPU. The RAM and disk status shows that I have used most of my disk storage on Colab. Is there a way to reset it? Or to delete something to free up some more disk space? I know that I can change to GPU which will give me a lot more disk space, however, my models take forever to change, so I would really like to stay with TPU. Thanks in advance!
A few places you might delete by rm -rf and reclaim some spaces.
5.6G from /usr/local/lib/python2.7
5.3G from /swift
3.0G from /usr/local/cuda-10.1
3.0G from /usr/local/cuda-10.0
2.1G from /tensorflow-2.0.0
1.3G from /usr/local/lib/python3.6/dist-packages/torch
788M from /opt/nvidia
474M from /usr/local/lib/python3.6/dist-packages/pystan
423M from /usr/local/lib/python3.6/dist-packages/spacy
I don't think there is a way to make more space than is available when you first open the Colab document. What is already there is there for a reason, it is there to run your environment. You can still try to remove existing files at your own risk by running the linux remove command like so in a cell.
!rm <path>
Otherwise, you'll have to switch to GPU because I know it offers a whole lot more space at the expense of longer training time. I think another option might be to pay to upgrade, but I don't know if it only gives you more TPU time or if it increases your RAM as well.
Related
We have been using IMX6ULL processors along with a Quectel 4G module in our custom made boards. The 4G module can be initialised, brought up and the PPP0 interface can also be initialised which in-turn does provide us with internet connectivity too but, when we start downloading files (of about 10 MB - 200 MB), we have observed that the download begins to stall at irregular intervals. While the download does stall, the PPP0 interface is still up but we lose internet connectivity hence, we have to kill PPPD and re-initialise PPP0.
We have tried using different variations of PPP0 initialisation scripts that we could get our hands on but the issue still persisted however, recently when we wanted to dump the traffic on the PPP0 interface using TCPDUMP in-order to analyse the same, we observed that the download does not stall anymore and we also observed a much better 4G throughput too. We have still not been able to figure out why this is indeed the case. Any inputs or guidance on the same would be of great help.
P.S: The kernel version we have been using is 4.1.15 but, we have observed a similar behavior with the 5.4.70 kernel too.
Thanks in advance
Regards
Nitin
Check the 4G network first with AT+COPS? and AT+CSQ
Does the Module disconnect from the base station?
Do not try kill pppd and restart setting up ppp0, and try AT+CFUN=0 \ AT+CFUN=1 to restart the network registration first.
And for 4G module, the Quectel provide a tool named quectel-CM to setup internet connection, it is of better performance than ppp.
btw, have you check the the memory used and the CPU status?
I'm going to setup a full Ethereum node on my PC here with geth --syncmode=full
I have to buy SSD drive for that. My question is will 1TB SSD be enough or I have to buy even more bigger (= expensive) SSD drive?
PS. I've searched over internet and didn't find recent information about it...
This is what you need:
https://etherscan.io/chart2/chaindatasizefast
Geth has 3 modes; light, fast and full. Running fast is fine. If you want to learn more about them, read this answer.
according to this article once Geth is done with fast sync, it switches to full sync. With a Parity Archive node approaching 2TB (source) you can expect at least that much in disk-space. Running a stable node is a challenge, so you may want to look into QuikNode (who can run a node in the cloud for you).
Is there an easy way to increase the RAM available in Knime through a config file or through menu options?
I am constantly running into "heap-space" errors during execution and it by default limits the number of categorical variables to 1,000, as well as difficulty displaying charts with more than n values (~10,000).
Example error:
ERROR Decision Tree Learner 0:65 Execute failed: Java heap space
Thanks!
Sure, you can edit knime.ini (in the knime or knime_<version> folder) and change the row starting with -Xmx (I think by default it is 2048m, two GiB). Though do not use so much memory that would cause the OS to swap as Java do not play very well with swapping.
(Displaying too many variables might still be slow, maybe you could aggregate them somehow.)
Anybody has a hint? I didn't change anything in the machine (except for the security updates), and the sites hosted there didn't suffer a significant change in connections.
May be Google changed something in their infrastructure? Coincidentally, it was an issue with the Cloud DNS ManagedZone these days: they charged me with $ 920 for half month usage, and it was an error (they counted thousands of weeks of usage too) so they recently changed back to $ 0,28. May be there was some process that indeed used the Cloud DNS by error and thus consumed CPU power, and they corrected now?
I wish to know what is happening from someone that knows what going on in GC. Thank you.
CPU utilization reporting is now more accurate from a VM guest perspective as it doesn't include virtualization layer overhead anymore. It has nothing to do with Cloud DNS.
See this issue for some extra context:
https://code.google.com/p/google-compute-engine/issues/detail?id=281
i recently have migrated between 2 servers (the newest has lower specs), and it freezes all the time even though there is no load on the server, below are my specs:
HP DL120G5 / Intel Quad-Core Xeon X3210 / 8GB RAM
free -m output:
total used free shared buffers cached
Mem: 7863 7603 260 0 176 5736
-/+ buffers/cache: 1690 6173
Swap: 4094 412 3681
as you can see there is 412 mb ysed in swap while there is almost 80% of the physical ram available
I don't know if this should cause any trouble, but almost no swap was used in my old server so I'm thinking this does not seem right.
i have cPanel license so i contacted their support and they noted that i have high iowait, and yes when i ran sar i noticed sometimes it exceeds 60%, most often it's 20% but sometimes it reaches to 60% or even 70%
i don't really know how to diagnose that, i was suspecting my drive is slow and this might cause the latency so i ran a test using dd and the speed was 250 mb/s so i think the transfer speed is ok plus the hardware is supposed to be brand new.
the high load usually happens when i use gzip or tar to extract files (backup or restore a cpanel account).
one important thing to mention is that top is reporting that mysql is using 100% to 125% of the CPU and sometimes it reaches much more, if i trace the mysql process i keep getting this error continually:
setsockopt(376, SOL_IP, IP_TOS, [8], 4) = -1 EOPNOTSUPP (Operation not supported)
i don't know what that means nor did i get useful information googling it.
i forgot to mention that it's a web hosting server for what it's worth, so it has the standard setup for web hosting (apache,php,mysql .. etc)
so how do i properly diagnose this issue and find the solution, or what might be the possible causes?
As you may have realized by now, the free -m output shows 7603MiB (~7.6GiB) USED, not free.
You're out of memory and it has started swapping which will drastically slow things down. Since most applications are unaware that the virtual memory is now coming from much slower disk, the system may very well appear to "hang" with no feedback describing the problem.
From your description, the first process I'd kill in order to regain control would be the Mysql. If you have ssh/rsh/telnet connectivity to this box from another machine, you may have to login from that in order to get a usable commandline to kill from.
My first thought (hypothesis?) for what's happening is...
MySQL is trying to do something that is not supported as this machine is currently configured. It could be missing a library or an environment variable is not set or any number things.
That operation allocates some memory but is failing and not cleaning up the allocation when it does. If this were a shell script, it could be fixed by putting an event trap command at the beginning that runs a function that releases memory and cleans up.
The code is written to keep retrying on failure so it rapidly uses up all your memory. Refering back to the shell script illustration, the trap function might also prompt to see if you really want to keep retrying.
Not a complete answer but hopefully will help.