what is the issue with rapidminer hbos process ? - rapidminer

I have a dataset of more than 1 million transactions running on windows with 32 GB RAM with HBOS algorithm.
The issue is we are getting an Out of Memory Error.
Can anyone help.

the HBOS algorithm can be quite memory intensive and the memory needed grows with the number of attributes used. So first of all, reducing the number of attributes might help.
But I couldn't reproduce your error. Perhaps you should reduce the max memory used by RapidMiner (under Settings -> Preferences -> System). The JVM always needs a slight overhead, so running RapidMiner with 30GB max memory should be safe.

Related

TensorFlow strange memory usage

I'm on an Ubuntu 19.10 machine (with KDE desktop environment) with 8GB of RAM, an i5 8250u and an MX130 gpu (2GB VRAM), running a Jupyter Notebook with tensorflow-gpu.
I was just training some models to test their memory usage, and I can't see any sense in what I'm looking at. I used KSysGUARD and NVIDIA System Monitor (https://github.com/congard/nvidia-system-monitor) to monitor my system during training.
As I hit "train", on NVIDIA S.M. show me that memory usage is 100% (or near 100% like 95/97%) the GPU usage is fine.
Always in NVIDIA S.M., I look at the processes list and "python" occupies only around 60MB of vram space.
In KSysGUARD, python's memory usage is always around 700mb.
There might be some explanation for that, the problem is that the gpu's memory usage hits 90% with a model with literally 2 neurons (densely connected of course xD), just like a model with 200million parameters does. I'm using a batch size of 128.
I thought around that mess, and if I'm not wrong, a model with 200million parameters should occupy 200000000*4bytes*128 bytes, which should be 1024gb.
That means I'm definitely wrong on something, but I'm too selfless to keep that riddle for me, so I decided to give you the chance to solve this ;D
PS: English is not my main language.
Tensorflow by default allocates all available VRAM in the target GPU. There is an experimental feature called memory growth that let's you control that, basically stops the initialization process from allocating all VRAM and does it when there is a need for it.
https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth

Running text classification - CNN on GPU

Based on this github link https://github.com/dennybritz/cnn-text-classification-tf , I want to classified my datasets on Ubuntu-16.04 on GPU.
For running on GPU, I've been changed line 23 on text_cnn.py to this : with tf.device('/gpu:0'), tf.name_scope("embedding"):
my first dataset for train phase has 9000 documents and it's size is about 120M and
second one for train has 1300 documents and it's size is about 1M.
After running on my Titan X server with GPU, I have got errors.
Please guide me, How can I solve this issue?
Thanks.
You are getting Out of Memory error, so first thing to try is smaller batch size
(default is 64). I would start with:
./train.py --batch_size 32
Most of the memory is used to hold the embedding parameters and convolution parameters. I would suggest reduce:
EMBEDDING_DIM
NUM_FILTERS
BATCH_SIZE
try embedding_dim=16, batch_size=16 and num_filters=32, if that works, increase them 2x at a time.
Also if you are using docker virtual machine to run tensorflow, you might be limited to use only 1G of memory by default though you have 16G memory in your machine. see here for more details.

Increase RAM available for Knime?

Is there an easy way to increase the RAM available in Knime through a config file or through menu options?
I am constantly running into "heap-space" errors during execution and it by default limits the number of categorical variables to 1,000, as well as difficulty displaying charts with more than n values (~10,000).
Example error:
ERROR Decision Tree Learner 0:65 Execute failed: Java heap space
Thanks!
Sure, you can edit knime.ini (in the knime or knime_<version> folder) and change the row starting with -Xmx (I think by default it is 2048m, two GiB). Though do not use so much memory that would cause the OS to swap as Java do not play very well with swapping.
(Displaying too many variables might still be slow, maybe you could aggregate them somehow.)

How to Properly Recover from Memory Errors in GPU?

Consumer-grade Nvidia GPUs are expected to have about 1-10 soft memory errors per week.
If you somehow manage to detect an error on a system without ECC
(e.g. if the results were abnormal) what steps are necessary and sufficient to recover from it?
Is it enough to just reload all of the data to the GPU (cuda.memcpy_htod in PyCuda),
or do you need to reboot the system? What about the "kernel", rather than data?
A soft memory error (meaning incorrect results due to noise of some kind), shouldn't require a reboot. Just rewind back to some known good position, reload data to the GPU and proceed.
Of course, it depends on what was located in the memory that was corrupted. I have accidentally overwritten memory on GPUs that required a reboot to fix, so it seems that could happen if memory is randomly corrupted as well. I think the GPU drivers reside partially in GPU memory.
For critical calculations, one can guard against soft memory errors by running the same calculation twice (including memory copies, etc) and comparing the result.
Since the compute cards with ECC are often more than twice as expensive as the graphics cards, it may be less expensive to just purchase two graphics cards and run the same calculations on both and compare all results. That has the added benefit of enabling doubling the calculation speed for non-critical calculations.

MySQL thread_concurrency, innodb_thread_concurrency, should I use or not?

I have a dedicated server with 24 CPUs and 32GB of ram.
This server serves website and mysql.
I don't know what is the difference between those two variables, if there is any.
I don't know if I should use them because after reading on Google some say that those variables might be ignored depending on the OS or MySQL versiĆ³n.
So should I use them?
Please read Mysql Performance Blog carefully, select decent initial values, monitor performance of your server during busy hours of the day and tune accordingly.
There are no simple answers, because your workload is uniquely yours.
Off the top of my head your balance of CPU and RAM seems wrong. I think 1~4 cores for 64GB of ram, or 24 cores for max ram you can get, 192GB perhaps? CPU needs to be provisioned for query rate, while RAM for active/hot dataset size. I can imagine a weird workload where your CPU/RAM makes sense, but I'm not sure innodb is in fact the best solution for such workload.
Coming back to your question: "thread concurrency doesn't do what you expect" in short most likely you should not use. innodb_thread_concurrency is just a cutoff, I'd say if your workload is all hot (i.e. mysql doesn't use much disk(?)), it should not be higher than number of cores. Do read up the blog, these settings are not as simple as they seem.
Also you may want to pay attention to: thread cache, innodb buffer pool, add mem pool, heap table size, sort/key buffer size, flush log at tx commit, log file size. And probably a few more I couldn't think of right now.
The thread_concurrency option in MySQL is mainly for Solaris systems and will also be depreciated in version 5.6, so tuning it might be a waste of time.
thread_concurrency
Also please read: https://www.percona.com/blog/2012/06/04/thread_concurrency-doesnt-do-what-you-expect/
The innodb_thread_concurrency can be adjusted for performance, but I've found no performance increases using it.
I found the best information from https://www.percona.com/blog/. All others suggesting and giving advice might deem MySQL inoperable.
(According to manual "thread_concurrency" variable is usable only for Solaris OS)
This will depend on a number of issues, the operating system, the scheduler options, the I/O subsystem, and the number and type of CPUs, as well as the type and number of queries being run.
The only way you can tell for certain on your system is to adjust the value of innodb_thread_concurrency and run typical workloads to benchmark. A reasonable starting point is from 0 to 48 (in your case) x2 times the number of CPU cores available. You could then increase this until the point at which you start to see the system become CPU bound and throttle it back a bit.
This doesn't take into account the disk activity that your transactions will generate, From there you can then look at disk I/O and make adjustments from there.
Setting this to 0 is setting this to unlimited
** so that by default there is no limit on the number of concurrently executing threads
http://dev.mysql.com/doc/refman/5.5/en/innodb-performance-thread_concurrency.html