I configured -spark.executor.cores=16 in leads but when i starts snappy i am getting
WARNING: daemonize() not available in base implementation snappydata
and also i tried with
export SPARK_WORKER_CORES=16
but it is not reflected in cores usage, it uses all cores available in my server
Related
So I recently have a project using MySQL 8.0.12, configured for Development Computer upon installation.
I developed the system on my PC, which has an i5 CPU with 8 GB RAM.
On my PC, the mysqld.exe process consumes around 10% of CPU usage and 20 MB of Memory when a continuous query is run
I then deployed this system to the client PC, which has an Atom CPU with 8 GB RAM. Also using a fresh install of MySQL 8.0.12.
For some reason, even on idle condition, the mysqld.exe process consumes 300 MB of Memory. Also the CPU usage goes up to 60% during continuous query.
Both system runs on Windows 10 x64-bit
Obviously the speed of these two computers are different, but I kind of doubt that the CPU core is the issue, since the idle state already consume different memory.
What may went wrong with this MySQL inside the Atom based PC? Why does it behave very differently? CPU Usage aside, it is very weird to me that the idle state memory consumption is so different.
Is there any possible workaround to these issues?
I'm running a JBOSS application that we have successfully configured use Huge Pages/Large Pages on with other cloud platforms other than GCE. But I'm having problems on GCE. I'm seeing the error:
Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve large pages memory req_addr: 0x00000005f0000000 bytes: 8858370048 (errno = 12).
when I start the JVM. These are running on Ubuntu 14.04 based systems, they have been configured just like similar systems we have huge pages working before, so I'm wondering if there is some settings on virtual machine level that aren't set correct to allow large pages to work. Anyone have any suggestions?
I just wasn't allocating enough huge pages to handle the amount of heap I was configured for the JVM. So there isn't a problem in GCE.
As you know, since CUDA 5.5, Hyper-Q (on NVIDIA GPUs) allows multiple MPI processes to run simultaneously on a single GPU and share its resources, upon resource availability.
Hyper-Q can be activated by a driver command (i.e., nvidia-cuda-mps-control -d ) before running the application.
Considering that Hyper-Q does not necessarily benefit the performance of an application (and in some cases it can even harm the performance), is there anyway to deactivate Hyper-Q (or even activate it) by inserting some driver commands in the application? In other words, is it possible to start Hyper-Q within the program (any runtime/Driver command, that you're aware of)?
Thanks in advance,
Iman
Hyper-Q cannot be turned on/off. This is a hardware feature of Kepler cc3.5 and newer GPUs.
The CUDA MPS server can be turned on/off. The method of turning it on and off is described in section 4.1.1 of the documentation. In a nutshell, excerpting:
nvidia-cuda-mps-control -d # Start daemon as a background process.
echo quit | nvidia-cuda-mps-control # Shut the daemon down.
There's nothing to prevent you from issuing these commands from a non-CUDA application (for example via system()).
Regarding CUDA applications:
After the CUDA MPS server is turned on, CUDA applications (or MPI processes) can be launched and "connected" to an instance of the MPS server.
Once these applications (or MPI processes) are launched and connected to the MPS server, the MPS server should not be shut down until those applications/processes are terminated.
Therefore, an application that has launched and connected to an MPS server should not manipulate the state of that server.
I'm writing a program that manages data entered by users. I plan to open a test version to the public and have no idea how many users there may be.
I want my program to test when memory is getting low so that I know when to buy more server space and so that I can automatically restrict data entry when necessary. What's a good way to detect memory shortage? Allocate garbage space temporarily to get the exception? Is there a better way?
This may be best accomplished outside of your application using a performance monitoring tool. Windows Server can be configured to do this for you; see this question. There are other tools out there that help you monitor your servers, and I advise you to use an existing system unless you absolutely have to do this with Python.
If you must absolutely do this using Python, then have a look at the psutil library:
psutil (python system and process utilities) is a cross-platform
library for retrieving information on running processes and system
utilization (CPU, memory, disks, network) in Python. It is useful
mainly for system monitoring, profiling and limiting process resources
and management of running processes. It implements many
functionalities offered by command line tools such as: ps, top, lsof,
netstat, ifconfig, who, df, kill, free, nice, ionice, iostat, iotop,
uptime, pidof, tty, taskset, pmap. It currently supports Linux,
Windows, OSX, FreeBSD and Sun Solaris, both 32-bit and 64-bit
architectures, with Python versions from 2.4 to 3.4. Pypi is also
known to work.
You may combine this with the email package to send the alerts.
What does Nvidia CUDA driver do exactly? from the perspective of using CUDA.
The driver passes the kernel code, with the execution configuration (#threads, #blocks)...
and what else?
I saw some post that the driver should be aware of the number of available SMs.
But isn't that unnecessary ? Once the kernel is passed to GPU, the GPU scheduler just needs to spread the work to available SMs...
The GPU isn't a fully autonomous device, it requires a lot of help from the host driver to do even
the simplest things. As I understand it the driver contains at least:
JIT compiler/optimizer (PTX assembly code can be compiled by the driver at runtime, the driver will also recompile code to match the execution architecture of the device if required and possible)
Device memory management
Host memory management (DMA transfer buffers, pinned and mapped host memory, unified addressing model)
Context and runtime support (so code/heap/stack/printf buffer memory management), dynamic symbol management, streams, etc
Kernel "grid level" scheduler (includes managing multiple simultaneous kernels on architectures that support it)
Compute mode management
Display driver interop (for DirectX and OpenGL resource sharing)
That probably represents the bare minimum that is required to get some userland device code onto a GPU and running via the host side APIs.