Networked CUDA GPU - cuda

Is it possible for multiple low-end computers to each make CUDA calls to a GPU located on a central server in a request/response over the cloud scenario? To make it as if these low-end computers possess a "virtual" GPU.

I had a similiar problem to solve.
The database was living in the low end machine and I had a cluster of GPUs in my disposal on the local network.
I made a small client (on the low end machine) to parse the database, serialize the data with google protocol buffers and send them to the server with zmq sockets. For data distribution you can have asynchrouns publisher/subscriber sockets.
On the server side you deserialize the data and you have the CUDA program to run the calculations (it can also be a daemonized application so you dont have to fire it up yourself every time).
Once the data is ready on the server you can issue a synchronous message (request/reply socket) from the client and when the server receives the message it calls a function wrapper to the CUDA kernel.
If you need to process the results back on the client you can follow the reverse route to send the data back to the client.
If the data is already in the server, its even easier. You only need the request/reply socket to send a message and call the function.
Check the zmq manual, they have a lot of examples in many programming languages.

Related

Can I edit packets from my server before they reach my Client?

I made a simple Instant Message Chat Client and Server on TCP, that both run off Adobe AIR. It works great and it was a interesting way to learn basic networking programming.
My Question: Is it possible to change the data in the packet sent from the Chat Server before it arrives at the Client without using the Server or Client to do so? Like perhaps a program?
I am new to Network programming so I apologize if this is a dumb question.
Your question is very broad. So the answer is broad as well. Yes. It's possible.
For that you need to get the packets between the client and server to pass through a third program. There are quite a lot of ways to achieve that. Here's non-exhaustive list:
First, on your own machines (client/server) you could get access to the packet from the operating system using various low-level APIs. For instance iptables+nfqueue in Linux or the Windows Filtering Platform on Windows.
Second, you could get access to the packets by intentionally having them communicate through some proxy program which may or may not reside on the same server as the client or the server.
Third, you could get access to the packets by picking them up from the network itself. For instance, you could set up some Linux machine as a router and have it sit between the client and the server (as long as they're not on the same machine). That Linux machine will now have access to all of the packets that pass through it, and it can pass them to various user-space programs using hooks such as the previously mentioned nfqueue.

Why is spark filling the tmp (spark.local.dir) in the machine that submits jobs?

I have a spark 1.2.1 cluster set up in standalone mode with a master and a few slaves. I then let my data scientists enjoy the cluster's power.
All is working fine. However, the dedicated server that my data scientists used to submit spark jobs have its spark.local.dir filled up gradually.
Given that this machine is sitting outside of the cluster, not a master, nor a worker/slave, I wouldn't think that the local spark.local.dir is used in any way by spark. (And why would it? It only shows the logs.)
I could not find a good doc detailing this part of information. Does anybody have an idea?
Not enough information about your setup to be sure, but I am guessing that the jobs are launched in client mode where the driver would be on your client node.
From the spark docs:
In client mode, the driver is launched in the same process as the client that submits the application. In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish.
I am guessing that in client mode the driver (on your client machine) of the application needs plenty of scratch space to manage the other workers in that case.

Arduino Yun - Uploading Sensor Data to Mysql on External Server

Can Arduino Yun connect to MYSQL on External Server and store sensor data on it. If yes how?.
Technically you can.
You could write or port a mysql driver to the Arduino, but the small amount of memory and process power will make that no very easy.
With the Yún you could also install some libraries and program an app running on the linux side that forwards incoming data from the serial to your mysql database.
An other option:
Use an intermediary app to save incoming data to your database. The interface could be a typical HTTP API or via a publish/subscribe broker.
For simplicity I would recommend to go for option three.

GPUDirect RDMA transfer from GPU to remote host

Scenario:
I have two machines, a client and a server, connected with Infiniband. The server machine has an NVIDIA Fermi GPU, but the client machine has no GPU. I have an application running on the GPU machine that uses the GPU for some calculations. The result data on the GPU is never used by the server machine, but is instead sent directly to the client machine without any processing. Right now I'm doing a cudaMemcpy to get the data from the GPU to the server's system memory, then sending it off to the client over a socket. I'm using SDP to enable RDMA for this communication.
Question:
Is it possible for me to take advantage of NVIDIA's GPUDirect technology to get rid of the cudaMemcpy call in this situation? I believe I have the GPUDirect drivers correctly installed, but I don't know how to initiate the data transfer without first copying it to the host.
My guess is that it isn't possible to use SDP in conjunction with GPUDirect, but is there some other way to initiate an RDMA data transfer from the server machine's GPU to the client machine?
Bonus: If somone has a simple way to test if I have the GPUDirect dependencies correctly installed that would be helpful as well!
Yes, it is possible with supporting networking hardware. See the GPUDirect RDMA documentation.

Synchronization and time keeping of multiple applications

How would I implement a system that will keep 20 applications running on a closed network to stay synchronized whilst performing various tasks?
Each application will be identical, on an identical machine. These machines will have a socket connection to the master application that will issue TCP commands to the units such as Play:"Video1.mp4". It is vital that these videos are played at the same time and keep time with each other.
The only difference between each unit is that the window will be offset on the desktop, so that each one has a different view port on the application - as this will be used in a multi-projector set up.
any solutions/ideas would be greatly appreciated.
I did it some years ago. 5 computers running 5 instances of the same flash app. Evey app was displaying a "slice" of the same huge app and everything needed to be synchronized at fractions of seconds precision.
I used a simple Python script (running on a 6th machine) that was sending OSC messages on the local network. the flash apps were listening through FLOSC to this packets, and were sending to the Python script message about their status.
The stuff was running at the Withney Museum (NY) and at Palais de Tokyo (Paris), so I'm quite confident about the solution :) I hope it helps you
You have to keep tracking and latest updated data in your master application. you have to broadcast your newly updated data to all connected client to deliver updated data. after any update from any client you have to send updated data to all connected clients.
In FMS remote shared object is used to maintain data centrally across the network connected application via FMS. when any client is sending any updated OnSync Event is fired to all client application and data is sync with FMS Remote Shared Object. So this kind of Flow you have to develop for proper synchronization of data across network.
you can also use the RPC system to sync data between all connected application to the Master application. in that you have to init RPC to the client to Master application to send data update and Master application send RPC to all other client which are connected to the Master application.