adaptive load balancing with gnu parallel - google-compute-engine

Is there some way to run gnu parallel with a dynamically changing list of remote hosts? The dynamism isn't intermittent or irregular -- I'm looking for a way to use the Google compute engine autoscaling feature to smoothly scale up to a max number of hosts and have gnu parallel dispatch jobs as these hosts come alive. I guess I can create a fake job to trigger autoscaling to launch the multiple hosts and have them register themselves to some central host file.. Any ideas how best to manage this?

From man parallel:
--slf filename
File with sshlogins. The file consists of sshlogins on
separate lines.
:
If the sshloginfile is changed it will be re-read when a
job finishes though at most once per second. This makes it
possible to add and remove hosts while running.

Related

Single CANOE configuration to handle multiple(2) sub canoe configurations

I want to have a single canoe configuration which will prompt user to select his/her choice of interest on CAN speed . I have two seperate canoe configurations developed ,one on high speed CAN(500Kbps) and another on CANFD(2000kbps).And each config has its own set of Nodes simulated .Now I don't want have a seperate configs instead want to have one cong which will load respective config's when user selects his/her CAN speed.
Can I integrate these two seperate config's into one so that if I select HS-CAN ,I need to have its simulated nodes being displayed in the simuation setup and if I select CANFD ,I need to have its simulated nodes being displayed in the simuation setup.
Well, depends hon how dirty you want to have it...
If you're not using VN89xx in standalone mode, there is no "standard" way to have a "master" configuration and then load other configs.
However, you can call CANoe from within CANoe, with SysExecCmd canoe32.exe /a /f "config" to launch CANoe und then immediatly exit canoe from within capl. That's dirty but should work (see CANoe and CAPL help for references). Oh, and don't forget to recall the master configuration after the other configuration finished. This has load times, though...
You can also manually integrate both simulations into the same configuration onto separate CAN buses. You then must assure, only one bus can run at the same time. If you have two piggies: fine. Wire them together externally and you're done. If you don't, each time you change bus (via a CAPL script, stopping and starting the respective simulation), the CAN parameters must be reset to the required specification. Then, in the hardware configuration, set both channels to the same transceiver and ignore the popup telling you, that this is a bad idea (works only with CAN).
Worst would be, of course, to implement everything onto the same bus in CANoe and handle it there.
Best case would be: Don't have such requirements or use external tools. Or usesrs that know what they're doing. CANoe is almost fully scriptable via ActiveX, if you want to go down that rabbit hole...
Sorry to say but this is generally one of the things you want to avoid. ;-)

What would be an ideal way to share writable volume across containers for a web server?

The application in question is Wordpress, I need to create replicas for rolling deployment / scaling purposes.
It seem can't create more then 1 instance of the same container, if it uses a persistent volume (GCP term):
The Deployment "wordpress" is invalid: spec.template.spec.volumes[0].gcePersistentDisk.readOnly: Invalid value: false: must be true for replicated pods > 1; GCE PD can only be mounted on multiple machines if it is read-only
What are my options? There will be occasional writes and many reads. Ideally writable by all containers. I'm hesitant to use the network file systems as I'm not sure whether they'll provide sufficient performance for a web application (where page load is rather critical).
One idea I have is, create a master container (write and read permission) and slaves (read only permission), this could work - I'll just need to figure out the Kubernetes configuration required.
In https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes you can see a table with the possible storage classes that allow ReadWriteMany (the option you are looking for).
AzureFile (not suitable if you are using GCP)
CephFS
Glusterfs
Quobyte
NFS
PortworxVolume
The one that I've tried is that of NFS. I had no issues with it, but I guess you should also consider potential performance issues. However, if the writes are to be occassional, it shouldn't be much of an issue.
I think what you are trying to solve is having a central location for wordperss media files, in that case this would be a better solution: https://wordpress.org/plugins/gcs/
Making your kubernetes workload truly stateless and you can scale horizontally.
You can use Regional Persistent Disk. It can be mounted to many nodes (hence pods) in RW more. These nodes can be spread across two zones within one region. Regional PDs can be backed by standard or SSD disks. Just note that as of now (september 2018) they are still in beta and may be subject to backward incompatible changes.
Check the complete spec here:
https://cloud.google.com/compute/docs/disks/#repds

How do sites like codepad.org and ideone.com sandbox your program?

I need to compile and run user-submitted scripts on my site, similar to what codepad and ideone do. How can I sandbox these programs so that malicious users don't take down my server?
Specifically, I want to lock them inside an empty directory and prevent them from reading or writing anywhere outside of that, from consuming too much memory or CPU, or from doing anything else malicious.
I will need to communicate with these programs via pipes (over stdin/stdout) from outside the sandbox.
codepad.org has something based on geordi, which runs everything in a chroot (i.e restricted to a subtree of the filesystem) with resource restrictions, and uses the ptrace API to restrict the untrusted program's use of system calls. See http://codepad.org/about .
I've previously used Systrace, another utility for restricting system calls.
If the policy is set up properly, the untrusted program would be prevented from breaking anything in the sandbox or accessing anything it shouldn't, so there might be no need put programs in separate chroots and create and delete them for each run. Although that would provide another layer of protection, which probably wouldn't hurt.
Some time ago I was searching for a sandbox solution to use in an automated assignment evaluation system for CS students. Much like everything else, there is a trade-off between the various properties:
Isolation and access control granularity
Performance and ease of installation/configuration
I eventually decided on a multi-tiered architecture, based on Linux:
Level 0 - Virtualization:
By using one or more virtual machine snapshots for all assignments within a specific time range, it was possible to gain several advantages:
Clear separation of sensitive from non-sensitive data.
At the end of the period (e.g. once per day or after each session) the VM is shutdown and restarted from the snapshot, thus removing any remnants of malicious or rogue code.
A first level of computer resource isolation: each VM has limited disk, CPU and memory resources and the host machine is not directly accessible.
Straight-forward network filtering: By having the VM on an internal interface, the firewall on the host can selectively filter the network connections.
For example, a VM intended for testing students of an introductory programming course could have all incoming and outgoing connections blocked, since students at that level would not have network programming assignments. At higher levels the corresponding VMs could e.g. have all outgoing connections blocked and allow incoming connection only from within the faculty.
It would also make sense to have a separate VM for the Web-based submission system - one that could upload files to the evaluation VMs, but do little else.
Level 1 - Basic cperating-system contraints:
On a Unix OS that would contain the traditional access and resource control mechanisms:
Each sandboxed program could be executed as a separate user, perhaps in a separate chroot jail.
Strict user permissions, possibly with ACLs.
ulimit resource limits on processor time and memory usage.
Execution under nice to reduce priority over more critical processes. On Linux you could also use ionice and cpulimit - I am not sure what equivalents exist on other systems.
Disk quotas.
Per-user connection filtering.
You would probably want to run the compiler as a slightly more privileged user; more memory and CPU time, access to compiler tools and header files e.t.c.
Level 2 - Advanced operating-system constraints:
On Linux I consider that to be the use of a Linux Security Module, such as AppArmor or SELinux to limit access to specific files and/or system calls. Some Linux distributions offer some sandboxing security profiles, but it can still be a long and painful process to get something like this working correctly.
Level 3 - User-space sandboxing solutions:
I have successfully used Systrace in a small scale, as mentioned in this older answer of mine. There several other sandboxing solutions for Linux, such as libsandbox. Such solutions may provide more fine-grained control over the system calls that may be used than LSM-based alternatives, but can have a measurable impact on performance.
Level 4 - Preemptive strikes:
Since you will be compiling the code yourself, rather than executing existing binaries, you have a few additional tools in your hands:
Restrictions based on code metrics; e.g. a simple "Hello World" program should never be larger than 20-30 lines of code.
Selective access to system libraries and header files; if you don't want your users to call connect() you might just restrict access to socket.h.
Static code analysis; disallow assembly code, "weird" string literals (i.e. shell-code) and the use of restricted system functions.
A competent programmer might be able to get around such measures, but as the cost-to-benefit ratio increases they would be far less likely to persist.
Level 0-5 - Monitoring and logging:
You should be monitoring the performance of your system and logging all failed attempts. Not only would you be more likely to interrupt an in-progress attack at a system level, but you might be able to make use of administrative means to protect your system, such as:
calling whatever security officials are in charge of such issues.
finding that persistent little hacker of yours and offering them a job.
The degree of protection that you need and the resources that you are willing to expend to set it up are up to you.
I am the developer of libsandbox mentioned by #thkala, and I do recommend it for use in your project.
Some additional comments on #thkala's answer,
it is fair to classify libsandbox as a user-land tool, but libsandbox does integrate standard OS-level security mechanisms (i.e. chroot, setuid, and resource quota);
restricting access to C/C++ headers, or static analysis of users' code, does NOT prevent system functions like connect() from being called. This is because user code can (1) declare function prototypes by themselves without including system headers, or (2) invoke the underlying, kernel-land system calls without touching wrapper functions in libc;
compile-time protection also deserves attention because malicious C/C++ code can exhaust your CPU with infinite template recursion or pre-processing macro expansion;

Simple scalable work/message queue with delay

I need to set up a job/message queue with the option to set a delay for the task so that it's not picked up immediately by a free worker, but after a certain time (can vary from task to task). I looked into a couple of linux queue solutions (rabbitmq, gearman, memcacheq), but none of them seem to offer this feature out of the box.
Any ideas on how I could achieve this?
Thanks!
I've used BeanstalkD to great effect, using the delay option on inserting a new job to wait several seconds till the item becomes available to be reserved.
If you are doing longer-term delays (more than say 30 seconds), or the jobs are somewhat important to perform (abeit later), then it also has a binary logging system so that any daemon crash would still have a record of the job. That said, I've put hundreds of thousands of live jobs through Beanstalkd instances and the workers that I wrote were always more problematical than the server.
You could use an AMQP broker (such as RabbitMQ) and I have an "agent" (e.g. a python process built using pyton-amqplib) that sits on an exchange an intercepts specific messages (specific routing_key); once a timer has elapsed, send back the message on the exchange with a different routing_key.
I realize this means "translating/mapping" routing keys but it works. Working with RabbitMQ and python-amqplib is very straightforward.

How to get your code ready for Loadbalancing

As we did this in the past, i'd like to gather useful information for everyone moving to loadbalancing, as there are issues which your code must be aware of.
We moved from one apache server to squid as reverse proxy/loadbalancer with three apache servers behind.
We are using PHP/MySQL, so issues may differ.
Things we had to solve:
Sessions
We moved from "default" php sessions (files) to distributed memcached-sessions. Simple solution, has to be done. This way, you also don't need "sticky sessions" on your loadbalancer.
Caching
To our non-distributed apc-cache per webserver, we added anoter memcached-layer for distributed object caching, and replaced all old/outdated filecaching systems with it.
Uploads
Uploads go to a shared (nfs) folder.
Things we optimized for speed:
Static Files
Our main NFS runs a lighttpd, serving (also user-uploaded) images. Squid is aware of that and never queries our apache-nodes for images, which gave a nice performance boost. Squid is also configured to cache those files in ram.
What did you do to get your code/project ready for loadbalancing, any other concerns for people thinking about this move, and which platform/language are you using?
When doing this:
For http nodes, I push hard for a single system image (ocfs2 is good for this) and use either pound or crossroads as a load balancer, depending on the scenario. Nodes should have a small local disk for swap and to avoid most (but not all) headaches of CDSLs.
Then I bring Xen into the mix. If you place a small, temporal amount of information on Xenbus (i.e. how much virtual memory Linux has actually promised to processes per VM aka Committed_AS) you can quickly detect a brain dead load balancer and adjust it. Oracle caught on to this too .. and is now working to improve the balloon driver in Linux.
After that I look at the cost of splitting the database usage for any given app across sqlite3 and whatever db the app wants natively, while realizing that I need to split the db so posix_fadvise() can do its job and not pollute kernel buffers needlessly. Since most DBMS services want to do their own buffering, you must also let them do their own clustering. This really dictates the type of DB cluster that I use and what I do to the balloon driver.
Memcache servers then boot from a skinny initrd, again while the privileged domain watches their memory and CPU use so it knows when to boot more.
The choice of heartbeat / takeover really depends on the given network and the expected usage of the cluster. Its hard to generalize that one.
The end result is typically 5 or 6 physical nodes with quite a bit of memory booting a virtual machine monitor + guests while attached to mirrored storage.
Storage is also hard to describe in general terms.. sometimes I use cluster LVM, sometimes not. The not will change when LVM2 finally moves away from its current string based API.
Finally, all of this coordination results in something like Augeas updating configurations on the fly, based on events communicated via Xenbus. That includes ocfs2 itself, or any other service where configurations just can't reside on a single system image.
This is really an application specific question .. can you give an example? I love memcache, but not everyone can benefit from using it, for instance. Are we reviewing your configuration or talking about best practices in general?
Edit:
Sorry for being so Linux centric ... its typically what I use when designing a cluster.