how to setup embarrassingly parallel jobs on gce - google-compute-engine

what is the simplest way to setup embarrassingly parallel jobs on GCE? i have used slurm on other clusters but would like to avoid the config and installation issues. is there something available out of the box on google?

The easiest option is probably not to use GCE directly! Google Cloud has a few other products that will automatically manage config/installation of parallel soutions for you:
Dataflow, based on Apache Beam
Dataproc, based on Hadoop
BigQuery, if your query can be expressed in a SQL-like form.

Related

What option we have to build our own MySQL private cloud?

My company has a requirement that we want to build a private DB cloud service for internal use.
Requirements are:
User can easily request a new mysql instance, and terminate it.
Each mysql instances are isolated with each other.
One of the solution we have is just using to create different user and schema for each user. Something similar to what the cPanel is doing.
But I wonder is there better option available?
Honestly, I am don't think putting everybody on the single big MySQL instance is a good idea.
First, we can't do much about resources management. And I am afraid having a problem in the database (can't boot it up for example) is going to kill everybody.
To minimize the risk of single point of failure, I am looking for something like the Amazon RDS and Azure MySQL. What we want is very similar to that.
Does anybody know how are they do that? Is there is open source or commerical version we can buy?
Thanks you.
Without knowing the why it is hard to give you a good answer to your question. If Amazon RDS or Azure MySQL look like a good solution to you, I would suggest using those. Building such a service yourself and making sure it will scale well will probably cost a lot more money.
I mean, sure you could set up Kubernetes or Hashicorp Nomad and deploy containers there but you would need to figure out how these tools work, how to let MySQL run in a scaleable fashion, and build some kind of UI to easily launch and stop MySQL instances.

Kubernetes :: web interface to start a pod

Background:
As a backoffice service for our insurance mathematicians, a daily cronjob runs a pod.
Inside the pod, fairly complex future simulations take place.
The pod has two containers, an application server and a db server.
The process has few variables which are fed into the pod.
This is done by configmaps and container env variables.
When the pod is ready after approx. 10 hours, it copies the resulting database to another database
and then it's done. It runs daily because market data changes daily. And we also daily check our new codebase.
Great value, high degree of standardisation, fully automated.
So far so good.
But it uses the same configuration every time it runs.
Now what?
Our mathematicians would like to be able to start the pod feeding their own configuration data into it.
For example on a webpage with configurable input data.
Question:
Is there an existing Kubernetes framework implementing this?
"Provide a webpage with configurable input fields which are transformed into configmaps and env variables starting the pod"?
Sure, not too difficult to write.
But we do cloud native computing also because we want to reuse solutions of general problems and not write it ourselves if possible.
Thanks for any hints in advance.
They can start a Kubernetes Job for one time tasks. Apart from Google Cloud Console UI I'm not aware of an UI where you can configure fields for a config map. Maybe you can write a custom python script that launches these jobs.
https://kubernetes.io/docs/concepts/workloads/controllers/job/

Should I run mysql on google cloud run? (or any database)

I've been researching the new options to run Docker containers in Google Cloud Run, however, there seems to be no advice on whether or not one should run MySQL on Cloud run, apparently, I know it isn't a web service, and I understand in the Official Google Documentation for GCP, Google would probably just tell people to kindly use Cloud SQL (their SQL Offering), I haven't found any advice online about "running mysql on cloud run", so I thought I'd ask here.
Will startup times from cold starts decrease performance of the solution? (assuming one uses a Bucket for storing the stuff)
Running a SQL database is not a good fit for Cloud Run.
First of all, the contract between the deployed container and Cloud Run is that the container needs to run an HTTP server on port 8080. That's not really the way MySQL works.
Second of all, the container is going to be limited to the filesystem that was included in the container image. This same image is going to be instantiated many times over as the service handles load. There will be no way to persist the data written to MySQL. You could have read-only data stored in that image that only changes when a new image is published, but that's not really what you would expect to use a relational database for.
Cloud Run is really good at operating HTTP/web services in a serverless and scalable way. These web services typically make use of other APIs and service deployed to Google Cloud, or third party services. It's not really meant to offer persistent, scalable, ACID-compliant database services - this is a whole different sort of problem space.

Kubernetes cluster on GCE from Instances/Group

Have Kubernetes computation cluster running on GCE, reasonable happy so far. I know if I created K-cluster, I'll get to see nodes as VM Instances and cluster as Instance group. I would like to do other way around - create instances/group and make K-cluster out of it so it could be managed by Kubernetes. Reason I want to do so is to try and make nodes preemptible, which might better fit my workload.
So question - Kubernetes cluster with preemptible nodes how-to. I could do either one or another now, but not together
There is a patch out for review at the moment (#12384) that makes a configuration option to mark the nodes in the instance group as preemptible. If you are willing to build from head, this should be available as a configuration option in the next couple of days. In the meantime, you can see from the patch how easy it is to modify the GCE startup scripts to make your VMs preemptible.

What is better approach "mysql on VM GCE" vs Google SQL Cloud

I want to use a small mysql database in order to store some data that i going to calculate on a VM of GCE (by using Talend).
After store the data on the mysql i want to connect to it by using Excel, and update some registries.
What should be the best approach, install mysql on the VM or use Google Cloud SQL?
Kind Regards
Only you can decide what better fits your needs, but you may consider the following:
Local Mysql Pros:
Faster performances. This could be important if generating a lot of queries you would need a bigger Cloud SQL instance to have similar speed.
Minor costs
Cloud SQL Pros:
High reliability. Data is backed-up without the need of taking snapshots.
Possibility to stop or delete GCE instance and keep database active.
Easier and faster to scale if required
Easily add a read replica.
Less load on the GCE
Sincerely,
Paolo
For better performance you can run your MySQL server on a virtual machine. I have tried that with the same server specifications (1 CPU, 3.75 GB memory) and it runs much better.