How often does Google Cloud Preemptible instances preempt (roughly)? - google-compute-engine

I see that Google Cloud may terminate preemptible instances at any time, but have any unofficial, independent studies been reported, showing "preempt rates" (number of VMs preempted per hour), perhaps sampled in several different regions?
Given how little information I'm finding (as with similar questions), even anecdotes such as: "Looking back the past 6 months, I generally see 3% - 5% instances preempt per hour in uswest1" would be useful (I presume this can be monitored similarly to instance count metrics in AWS).
Clients occasionally want to shove their existing, non-fault-tolerant code in the cloud for "cheap" (despite best practices), and without having an expected rate of failure, they're often blind-sighted by the cheapness of preemptible, so I'd like to share some typical experiences of the GCP community, even if people's experiences may vary, to help convey safe expectations.

Thinking about “unofficial, independent studies” and “even anecdotes such as:” “Clients occasionally want to shove their existing, non-fault-tolerant code in the cloud for "cheap"” it ought to be said that no one architect or sysadmin in right mind would place production workloads with defined SLA into an execution environment without SLA. Hence the topic is rather speculative.
For those who is keen, Google provides preemption rate expectation:
For reference, we've observed from historical data that the average
preemption rate varies between 5% and 15% per day per project, on a
seven-day average, occasionally spiking higher depending on time and
zone. Keep in mind that this is an observation only: Preemptible
instances have no guarantees or SLAs for preemption rates or
preemption distributions.
Besides that there is an interesting edutainment approach to the task of "how to make inapplicable applicable".

Related

Scheduling Optimization for multi-step growth modeling

I recently had a Gaussian Process machine learning program built for my production department. This GP system has built a massive mySQL database that provides growth durations for each of the organisms we grow (Lab environment) and the predicted yield for each of those combinations of growth steps.
I would like to build an optimization program in python (preferably) to assist me in scheduling what organisms to grow, when to grow them, and for how long at each step.
Here is some background:
4 steps to the process
Plate step (organism is plated; growth is started)
Seed step (organism transferred from plate to seed phase)
Incubation step (organism is transferred from seed to incubation phase)
Harvest step (organism is harvested; yield collected)
There are multiple organisms (>50) that are grown per year. Each has their own numerical ID
There is finite space to grow organisms at the incubation step
There is infinite space to grow organisms at the plate and seed step.
Multiple 'lots' of the same organism are typically grown at a time. A lot is predefined by the number of containers being used at the incubation step.
Different organisms have very different maximum yields. Some yield 2000 grams max and others 600 g max.
The mySQL server has every combination of # of days at each step for each organism and the predicted yield for that combination. This data is what needs to be used for optimization.
The massive challenge we run into is scheduling what organisms to grow when. With the GP process, we know the theoretical maximums (and they work!) but its hard putting it into practice due to constraints (see below)
Here would be my constraints:
Only one organism can be harvested per day.
No steps can be started on weekends. Organisms can grow over the weekend, but we can't start a new step on a weekend
If multiple 'lots' are being grown of the same mold, the plate and seed start dates should be the same for every 'lot'.
- What this typically looks like in practice is:
- plate and seed steps start on the same day
- next, incubation steps start day-after-day for as many lots as being made
- finally, harvests occur in the same pattern (day-after-day)
- Therefore, what you typically get is identical # of days in the plate phase, identical # of incubation days, and differing # of seed days.
Objective Function: I don't know how to articulate this perfectly, but very broadly we need to maximize the yields for each organism. However, there needs to be a time balance too as the space to grow the organisms is finite and the time we have to grow them is finite as well.
I have created a metric known as lot*weeks that tries to capture that. It is a measure of the number of the number of weeks (at the incubation phase) needed to grow the expected annual demand of a specific organism based upon the predicted yield from the SQL server. Therefore, a potential objective function would be to minimize the lot_weeks for each organism.
This is obviously more of a broad ask for help. I don't have a specific request. If this is not appropriate for this forum, I can take my question elsewhere. I feel comfortable with the scope of the project and can figure out how to write the code over time but I need assistance with what tools to use and what's possible.
I've seen that pyomo may be helpful but I also wanted to check here first. Thank you
I've tried looking into using Pyomo but stopped due to the complexity and didn't want to learn all of it if it wasn't appropriate for the problem.
Edit: This was too broad, I apologize. I've created another post with more concrete examples. Thank you for all that helped.
This is really too broad of a question for this forum, and it may likely get closed. That said...
You have a framework here that you could develop an optimization in. The database part is irrelevant. For an effective optimization model, what you really need is a known relationship between the variables and the outcomes, for instance, days in incubation ==> size of harvest or such. Which it sounds like you have.
This isn't an entry level model you are describing. Do you have any resources to help? Local university that might have need for grad student projects in the field or such?
As you develop this, you should start small and focus the model on the key issues here... if they aren't known, then perhaps that is the place to start. For instance, perhaps the key issue is management of planting times vis-a-vis the weekends (that is one model). Or perhaps the key issue is the management of the limited space for growth and the inability to achieve steps on the weekend just kinda works itself out. (That is another model for space management.) Try one that seems to address key management questions. Start very small and see if you can get something working as a proof of concept. If this is your first foray into linear programming, you will need help. You might also start with an introductory textbook on LP.

Compute Engine - Automatic scale

I have one VM Compute Engine to host simple apps. My apps is growing and the number of users too.
Now my users work basicaly from 08:00 AM to 07:00 PM, in this period the usage os CPU and Memory is High and the speed of work is very important.
I'm preparing to expand the memory and processor in the next days, but i search a more scalable and cost efective way.
Is there a way for automatic add resources when i need and reduce after no more need?
Thanks
The cost of running your VMs is directly related to a number of different factors i.e. the type of network in use (premium vs standard), the machine type, the boot disk image you use (premium vs open-source images) and the region/zone where your workloads are running, among other things.
Your use case seems to fit managed instance groups (MIGs). With MIGs you essentially configure a template for VMs that share the same attributes. During the configuration of your MIG, you will be able to specify the CPU/memory limit beyond which the MIG autoscaler will kick off. When your CPU/memory reading goes below that threshold, MIG scales your VMs down to the number of instances specified in your template.
You can also use requests per second as a threshold for autoscaling and I would recommend you explore the docs to know more about it.
See docs

Some total newbie questions on NFT and Ethereum

I'm interested in the conceptual topic of creating rights managements systems on the the Ethereum block chain with digital assets represented by an NFT.
I am just reading up on how to write programs that run on Etherium but I have some very basic questions just to get to started.
I read that NFT are created on the Ethereum blockchain. I don't really understand if that is the same block chain on which the currency Ether is maintained? Seems like the ledger will become impossibly large huge if both the every currency transaction and every digital asset and copy thereof that migrates to Ethereum is stored in one single giant ledger and that each miner on the chain has to download the entire ledger to one single machine in order to validate transactions? Have I got big misunderstanding there? I know there is talk about "sharding" in the future, but it seems like that isn't coming very soon.
Cost of running a smart contract on the blockchain? Assuming that the we are talking about the same block chain, from what I can see the price of "Gas" is quite high. I'm reading that the price of ETH transfer from one party to another is 21,000 Gwei, about $0.03 today. Just trying to understand the basics, how much does it cost to create a NFT? And roughly how much does it cosst to execute a simple function on the blockchain (without loops). Let say the equivalent of 5 statement function which takes a few simple params, reads a few blocks, doesn't write to the block chain but just performs some simple math and a few if statements and returns a string? Does that also cost, like, more than penny? Is the conversion to ETH2 switch from proof of work to proof of stake going to bring those costs down by orders of magnitude?
Any good resources or reference on how to write programs which create and manipulate NFTS on Etherium? Most of what I have seen in the bookstores seem to cover financial transactions with Ether.
Yes, it's the same blockchain.
You can see in the stats that full node (stores current state) currently takes about 400 GB and archive node (stores current and historical states as well) takes about 6.6 TB.
My observation is that most web apps using blockchain data don't verify and trust a third-party service running a node (such as Infura). And I believe that most end users or businesses who want/need to verify, usually have the capacity to store 400+ GB and are able to scale.
But if this amount of data is okay or "impossibly large huge", I'll leave that to your decision. :)
Deployment of a token smart contract usually costs between 500k to 3M gas. My estimate is that most token contracts with basic features that were compiled with an optimizer, cost around 1M gas to deploy. With current prices of ~200 Gwei/gas and $1800/ETH, that's about $350. But I remember just few months ago the average gas prices were ~20 and ETH cost $500, so that would be around $10. So yea, the cost of deploying a contract is very volatile.
Simple function that performs validations and transformations in memory is going to cost the base 21k + few hundred gas. (Working with memory data is cheap gas-wise, accessing the storage is much more expensive.) So in current prices around $7, few months ago it could have been $0.25.
As for the question, whether ETH2.0 is going to bring lower gas price: My opinion is that L2 (which should be released earlier than PoS) is going to have some effect on the price since it allows for sidechain transactions (similar to Lightning network on Bitcoin). But this is a development forum, so I'm not not going to dive deeper into price speculations.
I recommend OpenZeppelin docs where they cover their opensource implementations of ERC standards (including ERC-721 NFTs) or googling the topic you're interested in and read articles that catch your eye (at least that's my current approach).
And if you're new to Solidity in general, I recommend at least few chapters from CryptoZombies tutorial. In my opinion, the first few chapters are great and you'll learn a lot, but then the quality slowly fades.

Inter data center traffic of Google Compute VMs

Does anyone know that if there is a limit on network traffic among VMs in different data centers in Google Compute Engine?
Specifically, are there any performance limits if VMs in different DCs are frequently (every 5 ms) communicating with each other?
Thanks in advance.
I'm sure that there are some performance limits, but they should be fairly high if you're within the same region. (>100Mbps, possibly >1Gbps) Between regions, bandwidth is likely to be somewhat more variable, but I'd expect it to be >100Mbps on the same continent.
Note that there are also egress fees for traffic between VMs in different GCP zones, so you might want to pay attention to the total data transferred; 130Mbps would be around 1GB every minute, or $6/hour.

Stress test cases for web application

What other stress test cases are there other than finding out the maximum number of users allowed to login into the web application before it slows down the performance and eventually crashing it?
This question is hard to answer thoroughly since it's too broad.
Anyway many stress tests depend on the type and execution flow of your workload. There's an entire subject dedicated (as a graduate course) to queue theory and resources optimization. Most of the things can be summarized as follows:
if you have a resource (be it a gpu, cpu, memory bank, mechanical or
solid state disk, etc..), it can serve a number of users/requests per
second and takes an X amount of time to complete one unit of work.
Make sure you don't exceed its limits.
Some systems can also be studied with a probabilistic approach (Little's Law is one of the most fundamental rules in these cases)
There are a lot of reasons for load/performance testing, many of which may not be important to your project goals. For example:
- What is the performance of a system at a given load? (load test)
- How many users the system can handle and still meet a specific set of performance goals? (load test)
- How does the performance of a system changes over time under a certain load? (soak test)
- When will the system will crash under increasing load? (stress test)
- How does the system respond to hardware or environment failures? (stress test)
I've got a post on some common motivations for performance testing that may be helpful.
You should also check out your web analytics data and see what people are actually doing.
It's not enough to simply simulate X number of users logging in. Find the scenarios that represent the most common user activities (anywhere between 2 to 20 scenarios).
Also, make sure you're not just hitting your cache on reads. Add some randomness / diversity in the requests.
I've seen stress tests where all the users were requesting the same data which won't give you real world results.