Where to store key-value pairs for runtime retrieval from within Cloud Function? - google-cloud-functions

In a Cloud Function I need to retrieve a bunch of key-value pairs to process. Right now I'm storing them as json-file in Cloud Storage.
Is there any better way?
Env-variables don't suite as (a) there are too many kv pairs, (b) the same gcf may need different sets of kv depending on the incoming params, (c) those kv could be changed over time.
BigQuery seems to be an overkill, also given that some kv have few levels of nesting.
Thanks!

You can use Memorystore, but it's not persistent see the FAQ.
Cloud Memorystore for Redis provides a fully managed in-memory data
store service built on scalable, secure, and highly available
infrastructure managed by Google. Use Cloud Memorystore to build
application caches that provides sub-millisecond data access. Cloud
Memorystore is compatible with the Redis protocol, allowing easy
migration with zero code changes.
Serverless VPC Access enables you to connect from the Cloud Functions environment directly to your Memorystore instances.
Note: Some resources, such as Memorystore instances, require connections to come from the same region as the resource.
Update
For persisted storage you could use Firestore.
See a tutorial about using Use Cloud Firestore with Cloud Functions

Related

Pub/Sub Communication between twincat3 and azure

I am new in this field. My condition is, I have a Beckhoff PLC using Twincat3 software. I am using OPC UA to upload data to OPC UA server and then send data to the cloud (Azure SQL database) through Azure IoT Hub. I wanted to make a pub/sub communication. The next steps, I will analyze the data with power bi and display it on several power bi mobile with different types of information. The problem is I have a bit confusion about how Pub/Sub communication applied in this connection. I have read about MQTT and AMPQ but do I need to write a code to be able to apply Pub/Sub communication? Thanks!
Azure IoT Hub is a Pub/Sub service. You can subscribe multiple stream processors to the data that hits the hub, and each one will see the entire stream. These stream processors can be implemented in custom code, perhaps with an Azure Function, but also with Logic Apps or Azure Stream Analytics.
You can set up OPC UA servers at both the PLC and the cloud. Each can subscribe to objects on the other for two way exchange. Otherwise, make the OPCUA objects available on the PLC, the subscribe to then from your cloud service.
Of course you will need to enable all the necessary ports and handle certificate exchange.
If you are using the Beckhoff OPC UA server, you annotate the required variables / structs with attributes. See the documentation.
If you want to use MQTT, instead you will need to write some code, using the MQTT library for TwinCAT. You will also need to get your broker set up and again, handle security. There are decent examples for the main providers Inthe Beckhoff documentation for the MQTT library.

Azure CosmosDB - its use as a queue

I have been doing some reading on Azure Cosmos DB (the new Document DB) and noticed that it allows Azure functions to be executed when data is written to the db.
Typically I would have written to a service bus and then processed the message using an Azure function and storing the message in the document db for history.
I wanted some help on good practice for CosmoDB
It depends on your use case, your throughput requirement? what processing you will be doing on data? how transient your data is? will it be globally distributed etc.
Yes, Cosmos DB can ingest data with very high rate and storage can scale elastically too. Azure Functions are certainly a viable option to process the change feed in cosmos db.
Here is more information: https://learn.microsoft.com/en-us/azure/cosmos-db/serverless-computing-database

Can SAN/NAS storage attach to cloud server?

Currently my boss ask my team to relocate our database to cloud server(Windows). Beside that, he also asked us to attach SAN/NAS storage to that server for a better speed/performance. The problem is we have no experiences in SAN/NAS storage.
The question is, can SAN/NAS storage be attach to cloud server? If can, is this a good practice? We currently using MySQL for our database.
Thanks
are we talking about a private or public cloud (AWS, Azure) ? though there are storage arrays that are able to proxy cloud storage, I don't think there are product to attach onsite storage array to a server in public cloud.
The reason why you want to use ie SAN is for performance - minimum latency. Imagine the connection between a storage array in a separate datacenter to a cloud server over TCP/IP, possibly far appart. The latency would make it unusable for ie. high transaction workload and defeat the purpose of a storage array.
If you were talking about a private cloud - VMware orchestrated or Openstack, then that might be possible via RDM (VMware) or Cinder (probably Cinder storage node). I think Azure is adding a feature where you can integrate part of the local infrastructure to Azure as an availability zone, so there might be possibilities.

Share a persistent disk between Google Compute Engine VMs

From Google's documentation:
It is possible to attach a persistent disk to more than one instance. However, if you attach a persistent disk to multiple instances, all instances must attach the persistent disk in read-only mode. It is not possible to attach the persistent disk to multiple instances in read-write mode.
If you attach a persistent disk in read-write mode and then try to attach the disk to subsequent instances, Google Compute Engine returns an error.
So, I need to have a share persistent-disk as frontend for all my compute engine, good, how can you write on this shared disk?
My guess (I hope) is a read/write persistent-disk can be attached only with 1 compute engine but this same disk can be share in read only to others VMs, is thats right?
Lets say I have 2 Compute Engine VMs and 2 persistent disks,
is this flow is possible?
compute1 read/write disk1 and read only disk2
compute2 read/write disk2 and read only disk1
Update: this is available as of 2020-06-16
As per another answer by Matthew Lenz, the functionality for creating multi-writer persistent disks is available, but it's still in alpha status (even though it's documented as being in the beta track) and requires special per-project enablement.
Note: This GitHub issue notes that the functionality is still in alpha, even though it's labelled as beta. You can submit feedback via Cloud Console to request it for your project if you'd like to get early access to this functionality, but it's not guaranteed to be enabled.
Assuming your project has the permissions to use this feature (or the feature becomes public-access), note that it comes with some caveats:
--multi-writer
Create the disk in multi-writer mode so that it can be attached with read-write access to multiple VMs. Can only be used with zonal SSD persistent disks. Disks in multi-writer mode do not support resize and snapshot operations.
You can use this via:
$ gcloud beta compute disks create DISK_NAME --multi-writer [...]
Note the caveats:
zonal SSD persistent disks only
no disk resizing
no snapshots
If these trade-offs are not acceptable to you, see the original answer (below) which has a long list of recommended storage alternatives for sharing data between multiple GCE VMs.
Original answer (valid prior to 2020-06-16)
No, this is not possible, as the documentation that you cited at the time of writing said (since updated):
However, if you attach a persistent disk to multiple instances, all instances must attach the persistent disk in read-only mode.
The documentation has been re-arranged since then; the new docs are at a different URL but with the same content:
You can attach a non-root persistent disk to more than one virtual machine instance in read-only mode, which allows you to share static data between multiple instances. Sharing static data between multiple instances from one persistent disk is cheaper than replicating your data to unique disks for individual instances.
If you attach a persistent disk to multiple instances, all of those instances must attach the persistent disk in read-only mode. It is not possible to attach the persistent disk to multiple instances in read-write mode. If you need to share dynamic storage space between multiple instances, connect your instances to Cloud Storage or create a network file server.
If you have a persistent disk with data that you want to share between multiple instances, detach it from any read-write instances and attach it to one or more instances in read-only mode.
which means you cannot have one instance have write access while another has read-only access.
If you want to share data between them, you need to use something other than Persistent Disk. Below are some possible solutions.
You can use any of the following hosted/managed services:
Google Cloud Filestore — perhaps closest to what you're looking for, as it provides an NFSv3 file system
You can also use Elastifile on GCP as a fully-managed service; note that GCP acquired Elastifile in July 2019
Google Cloud Datastore
Google Cloud Storage, which you can use via the GCS API (JSON or XML) or you can mount it using gcsfuse as a block device
Google Cloud Bigtable
Google Cloud SQL
Alternatively, you can run your own:
self-managed or third-party managed file servers solutions, including NetApp and Panzura
self-managed Elastifile storage deployment (for fully-managed, see previous section for the link)
database (whether SQL or NoSQL)
distributed filesystem such as Ceph, GlusterFS, OrangeFS, ZFS, etc.
file server such as NFS or SAMBA
single VM as a data storage node, and use sshfs to create a FUSE mount from other VMs that want to access that data
GCP has alpha functionality for 'multi-write' persistent disks. It's been in alpha for quite a long time so who knows if it'll make it to beta or ga any time soon. Here is a link to the documentation. https://cloud.google.com/sdk/gcloud/reference/beta/compute/disks/create#--multi-writer
EDIT: 2020-06-16. This has been promoted to beta.

Storage options for diskless servers [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am trying to build a neural network simulation running on several high-CPU diskless instances. I am planning to use a persistent disk to store my simulation code and training data and mount them on all server instances. It is basically a map reduce kind of task (several nodes working on the same training data, the results of all nodes need to be collected to one single results file).
My only question now is, what are my options to (permanently) save the simulation results of the different servers (either at some points during the simulation or once at the end). Ideally, I would love to write them to the single persistent disk mounted on all servers but this is not possible because i can only mount it read-only to more than one server.
What is the smartest (and cheapest) way to collect all simulation results of all servers back to one persistent disk?
Google Cloud Storage is a great way to permanently store information in the Google Cloud. All you need to do is enable that product for your project, and you'll be able to access Cloud Storage directly from your Compute Engine virtual machines. If you create your instances with the 'storage-rw' service account, access is even easier because you can use the gsutil command built into your virtual machines without needing to do any explicit authorization.
To be more specific, go to the Google Cloud Console, select the project with which you'd like to use Compute Engine and Cloud Storage and make sure both those services are enabled. Then use the 'storage-rw' service account scope when creating your virtual machine. If you use gcutil to create your VM, simply add the --storage_account_scope=storage-rw (there's also an intuitive way to set the service account scope if you're using the Cloud Console to start your VM). Once your VM is up and running you can use the gsutil command freely without worrying about doing interactive login or OAuth steps. You can also script your usage by integrating any desired gsutil requests into your application (gsutil will also work in a startup script).
More background on the service account features of GCE can be found here.
Marc's answer is definitely best for long-term storage of results. Depending on your I/O and reliability needs, you can also set up one server as an NFS server, and use it to mount the volume remotely on your other nodes.
Typically, the NFS server would be your "master node", and it can serve both binaries and configuration. Workers would periodically re-scan the directories exported from the master to pick up new binaries or configuration. If you don't need a lot of disk I/O (you mentioned neural simulation, so I'm presuming the data set fits in memory, and you only output final results), it can be acceptably fast to simply write your output to NFS directories on your master node, and then have the master node backup results to some place like GCS.
The main advantage of using NFS over GCS is that NFS offers familiar filesystem semantics, which can help if you're using third-party software that expects to read files off filesystems. It's pretty easy to sync down files from GCS to local storage periodically, but does require running an extra agent on the host.
The disadvantages of setting up NFS are that you probably need to sync UIDs between hosts, NFS can be a security hole, (I'd only expose NFS on my private network, not to anything outside 10/8) and that it will require installing additional packages on both client and server to set up the shares. Also, NFS will only be as reliable as the hosting machine, while an object store like GCS or S3 will be implemented with redundant servers and possibly even geographic diversity.
If you want to stay in the google product space, how about google cloud storage?
Otherwise, I've used S3 and boto for these kinds of tasks
As a more general option, you're asking for some sort of general object store. Google, as noted in previous responses, makes a nice package, but nearly all cloud providers provide some storage option. Make sure your cloud provider has BOTH key options -- a volume store, a store for data similar to a virtual disk, and an object store, a key/value store. Both have their strengths and weaknesses. Volume stores are drop-in replacements for virtual disks. If you can use stdio, you can likely use a remote volume store. The problem is, they often have the structure of a disk. If you want anything more than that, you're asking for a database. The object store is a "middle ground" between the disk and the database. It's fast, and semi-structured.
I'm an OpenStack user myself -- first, because it does provide both storage families, and second, it's supported by a variety of vendors, so, if you decide to move from vendor A to vendor B, your code can remain unchanged. You can even run a copy of it on your own machines (Go to www.openstack.org) Note however, OpenStack does like memory. You're not going to run your private cloud on a 4GB laptop! Consider two 16GB machines.