Is it possible to run IoT-Agent for Ultralight 2.0 without MondoDB link (with memory type of data hold)? - fiware

During configuring IoT-Agent for Ultralight 2.0 there is a possibility to set docker variable IOTA_REGISTRY_TYPE- Whether to hold IoT device info in memory or in a database (mongodb by default). Documentation that I'm referencing.
Firstly I would like to have it set for memory and what would it imply?
Could the data be preserved only in some allocated part of memory within docker env.? Could I omit further variables within configuration file, like IOTA_MONGO_HOST (The hostname of mongoDB - used for holding device information).
Architecture for my system has raspberry pi running IoT Agent and VM running Orion Context Broker and MongoDB. Both are reachable because they see each other in LAN. Is it necessary for MongoDB to be the same database for IoT Agent and Orion Context Broker if they are linked?
Is it possible to run IoT Agent with memory only type of device information persistence (instead of database type)? Will it have any effect on whole infrastructure running besides of obvious lack of device data holding?

Firstly I would like to have it set for memory and what would it imply?
There would be no need for a MongoDB database attached to the IoT Agent, there would be no persistence of provisioned devices in the event of disaster recovery.
Could the data be preserved only in some allocated part of memory within docker env.?
No
Could I omit further variables within configuration file, like IOTA_MONGO_HOST (The hostname of mongoDB - used for holding device information).
The Docker Env parameters are merely overrides to the values found in the config.js within the Enabler itself, so all of the ENV variables can be omitted if you are using defaults.
Is it necessary for MongoDB to be the same database for IoT Agent and Orion
Context Broker if they are linked?
The IoT Agent and Orion can run entirely separately and usually would use separate MongoDB instances. At least this would be the case in a properly architected production environment.
The Step-by-Step Tutorials are lumping everything together on one Docker engine for simplicity. A proper architecture has been sacrificed to keep the narrative focused on the learning goals. You don't need two Mongo-DB instances to handle less than 20 dummy devices.
When deploying to a production environment, try looking at the SmartSDK Recipes
in order to scale up to a proper architecture:
see: https://smartsdk.github.io/smartsdk-recipes/
Is it possible to run IoT Agent with memory only type of device information
persistence (instead of database type)? Will it have any effect on whole
infrastructure running besides of obvious lack of device data holding?
I haven't checked this, but there may be a slight difference in performance since memory access should be slightly faster. The pay-off is that you will lose the provisioned state of all devices if failure occurs. If you need to invest in disaster recovery then Mongo-DB is the way to go, and periodically back-up your database so you can always return to last-known-good

Related

Should I use k8s statefulsets directly or mysql-operator to deploy master-slave mysql cluster?

So I want to deploy a master-slaves MySQL cluster in k8s. I found 2 ways that seem popular:
The first one is to use statefulsets directly from k8s official document: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/
The second one is to use operator, i.e. https://github.com/oracle/mysql-operator
Which way is most commonly used?
Also, in statefulsets, if my MySQL master dies, will k8s automatically promote the slave to be the master?
Lastly, when my logic backend app performs an operation (CRUD) to MySQL cluster, how does k8s know which pod to route to, i.e. write operation can only be sent to master while read is sent to all?
Users can deploy and maintain a set of highly available MySQL services in k8s based on StatefulSets, the process is relatively complex. This process requires users to familiarize themselves with various k8s resource objects, learn many MySQL operation details and maintain a set of complex management scripts. Kubernetes Operators are designed to reduce the threshold for deploying complex applications on k8s.
Operator hides the orchestration details of complex applications and greatly reduces the threshold to use them in k8s. If you need to deploy other complex applications, we recommend that you use the Operator.
Speaking about master election while using StatefulSet.
Electing potential slave to be a master is not an automatic process - you have to configure this manually using Xtrabackup - here is more information - setting_up_replication.
Take a look: cloning-existing-data, starting-replication, mysql-statefulset-operator.
Useful tools: vitess for better MySQL networking management and percona-xtradb-cluster that provides superior performance, scalability and instrumentation.

Are custom metadata values for GCE instance stored securely?

I was wondering if custom metadata for google compute engine VM instances was an appropriate place to store sensitive information for configuring apps that run on the instance.
So we use container-optimised OS images to run microservices. We configure the containers with environment variables for things like creds for db connections and other systems we integrate with.
The VMs are treated as ephemeral for each CD deployment and the best I have come up with so far is to create an instance template with config values loaded via a file I keep on my local machine into the VM custom metadata, which is then made available to a systemctl unit when the VM starts up (cloud-config).
The essence of this means environment variable values (some containing creds) are uploaded by me (which don't change very much) and are then pulled from the VM instance metadata server when a new VM is fired up. So I'm just wondering if there's any significant security concerns with this approach...
Many thanks for your help
According to the Compute Engine documentation :
Is metadata information secure?
When you make a request to get
information from the metadata server, your request and the subsequent
metadata response never leaves the physical host running the virtual
machine instance.
Since the request and response are not leaving the physical host, you will not be able to access the metadata from another VM or from outside Google Cloud Platform. However, any user with access the VM will be able to query the metadata server and retrieve the information.
Based on the information you provided, storing credentials for a test or staging environment in this manner would be acceptable. However, if this is a production system with customer or information important to the business, I would keep the credentials in a secure store that tracks access. The data in the metadata server is not encrypted, and accesses are not logged.

Is it recommended to run clustered database with Kubernetes in production environment?

Is it reasonable to use Kubernetes for a clustered database such as MySQL in production environment?
There are example configurations such as mysql galera example. However, most examples do not make use of persistent volumes. As far as I've understood persistent volumes must reside on some shared file system as defined here Kubernetes types of persistent volumes. A shared file system will not guarantee that the database files of the pod will be local to the machine hosting the pod. It will be accessed over network which is rather slow. Moreover, there are issues with MySQL and NFS, for example.
This might be acceptable for a test environment. However, what should I do in a production environment? Is it better to run the database cluster outside Kubernetes and run only application servers with Kubernetes?
The Kubernetes project introduced PetSets, a new pod management abstraction, intended to run stateful applications. It is an alpha feature at present (as of version 1.4) and moving rapidly. A list of the various issues as we move to beta are listed here. Quoting from the section on when to use petsets:
A PetSet ensures that a specified number of "pets" with unique identities are running at any given time. The identity of a Pet is comprised of:
a stable hostname, available in DNS
an ordinal index
stable storage: linked to the ordinal & hostname
In addition to the above, it can be coupled with several other features which help one deploy clustered stateful applications and manage them. Coupled with dynamic volume provisioning for example, it can be used to provision storage automatically.
There are several YAML configuration files available (such as the ones you referenced) using ReplicaSets and Deployments for MySQL and other databases which may be run in production and are probably being run that way as well. However, PetSets are expected to make it a lot easier to run these types of workloads, while supporting upgrades, maintenance, scaling and so on.
You can find some examples of distributed databases with petsets here.
The advantage of provisioning persistent volumes which are networked and non-local (such as GlusterFS) is realized at scale. However, for relatively small clusters, there is a proposal to allow for local storage persistent volumes in the future.

Why is spark filling the tmp (spark.local.dir) in the machine that submits jobs?

I have a spark 1.2.1 cluster set up in standalone mode with a master and a few slaves. I then let my data scientists enjoy the cluster's power.
All is working fine. However, the dedicated server that my data scientists used to submit spark jobs have its spark.local.dir filled up gradually.
Given that this machine is sitting outside of the cluster, not a master, nor a worker/slave, I wouldn't think that the local spark.local.dir is used in any way by spark. (And why would it? It only shows the logs.)
I could not find a good doc detailing this part of information. Does anybody have an idea?
Not enough information about your setup to be sure, but I am guessing that the jobs are launched in client mode where the driver would be on your client node.
From the spark docs:
In client mode, the driver is launched in the same process as the client that submits the application. In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish.
I am guessing that in client mode the driver (on your client machine) of the application needs plenty of scratch space to manage the other workers in that case.

Can I install MySQL on the VMs provided in Azure Cloud Services?

From what I gather, the only way to use a MySQL database with Azure websites is to use Cleardb but can I install MySQL on VMs provided in Azure Cloud Services. And if so how?
This question might get closed and moved to ServerFault (where it really belongs). That said: ClearDB provides MySQL-as-a-Service in Azure. It has nothing to do with what you can install in your own Virtual Machines. You can absolutely do a VM-based MySQL install (or any other database engine that you can install on Linux or Windows). In fact, the Azure portal even has a tutorial for a MySQL installation on OpenSUSE.
If you're referring to installing in web/worker roles: This simply isn't a good fit for database engines, due to:
the need to completely script/automate the install with zero interaction (which might take a long time). This includes all necessary software being downloaded/installed to the vm images every time a new instance is spun up.
the likely inability for a database cluster to cope with arbitrary scale-out (the typical use case for web/worker roles). Database clusters may or may not work well when a scale-out occurs (adding an additional vm). Same thing when scaling in (removing a vm).
less-optimal attached-storage configuration
inability to use Linux VMs
So, assuming you're still ok with Virtual Machines (vs stateless Cloud Service vm's): You'll need to carefully plan your deployment, with decisions such as:
Distro (Ubuntu, CentOS, etc). Azure-supported Linux distro list here
Selecting proper VM size (the DS series provide SSD attached disk support; the G series scale to 448GB RAM)
Azure Storage attached disks being non-Premium or Premium (premium disks are SSD-backed, durable disks scaling to 1TB/5000 IOPS per disk, up to 32 disks per VM depending on VM size)
Virtual network configuration (for multi-node cluster)
Accessibility of database cluster (whether your app is in the vnet or accesses it through a public endpoint; and if the latter, setting up ACL's)
Backup / HA / DR planning
Someone else mentioned using a pre-built VM image from VM Depot. Just realize that, if you go that route, you're relying on someone else to configure the database engine install for you. This may or may not be optimal for what you're trying to achieve. And the images may or may not be up-to-date with the latest versions, patches, etc.
Of course, what I wrote applies to any database engine you install in your own virtual machines, where a service provider (such as ClearDB) tends to take care of most of these things for you.
If you are talking about standard VMs then you can use a pre-built images on VMDepot for that.
If you are talking about web or worker roles (PaaS) I wouldn't recommend it, but if you really want to you could. You would need to fully script the install of the solution on the host. The only downside (and it's a big one) you would have would be the that the host will be moved to a new host at some point which would mean your MySQL data files would be lost - if you backed up frequently and were happy to lose some data then this option may work for you.
I think, that the main question is "what You want to achieve?". As I see, You want to use PaaS solution with Web Apps or Cloud Service and You need a MySQL database. If Yes, You have two options (both technically as David Makogon said). First one is to deploy Your own (one) server with MySQL and connect to it from the outside (internet side). Second solution is to create one MySQL server or cluster and connect Your application internally in Azure virtual network. WIth Cloud Service it is simple but with Web App it is not. You must create VPN gateway in Azure VM and connect Your Web App to this gateway. In this way You will have internal connection wfrom Your application to Your own MySQL cluster.