Migrating previously collected datasets to FIWARE backend - fiware

Having at hand, the task of migrating previously collected environmental datasets (weather, airquality, noise etc) from sensors deployed in different locations, and stored in several tables of MySQL database, to my instance of fiware Orion CB, and thus persisted to fiware backend.
The challenges are many:
the data isn't stored in fiware standards, so must be transformed according to the fiware data models.
not all tables are a good candidates of being transformed to an Entity.
some Entities need have field values from several tables as attributes. For instance, defining AirQualityObserved Entity-type would have attributes from these tables: airquality, co, co2, no2 and deployment. So mapping these attributes to a particular Entity-type is a challenge.
As this is a one-time upload (not live data), I am thinking of two possibilities to go about it.
Add an LwM2M client, to keep sending data to an IoTAgent and eventually passed to Orion CB until the last record.
Create a Python script that "pretends" to be a contextProvider to the Orion instance, sending data (say every 5sec) until the last record.
I have not come across a case in my literature search that addresses such a situation. Is there any recommendations from FIWARE Foundation for situations similar to this?
How would you suggest about data fields --> Entity's attributes mapping that actually need be combined from several tables?

IOTA usage makes sense when you have live data (I mean, a real device sending information to the FIWARE platform). However, you say this is a one-time upload, so the Python script option seems better this case.
(A little terminological comment here: your script will take the role of context producer. A context provider is a different actor, related with registrations and query/update forwarding. See this piece of documentation for additional detail).
With regards to the data fields to Entity's attributes mapping I don't have any particular suggestion. This is just a matter of analyzing the data model (i.e. entity attributes) and find how to set that information from your data in the tables.

Related

How to deal with NGSI-LD tenants ? Create ? List ? Delete?

I have a hard time trying to find concrete information about how to deal with tenants in a NGSI-LD Context Broker.
The ETSI specification defines multi-tenancy but it seems it doesn't specify any operations to create a tenant, list available tenants and remove a tenant.
I presume each broker is free to implement multi-tenancy its own way but I've searched the tenant keyword in broker documentations (at least for Orion-LD, Stellio and Scorpio) with no success.
Thanks to this stackoverflow post I've successfully created a tenant in Orion-LD.
I'd like to know if there are some tenants operations (documented or undocumented) exposed by brokers.
Especially any facilities to remove a tenant - along with all resources that have been created "inside".
Thanks.
So, first of all, tenants are created "on the fly". In comes a request to create an entity, subscription, or registration, and the HTTP header "NGSILD-Tenant" specifies a tenant that does not already exists, then the tenant is created and the entity/sub/reg is created "under it". For ALL operations, the HTTP header is used. jsonldContexts are different, those are "omnipresent" and exist regardless of the tenant used.
GET operations (like all other operations) can use the NGSILD-Tenant header to indicate on which tenant the GET is to be performed, but if that tenant does not exist, it will naturally not be created - an error is retuirned instead.
There are no official endpoints to list tenants (nor delete - that would be a bit dangerous!), but in the case of Orion-LD, I implemented an endpoint for debugging purposes: GET /ngsi-ld/ex/v1/tenants. That one you can use if you please. Just remember, no other NGSI-LD broker is supporting that endpoint

Design database schema to support multi-tenant in MYSQL

I'm working on a School manager software in ASP that connects to an MYSQL DB. The software is working great when I deploy it in local machine for each user (SCHOOL), but I want to migrate software to AZURE cloud. The users will have an account to connect to the same app but data must not mix with other schools data. My problem is to find the best way to deploy and manage the database.
Must I Deploy 1 DB for each school
All school DATA in the same DB.
I'm not sure my solutions are the best ways.
I don't want ex STUDENT TABLE( content student for school X, for SCHOOL Y, ...)
please help to find the best solution.
There are multiple possible ways to design schema to support multi-tenant. The simplicity of the design depends on the use case.
Separate the data of every tenant (school) physically, i.e., one
schema must contain data related to only a specific tenant.
Pros:
Easy for A/B Testing. You can release updates which require database changes to some tenants and over time make it available for others.
Easy to move the database from one data-center to another. Support different SLA for backup for different customers.
Per tenant database level customization is easy. Adding a new table for customers, or modifying/adding a field becomes easy.
Third party integrations are relatively easy, e.g., connecting your data with Google Data Studio.
Scaling is relatively easy.
Retrieving data from one tenant is easy without worrying about the mixing up foreign key values.
Cons:
When you have to modify any field/table, then your application code needs to handle cases where the alterations are not completed in some databases.
Retrieving analytics across customers becomes difficult. Designing Queries for usage analysis becomes harder.
When integrating with other databases system, especially NoSQL, you will need more resources. e.g., indexing data in Elasticsearch for every tenant will require index per tenant, and if there are thousands of customers, it will result in creating thousands of shards.
Common data across tenants needs to be copied in every database
Separate data for every tenant (school) logically, i.e., one schema
contains data for all the tenants.
Pros:
Software releases are simple.
Easy to query usage analytics across multiple tenants.
Cons:
Scaling is relatively tricky. May need database sharding.
Maintaining the logical isolation of data for every tenant in all the tables requires more attention and may cause data corruption if not handled at the application level carefully.
Designing database systems for the application that support multiple regions is complicated.
Retrieving data from a single tenant is difficult. (Remember: all the records will be associated with some other records using foreign keys.)
This is not a comprehensive list. These are based on my experiences with working on both the type of designs. Both the designs are common and are used by multiple organization based on the usecase.

Firestore options given that it does not support namespaces

Is there anything out there (cloud JSON datastore offering, supporting mobile apps), as performant and feature rich as GCP Firestore in native mode, but provides namespaces?
The lack of this expected feature of Firestore (or any database for that matter), in native mode, is a deal killer all around, due to the nature of the software development cycle, multiple environment needs (dev, test, prod), continuous deployment and delivery pipelines, JSON data models that use namespaces, and much more.
If not, on your really big Firestore project, what are you doing to create development, test, integration environments, or areas, that people can work in, or to support seperately running related applications in production, each needing their own namespace, or set of collections defined, without having to create and manage a bazillion projects, Firestore native instances, and service accounts for each (each project/Firestore instance needs a service account .json file to be created, distributed to developers, securely stored), each additional instance adding more management overhead, and without having to run Firestore in GCP Datastore mode, in which mode, you lose all the advantages, features and main selling points, that led you to chose Firestore to support your app in the first place?
Optional Reading: Background / Context:
I recently joined a new project, was asked to create a JSON data model for various services (independently running programs), that comprise the whole, and also setup sample data for multiple runtime environments like 'dev1', 'dev2', 'test', 'prod', where the data model might be in flux, or different in 'dev' or 'test' for a period, until the next production deployment of an updated data model. I have done this in the past with GCP Datastore, and other databases of all types (NoSQL and Not NoSQL).
At the time, the JSON document store (database), had not been chosen, it was undecided. While working on the data model, and plan for multiple environments to support development efforts, was told that the datastore to be chosen was Firestore, and subsequently in the process of trying to implement the basic CRUD operations to insert sample data, and create separate sandbox areas in the same Firestore instance where 'dev1' and 'dev2' could work, and be as destructive as they want, within their own area, without affecting each other, have found that namespaces are not supported in native mode (and the client wants native mode, and what is offered there, otherwise they will look at another product for implementation).
And so now, where we thought we would need only two projects, with a Firestore instance, if we stick with Firestore in native mode across the board, we would need thirty six instances. Because of this, I am seeking input as to what is being done out there to avoid or minimize so many projects/instances. I have a solution to submit to the company, that involves not using Firestore, but thought I would ask before abandoning this. We need the ability to segregate, isolate, partition, compartmentalize data for common software development lifecycle needs, and our own application runtime needs, and all the while, in each of these environments, match the production infrastructure as much as possible.
Our need for namespaces, has nothing to do with supporting multiple clients or multitennacy, as is often cited in Google documentation I have found (as seemingly the primary purpose, and the only use case for this), and historically, that is one less frequently implemented use case of namespaces, out of many hundreds more.
We only want a max two projects and database instances (two Firestore native instances):
Production
Everything else under the sun that is not production: 'dev1',
'dev2', 'test1', 'test2', 'tmp-test-whatever'
With any database product, you should need only one database instance, and have a mechanism to support segregation and isolation of data, and data definitions, creating as many namespaces, as you want, within that database. Some database products refer to each namespace as a database. I want to distinguish here between the runtime software I am calling a "database", or "database instance", and the area where data is defined and contained (the namespace).
Oracle, PostgreSQL and others, call these segregated areas "schemas". Other data formats, XML and many more support the notion of "namespaces", so as to provide isolation and avoid data collisions of definitions with the same name.
Google Datastore supports namespaces, they say for multiple tenancy, but in such a way that each namespace is not isolated, protected as is with other database products. Any person that has access to that Datastore instance, can do anything with ALL namespaces, and there is no way to create a service account that restricts entirely, or limits access to a particular namespace.
With this Firestore backed project in production, there will be multiple separatly running services at any one time hitting, what we had hoped to be a single Firestore instance in native mode. Some will run on mobile, some will run on another VM instance (Web app initiated CRUD opertations on various collections/documents). All services are part of the same product. Some of these separate services have collections with the same name.
Example: a 'users' collection:
{ service1: <== 'service1' is the namespace, it has multiple collections, 'users' which is just one for example.
{ users:
{ user: {
login: <login_name>
<other fields>:
}
}
}
Now another name space, that also has a 'users' collection, with a different data definition, and different set of data from the above
{ service2: <== 'service2' is the name space
{ users:
{ user: {
first_name: <first_name>
last_name: <last_name>
<other fields>:
}
}
}
----
and other services that have their own collections.
Other use cases for namespaces, as I have mentioned above:
environment, like 'dev', or 'test' for example, for use in modification of any collection, such as adding, reworking the data model during development.
a unit test we want to write, that would insert data in a unique name space devoted temporarily to just that test, data would be inserted, the test would run, and at the end all data belonging to that temporary namespace would be deleted.
a namespace used by the mobile app portion of the product
a namespace to support the web app portion of the product, we are trying to use one datastore product for the entire product
a namespace environment for CI to do various things
I proposed something that would work for the data model in Firestore native mode, but it is very undesirable and kludgy, like having the service name and environment in the collection name: dev1_service1_users, dev1_service2_users, and so on to distinguish, and avoid collisions.
Firestore native gives you one namespace, they call default, but it is not a name space at all, it is a complete absence of one.
The only solution I see is to not use Firestore, but some other JSON datastore that would get us close to what Firestore native offers, a solution we would install, update and manage on a VM in the cloud, and manage all that infrastructure (load balancing, +much more).
I will post the direction we take, for anyone interested, or having a similar problem or discussion.

Microservice Architecture design

I have few doubts on Microservice architecture.
Lets say there are microservices A, B and C.
A maintains the context of a job apart from other things it does and B,C work to fulfill that job by doing respective tasks for that job.
Here I have questions.
1. DB design
I am talking about SQL here.
Usage of foreign keys simplifies lot of things.
But as I understand microservice architecture, every microservice maintaines its own data and data has to be queried from that service if required.
Does it mean no foreign keys referring to tables in another microservices?
2. Data Flow
As I see here are two ways.
All the queries are done using jobId maintained uniquely in all microservices for a job.
Client requests go directly to individual service for a task. To get summary of the job, client queries individual microservices collects the data and passes to user.
Do everything through coordinating microservice. Client requests go to service A and in tern service A will gather info from all other microservices for that jobId and passed that to user.
Which of the above two has to be followed and why?
You're correct thinking that microservices should ideally have their own data structure so they can be deployed independently. However there are several design patterns that help you, and that doesn't necessarily translates in "No FK". Please refer to:
Database per service
Sagas
API Composition
CQRS
The patterns listed above answer both your questions.
Does it mean no foreign keys referring to tables in another microservices?
Not in the database sense. One microservice may hold IDs of remote entities but should not assume anything about the remote microservice persistence (i.e. the database type, it could be anything from SQL to NoSQL).
Which of the above two has to be followed and why?
This really depends. There are two types of architectures: choreography and orchestration. Both of them are good. Which one to use? Only you can decide. Here are a few blog posts about them:
Microservices — When to React Vs. Orchestrate
Benefits of Microservices - Choreography over Orchestration, Low Coupling and High Cohesion
Also, the solution to this SO question might be useful.

What are best practices for partitioning DocumentDB across accounts?

I am developing an application that uses DocumentDB to store customer data. One of the requirements is that we segregate customer data by geographic region, so that US customers' data is stored within the US, and European customers' data lives in Europe.
The way I planned to achieve this is to have two DocumentDB accounts, since an account is associated with a data centre/region. Each account would then have a database, and a collection within that database.
I've reviewed the DocumentDB documentation on client- and server-side partitioning (e.g. 1, 2), but it seems to me that the built-in partitioning support will not be able to deal with multiple regions. Even though an implementation of IPartitionResolver could conceivably return an arbitrary collection self-link, the partition map is associated with the DocumentClient and therefore tied to a specific account.
Therefore it appears I will need to create my own partitioning logic and maintain two separate DocumentClient instances - one for the US account and one for the Europe account. Are there any other ways of achieving this requirement?
Azure's best practices on data partitioning says:
All databases are created in the context of a DocumentDB account. A
single DocumentDB account can contain several databases, and it
specifies in which region the databases are created. Each DocumentDB
account also enforces its own access control. You can use DocumentDB
accounts to geo-locate shards (collections within databases) close to
the users who need to access them, and enforce restrictions so that
only those users can connect to them.
So, if your intention is to keep the data near to user (and not just keep them stored separate) your only option is to create different accounts. Lucky that billing is not per account based but per collection based.
DocumentDB's resource model gives an impression that you can not (atleast out of the box) mix DocumentDB accounts. It doesn't look like partition keys are of any use too as partitions too can happen only within the same account.
May be this sample would help you or give some hints.