Orion-LD tenant issue in multiple instance environment - fiware

In my environment, multiple OrionLD instances are running on a Kubernetes cluster.
The environment consists of two OrionLD(0.8.0) instances , one MongoDB instance, and a LoadBalancer to OrionLD.
I created an entity with a new tenant by using "NGSILD-Tenant" header.
Next, when I tried to retrieve it with "GET /entities", sometimes the retrieval succeeded, and sometimes it failed.
The error message was below.
{
"type": "https://uri.etsi.org/ngsi-ld/errors/NonExistingTenant",
"title": "No such tenant",
"detail": "Tenant01"
}
It seems that one OrionLD instance can recognize the new tenant, but the other cannot.
What is a possible cause of this issue?
Thanks.

ok, this seems to be a problem in the broker. Create an issue on Orion-LD's github, please: https://github.com/FIWARE/context.Orion-LD/issues.
I recently implemented tenant checks for retrievals. It's OK to create new tenants on the fly (entity create operations), but for queries, the tenant must exist already, and the list is in RAM. Meaning, only the broker that created the entity knows about the tenant. It completely explains your problem.
I didn't think about this use case, but you are absolutely right.
I will have to improve the way I check for "tenant exists" for retrieval operations.

So, as it seemed, the bug was mine and it has been fixed and accepted (just to clarify this now "non-issue")

Related

Microservices centralized database model

Currently we have some microservice, they have their own database model and migration what provided by GORM Golang package. We have a big old MySQL database which is against the microservices laws, but we can't replace it. Im afraid when the microservices numbers start to growing, we will be lost in the many database model. When I add a new column in a microservice I just type service migrate to the terminal (because there is a cli for run and migrate commands), and it is refresh the database.
What is the best practice to manage it. For example I have 1000 microservice, noone will type the service migrate when someone refresh the models. I thinking about a centralized database service, where we just add a new column and it will store all the models with all migration. The only problem, how will the services get to know about database model changes. This is how we store for example a user in a service:
type User struct {
ID uint `gorm:"column:id;not null" sql:"AUTO_INCREMENT"`
Name string `gorm:"column:name;not null" sql:"type:varchar(100)"`
Username sql.NullString `gorm:"column:username;not null" sql:"type:varchar(255)"`
}
func (u *User) TableName() string {
return "users"
}
Depending on your use cases, MySQL Cluster might be an option. Two phase commits used by MySQL Cluster make frequent writes impractical, but if write performance isn't a big issue then I would expect MySQL Cluster would work out better than connection pooling or queuing hacks. Certainly worth considering.
If I'm understanding your question correctly, you're trying to still use one MySQL instance but with many microservices.
There are a couple of ways to make an SQL system work:
You could create a microservice-type that handles data inserts/reads from the database and take advantage of connection pooling. And have the rest of your services do all their data read/writes through these services. This will definitely add a bit of extra latency to all your writes/reads and likely be problematic at scale.
You could attempt to look for a multi-master SQL solution (e.g. CitusDB) that scales easily and you can use a central schema for your database and just make sure to handle edge cases for data insertion (de-deuping etc.)
You can use data-streaming architectures like Kafka or AWS Kinesis to transfer your data to your microservices and make sure they only deal with data through these streams. This way, you can de-couple your database from your data.
The best way to approach it in my opinion is #3. This way, you won't have to think about your storage at the computation layer of your microservice architecture.
Not sure what service you're using for your microservices, but StdLib forces a few conversions (e.g. around only transferring data through HTTP) that helps folks wrap their head around it all. AWS Lambda also works very well with Kinesis as a source to launch the function which could help with the #3 approach.
Disclaimer: I'm the founder of StdLib.
If I understand your question correctly it seems to me that there may be multiple ways to achieve this.
One solution is to have a schema version somewhere in the database that your microservices periodically check. When your database schema changes you can increase the schema version. As a result of this if a service notices that the database schema version is higher than the current schema version of the service it can migrate the schema in the code which gorm allows.
Other options could depend on how you run your microservices. For example if you run them using some orchestration platform (e.g. Kubernetes) you could put the migration code somewhere to run when your service initializes. Then once you update the schema you can force a rolling refresh of your containers which would in turn trigger the migration.

AWS RDS showing outdated data for Multiple AZ instance (MySQL)

My RDS instance was showing outdated data temporarily.
I ran a SELECT query on my data. I then ran a query to delete data from a table and another to add new data to the table. I ran a SELECT query and it was showing the old data.
I ran the SELECT query AGAIN and THEN it finally showed me the new data.
Why would this happen? I never had these issues locally or on my normal non AZ instances. Is there a way to avoid this happening?
I am running MySQL 5.6.23
According to the Amazon RDS Multi-AZ FAQs, this might be expected.
Specifically this:
You may observe elevated latencies relative to a standard DB Instance deployment in a single Availability Zone as a result of the synchronous data replication performed on your behalf.
Of course, it depends on the frequency of the delays you're observing and what is the increased latency you're seeing, but an option would be to contact AWS support in case the issue is frequently reproducible.
As embarrassing as this is... it was an issue in our Spring Java code and not AWS.
A method modified a database entity object. The method itself wasn't transactional but was called from a transactional context which would persist any changes on entities to the database.
It looked like it was rolling back changes, but what it was doing was just overwriting data. My guess is it overwrote the data a while ago so until someone tried to modify it we just assumed it was the correct data.

Error accessing Cosmos through Hive

Literally from:
https://ask.fiware.org/question/84/cosmos-error-accessing-hive/
As the answer in the quoted FIWARE Q&A entry suggest the problem is fixed by now. its here: https://ask.fiware.org/question/79/cosmos-database-privacy/. However, it seems like other issues arisen related to the solution, namely: Through ssh connection, the typing the hive command results in the following error: https://cloud.githubusercontent.com/assets/13782883/9439517/0d24350a-4a68-11e5-9a46-9d8a24e016d4.png the hiveSQL queries work fine (through ssh) regardless the error message.
When launching exactly the same hiveSQL queries (each one of them worked flawlessly two weeks ago) remotely, the request times out even in absurd time windows (10 minutes). The most basic commands ('use $username;', 'show tables';) also time out.
(The thrift client is: https://github.com/garamon/php-thrift-hive-client)
Since the Cosmos usage is an integral part of our project, it is of utmost importance whether it is a temporal issue caused by the fixes or it is a permanent change in the remote availability (could not identify relevant changes in the documentation).
Apart from fixing the issue you mention, we moved to a HiveServer2 deployment instead of the old Hive server (or HiveServer1), which had several performance drawbacks dued to, indeed, the usage of Thrift (particularly, only one connection could be served at the same time). HiveServer2 now allows for parallel queries.
Being said that, most probably the client you are using is not valid anymore since it could be specifically designed for working with a HiveServer1 instance. The good news are it seems there several other client implementations for HS2 using PHP, such as https://github.com/QwertyManiac/hive-hs2-php-thrift (this is the first entry I found when performing a search in Google).
What is true is this is not officialy documented anywhere (it is only mentioned in this other SOF question). So, nice catch! I'll add it inmediatelly.

How to perform targeted select queries on main DB instance when using Amazon MySQL RDS and Read replica?

I'm considering to use Amazon MySQL RDS with Read Replicas. The only thing disturbing me is Replica Lag and eventual inconsistency. For example, image the case when user modifies his profile (UPDATE will be performed on main DB instance) and then refreshes the page to see changed info (SELECT might be performed from Replica which has not received changes yet due to Replica Lag).
By accident, I found Amazon article which mentions its possible to perform targeted queries. For me it sounds like we can add some parameter or other to tell Amazon to execute select on the main DB instance instead of on Replica. The example with user profile is quite trivial but the same problem occurs in more realistic cases, for example checkout, when a user performs several steps and he needs to see updated info on then next screens. Yes, application could cache entire data set on its own, however it would be great if anybody knows how to perform targeted queries on main DB instance.
I read the link you referenced and didn't find any mention of "target" or anything like that.
But this line might be what you're referring to:
Otherwise, you should spread out the load and read from one of the
Read Replicas. You can make this decision on a query-by-query basis
within your application. You will probably want to maintain some sort
of registry of available Read Replicas within your application,
choosing from among them on a round-robin or randomly distributed
basis.
If so, then I interpret that line to suggest that you can balance reads in your application by just picking one server from a pool and hitting that one. But it would be all in your application logic.

providing the right sequelize.js object to connect to the database(multi-tenant application)

First some Background:
We are trying to create a multi-tenant application, we thought of going on with using the mean stack first and create multiple collection for each tenant (eg order_tenant1,order_tenant2 etc) then we went through some blogs that suggested against this approach, second we felt the need of transaction as a core requirement of our DB thus opened our self's to RDBMS lke mysql and mariaDB, we stumbled upon a blog which explained the approach in a lot of detail which says to create views to get, update and insert data related to tenant and views parameter would be defined thorugh the connection string as we are using node.js i found ORM for mysql sequelizejs which is quite good.
The actual problem:
As per my experience of the mean stack we define the mongo connection in the server.js file and the application establishes those connection at the application start and keeps them alive,
how can i have multiple sequelizejs (or for that matter and database connection )objects to connect to the database according to the user belonging to a particular tenant and provide the right object to the application to carry on with the business logic
1)should i create a new connection object on every request the application get and then close it after the request is processed ?
2)or is there any better way to handle this in node, express or sequelizejs!?
Edited:
we have decided to use row-base approch containing the tenant_id as a column as said in the blog above, but i am struggling about how would i maintain dirrent connection object to the database through sequelizejs objects i.e id a user belonging to tenant id:1 sends a request to the application he need to be serverd with an object say "db" which is a sequelize object to communicate with the database which is created using tenant id 1's details in its connection string, same for a user belonging tenant id:2 it needs to be served with the same object i.e. "db" but it must be created using tenant id 2's details in its connection string as i want to maintain different connection string (database connection objects) for every tenant i have to serve.
Multi-tenancy can be implemented as row-based, schema-based, or database-based. Other than that 2010 article, I think you will find few if any other recommendations for doing database-backed multi-tenancy. Systems were not designed to talk to tens or thousands of databases, and so things will keep failing on you. The basic thing you're trying to avoid is SQL injection attacks that reveal other user's data, but the proper way to avoid those is through sanitizing user inputs, which you need to do no matter what.
I highly recommend going with a normal row-based multi-tenancy approach as opposed to schema-based, as described in https://devcenter.heroku.com/articles/heroku-postgresql#multiple-schemas and in this original article: http://railscraft.tumblr.com/post/21403448184/multi-tenanting-ruby-on-rails-applications-on-heroku
Updated:
Your updated question still isn't clear about the difference between database-based and row-based multi-tenancy. You want to do row-based. Which means that you can setup a single Sequelize connection string exactly like the examples, since you'll only have a single database.
Then, your queries to the database will look like:
User.find({ userid: 538 }).complete(function(err, user) {
console.log(user.values)
})
The multi-tenancy is provided by the userid attribute. I would urge you to do a lot more reading about databases, ORM, and typical patterns before getting started. I think you will find an additional up front investment to pay dividends versus starting development when you don't fully understand how ORMs typically work.