Firestore options given that it does not support namespaces - json
Is there anything out there (cloud JSON datastore offering, supporting mobile apps), as performant and feature rich as GCP Firestore in native mode, but provides namespaces?
The lack of this expected feature of Firestore (or any database for that matter), in native mode, is a deal killer all around, due to the nature of the software development cycle, multiple environment needs (dev, test, prod), continuous deployment and delivery pipelines, JSON data models that use namespaces, and much more.
If not, on your really big Firestore project, what are you doing to create development, test, integration environments, or areas, that people can work in, or to support seperately running related applications in production, each needing their own namespace, or set of collections defined, without having to create and manage a bazillion projects, Firestore native instances, and service accounts for each (each project/Firestore instance needs a service account .json file to be created, distributed to developers, securely stored), each additional instance adding more management overhead, and without having to run Firestore in GCP Datastore mode, in which mode, you lose all the advantages, features and main selling points, that led you to chose Firestore to support your app in the first place?
Optional Reading: Background / Context:
I recently joined a new project, was asked to create a JSON data model for various services (independently running programs), that comprise the whole, and also setup sample data for multiple runtime environments like 'dev1', 'dev2', 'test', 'prod', where the data model might be in flux, or different in 'dev' or 'test' for a period, until the next production deployment of an updated data model. I have done this in the past with GCP Datastore, and other databases of all types (NoSQL and Not NoSQL).
At the time, the JSON document store (database), had not been chosen, it was undecided. While working on the data model, and plan for multiple environments to support development efforts, was told that the datastore to be chosen was Firestore, and subsequently in the process of trying to implement the basic CRUD operations to insert sample data, and create separate sandbox areas in the same Firestore instance where 'dev1' and 'dev2' could work, and be as destructive as they want, within their own area, without affecting each other, have found that namespaces are not supported in native mode (and the client wants native mode, and what is offered there, otherwise they will look at another product for implementation).
And so now, where we thought we would need only two projects, with a Firestore instance, if we stick with Firestore in native mode across the board, we would need thirty six instances. Because of this, I am seeking input as to what is being done out there to avoid or minimize so many projects/instances. I have a solution to submit to the company, that involves not using Firestore, but thought I would ask before abandoning this. We need the ability to segregate, isolate, partition, compartmentalize data for common software development lifecycle needs, and our own application runtime needs, and all the while, in each of these environments, match the production infrastructure as much as possible.
Our need for namespaces, has nothing to do with supporting multiple clients or multitennacy, as is often cited in Google documentation I have found (as seemingly the primary purpose, and the only use case for this), and historically, that is one less frequently implemented use case of namespaces, out of many hundreds more.
We only want a max two projects and database instances (two Firestore native instances):
Production
Everything else under the sun that is not production: 'dev1',
'dev2', 'test1', 'test2', 'tmp-test-whatever'
With any database product, you should need only one database instance, and have a mechanism to support segregation and isolation of data, and data definitions, creating as many namespaces, as you want, within that database. Some database products refer to each namespace as a database. I want to distinguish here between the runtime software I am calling a "database", or "database instance", and the area where data is defined and contained (the namespace).
Oracle, PostgreSQL and others, call these segregated areas "schemas". Other data formats, XML and many more support the notion of "namespaces", so as to provide isolation and avoid data collisions of definitions with the same name.
Google Datastore supports namespaces, they say for multiple tenancy, but in such a way that each namespace is not isolated, protected as is with other database products. Any person that has access to that Datastore instance, can do anything with ALL namespaces, and there is no way to create a service account that restricts entirely, or limits access to a particular namespace.
With this Firestore backed project in production, there will be multiple separatly running services at any one time hitting, what we had hoped to be a single Firestore instance in native mode. Some will run on mobile, some will run on another VM instance (Web app initiated CRUD opertations on various collections/documents). All services are part of the same product. Some of these separate services have collections with the same name.
Example: a 'users' collection:
{ service1: <== 'service1' is the namespace, it has multiple collections, 'users' which is just one for example.
{ users:
{ user: {
login: <login_name>
<other fields>:
}
}
}
Now another name space, that also has a 'users' collection, with a different data definition, and different set of data from the above
{ service2: <== 'service2' is the name space
{ users:
{ user: {
first_name: <first_name>
last_name: <last_name>
<other fields>:
}
}
}
----
and other services that have their own collections.
Other use cases for namespaces, as I have mentioned above:
environment, like 'dev', or 'test' for example, for use in modification of any collection, such as adding, reworking the data model during development.
a unit test we want to write, that would insert data in a unique name space devoted temporarily to just that test, data would be inserted, the test would run, and at the end all data belonging to that temporary namespace would be deleted.
a namespace used by the mobile app portion of the product
a namespace to support the web app portion of the product, we are trying to use one datastore product for the entire product
a namespace environment for CI to do various things
I proposed something that would work for the data model in Firestore native mode, but it is very undesirable and kludgy, like having the service name and environment in the collection name: dev1_service1_users, dev1_service2_users, and so on to distinguish, and avoid collisions.
Firestore native gives you one namespace, they call default, but it is not a name space at all, it is a complete absence of one.
The only solution I see is to not use Firestore, but some other JSON datastore that would get us close to what Firestore native offers, a solution we would install, update and manage on a VM in the cloud, and manage all that infrastructure (load balancing, +much more).
I will post the direction we take, for anyone interested, or having a similar problem or discussion.
Related
Migrating previously collected datasets to FIWARE backend
Having at hand, the task of migrating previously collected environmental datasets (weather, airquality, noise etc) from sensors deployed in different locations, and stored in several tables of MySQL database, to my instance of fiware Orion CB, and thus persisted to fiware backend. The challenges are many: the data isn't stored in fiware standards, so must be transformed according to the fiware data models. not all tables are a good candidates of being transformed to an Entity. some Entities need have field values from several tables as attributes. For instance, defining AirQualityObserved Entity-type would have attributes from these tables: airquality, co, co2, no2 and deployment. So mapping these attributes to a particular Entity-type is a challenge. As this is a one-time upload (not live data), I am thinking of two possibilities to go about it. Add an LwM2M client, to keep sending data to an IoTAgent and eventually passed to Orion CB until the last record. Create a Python script that "pretends" to be a contextProvider to the Orion instance, sending data (say every 5sec) until the last record. I have not come across a case in my literature search that addresses such a situation. Is there any recommendations from FIWARE Foundation for situations similar to this? How would you suggest about data fields --> Entity's attributes mapping that actually need be combined from several tables?
IOTA usage makes sense when you have live data (I mean, a real device sending information to the FIWARE platform). However, you say this is a one-time upload, so the Python script option seems better this case. (A little terminological comment here: your script will take the role of context producer. A context provider is a different actor, related with registrations and query/update forwarding. See this piece of documentation for additional detail). With regards to the data fields to Entity's attributes mapping I don't have any particular suggestion. This is just a matter of analyzing the data model (i.e. entity attributes) and find how to set that information from your data in the tables.
mySQL authoritative database - combined with Firebase
We have built a LAMP-stack API application via PHP Laravel. This currently uses a local mySQL instance. We have mostly implemented views in AngularJS. In order to use Firebase, we need to sync data between the authoritative store in mySQL with anything relevant that exists on Firebase, as close to real-time as possible. This means that other parts of the app which are not real-time and don't use Firebase can also serve up fresh content that's very recently been entered into the system. I know that Firebase is essentially a noSQL database in the cloud. My question is - how do I write a wrapper or a means to sync the canonical version of my Firebase into my database of record - mySQL? Update to answer - our final decision - ditching Firebase as an option We have decided against this, as we can easily have a socket.io instance on the same server with an extremely low latency connection to mySQL, so that the two can remain in sync. There's no need to go across the web when resources and endpoints can exist on localhost. It also gives us the option to run our app without any internet connection, which is important if we sell an on-premise appliance to large companies. A noSQL sync platform like Firebase is really just a temporary store that makes reads/writes faster in semi-real-time. If they attempt to get into the "we also persist everything for you" business - that's a whole different ask with much more commitment required. The guarantee on eventual consistency between mySQL and Firebase is more important to get right first - to prevent problems down the line. Also, an RDMS is essential to our app - it's the only way to attack a lot of data-heavy problems in our analytics/data mappings - there's very strong reasons most of the world still uses a RDMS like mySQL, etc. You can make those very reliable too - through Amazon RDS and Google Cloud SQL. There's no specific problem beyond scaling real-time sync that Firebase actually solves for us, which other open source frameworks don't already solve. If their JS lib actually handled offline scenarios (when you START offline) elegantly, I might have considered it, but it doesn't do that yet. So, YMMV - but in our specific case, we're not considering Firebase for the reasons given above.
The entire topic is incredibly broad, definitely too broad to provide a simple answer to. I'll stick to the use-case you provided in the comments: Imagine that you have a checklist stored in mySQL, comprised of some attributes and a set of steps. The steps are stored in another table. When someone updates this checklist on Firebase - how would I sync mySQL as well? If you insist on combining Firebase and mySQL for this use-case, I would: Set up your Firebase as a work queue: var ref = new Firebase('https://my.firebaseio.com/workqueue') have the client push a work item into Firebase: ref.push({ task: 'id-of-state', newState: 'newstate'}) set up a (nodejs) server that: monitors the work queue (ref.on('child_added') updates the item in the mySQL database removes the task from the queue See this github project for an example of a work queue on top of Firebase: https://github.com/firebase/firebase-work-queue
notifying applications on db INSERT
Consider an application with two components, possibly running on separate machines: Producer - Inserts records into a database, but does little to no reading from the database. Multiple instances may be running concurrently. Consumer - Must be notified when a record is inserted into the database by an instance of component A. May also have multiple instances. What is the best way to perform the notifications, assuming that producers will be inserting 10-100 records into the database per second at peak times? The database technology is currently MySQL, but this is not necessarily set in stone. I can see a few different ways: Use something like MySQL message queue to "push" INSERT notifications to subscribers (consumers). Producers would have no knowledge that this was occurring. Have producers interact with an intermediate layer that performs the INSERT, and pushes notifications to a message queue that consumers are subscribed to. Have consumers poll the database frequently to check for new additions (seems like a bad idea) etc. As far as coupling is concerned: Is it a good idea to have a two relatively separate application components perform direct queries on a shared database, or should one component "own" the database while the other component indirectly interacts with the DB via calls to the owning component?
I like the second proposed solution (the intermediate layer), as it separates the notification from the database work, and could possibly be part of a two-phase commit XA transaction. If the consumers need the database content in addition to the notification, that can be accomplished via MySQL replication. This could also address the coupling question, as the consumer components could have read-only access to their replicated instances. Using a messaging solution would also address any potential bottlenecks in the database-only solution, as it would separate the notification and storage into separate processes. Depending on the language, you have a number of choices for the message distribution. If you're using Java, I'd actually recommend JGroups rather than JMS, as it's somewhat easier to configure. If Java isn't your language of choice, Apache's Active MQ supports a number of languages for interfacing. Apache's Qpid is an AMQP implementation that also supports a number of languages (Java, C++, Python, Ruby, etc.) Other messaging options could include XMPP, STOMP, or RestMS implementations.
Can MS Enterprise Library Logging be used for multiple applications?
I'm wondering if its - a) possible; b) good practice - to log multiple applications to single log instance? I have several ASP.NET apps and I would like to aggregate all exceptions to a centralized location that can be queried as part of an Enterprise Dashboard app. I'm using both the EL logging block and the EL exception blog along with the Database Trace Listener. I would like to see exceptions across all apps logged to a single db. Any comments, best practice guidelines or answers would be extremely welcome.
Yes, it is definitely possible to store multiple application logs in a central location using EL. An Enterprise Dashboard application that lets you view exceptions across applications and tiers, and provides reporting is a great reason to centralize your logging. So I'll say yes to question b as well. Possible Issues/Negatives I'm assuming that you are using the Database Trace Listener since you mention that in your question. If there are a large number of applications logging a large number of log entries combined with users querying the (potentially large) log database, there is the potential for performance to degrade (since the logging is done synchronously) which could impact your application performance. Another Approach To mitigate against this possibility, I would investigate using the Distributor Service to log asynchronously. In that model, all of the applications would log to a message queue (using the MSMQ Trace Listener). A separate service then polls the queue and would forward the log entries to a trace listener (in your case a Database Trace Listener) which would persist the messages in your dashboard database. This setup is more complicated. But it does seem to align with what you are trying to achieve and has some other benefits such as asynchronous processing and the ability to log even if the dashboard database is down (e.g. for maintenance). Other Considerations You may also want to think about standardizing some LogEntry properties across applications. For example, LogEntry doesn't really have an "application" property so you could add an ExtendedProperty to represent the application name. Or you may standardize on a specific format for the Message property so that various information can be pulled out of the message and stored in separate database columns for easier searching and categorization.
Should I move client configuration data to the server?
I have a client software program used to launch alarms through a central server. At first it stored configuration data in registry entries, now in a configuration XML file. This configuration information consists of Alarm number, alarm group, hotkey combinations, and such. This client connects to a server using a TCP socket, which it uses to communicate this configuration to the server. In the next generation of this program, I'm considering moving all configuration information to the server, which stores all of its information in a SQL database. I envision using some form of web interface to communicate with the server and setup the clients, rather than the current method, which is to either configure the client software on the machine through a control panel, or on install to ether push out an xml file, or pass command line parameters to the MSI. I'm thinking now the only information I would want to specify on install would be the path to the server. Each workstation would be identified by computer name, and configured through the server. Are there any problems or potential drawbacks of this approach? The main goal is to centralize configuration and make it easier to make changes later, because our software is usually managed by one or two people at most.
Other than allowing for the client to function offline (if such a possibility makes sense for your application), there doesn't appear to be any drawback of moving the configuration to a centralized location. Indeed even with a centralized location, a feature can be added in the client to cache the last known configuration, for use when the client is offline). In case you implement a [centralized] database design, I suggest to consider storing configuration parameters in an Entity-Attribute-Value (EAV) structure as this schema is particularly well suited for parameters. In particular it allows easy addition and removal of particular parameters and also the handling parameters as a list (paving the way for a list-oriented display as well in the UI, and therefore no changes needed in the UI either when new types of parameters are introduced). Another reason why configuartion parameter collections and EAV schemas work well together is that even with very many users and configuration points, the configuration data remains small enough that is doesn't suffer some of the limitations of EAV with "big" tables.
Only thing that comes to mind is security of the information. In either case you probably have that issue though. Probably be easier to interface with though with a database as everything would be in one spot.