What strategy/technology should I use for this kind of replication? - mysql

I am currently facing one problem which not yet figure out good solution, so hope to get some advice from you all.
My Problem as in the picture
Core Database is where all the clients connect to for managing live data which is really really big and busy all the time.
Feature Database is not used so often but it need some part of live data (maybe 5%) from the Core Database, But the request task to this server will take longer time and consume much resource.
What is my current solution:
I used database replication between Core Database & Feature Database, it works fine. But
the problem is that I waste a lot of disk space to store unwanted data.
(Filtering while replicate data is not work with my databases schema)
Using queueing system will not make data live on time as there are many request to Core Database.
Please suggest some idea if you have met this?
Thanks,
Pang

What you define is a classic data integration task. You can use any data integration tool to extract data from your core database and load into featured database. You can schedule your data integration jobs from real-time to any time-frame.
I used Talend in my mid-size (10GB) semi-scientific PostgreSQL database integration project. It worked beautifully.
You can also try SQL Server Integration Services (SSIS). This tool is very powerful as well. It works with all top-notch RDBMSs.

If all you're worrying about is disk space, I would stick with the solution you have right now. 100GB of disk space is less than a dollar, these days - for that money, you can't really afford to bring a new solution into the system.
Logically, there's also a case to be made for keeping the filtering in the same application - keeping the responsibility for knowing which records are relevant inside the app itself, rather than in some mysterious integration layer will reduce overall solution complexity. Only accept the additional complexity of a special integration layer if you really need to.

Related

AWS DMS to replicate transactional data to data warehouse on an ongoing basis

I'm hoping someone can tell me if I'm absolutely crazy before I go too far down this path. I have an application with MySQL as the backend. I needed to create more robust reporting and i opted to build a data warehouse in pgsql. The challenge is I don't want the DW to be just updated once or twice a day. I'd like it to be near real time (some lag is expected and not a problem).
I looked at AWS glue and a few other options and finally settled on DMS as a method of replicating the data from the MySQL source to the pgsql target db for staging. I then set up trigger functions that will basically manipulate the inserted/updated data in the pgsql db, landing it in the data warehouse. The application is also connected to the DW and can pull reports and dashboard metrics from the DW as needed.
I've built a proof of concept and it seems to work, but it's really only me hitting the application at the moment, so I'm not sure if it will hold up if I were to proceed with this idea and put it in production.
I currently have a dms.t2.small replication instance (engine version 2.4.4) running at about 15-20% CPU utilization. I don't have it configured for Multi AZ currently.
I'm seeing combined CDCLatencyTarget/CDCLatencySource values averaging about 9 seconds. I think if that holds true it wouldn't be unbearable, although the less time the better. I'd say if it gets up over a minute we may start to see complaints.
I know that DMS is more meant for migrations, so I'd like to know if I'm just doing this in a really stupid way, or if this is a more or less valid use case? Are there issues with DMS that I am unaware of that will cause me to later regret this decision?
Also, I'd love any ideas you have for how I can put safeguards in place to ensure that source and target stay synced or if they don't that I'm made aware of it, or something that would allow it to self-heal.

Django and mysql on different servers performance

I hvae django app that needs to be extremely fast, and it works good for now.
So my question is, is it better to put django app on one server and mysql on another server, or on one server both?
I ask because of communication between then.
I use digitalocean, and both are on one server.
It depends how well the application is written.
Poorly written django will generate a lot of queries so maybe it's beneficial to have it on the same server. Well written Django should leverage the database to do the heavy lifting, in which case its better to have it on a separate server, so the server can be tuned for a database. (In general having a separate database server is the way to go).
The best thing to do would be to add Django debug toolbar to your application and see if it is generating a lot of queries or not, and tune the application from there.
You have couple of options but let's stick to these two.
One server for everything
Good for setting up an application quickly, as it is the simplest setup possible, but it offers little in the way of scalability and component isolation.
There are a lot of pros, it's fast, simple to work with. It does not meet latency problems. From cons: you cannot horizontally scale.
Server for web application and server for database.
First of all, I would recommend to use Postgres, since the latest version (9.6) can now work on multiple cores, which makes it way faster than mysql.
It is good for setting up an application quickly, but keeps application and database from fighting over the same system resources.
From pros it does not fight over resources (RAM / CPU / I/O).
It may also increase security by removing database from DMZ.
From cons, it is harder to setup and when high-latency is going on, the queries might take longer to execute.
To sum up. I would use first option for small and medium applications which does not require a lot of requests.
I would consider moving DB to another server/servers, whenever the application hosts thousands of users per day.

SQLite3 database per customer

Scenario:
Building a commercial app consisting in an RESTful backend with symfony2 and a frontend in AngularJS
This app will never be used by many customers (if I get to sell 100 that would be fantastic. Hopefully much more, but in any case will be massive)
I want to have a multi tenant structure for the database with one schema per customer (they store sensitive information for their customers)
I'm aware of problem when updating schemas but I will have to live with it.
Today I have a MySQL demo database that I will clone each time a new customer purchase the app.
There is no relationship between my customers, so I don't need to communicate with multiple shards for any query
For one customer, they can be using the app from several devices at the time, but there won't be massive write operations in the db
My question
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and if it's a common practice to have one dedicated SQLite3 database PER CLIENT. I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
Is this a correct scenario for SQLite?
Any suggestion (aka tutorial) in how to achieve this?
[I wonder] if it's a common practice to have one dedicated SQLite3 database PER CLIENT
Only if the database is deployed along with the application, like on a phone. Otherwise I've never heard of such a thing.
I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
SQLite is a SQL database and responds to ALTER TABLE and the like. As for updating all the schemas, you'll have to re-run the update for all schemas.
Schema synching is usually handled by an outside utility, usually your ORM will have something. Some are server agnostic, some only support specific servers. There are also dedicated database change management tools such as Sqitch.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and
SQLite's main advantage is not requiring you to install and run a server. That makes sense for quick projects or where you have to deploy the database, like a phone app. For server based application there's no problem having a database server. SQLite's very restricted set of SQL features becomes a disadvantage. It will also likely run slower than a server database for anything but the simplest queries.
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
Under no circumstances should you test with a different database than the production database. Databases do not all implement SQL the same, MySQL is particularly bad about this, and your tests will not reflect reality. Running a MySQL instance for testing is not much work.
This separate schema thing claims three advantages...
Extensibility (you can add fields whenever you like)
Security (a query cannot accidentally show data for the wrong tenant)
Parallel Scaling (you can potentially split each schema onto a different server)
What they're proposing is equivalent to having a separate, customized copy of the code for every tenant. You wouldn't do that, it's obviously a maintenance nightmare. Code at least has the advantage of version control systems with branching and merging. I know only of one database management tool that supports branching, Sqitch.
Let's imagine you've made a custom change to tenant 5's schema. Now you have a general schema change you'd like to apply to all of them. What if the change to 5 conflicts with this? What if the change to 5 requires special data migration different from everybody else? Now let's imagine you've made custom changes to ten schemas. A hundred. A thousand? Nightmare.
Different schemas will require different queries. The application will have to know which schema each tenant is using, there will have to be some sort of schema version map you'll need to maintain. And every different possible query for every different possible schema will have to be maintained in the application code. Nightmare.
Yes, putting each tenant in a separate schema is more secure, but that only protects against writing bad queries or including a query builder (which is a bad idea anyway). There are better ways mitigate the problem such as the view filter suggested in the docs. There are many other ways an attacker can access tenant data that this doesn't address: gain a database connection, gain access to the filesystem, sniff network traffic. I don't see the small security gain being worth the maintenance nightmare.
As for scaling, the article is ten years out of date. There are far, far better ways to achieve parallel scaling then to coarsely put schemas on different servers. There are entire databases dedicated to this idea. Fortunately, you don't need any of this! Scaling won't be a problem for you until you have tens of thousands to millions of tenants. The idea of front loading your design with a schema maintenance nightmare for a hypothetical big parallel scaling problem is putting the cart so far before the horse, it's already at the pub having a pint.
If you want to use a relational database I would recommend PostgreSQL. It has a very rich SQL implementation, its fast and scales well, and it has something that renders this whole idea of separate schemas moot: a built in JSON type. This can be used to implement the "extensibility" mentioned in the article. Each table can have a meta column using the JSON type that you can throw any extra data into you like. The application does not need special queries, the meta column is always there. PostgreSQL's JSON operators make working with the meta data very easy and efficient.
You could also look into a NoSQL database. There are plenty to choose from and many support custom schemas and parallel scaling. However, it's likely you will have to change your choice of framework to use one that supports NoSQL.

database strategy to provide data analytics

I provide a solution that handles operations for brick and mortar shops. My next step is to provide analytics for my customers.
As I am in the starting phase I am hoping to find a free way to do it myself instead of using third party solutions. I am not expecting a massive scale at this point but I would like to get it done right instead of running queries off the production database.
And I am thinking for performance concerns I should run the analytics queries from separate tables in the same database. A cron job will run every night to replicate the data from the production tables to the analytics tables.
Is that the proper way to do this?
The other option I have in mind is to run the analytics from a different database (as opposed to just tables). I am using Amazon RDS with MySQL if that makes it more convenient?
It depends on how much analytics you want to provide.
I am a DWH manager and would start off with a small (free) BI (Business Intelligence) solution.
Your production DB and analytics DB should always be separate.
Take a look at Pentaho Data Integration (Community Edition) It's a free ETL tool that will help you get your data from your production to your analytics database and also can perform transformation.
check out some free reporting software like Jaspersoft to help you provide a Reporting Platform for customers (if that's what you want, otherwise just use Excel).
BI never wants to throw away data. If you think that your data in the analytics DB is gonna grow large (2TB +) don't use MySQL but rather PostgreSQL. MySQL does not handle big data well.
If you are really serious about this, read "The Datawarehouse Toolkit" by Ralph Kimball. That will set you up with some basic Data Warehouse knowledge.
Amazon RDS provides something call a Read-Replica. Which automatically performs replication and is optimised for reading.
I like this solution for its high convenience. Downside: its price-tag.

MySQL AND Filemaker Pro?

I have a client that wants to use Filemaker for a few things in their office, and may have me building a web app.
The last time I used, or thought about, or even heard of, Filemaker was about 10 years ago, and I seem to remember that I don't want to use it as the back end of a sophisticated web app, so I am thinking to try to sell them on MySQL.
However, will their Filemaker database talk to MySQL? Any idea how best to talk them down from Filemaker?
You may have a hard time talking them out of FileMaker, because it was actually a pretty clever tool for making small, in-house database applications, and it had a very loyal user base. But you're right--it's not a good tool for making a web application.
I had a similar problem with a client who was still using a custom dBase IV application. Fortunately, Perl's CPAN archive has modules for talking to anything. So I wrote a script that exported the entire dBase IV database every night, and uploaded it into MySQL as a set of read-only tables.
Unfortunately, this required taking MySQL down for 30 minutes every night. (It was a big database, and we had to convert free-form text to HTML.) So we switched to PostgreSQL, and performed the entire database update as a single transaction.
But what if you need read-write access to the FileMaker database? In that case, you've got several choices, most of them bad:
Build a bi-directional synchronization tool.
Get rid of FileMaker entirely. If the client's FileMaker databases are trivial, this may be relatively easy. I'd begin by writing a quick-and-dirty clone of their most important databases and demoing it to them in a web browser.
The client may actually be best served by a FileMaker-based web application. If so, refer them to Google.
But how do you sell the client on a given choice? It's probably best to lay out the costs and benefits of each choice, and let the client decide which is best for their business. You might lose the job, but you'll maintain a reputation for honest advice, and you won't get involved in a project that's badly suited to your client.
We develop solutions with both FileMaker and PHP/MySQL. Our recommendation is to do the web app in a web app optimised technology like MySQL.
Having said that, FileMaker does have a solid PHP API so if the web app has relatively lightweight demands (e.g. in house use) then use that and save yourself the trouble of synchronisation.
FileMaker's ESS technology let's FileMaker use an SQL db as the backend data source, which gives you 2 options:
Use ESS as a nice tight way to synchronise right within FileMaker - that way you'd have a "native" data source to work with within the FileMaker solution per se.
Use ESS to allow FileMaker to be used as a reporting/data mining/casual query and edit tool directly on the MySQL tables - it works sweet.
We've found building a sophisticated application in FileMaker with ESS/MySQL backend to be very tricky, so whether you select 1 or 2 from above depends on how sophisticated and heavy duty that FileMaker usage is.
Otherwise, SyncDek has a good reputation as a third party solution for automating Synchronisation.
I've been tackling similar problems and found a couple of solutions that emk hasn't mentioned...
FileMaker can link to external SQL data sources (ESS) so you can use ODBC to connect to a MySQL (or other) database and share data. You can find more information here. we tried it and found it to be pretty slow to be honest
Syncdek is a product that claims to allow you to perform data replication and data transmission between Filemaker, MySQL and other structured sources.
It is possible to use Filemaker's Instant Web Publishing as a web service that your app can then push and pull data through. We found a couple of wrappers for this in python and php
you can put a trigger in the FileMaker database so that every time a record is changed (or part of a record you are interest in) you can call a web service that updates a MySQL or memcached version of that data that your website can access.
I found that people like FileMaker because it gives them a very visual interface onto their data - it's very easy to make quite large self-contained applications without too much development knowledge. But, when it comes to collaboration with many users or presenting this data in a format other than the FileMaker application we found performance a real problem.