Best solution for automated collection of data from remote MySQL servers - mysql

I have done extensive research, i feel that i have good candidates but i still lack enough knowledge to decide which one i should implement, ideally i would like to hear from someone that actually implemented a solution to a similar problem.
The Problem
Our project consists of a community of distributed nodes (25 nodes). The nods run on Linux computers, and are installed in the typical residential setting (behind a NAT), with wide dispersion geographically and ISP wise.
Our software on the node collects a variety of its own unique data which is logged to MySQL DB on the local host (node) which is not WAN accessible directly. We also have a Web interface for each node that uses the local node DB to allow the local node user to visualize certain data and parameters; this is only accessible on the LAN.
We typically set-up and maintain an open port for SSH from our labs to each node. All remote DB on nodes have the exact same schema but completely different data. We need an automated way to collect all data from all the nodes and get them to our WAN accessible lab servers (Windows 7 servers, but can be Linux if it provide a better solution). We have narrowed the option as follow:
Solutions:
Create a .bat script that sequentially connect to each node over SSH to import data.
Use the web interface that runs on each node to periodically query the local db then save that data to a central MySQL server. I know i can connect to two db in PHP. seems to be doable here.
Use the MySQL supported “slave-master” replication setup which will duplicate all remote databases on the server.
Use the MySQL supported federated engine setup which will link local tables to remote ones.
Questions:
Are all these viable solutions?
Any major cons i should be aware of for the viable ones?
is there better solutions available (paid or otherwise) ?

Related

How to properly use databases in development?

I'm struggling with finding out how to properly test stuff on my local PC and then transfer that over to production.
So here is my situation:
I got a project in NodeJS/typescript, and I'm using Prisma in it for managing my database. On my server I just run a MySQL database, and for testing on my PC I always just used SQLite.
But now that I want to use Prisma Migrate (because it's highly recommended to do so in production) I can't because I use different databases on my PC vs on my Server. Now here comes my question, what is the correct way to test with a database during development?
Should I just connect to my server and make a test database there? Use VS Code's SSH coding function to code directly on the server and connect to the database? Install MySQL on my PC? Like, what's the correct way to do it?
Always use the same brand and same version database in development and testing that you will eventually deploy to. There are compatibility differences between brands, i.e. an SQL query that works on SQLite does not necessarily work the same on MySQL, and vice-versa. Even data types and schema definitions aren't all the same between different SQL products.
If you use different SQL databases in development and production, you will waste a bunch of time and increase your gray hair debugging problems in production, as you insist, "it works on my machine."
This is avoidable!
When I develop on my local computer, I usually have an instance of MySQL Server running in a Docker container on my laptop.
I assume any test data on my laptop is temporary. I can easily recreate schema and data at any time, using scripts that are checked into my source control repo, so I don't worry about losing any data. In fact, I feel no hesitation to drop it and recreate it several times a week.
So if I need to upgrade the local database version to match an upgrade on production, I just delete the Docker container and its data, pull the new Docker image version, initialize a new data dir, and reload my test data again.
Every step is scripted, even the Docker pull.
The caveat to my practice is that you can't necessarily duplicate the software if you use cloud databases, for example Amazon Aurora. There's no way to run an Aurora-compatible instance on your laptop (and don't believe the salespeople that Aurora is fully compatible with MySQL; it's not). So you could run a small Aurora instance in a development VPC and connect to that from your app development environment. At least if your internet connection is reliable enough.
By the way, a similar rule applies to all the other technology you use in development. The version of Node.js, Prisma, other NPM dependencies, http and cache servers, etc. Even the operating system might be the source of compatibility issues, but you may have to develop in a Virtual Machine to match the OS to production exactly.
At one past job, I did help the developer team create what we called the "golden image" which was a pre-configured VM with all our software dependencies installed, and we used this golden image for both the developer sandbox VM, and also an AMI from which we launched the production Amazon EC2 instances. So all the developers were guaranteed to have a test environment that matched production exactly. After that, if they had code problems, they could fix it in development and have a much higher confidence it would work after deploying to production.

How to share MySQL database with other team members

I'm a newbie to web development. My team at school is using J2EE and MySQL to develop a web app that will be deployed on AWS. We use GitHub for version control.
I am just wondering if I use MySQL from my terminal to add tables into the local "test" database, how can my teammates have access to them? Should I deploy the database somewhere or maybe create the tables in code so that my teammates can automatically have the tables in their local database when they run the code? But how can the data already stored in the database be shared then?
Sorry to have this naive question, I tried to do some research online but it seems that the results are more advanced and about PHP not J2EE... It will also be great if you can recommend some good resource for me to read through since I believe this is a very fundamental concept that I should know.
You can maintain the database's schema in your code so it can be committed to source control and accessed by the others. This is a good practice regardless of how you use a test database for the development.
Your team members will not be able to easily access your local database. For a distributed development environment it would be best to host your test database on a remote server, such as on an EC2 instance in a public subnet or in RDS. Then you can pass along the database's connection information (host, port) and credentials to the other team members.
Pay attention to the security group when creating the database either in EC2 or RDS. You can open it up to the world (0.0.0.0) or narrow it to just your team members' IP addresses to tighten security. Otherwise the team members will not be able to connect to the database.
It's hard for your team members to access your local database from another computer. It's a lot better to host your database on a remote server, such as a EC2 on AWS or Computer Engine on GCP. Then you MySQL database can be accessed by anyone with an authorized connection and whitelisted IP address.
Another solution is using a cloud-based data warehouse like a Snowflake or Acho Studio. Once you have the MySQL database connected to the DW, your teammates should be able to access the tables you've authorized them to see.
This way you can also share your SQL queries with your teammates so they can run them against the MySQL server themselves.
You can create a test environment using VM or containers and share it with your team members. You should pay attention to how to keep track of the changes in these test environments as well. The following answer describes how a db with schema can be shared using a docker image. You can version control these images so you can track the changes.

Can I install MySQL on the VMs provided in Azure Cloud Services?

From what I gather, the only way to use a MySQL database with Azure websites is to use Cleardb but can I install MySQL on VMs provided in Azure Cloud Services. And if so how?
This question might get closed and moved to ServerFault (where it really belongs). That said: ClearDB provides MySQL-as-a-Service in Azure. It has nothing to do with what you can install in your own Virtual Machines. You can absolutely do a VM-based MySQL install (or any other database engine that you can install on Linux or Windows). In fact, the Azure portal even has a tutorial for a MySQL installation on OpenSUSE.
If you're referring to installing in web/worker roles: This simply isn't a good fit for database engines, due to:
the need to completely script/automate the install with zero interaction (which might take a long time). This includes all necessary software being downloaded/installed to the vm images every time a new instance is spun up.
the likely inability for a database cluster to cope with arbitrary scale-out (the typical use case for web/worker roles). Database clusters may or may not work well when a scale-out occurs (adding an additional vm). Same thing when scaling in (removing a vm).
less-optimal attached-storage configuration
inability to use Linux VMs
So, assuming you're still ok with Virtual Machines (vs stateless Cloud Service vm's): You'll need to carefully plan your deployment, with decisions such as:
Distro (Ubuntu, CentOS, etc). Azure-supported Linux distro list here
Selecting proper VM size (the DS series provide SSD attached disk support; the G series scale to 448GB RAM)
Azure Storage attached disks being non-Premium or Premium (premium disks are SSD-backed, durable disks scaling to 1TB/5000 IOPS per disk, up to 32 disks per VM depending on VM size)
Virtual network configuration (for multi-node cluster)
Accessibility of database cluster (whether your app is in the vnet or accesses it through a public endpoint; and if the latter, setting up ACL's)
Backup / HA / DR planning
Someone else mentioned using a pre-built VM image from VM Depot. Just realize that, if you go that route, you're relying on someone else to configure the database engine install for you. This may or may not be optimal for what you're trying to achieve. And the images may or may not be up-to-date with the latest versions, patches, etc.
Of course, what I wrote applies to any database engine you install in your own virtual machines, where a service provider (such as ClearDB) tends to take care of most of these things for you.
If you are talking about standard VMs then you can use a pre-built images on VMDepot for that.
If you are talking about web or worker roles (PaaS) I wouldn't recommend it, but if you really want to you could. You would need to fully script the install of the solution on the host. The only downside (and it's a big one) you would have would be the that the host will be moved to a new host at some point which would mean your MySQL data files would be lost - if you backed up frequently and were happy to lose some data then this option may work for you.
I think, that the main question is "what You want to achieve?". As I see, You want to use PaaS solution with Web Apps or Cloud Service and You need a MySQL database. If Yes, You have two options (both technically as David Makogon said). First one is to deploy Your own (one) server with MySQL and connect to it from the outside (internet side). Second solution is to create one MySQL server or cluster and connect Your application internally in Azure virtual network. WIth Cloud Service it is simple but with Web App it is not. You must create VPN gateway in Azure VM and connect Your Web App to this gateway. In this way You will have internal connection wfrom Your application to Your own MySQL cluster.

Single Store CRM application to Multi Store CRM - Single Local database server to Multi Store Multi location

My company has Desktop application developed in vb.net using devexpress controls. Back End database is MySQL.
Company is in retailing and have 2 retail stores in in same city. Both stores always stay busy and customers are always in waiting at the counter. Basically, it is desktop based CRM application which has lot of modules inside it apart from invoice/Receipt module, it has other modules like Delivery module, installation module, Service/Repair module, Account Receivable module and many other modules used by various back office departments of the company. Other resources/hardware such as Barcode Printer, Receipt Printer, and Barcode scanner are connected to the CRM on Desktop PC.
Currently, there are around 55 clients always connected to server and using application.
Problem:
Till couple of weeks back, company had no issue using this desktop application and single MySQL server as all clients were connected via LAN or WLAN.
Now situation has changed, and new requirement has raised: Company has planned to open new stores at very far distance. Such stores cannot be connected to current central database via LAN or WLAN. Each new branch would have around 20-30 clients, say “Branch Clients”
Also, there would be field executive who will be working from their laptop. Say “Remote Clients”. They will just have 3G internet connection on their laptop.
Thought 1: Install desktop application at all branch PCs, and connect them to central MySQL database server over the internet.
Not possible: Connection over the internet would be very slow for fetching such huge data. Data is really huge For, e.g. if client opens “Customer Master”, then there would be more than 600,000 rows which takes lot of bandwidth and time to open over the internet. And there are many more such modules which loads lot of data.
Also, in case of losing internet connection, clients would not able to operate the application. Customer waiting in line to make receipt would go crazy if they have to wait for long.
Thought 2: Install new MySQL server at branch store, all the desktop PCs then would be connected to that local branch server. And then that local branch server would be connected to central server via MySQL replication option.
Not possible: Since MySQL replication has limitation of only one way replication, we cannot implement this structure. Application requires to move data from central server to branch server and from Branch to Central in real-time. Also, MySQL replication engineering has limitation to replicate only with one server only. In that case, we cannot replicate with multiple branch stores. There is an option of cluster server, but company cannot afford licensing cost.
Thought 3: Somebody suggested me that I should transfer entire desktop application into Web Application and get cloud server for database.
Not possible: I think looking at current requirement (fast access), environment (retail store-pos) and hardware (printers, scanners) connected to client - it is not advisable to have web application and cloud database server. Also in the event of no internet, entire store would go down.
Thought 4: Somebody suggested me that I should move from MySQL server to MSSQL and keep desktop application as it is. MSSQL has capability to sync with multiple servers in real-time over the internet. It has no limitation like MySQL’s one way replication and only one replication connection.
I guess, to make faster and constant database connection, installing local branch server is highly required. But I don’t know how those different branch servers could be connected to central server.
My Questions:
• What is the best way to resolve above issues in given condition and successfully fulfill the company’s requirement? Faster and constant connection to database server. And also real-time updates between all branches and central server. If internet connection is down, then delay in real-time update is acceptable but clients should not be affected from work.
• Would migration from MySQL to MSSQL resolve the issue? Because data migration is not issue as there are many tools available which converts the database from one platform to other. But issue is - application is very huge having hundreds of query written for MySQL. I guess I have to change those all queries also, because queries are not same for MySQL and MSSQL. Do I have to change all the queries or just the few percentage queries? Or if there is any tool available which convert queries from MySQL to MSSQL query.
• In general, how such small-medium retail store company have their infrastructure and application setup? Let me know some ideas.

Microsoft Sync Framework mark client database as up to date

I've developed an application using the Microsoft Sync Framework 2.1 SDK and my current deployment method has been:
Make a backup of the unprovisioned database from a development machine and restore it on the server.
Provision the server followed by provisioning the client
Sync the databases
Take a backup of the synced database on the development machine and use that for the client installations. It is included in an InstallShield package as an SQL/Server backup that I restore on the client machine.
That works but on the client machine now I would also like to create a seperate test database using the same SQL/Server backup without doubling the size of the installation. That also works but of course because the client test version is no longer synced with the test version on the server it attempts to download all records which takes many hours over slower Internet connections.
Because integrity of the test database is not critical I'm wondering if there's a way to essentially mark it as 'up to date' on the client machine without too much network traffic?
After looking at the way the tracking tables work I'm not sure this is even possible without causing other clients to either upload or download everything. Maybe there is an option to upload only from a client that I've missed? That would suit this purpose fine.
Everytime you take a backup of a provisioned database and restore it to initialize another client or replica, make sure you run PerformPostRestoreFixup after you restore and before you sync it for the first time.
After further analysis of the data structures used by Sync Framework I determined there would be no acceptable way to achieve the result I was seeking without sending a significant of data between the client and server that would have approached what was required to do a 'proper' sync.
Instead I ended up including a seperate test database backup along with the deployment so that the usual PerformPostRestoreFixup could be performed followed by a sync in the normal manner the same as I was handling the live database.