How to copy a MySQL database excluding customer data? [closed] - mysql

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 days ago.
Improve this question
I have a database (MySQL, AWS RDS). This is a production database that has customer information including names, emails, bank account information. Some of it is encrypted, some of it is not.
We want to setup an environment that can be regularly used for automated testing. We want the database for the test environment to be the same as production except we want to replace customer data.
We want to do this in a way where the customer data never leaves the production environment. We don’t mind creating an “intermediate” environment that may initially contain some customer data but then gets removed. From the intermediate environment, we’d transfer the cleaned database to the testing environment.
Appreciate the guidance since I’m way out of my depth here

There is no easy / automated solution to do this that I am aware of. You need to replicate your data to a different system and have that replication service scrub your data for you. A few options come to mind:
You could write a batch processor that dumps the DB down to disk, loads it into a secondary server (staging/scrubbing environment), and then run a series of cleanup scripts. Then you can again dump the data down.
You could write a database trigger that fires on the events you care about and maintains a staging table of sanitized data.
You could write a test data generator that users the patterns of your production data to generate fake data in a testing table. There are many tools that can help with this, both open source and commercial.
Personally, I lean towards the last option because it's the safest and can be used in many places, like on a local dev machine, CI/CD system, shared staging environment, etc. I do believe there is a strong case for sending a copy of a subset of production data to canary systems as part of a rollout strategy, though. Effectively testing your release with live data before wiring it to your live database.

Related

SQL database engine to carry with Git project [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm developing a small experimental project using Node.js and MySQL. The database has about 20 tables only, with less than 100 registers on each table. I work with this project in two different environments and would like to make it easier to keep the databases updated, for both architecture and data. So I desire to replace the MySQL engine with another that better fits the requirements:
SQL;
Compatible with Node js;
Easy to install/access/carry;
Free;
I think SQLite could solve my issue, but I'm not sure how the single database file will behave when managed by git. Another option would be an online database, but I don't know any that is SQL and free.
Do you have any suggestions?
I guess your application must initialize a database the first time you start it up after cloning it. That is, it must create some tables and load rows into them.
You can handle this with SQLite by saving the database file as a binary git object. That should work OK, at least until the next version of SQLite comes out and breaks database file compatibility.
But a better way is to create a SQL file to do the database initialization, and store that in git. It contains the CREATE TABLE and INSERT operations necessary to set up your database. (If you're wise you will write that SQL code so it works on both SQLite and MySQL: you'll be able to switch database servers in the future.)
Then, when your application first runs, it opens the database software and checks whether your tables are present. If they are not, your application loads them from your SQL table.
I guess you also want to share information inserted into the database when the application runs in more than one place. Obviously a shared database server is a good SQL way to do this.
Without a shared server, a good way to do this is to build some kind of "save" operation into your application. It will write out the SQL INSERT for the shared information, which you can then send to another location or commit to git.
Free public shared database servers? Your best bet is probably to run MySQL on a free-tier virtual machine on one of the server-rental ("cloud") services. Both AWS and Azure offer free tier. And Digital Ocean costs US$5 per month for a small virtual machine.

Shared database options in scrum team [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
I have a broad question I would like some advice on.
I'm a developer on a 3-person scrum team working on the same application (c# web form application). For years, the process has been that developers run a local IIS site on their machine and point their application at a shared database. Using the shared database has many benefits over local SQL instances for our business and minimizes the overhead for each developer on setup and maintenance.
This process has worked for years (mainly because everyone was in the office), but now half of us work from home or remotely. SQL performance through the VPN is atrocious, and ends up slowing our local sites down at an order of magnitude when processing the SQL calls through the VPN. Because of the slowness, it's really caused slowed down my productivity and I'm looking for options.
I've thought of the following solutions:
Run SQL locally
Setup Replication
Try and point a SQL instance at a UNC path for mdl files (if this is possible)
Tweak VPN settings to speed up MS-SQL calls
I'm doubtful there is a way to speed up the traffic going through the VPN, but I'd be open to any ideas.
RDP into my machine at work when I work from home. I truly hate doing this.
Below are the technical specs of our setup:
MS-SQL 2008 R2
C# Web Forms
N-Tier Architecture
Have any other development teams had this issue before, if so what where some of the solutions?
Thanks
Regardless of the technology you are using, this situation happens to many developers who work remotely.
Your three options are viable but consider this:
Running a local database server introduce the possibility of environmental errors. If you have 3 developers, you now have 3 different databases. Even if the structure of the DB itself never changes, the test data will certainly differ from one developer to the other. At some point you may have to reconcile the "master" version of the test data with each local instances.
Tweaking the VPN settings is certainly worth looking into but I cannot help you with that. I have no idea how you would do this.
RDP into your remote machine could be the best option. This will guarentee you have the same setup when you work from home or from the office. I worked in two different large companies before and that's how we were doing it in both places. May I ask why you hate this? It's true sometimes it doesn't work or it is very slow.
You may end up doing all these options. I mean tweaking your network speed is always good, no matter how your work. You could have local instances on which you do your development then you synchronize your work with the repository and RDP to your office machine to test your new code with the master instance of the database. This way you don't have to work on RDP all day and you end-up testing your code twice, in two different environments. That's good, right?

Which database should I choose? MySQL or mongoDB? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm working on a project which is somewhat familiar to WhatApp, except that I'm not implementing the chatting function.
The only thing I need to store in the database is user's information, which won't be very large, and I also need an offline database in the mobile app which should be synced with the server database.
Currently I use MySQL for my server database, and I'm thinking of using JSON for the syncing between mobile app and the server, then I found that mongoDB has a natural support for JSON, which caused me wonder should I change to mongoDB.
So here are my questions:
Should I change to mongoDB or should I still use MySQL? The data for each user won't be too large and it does have some requirement for data consistency. But mongoDB's JSON support is somewhat attractive.
I'm not familiar with the syncing procedure, I did some digging and it appears that JSON is a good choice, but what data should I put into the JSON files?
I actually flagged this as too broad and as attracting primarily opinion based answers but I'll give it a go anyhow and hope to stay objective.
First of all you have 2 separate questions here.
What database system should I use.
How do I sync between app and server.
The first one is easily answered because it doesn't really matter. Both are good options for storing data. MySQL is mature and stable and MongoDB although it's newer has very good reviews and I don't know of any known problems which would prevent it from being used. So take the database which you find easy to use.
Now for second I'll first put in a disclaimer that for data synchronization between multiple entities entire books are written and that it is after all this time still the subject of Phds.
I would advice against directly synchronizing between mobile app and database because that requires the database credentials to be contained within the app. Mobile apps can and will be decompiled and credentials extracted which would compromise your entire database. So you'll probably want to create some API which first does device/user authentication and then changes the database.
This already means that using MongoDB for sake of this would probably be a bad idea.
Now JSON itself is just a format of representing data with some structure, just as XML. As such it's not a method of synchronization but transport.
For synchronizing data it's important that you know the source of truth.
If you have 1 device <-> 1 record it's easy because the device will be the source of truth, after all the only mutations that take place are presumably done by the user on the device.
If you have n devices <-> 1 record then it becomes a whole lot more annoying. If you want to allow a device to change the state when offline you'll need to do some tricks to synchronize the data when the device comes back online. But this is probably a question too complex and situation dependent to answer on SO.
If you however force the device to always immediately propagate changes to the database then the database will always contain the most up to date record, or truth. Downside is that part of the app will not be functional when offline.
If offline updates don't change the state but merely add new records then you can push those to the server when it comes online. But keep in mind you won't be able to order these events.

Is there an open source alternative to MS SQL Compact Edition for Remote Data Application? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
We currently use SQL CE databases on the client machines, which then synchronise their data to a central server using the merge replication/RDA functionality of MS SQL. The amounts of data involved is small, and the central server will often be idle for ~95% of the time - it's only active really when data is incoming, and is typically synchronised on a daily/weekly basis.
The SQL Standard licensing costs for this are large, relative to the SQL server workload / the amount of data we're talking about (in the order of 100s of MBs maximum). What I'd like to know is if there's an open source alternative (mySQL or similar) which we could use as the backend data storage for our .NET application. My background is Windows Server Admin, so relatively new to Linux, but happy to give it a go and learn some new skills, as long as it won't be prohibitively difficult. If there are any other alternatives that would be great too.
Well this is quite a open ended question so I am going to give you some guidelines around what you can start researching.
Client Side embeded databases.
MySQL can be embedded just from my understanding MySQL as a embedded server might be overkill for a client.There are however a stack of alternatives. Once such a point would be the Berkely database system. There are other alternatives as well. Keep in mind you dont want a FULL sql server on the client side you are looking for something light weight.You can read about Berkley here: http://en.wikipedia.org/wiki/Berkeley_DB and about alternatives here : Single-file, persistent, sorted key-value store for Java (alternative to Berkeley DB). They mention SQLite which might just be up your alley. So in short there is a whole stack of open source tools you can use here.
Back End Databases. MySQL will do the job very well and even PostgreSQL. PostegreSQL seems to support more enterprise features the last time I looked however that might have changed. These two are your main players in the SQL server market as far as open source is concerned. Either one will do fine in your scenario. Both PostgreSQL and MySQL run on windows as well so you dont have to install Linux though I would suggest that you invest the time in Linux as I have it is well worth the effort and the peace of mind you get is good.
There is one major sticking point for you if you switch over to MySQL/PostgreSQL that the current RDA/replication technology you have will not be supported by these databases and you will need to look at how to implement this probably from scratch. So while the backend and even front end DB's can be replaced the replication of the data will be a little more problematic but NOT impossible.
Go play with these technologies do some tests and then you will need to decide how you will replace that replication.

I would like to create a database with the goal of populating this database with comprehensive inventory information obtained via a shell script [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I would like to create a database with the goal of populating this database with comprehensive inventory information obtained via a shell script from each client machine. My shell script currently writes this information to a single csv file located on a server through an ssh connection. Of course, if this script were to be run on multiple machines at once it would likely cause issues as each client potentially would try to write to the csv at the same time.
In the beginning, the inventory was all I was after; however after more thought I began to ponder wether or not much much more could be possible after I gathered this information. If I were to have this information contained within a database I might be able to utilize the information to initialize other processes based on the information of a specific machine or group of "like" machines. It is important to note that I am already currently managing a multitude of processes by identifying specific machine information. However pulling that information from a database after matching a unique identifier (in my mind) could greatly improve the efficiency. Also allowing for more of server side approach cutting down on the majority of client side scripting. (Instead of gathering this information from the client machine on the startup of each client I would have it already in a central database allowing a server to utilize the information and kick off specific events)
I am completely foreign to SQL and am not certain if it is 100% necessary. Is it necessary? For now I have decided to download and install both PostgreSQL and MySQL on separate Macs for testing. I am also fairly new to stackoverflow and apologize upfront if this is an inappropriate question or style of question. Any help including a redirection would be appreciated greatly.
I do not expect a step by step answer by any means, rather am just hoping for a generic "proceed..." "this indeed can be done..." or "don't bother there is a much easier solution."
As I come from the PostgreSQL world, I highly recommend using it for it's strong enterprise-level features and high standard compliance.
I always prefer to have a database for each project that I'm doing for the following benefits:
Normalized data is easier to process and build reports on;
Performance of database queries will be much better due to the caching done by the DB engine, indexes on your data, optimized query paths;
You can greatly improve machine data processing by using SQL/MED, which allows querying external data sources from the database directly. You can have a look on the Multicorn project and examples they provide.
Should it be required to deliver any kinds of reports to your management, DB will be your friend, while doing this outside the DB will be overly complicated.
Shortly — go for the database!