Granular 'Up to the minute' data recoverability of mySQL database data - mysql

I operate a web-based online game with a mySQL backend. Every day many writes are performed against hundreds of related tables holding user data.
Frequently a user's account will become compromised. I would like the ability to restore the user's data to a certain point in time prior to the attack without affecting any other user data.
I'm aware of binary logging in mySQL, but as far as I know this is whole-database recovery up to a certain point in time. I would like a more granular solution, ie able to specify which tables, which rows etc.
What should I be looking into? What are the general best-practices?

If you create and use audit tables (populated through triggers) you can always get back to the data for one particular user in any table.
Be sure to write your general restore script before you need it though. Much easier to put in a userid into a script that you already have available than to sit there looking at the audit tables going, how do I do this again.

MySQL (or any other RDBMS that I'm aware of) is not able to do that by itself. Therefore you should implement that yourself in your application layer.

This is (without external modules) not possible.
As thejh in the comments suggested, revisions would be a good solution. When you only need to work with userdata, create a table that resembles the usertable with additional timestamp or similar and run a cron job once a week/day/.. that copies the userdata that has recently been modified (additional flags/dates in the actual user table) into this table.

Related

Is it good to store mysql event logs in MongoDB?

Earlier in our database design, we use to create mandate fields for each of the table and few important fields were:
created_by
created_time
created_by_ip
updated_by
updated_time
updated_by_ip
Now, its an era of no-schema design. We prefer mongodb or some other just writing databases.
My question here is:
Is it a good practise to maintain logs in a separate database?
Do we need to create separate log table for each mysql tables considering mongodb or is it okay to have single mongodb audit table for
all mysql tables?
What things need to be considered in querying the results from mongodb?
What should be the structure for mongodb table structure?
Any other alternatives to store logs?
Considering situation where if we want to delete registered user if not authenticated in specified time(max of 48hrs).
If all the time logs are handled in mongodb. How can we query the same from mysql?
You usually want this (audit?) data next to the real data and definitely not in a different DB engine as the number of partial errors to support becomes quite a nightmare (e.g. someone registered, but you fail to insert audit data - is this ok? should the account become orphan? What happens if the app goes down half way?).
Systems that have this separation usually use messaging and 2 different listeners are responsible for storing the data and storing the audit (e.g. one in a relational DB and the other in an event store). In this way you have a higher chance of achieving eventual consistency.
Edit
There are a few options around using messaging and the assumption here is that both sources of data must be in sync (or as close as possible). Please bear in mind that I still think that storing data+audit together is by far the simplest and more sensible approach.
Using messaging, your app can emit a message on certain events (e.g. user created). Then 2 different listeners react to this message. One listener stores the data in one DB engine; Another listener stores the audit data. The problem with this approach is that you might need to ensure ordering on the messages, which makes it really slow.
Another (scary) approach is to use distributed (XA) transactions between MySQL and a messaging system (as mongo doesn't support transactions). Then the data to MySQL and the message would be committed together, and a listener can receive the audit data and store it in mongo.
I need to emphasize that the 2 approaches above are horrible and should never be implemented.
There are more sensible approaches but might require a different tech stack. For example using an EventSourcing+CQRS you can store the events (with the audit data) and store the final read models without the audit data.

Best database model for saas application (1 db per account VS 1 db for everyone)

Little question, I'm developing a saas software (erp).
I designed it with 1 database per account for these reasons :
I make a lot of personalisation, and need to add specific table columns for each account.
Easier to manage db backup (and reload data !)
Less risky : sometimes I need to run SQL queries on a table, in case of an error with bad query (update / delete...), only one customer is affected instead of all of them.
Bas point : I'm turning to have hundreds of databases...
I'm hiring a company to manage my servers, and they said that it's better to have only one database, with a few tables, and put all data in the same tables with column as id_account. I'm very very surprised by these words, so I'm wondering... what are your ideas ?
Thanks !
Frederic
The current environment I am working in, we handle millions of records from numerous clients. Our solution is to use Schema to segregate each individual client. A schema allows you to partition your clients into separate virtual databases while inside a single db. Each schema will have an exact copy of the tables from your application.
The upside:
Segregated client data
data from a single client can be easily backed up, exported or deleted
Programming is still the same, but you have to select the schema before db calls
Moving clients to another db or standalone server is a lot easier
adding specific tables per client is easier (see below)
single instance of the database running
tuning the db affects all tenants
The downside:
Unless you manage your shared schema properly, you may duplicate data
Migrations are repeated for every schema
You have to remember to select the schema before db calls
hard pressed to add many negatives... I guess I may be biased.
Adding Specific Tables: Why would you add client specific tables if this is SAAS and not custom software? Better to use a Postgres DB with a Hstore field and store as much searchable data as you like.
Schemas are ideal for multi-tenant databases Link Link
A lot of what I am telling you depends on your software stack, the capabilities of your developers and the backend db you selected (all of which you neglected to mention)
Your hardware guys should not decide your software architecture. If they do, you are likely shooting yourself in the leg before you even get out of the gate. Get a good senior software architect, the grief they will save you, will likely save your business.
I hope this helps...
Bonne Chance

best mysql table structure for INSERT only?

I have a website on a shared host, where I expect a lot of visitors. I don't need a database for reading (everything presented on the pages is hardcoded in PHP) but I would like to store data that my users enter, so for writing only. In fact, I only store this to do a statistical analysis on it afterwards (on my local computer, after downloading it).
So my two questions:
Is MySQL a viable option for this? It is meant to run on shared hosting, with PHP/MySQL available, so I cannot really use much other fancy packages, but if e.g. writing to a file would be better for this purpose, that's possible too. As far as I understood, adding a line to a file is a relatively complex operation for huge files. On the other hand, 100+ users simultaneously connecting to a MySQL database is probably also a huge load, even if it's just for doing 1 cheap INSERT query.
If MySQL is a good option, how should the table best be configured? Currently I have one InnoDB table, with a primary key id that auto-increments (next to of course the columns storing the data). This is general-purpose configuration, so maybe there are more optimized ways given that I only need to write to the table, and not read from it?
Edit: I mainly fear that the website will go viral as soon as it's released, so I expect the users to visit in a very short timeframe. And of course I would not like to lose the data that they enter due to an overloaded database.
MySQL is a perfectly reasonable choice for this. Probably much better than a flat file, since you say you want to aggregate and analyze this data later. Doing so with a flat file might take a long time, especially if the file is large. Additionally, RDBMS are for aggregation and dataset manipulation. Ideal for creating report data.
Put whatever data columns you want in your table, and some kind of identifier to track a user, in addition to your existing row key. IP address is a logical choice for user tracking, or a magic cookie value could potentially work. It's only a single table, you don't need to think too hard about it. You may want to add nonclustered indexes on columns you'll frequently filter on for reports, e.g. IP address, access date, etc.
I mainly fear that the website will go viral as soon as it's released, so I expect the users to visit in a very short timeframe. And of course I would not like to lose the data that they enter due to an overloaded database.
RDBMS such as MySQL are explicitly designed to handle heavy loads, assuming appropriate hardware backing. Don't sweat it.

Multi-Tenant Database design - Database for each user

I am working on a web application that will require users to have their own set of private data. My original plan was to create a stores table, a users table, and a user_stores intersecting table. Then I would, in the stores table, save the database name for that store (and create each store-specific database with an application user and password so the web application could always login).
Each store would have similar data (users, products, shipping methods, etc), and I know I can use foreign key references to tie everything together in one giant database. However, being that the data is very specific and potentially proprietary, would it be better to use my original design, or make a single database with everyone's data in there?
I am thinking for scaling concerns, separate databases would be better because we could put the more active accounts on their own (or more powerful) database servers and simply add a server location field in the stores table if we needed to. Additionally, it may be more secure because we could make add the user login information to the database and only give them access to their data (preventing one user from editing another user's stuff). My question is, are there concerns that I am missing though? Just about every post I have read about this says not to use the method I am thinking of, and I am no DBA. Any input would be helpful.
Additional Information:
This will be hosted on a Dedicated Server that I will have root access to. I can create as many MySQL databases as I need to.
I would use a single database for sure. Use the following to get started. There are several reasons to go with a single db, however the biggest reason of all is to save you from a maintenance nightmare. If you have to change the schema, you will have a mess on your hands.
http://msdn.microsoft.com/en-us/library/aa479086.aspx
In a multi-tenant database, database designers think about querying, cost, data isolation and protection, maintenance, and disaster recovery.
Multi-tenant solutions range from one database per tenant ("shared nothing") to one row per tenant ("shared everything"). This SO answer summarizes the tradeoffs. If you're designing a database that falls under some kind of regulatory environment (HIPAA, FERPA, etc.), that regulatory environment might trump all other considerations.
One database per tenant is a defensible decision in some cases. It's not clear whether that's the best answer in your case, though.

Generate general schema of a huge unknown database

I am required to make a general schema of a huge database that I have never used.
The problem is that I do not know how/where could I start doing this because, not considering the size, I have no idea of what is each table for. I can guess some but there are the mayority of them in which generic name fields do not say anything to me.
Do you have some advice?what could I do?
There is no documentation about the database and the creators are not able to help me because they are in another company now.
Thank you very much in advanced.
This isn't going to be easy.
Start by gathering any documentation, notes, etc. that exist. Also, it'll greatly help to have a thorough understanding of the type of data being stored, and of the application. Keep ample notes of your discoveries, and build the documentation that should have been built before.
If your database contains declared foreign keys, you can start there, and at least get down the relationships between the tables. Keeping in mind that this may be incomplete. As #John Watson points out, if the relationships are declared, there are tools to do this for you.
Check for stored functions and procedures, including triggers. Though these are somewhat uncommon in MySQL databases. Triggers especially will often yield clues ("every update to table X inserts a new row to table Y" -> "table Y is probably a log or audit table").
Some of the tables are hopefully obvious, and if you know what is related to them, you may be able to start figuring out those related tables.
Hopefully you have access to application code, which you can grep and read to find clues. Access to a test environment which you can destroy repeatedly will be useful too ("what happens if I change this in the app, where does the database change?"; "what happens if I scramble these values?"; etc.). You can dump tables and use diff on them, provided you dump them ordered by primary or unique key.
Doing queries like SELECT DISTINCT foo FROM table can help you see what different things can be in a column.
If its possible to start from a mostly-empty database (e.g., minimal to get the app to run), you can observe what changes as you add data to the app. Much quicker to dump the database when its small. Same for diffing it, same for reading through the output. Some things are easier to understand in a tiny database, but some things are more difficult. When you have a huge dataset and a column is always 3, you can be much more confident it always is.
You can watch SQL traffic from the application(s) to get an idea of what tables and columns they access for each function, and how they join them. Watching SQL traffic can be done in application-specific ways (e.g., DBI trace) or server-specific ways (turn on the general query log) or with a packet tracer like Wireshark or tcpdump. Which is appropriate is going to depend on the environment you're working in. E.g., if you have to do this on a production system, you probably want Wireshark. If you are doing this in dev/test, the disadvantage of the MySQL query log is that all the apps may very well be mixed together, and if multiple people are hitting the apps it'll get confusing. The app-specific log probably won't suffer from this, but of course the app may not have that.
Keep in mind the various ways data can be stored. For example, all three of these could mean May 1, 1980:
1980-05-01 — As a DATE, TIMESTAMP, or text.
2444330.5 — Julian day (with time, specifies at midnight)
44360 — Modified Julian day
326001600 — UNIX timestamp (with time, specifies midnight) assuming local time is US Eastern Time (seconds since Jan 1 1970 UTC)
There may be things in the database which are denormalized, and some of them may be denormalized incorrectly. E.g., you may be wondering "why does this user have a first name Bob in one table, and a first name Joe in another?" and the answer is "data corruption".
There may be columns that aren't used. There may be entire tables that aren't used. Despite this, they may still have data from older versions of the app (or other, no-longer-in-use apps), queries run from the MySQL console, etc.
There may be things which aren't visible in the application anywhere, but are used. Their purpose may be completely non-obvious without knowledge of the algorithms implemented in the app(s). For example, a search function in an app may store all kinds of precomputed information about the documents to search and their connections. Worse, these tables may only be updated by batch jobs, so changing a document won't touch them (making you mistakenly believe they have nothing to do with documents). Then, you come in the next morning, and the table is mysteriously very different. Though, in the search case, a query log when running search will tell you.
Try using the free mySQL workbench (it's specific to mySQL).
I have reverse engineered databases this way and also ended up with great Entity Relationship Diagrams!
I've worked with SQL for 20 years and this product really is great (it's free, from the mysql folks themselves).
It can have occasional problems, crashes, etc. at least it did on Ubuntu10 but they've been relatively rare and far out-weighed by the benefits! It's also actively developed so bugs are actually fixed on an on-going basis.
Assuming that nobody bothered to declare foreign keys in the table definition, and the database belongs to an application which is in use, after grabbing the current schema, the next step for me would be to enable logging of all queries (hoping that the data does NOT use a trivial ORM like [x]hibernate) to identify joins and data semantics.
This perl script may be helpful.