Database schema/design for storing metrics - mysql

For clarification, I don't want to store metrics on the database itself - rather, I want to build a database to store metrics from the various controls we measure at my organization for easy reporting. A little background: as manager, I pull metrics from various applications - our two ticketing systems (yeah, I know), our phone system, alerts from our event management software (i.e., Nagios), etc. I report on these on a weekly basis and keep an Excel spreadsheet with the historical data. The spreadsheet is really, really big and inflexible.
I'm not new at writing web apps, but I'm still new to the database design arena. I want to code an app with some awesome rickshaw javascript graphs for some great historical data (and to wow the senior management team with crazy colors and interactivity).
Where do I start with the database? I could create one table for all metrics, but how to index those into the various types (for instance, phone metrics has abandon rate, total inbound calls, total outbound calls, total time on call, average talk time, average hold time, max hold time, etc.). That's one messy, unorganized table.
I could create one table for each type (phone, ticket, event, etc.) but that seems hard to add metrics to the pile later.
I'm hoping someone here has some experience and can give me some pointers on what direction I should head.
PS: It will need to be SQLite or MySQL, just due to the resources I have available at this time.

MySQL design for such a system can be made considering following:
A Table for each type of metrics group for example an entity of ticket system can be a single ticket
If a ticket is connected to single user you may include user name in the previous ticket table otherwise to keep it flexible i would say create a table for each connected element for example ticket is assinged to staff and has multiple telephone calls associated to it so you would need calls table and staff table.
In order to map multiple items create mapping tables for example stafftickets and ticketcalls to associate staff with multiple tickets and tickets with multiple calls
Once you have defined these entities then you can sit on mySQL phpmyadmin and create tables that will work.
For charting side of things use D3.js and just spit out json and use javascript or json2 to bind it to your graphs etc.

Related

Design database schema to support multi-tenant in MYSQL

I'm working on a School manager software in ASP that connects to an MYSQL DB. The software is working great when I deploy it in local machine for each user (SCHOOL), but I want to migrate software to AZURE cloud. The users will have an account to connect to the same app but data must not mix with other schools data. My problem is to find the best way to deploy and manage the database.
Must I Deploy 1 DB for each school
All school DATA in the same DB.
I'm not sure my solutions are the best ways.
I don't want ex STUDENT TABLE( content student for school X, for SCHOOL Y, ...)
please help to find the best solution.
There are multiple possible ways to design schema to support multi-tenant. The simplicity of the design depends on the use case.
Separate the data of every tenant (school) physically, i.e., one
schema must contain data related to only a specific tenant.
Pros:
Easy for A/B Testing. You can release updates which require database changes to some tenants and over time make it available for others.
Easy to move the database from one data-center to another. Support different SLA for backup for different customers.
Per tenant database level customization is easy. Adding a new table for customers, or modifying/adding a field becomes easy.
Third party integrations are relatively easy, e.g., connecting your data with Google Data Studio.
Scaling is relatively easy.
Retrieving data from one tenant is easy without worrying about the mixing up foreign key values.
Cons:
When you have to modify any field/table, then your application code needs to handle cases where the alterations are not completed in some databases.
Retrieving analytics across customers becomes difficult. Designing Queries for usage analysis becomes harder.
When integrating with other databases system, especially NoSQL, you will need more resources. e.g., indexing data in Elasticsearch for every tenant will require index per tenant, and if there are thousands of customers, it will result in creating thousands of shards.
Common data across tenants needs to be copied in every database
Separate data for every tenant (school) logically, i.e., one schema
contains data for all the tenants.
Pros:
Software releases are simple.
Easy to query usage analytics across multiple tenants.
Cons:
Scaling is relatively tricky. May need database sharding.
Maintaining the logical isolation of data for every tenant in all the tables requires more attention and may cause data corruption if not handled at the application level carefully.
Designing database systems for the application that support multiple regions is complicated.
Retrieving data from a single tenant is difficult. (Remember: all the records will be associated with some other records using foreign keys.)
This is not a comprehensive list. These are based on my experiences with working on both the type of designs. Both the designs are common and are used by multiple organization based on the usecase.

Storing Visualizations and Analysis in Database

I am currently working on a web-application that would allow users to analyze & visualize data. For example, one of the use-cases is that the user will perform a Principal Component Analysis and store it. There can be other such analysis like a volcano plot, heatmap etc.
I would like to store these analysis and visualizations in a database in the back-end. The challenge that I am facing is how to design a relational database schema which will do this efficiently. Here are some of my concerns:
The data associated with the project will already be stored in a normalized manner so that it can be recalled. I would not like to store it again with the visualization.
At the same time, the user should be able to see what is the original data behind a visualization. For eg. what data was fed to a PCA algorithm? The user might not use all the data associated with the project for the PCA. He/she could just be doing this on a subset of the data in the project.
The number of visualizations associated with the webapp will grow with time. If I need to design an invoved schema everytime a new visualization is added, it could make overall development slower.
With these in mind, I am wondering if I should try to solve this with a relational database like MySQL at all. Or should I look at MongoDB? More generally, how do I think about this problem? I tried looking for some blogs/tutorials online but couldn't find much that was useful.
The first step you should do before thinking about technical design, including a relational or non-SQL platform, is a data model that clearly describes the structure and relations between your data in a platform independent way. I see the following interesting points to solve there:
How is a visualisation related to the data objects it visualizes? When the visualisation just displays the data of one object type (let's say the number of sales per month), this is trivial. But if it covers more than one object type (the number of sales per month, product category, and country), you will have to decide to which of them to link it. There is no single correct solution for this, but it depends on the requirements from the users' view: From which origins will they come to find this visualisation? If they always come from the same origin (let's say the country), it will be enough to link the visuals to that object type.
How will you handle insertions, deletes, and updates of the basic data since the point in time the visualisation has been generated? If no such operations relevant to the visuals are possible, then it's easy: Just store the selection criteria (country = "Austria", product category = "Toys") with the visual, and everyone will know its meaning. If, however, the basic data can be changed, you should implement a data model that covers historizing those data, i.e. being able to reconstruct the data values on which the original visual was based. Of course, before deciding on this, you need to clarify the requirements: Will, in case of changed basic data, the original visual still be of interest or will it need to be re-generated to reflect the changes?
Both questions are neither simplified nor complicated by using a NOSQL database.
No matter what the outcome of those requirements and data modeling efforts are, I would stick to the following principles:
Separate the visuals from the basic data, even if a visual is closely related to just one set of basic data. Reason: The visuals are just a consequence of the basic data that can be re-calculated in case they get lost. So the requirements e.g. for data backup will be more strict for the basic data than for the visuals.
Don't store basic data redundantly to show the basis for each single visual. A timestamp logic with each record of basic data, together with the timestamp of the generated visual will serve the same purpose with less effort and storage volume.

mysql for payment system?

I'm trying to create a payment system for my website. The website is a market place for 3d printing blueprint. Users buy credits on my website. When a user purchase a 3d printing blueprint uploaded by other user, it creates a new tuple or a row in the 'purchased' table while deducting credit in user credit table. Here's the important part. My gut tells me to use event scheduler to mark rows of purchased as payed every month and wire the sum of money earned by each seller. My worry is the table will grow infinitely as months pass by.
Is this the right implementation?
Or can I somehow create a new table each month that holds transactions for only this month?
Is there a Nosql equivalent to this?
Stripe.com or Braintree.com might be good options for you.
It is not advisable to create or roll your own payments implementation. These established services not only handle the PCI compliance aspect of payments, but they also have direct support for the use case you're asking about.
In an effort to answer your question further - it's probably not going to be an issue from the stand point of performing inserts into this MySQL table or in terms of iterating across it for batch processing. Querying on the other hand will become more onerous as the data set gets very large.
You can use partitioning in MySQL and perform the partitioning based on date but I doubt this is something you should spend your time accomplishing at this point. Wait until your site blows up and is super popular then come back and update your schema and configuration to meet your actual usage demands.
It's worth noting that you'll also want to make sure to take regular backups of something as important of payments information. Typically you'd also see at least one replica for something this critical.
Again I don't think you should try and solve this yourself. Just pay for a service that does this for you and focus on building the best 3d blueprint marketplace.

Unifying Database Storage

First of all, sorry for the question title - I was unable to think about something better.
I have an interesting problem.
There are three web applications:
1. ApplicationA => example.com -> hosted in Germany
2. ApplicationB => example2.net -> hosted in Australia
3. ApplicationC => anotherexample.com -> hosted in United States
All of them are completely free however owner is planning to implement some paid options. The main issue is that applications are hosted on separate servers, in three different locations.
Now, if owner wants to implement any paid options, he needs to created unified invoicing system (as invoices numbering order needs to be correct).
So, we have situation:
1. user buys a premium option on example.com
2. another user buys a premium option on example2.net
3. third and fourth users buy extra options on anotherexample.com
So we have 4 invoices, so they numbering should be as following: 2011/01, 2011/02, 2011/03, 2001/04.
As mentioned above, the main issue is to unify invoicing system as applications use different databases and are hosted on different servers. Of course, invoices should be stored in application-specified database.
Theoretically we have only one issue: invoices numbers. Obviously we need to create a unified system for invoices storage.
There might be few possible issues:
there might be a lot of API requests to invoicing system
every single invoice needs to be stored in the database
while creating every single invoice in every external application, we need
to query invoicing system for the
latest invoice number.
I'm really interested in your knowing your approaches and suggestions. Any activity in this case is highly appreciated.
First, I would have the independent invoicing systems in example.com, example2.net and anotherexample.com all have their own internal primary keys for the invoices generated from within each of these systems. Each system should have its own independent copy of the invoicing logic because you don't want an outage on one server knocking out invoicing on every server.
Whenever you have a distributed system where local copies are creating records for something that will be amalgamated later, it's a good idea to use a GUID as the local primary key, or if you have a philosophical objection to GUID as PKs, create a GUID as a candidate key. This way, you can bring together invoices from all of your systems (and any future ones) without worrying about key collisions and you'll be able to track the combined records back to the source records, should you ever have to do that.
Next, you'll need an integrated invoice system where all of the invoice details are collected periodically. To facilitate this, you need processes on each local invoicing system pumping their own records up to the centralized system. Keep a flag on the invoices in the local systems to identify which invoices have been successfully uploaded - or if you have very high volumes, use a work-list table containing the invoice keys that still need transmitting instead of a flag right on the local invoice table.
The centralized invoice system will also want to have a source code on the combined invoice table so that you can easily tell which website created the invoice originally.
As to invoice numbers, I'm assuming from your question that the customer is a bit fussy about having proper sequencing of your invoice numbers. You can have the centralized system generate these numbers for you using a web service to pick up the next invoice ID. If the centralized service is down for any reason, you can still give the customer an "order reference" i.e. the GUID and just hold back on the invoice number until it can be generated through the central server. This should satisfy your customer's need for tight sequential invoice numbers while preserving the ability to operate multiple sites on multiple servers.
If your customer actually doesn't care about tight sequencing of the invoice numbers, then another alternative is to have the central system generate blocks of reserved invoice numbers and allot them to each website. When the website is getting low on its allotment, it asks the central server for another block. This gives you breathing room in the sequences in case there are communication difficulties.
In my opinion, it would be better to use an encoded invoice number. This way, you won't need to worry about number order getting mixed up.
For example, invoices can be prefixed with the country domain like de297,de298, etc., for invoices from Germany.
And going one step further, I'd incorporate the year as well. Thus it would reset at the beginning of each year and still maintain no conflicts, while at the same time keep the invoice number within a small length.

Organizing a MySQL Database

I'm developing an application that will require me to create my first large-scale MySQL database. I'm currently having a difficult time wrapping my mind around the best way to organize it. Can anyone recommend any reading materials that show different ways to organize a MySQL database for different purposes?
I don't want to try getting into the details of what I imagine the database's main components will be because I'm not confident that I can express it clearly enough to be helpful at this point. That's why I'm just looking for some general resources on MySQL database organization.
The way I learned to work these things out is to stop and make associations.
In more object oriented languages (I'm assuming you're using PHP?) that force OO, you learn to think OO very quickly, which is sort of what you're after here.
My workflow is like this:
Work out what data you need to store. (Customer name etc.)
Work out the main objects you're working with (e.g. Customer, Order, Salesperson etc), and assign each of these a key (e.g. Customer ID).
Work out which data connects to which objects. (Customer name belongs to a customer)
Work out how the main objects connect to each other (Salesperson sold order to Customer)
Once you have these, you have a good object model of what you're after. The next step is to look at the connections. For example:
Each customer has only one name.
Each product can be sold multiple times to anybody
Each order has only one salesperson and one customer.
Once you've worked that out, you want to try something called normalization, which is the art of getting this collection of data into a list of tables, still minimizing redundancy. (The idea is, a one-to-one (customer name) is stored in the table with the customer ID, many to one, one to many and many to many are stored in separate tables with certain rules)
That's pretty much the gist of it, if you ask for it, I'll scan an example sheet from my workflow for you.
Maybe I can provide some advices based on my own experience
unless very specific usage (like fulltext index), use the InnoDB tables engine (transactions, row locking etc...)
specify the default encoding - utf8 is usually a good choice
fine tune the server parameters (*key_buffer* etc... a lot of material on the Net)
draw your DB scheme by hand, discuss it with colleagues and programmers
define data types based not only on the programs usage, but also on the join queries (faster if types are equal)
create indexes based on the expected necessary queries, also to be discussed with programmers
plan a backup solution (based on DB replication, or scripts etc...)
user management and access, grant only the necessary access rights, and create a read-only user to be used by most of queries, that do not need write access
define the server scale, disks (raid?), memory, CPU
Here are also some tips to use and create a database.
I can recomend you the first chapter of this book: An Introduction to Database Systems, it may help you organize your ideas, and I certainly recomend not using 5th normal form but using 4th, this is very important.
If I could only give you one piece of advice, that would be to generate test data at similar volumes as production and benchmark the main queries.
Just make sure that the data distribution is realistic. (Not all people are named "John", and not all people have unique names. Not all people give their phone nr, and most people won't have 10 phone numbers either).
Also, make sure that the test data doesn't fit into RAM (unless you expect the production data volumes to do too).