New Table vs. New Schema - mysql

Suppose I have a schema with many related tables: users, cities, items, purchases, etc. I now want a table in my database that contains solely event logging data for my internal support. The rows of information in the logging table are self-contained, not at all relational, and unrelated to my other tables. Is it better to create a new table in my existing schema, or to create an entirely new schema? Is one method preferred over the other? Is there a cost associated with one over the other?

In my opinion it all depends on the size of your database. If you are managing dozens of tables with millions of rows of data then you will probably have an easier time isolating and managing these logging tables into their own schema/database. If you are just managing a small app then don't worry, put everything into one database/schema. If your database is large or your anticipate your database becoming large then break them out. Once they are broken into separate entities you can easily manage the communication between the multiple databases/schemas using all kinds of great available tools.

In my opinion, if the data is unrelated, it belongs in a different schema. There's likely to be a very small overhead associated with creating a new schema, as opposed to having everything in a single schema, but I wouldn't have thought it was worth worrying about.

Related

Creating relationship between table column and another database

Say I have an application that adds companies. I have a table like this:
create table companies(
id primary key,
name varchar
)
For each company, I have to store loads of business-type information. Since they are so many, I've decided to create one database per company to avoid collision, very slow performance and complicated queries. The database name would be the company name in my companies table.
My problem is I would like to give the company name and the database a one-to-one relationship so they would be in sink with each other. Is that possible? If not, is there a better approach besides creating a database per company?
This is an elaboration on my comment.
Databases are designed to handle tables with millions, even billions of rows. I am guessing that your data is not that big.
Why do you want to store all the data for a single entity in a single table? Here are some reasons:
You can readily run queries across different companies.
You can define foreign key relationships between entities.
If you change the data structure, you can do it in one place.
It can be much more efficient in terms of space. Databases generally store data on data pages, and partially filled pages will eat up lots of space.
You have a single database for backup and recovery purposes.
A where clause to select a single company's data is not particularly "complicated".
(Note: This is referring to "entities", a database term. Data for the companies can still be spread across multiple tables.)
For performance, you can then adjust the data model, add indexes, and partition tables. This is sufficient for the vast myriad of applications that run on databases.
There are a handful of situations where a separate database per company/client is needed. Here are some I've encountered:
You are told this is what you have to do.
Each client really is customized, so there is little common data structure among them.
Security requirements specify that data must be in different databases or even on different servers (this is "good reason" for 1.).

Creating table for each user?

I'm programming a web application in which each user stores their tasks. (like a to-do application). Tasks will be stored in a table (for example: userstasks).
Which one is better?
1- userstasks table has a column named user_id that defines who created task?
2- a new table (e.g. usernametasks) will be created for each registered user that stores all their tasks?
P.S.: There are lots of users!
You always start with the simplest thing that works and stick with it until it's proven to be a performance problem. What you're talking about with #2 is termed "premature optimization". You only go down that road when #1 is having severe performance problems.
When you split data across different users, your ability to query across all users is severely diminished. For all intents, users will be living in different worlds. Reporting is nearly impossible.
For most applications that have a lot of reads, millions of records is not an issue. It's write-heavy applications that need special attention like this, or those with massive scale, like Reddit or Twitter. Since you're not making one of those, stick with the properly normalized structure first.
"Lots of users" probably means tens or hundreds of thousands. On a properly tuned MySQL instance that's not a big deal. If you need more scale, spin up some read-only secondary servers to spread out the load or look at using MySQL cluster.
I would go with option 1 (a table called tasks with a user_id foreign key) in the short run, assuming that a task can't have more than one user? If so then you'll need a JOIN table. Check into setting up an actual foreign key as well, this promotes referential integrity in the data itself.

DB table organization by entity, or vertically by level of data?

I hope the title is clear, please read further and I will explain what I mean.
We having a disagreement with our database designer about high level structure. We are designing a MySQL database and we have a trove of data that will become part of it. Conceptually, the data is complex - there are dozens of different types of entities (representing a variety of real-world entities, you could think of them as product developers, factories, products, inspections, certifications, etc.) each with associated characteristics and with relationships to each other.
I am not an experienced DB designer but everything I know tells me to start by thinking of each of these entities as a table (with associated fields representing characteristics and data populating them), to be connected as appropriate given the underlying relationships. Every example of DB design I have seen does this.
However, the data is currently in a totally different form. There are four tables, each representing a level of data. A top level table lists the 39 entity types and has a long alphanumeric string tying it to the other three tables, which represent all the entities (in one table), entity characteristics (in one table) and values of all the characteristics in the DB (in one table with tens of millions of records.) This works - we have a basic view in php which lets you navigate among the levels and view the data, etc. - but it's non-intuitive, to say the least. The reason given for having it this way is that it makes the size of the DB smaller, shortens query time and makes expansion easier. But it's not clear to me that the size of the DB means we should optimize this over, say, clarity of organization.
So the question is: is there ever a reason to structure a DB this way, and what is it? I find it difficult to get a handle on the underlying data - you can't, for example, run through a table in traditional rows-and-columns format - and it hides connections. But a more "traditional" structure with tables based on entities would result in many more tables, definitely more than 50 after normalization. Which approach seems better?
Many thanks.
OK, I will go ahead and answer my own question based on comments I got and more research they led me to. The immediate answer is yes, there can be a reason to structure a DB with very few tables and with all the data in one of them, it's an Entity-Attribute-Value database (EAV). These are characterized by:
A very unstructured approach, each fact or data point is just dumped into a big table with the characteristics necessary to understand it. This makes it easy to add more data, but it can be slow and/or difficult to get it out. An EAV is optimized for adding data and for organizational flexibility, and the payment is it's slower to access and harder to write queries, etc.
A "long and skinny" format, lots of rows, very few columns.
Because the data is "self encoded“ with its own characteristics, it is often used in situations when you know there will be lots of possible characteristics or data points but that most of them will be empty ("sparse data"). A table approach would have lots of empty cells, but an EAV doesn't really have cells, just data points.
In our particular case, we don't have sparse data. But we do have a situation where flexibility in adding data could be important. On the other hand, while I don't think that speed of access will be that important for us because this won't be a heavy-access site, I would worry about the ease of creating queries and forms. And most importantly I think this structure would be hard for us BD noobs to understand and control, so I am leaning towards the traditional model - sacrificing flexibility and maybe ease of adding new data in favor of clarity. Also, people seem to agree that large numbers of tables are OK as long as they are really called for by the data relationships. So, decision made.

SQL one-to-one relationships vs flattening

I'm using a standard SQL database and I'm trying to figure out whether or not to flatten a table or make it more "object-oriented". To me, smaller tables are easier to read but it would require joining tables and having one-to-one relationships. Is this generally a good way of doing things or is it frowned on in the SQL world?
I have a table which has the following attributes:
MYTABLE
- ID
- NAME
- LABEL
- CREATED_TS
- MODIFIED_TS
- CREATED_USER
- MODIFIED_USER
To me, the created/modified fields would be their own object. There are actually a few more fields as well so it's not really just this small. I would think that creating another table called "MYTABLE_MODINFO" or something like that which would have the CREATED and MODIFIED fields and they would be joined when data from them was needed. These tables aren't high access tables, they wouldn't have tons of queries per minute or even hundreds of rows in them, so I don't think efficiency would be much of an issue.
So mainly what I'm wondering is would this be a generally accepted design or should you generally keep your table structures flat?
You should create audit information in the same table. The reason is that this data is part of the row and is a one to one relationship, so there is no point in branching it apart.
If you want to store the audit info (audit tracking/history), then you can create another table, however in most cases I have seen this built by "duplicating" data and creating a surrogate key and mappings back to the original row. The reason I list duplicating in quotes is because auditing inherently requires duplication of the old data...if it is linked and changeable after being written, then it is not really an audit.
Just my two cents. If it does not make sense, then I can provide some examples. But, the gist is that each row will only ever have one current piece of modification information, so why break it out if it will never have more than one?
avoid a database 'one to one', you'll lose performance, scalability, independence. can you imagine what happen if you want to store 2 pictures per ID? will you create another field or will you repeat the row??... it's easier to create relationship to have more freedom when you want to upgrade, please review this tutorials.
http://www.youtube.com/watch?v=Onzm-PxSjtE
http://folkworm.ceri.memphis.edu/ew/SCHEMA_DOC/comparison/erd.htm
http://www.visual-paradigm.com/product/vpuml/provides/dbmodeling.jsp
Beside that you should normalize the DB to be sure that everything is in the best shape possible. Remember that the most important is to take what you need and adapt it.
http://databases.about.com/od/specificproducts/a/normalization.htm
http://www.youtube.com/watch?v=xzeuBwHkKxw
RDBMS design aren't the same with object-oriented approach in my view. the example you mentioned aren't different objects domain but data inheritance of your record. Since there would not be any overhead of tons of queries/execution of the table so you should keep them in the same table for auditing purpose and also easier to work with at normalize data.

Database separation - MySQL

I have a main MySQL db set up, and a class to handle the queries to it. It runs real nice. I am building a custom advertising system on my site and I'm wondering if there is any benefit to creating a separate database all together to handle that system?
Is there any pitfalls to doing it either way?
Option #1 - one DB for main website function, one DB for advertising system
Option #2 - one DB for both main website function and advertising system
Well, you need a new connection for every Database you use, also you need a new instance of your DB-Class - both costs some (minimal) memory. I personally see no reason why you would need/want to do this. If you just want to separate the two things, maybe you could use a prefix like "adv_" for the advertisement tables.
Edit: another problem could come up if you ever want to combine (e.g. join) data of the two databases - you will have a much easier time if you do not use multiple databases.
Johnnietheblack, there is no easy answer here, and not even one right answer: different tables need different approaches, and sometimes you have to throw away an academic/more "secure" database model to improve performance & scalability.
It's always a matter of trade-offs. Based on my personal experience, I have some thoughts to share with you:
When you separate tables in different databases, you have more work to do in your data abstraction layers to keep referential integrity (you have to do the DB chores...) and to link information. On the other hand, it's easier to manage the databases (indexes, data files, query tunings, etc.).
Tables with high insert rate and low maintenance (update/delete) and where referential integrity is not that important - like log tables - are good candidates to be put in a separate database: although the I/O from inserts are heavy, the records don't change over time, they are rarely retrieved, and their indexes tend to be pretty simple (date/time and some other attribute). I have one case where the log file was so big (millions of records) that at a point a single insert was taking almost 1 sec. Since it has 500 thousand new records each day, it was a snowball: we cannot stop the system to tune the damn thing because it takes too long, and the system was shutting down because this log table was used everywhere and was impacting the business (75% of the procedures used this table).
Databases can eat THOUSANDS of records for breakfast, so for small tables (less than 1000 records) you generally don't need to worry about, just the big ones ( more than 5000). I have a friend DBA that simply does not create indexes for performance in most of the tables: he made some tests and discovered that their SQL Server was changing the query plan to TABLE SCANS for most of the tables. But be careful here: is strong medicine!
Try to think about SaaS when it comes to define if a new tables set should be put together inside a database: your advertising system needs to be tightly integrated with your website or it can be a separate component, reusable by other components? If it is the later, you should think about using separate databases, to minimize impacts when you update the schema, do maintenance in the new tables, etc.
There are so many other cases, but alas, we have so little time... The important thing here is to keep an open mind and try to forget a little bit about 3rd form academically perfect database models. Hope it helps!