Save an array as a string or use multiple tables? - mysql

I am new to database design and I am using php and laravel in particular with mysql as my db layer.
I want to allow people to create and save training programs. each program can have multiple weeks, each week has 7 days, each day can have many training sessions, each session has 3 phases and each phase has many excersises.
I have a table of excersises already and I initially thought that I should just build an array of the data and store that array, unfortunately mysql does not support arrays so it would have to be stored as a string in the programs table. Now i am thinking that maybe a table for each object(weeks, days, session) that are all related in some way to the programs table might be a better way to go.
in the near future I would like for people to be able to mark each excersise /session /day / week as completed so which solution might make that easier?
If the array is the best option should I switch to postgres from mysql for their array functionality or is saving it as a string accepted practice?
Thanks

In my view it is good to use multiple tables instead of storing array as a string.
think of your needs
Plan your database design in such a way that in future if you want to add some table or columns in your table you face no difficulties. also i advice you to normalize your database. here are some good links on normalization of database.
Normalization of database,Four ways to normalize your database

It is good practise to use different tables for each one like weeks, days, sessions, phases, exercises.As per your needs you can consider another option like Mongo DB.

Related

Best practices for transforming a relational database to a non relational?

I have a MySQL Database and I need to create a Mongo Database (I don't care about keeping any data).
So are there any good practices for designing the structure (mongoose.Schema) based on the relational tables of MySQL ?
For example, the SQL has a table users and a table courses with relation 1:n, should I also create two collections in MongoDB or would it be better to create a new field courses: [] inside user document and create only the user collection ?
The schema definition should be driven by the use cases of the application.
Under which conditions is data accessed and modified. Which is the leading entity.
e.g. When a user is loaded do you always also want to know the courses of the user? This would be an argument for embedding.
Can you update courses without knowing all of its users, e.g. update the name of a course? Do you want to list an overview of all courses? This would be an argument for extracting into an own collection.
So there is no general guideline for such migration as only from the schema definition, the use cases cannot be derived.
If you don't care about data, the best approach is to redesign it from scratch.
NoSQLs differ from RDBMS in many ways so direct mapping will hardly be efficient and in many cases not possible at all.
First thing you need to answer to yourself (and probably to mention in the question) is why you need to change database in the first place. There are different kind of problems that Mongo can solve better than SQL and they require different data models. None of them come for free so you will need to understand the tradeoffs.
You can start from the very simple rule: in SQL you model your data after your business objects and describe relations between them, in Mongo you model data after queries that you need to respond to. As soon as you grasp the idea it will let you ask answerable questions.
It may worth reading https://www.mongodb.com/blog/post/building-with-patterns-a-summary as a starting point.
An old yet still quite useful https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1 Just keep in mind it was written long time ago when mongo did not have many of v4+ features. Nevertheless it describes philosophy of mongo data modelling with simple examples.It didn't change much since then.

use JSON column or a separate table to store a list of things of the same type in mysql

A User can have many Posts, I'm wondering whether I should store the ids of the Posts in a JSON column in the User table or I should create a separate table with composite primary key(user_id, post_id)?
If there are 10 million user, and each user has 100 posts, the separate table method will have 1 billion records, this doesn't seem very good for scaling or efficient.
Which method would be the better choice in this case? Does one have better performance than the other?
The answer depends on how you query the data. For example, do you ever need to fetch an individual post out of the list for a user? Or do anything else besides treat the full list of posts as one string? If so, then do not use JSON; use the separate table.
MySQL can actually support billions of rows in a table. I was responsible for one database at my last job that had over 5 billion rows. We used a mix of partitioning and indexing to help queries.
That said, I will say that any website with 10 million active users does not use just one MySQL Server instance for all the data. They split the data over many instances.
How to split data in this way has to be done very carefully, though. Do it only after you know exactly how your data will be queried. And after you have done everything else possible to optimize the database for the queries you run (indexing, caching, etc.).
Scalable Internet Architectures is a difficult, complex topic. There is no single, simple answer like "JSON vs. separate table" that will solve all your problems. There are many ways you can solve different bottlenecks of scalability.

MySQL Database Design. How to use dates in a relational database?

I'm in the process of building a database compiling various news articles from various sources and I'm trying to work out the most efficient way to catalogue the dates the articles were written.
As it stands I'm thinking of having one table with all the dates on, then pointing the article entries in the DB to the appropriate entry in the date table.
Obviously several problems with that spring to mind, not least of which being the incredibly unwieldy and excessively long list of dates I would have to create.
Is there a more efficient way of doing this? Bearing in mind that several tables within my database will be using the dates table.
Thanks in advance, and thank you for reading my essay...
As Ethan has suggested simply store the date in the same table as the article, no need for a lookup.
You may wish to store some details of an article separate from the body of the text, to potentially speed up searches but I would start simple. Store all relevant data in one table and only resort to partitioning if things are running slowly (avoid premature optimisation).
In addition to your main tables I would advise you to create an auxiliary Calendar Table to assist in date based queries. Contrary to what you said:
...the incredibly unwieldy and excessively long list of dates I would have to create.
A table containing dates for the next 50 years is only just over 18k rows, not much at all.
If you are going to be working with dates a lot I'd recommend you take a look at Developing Time-Oriented Database Applications in SQL by Richard Snodgrass, it's an excellent resource.
Use the MySQL DateTime format. Don't create a separate table for dates. MySQL is optimized for storing that format in each record and performing operations on it.
Creating a separate table for dates is more used for OLAP than OLTP, and I assume you are building an OLTP system.
The biggest problem with creating a separate table is that you would need to have a record for every possible time that might be used, and then you need to perform a lookup on that table to find the foreign key. That just is not a good idea at all.
Seeing as you are asking this question, I don't believe that you would need to get any more optimized than simply storing the datetime in each row. If you were building something where it would matter, you would probably know better than I would anyways.

Storing JSON in database vs. having a new column for each key

I am implementing the following model for storing user related data in my table - I have 2 columns - uid (primary key) and a meta column which stores other data about the user in JSON format.
uid | meta
--------------------------------------------------
1 | {name:['foo'],
| emailid:['foo#bar.com','bar#foo.com']}
--------------------------------------------------
2 | {name:['sann'],
| emailid:['sann#bar.com','sann#foo.com']}
--------------------------------------------------
Is this a better way (performance-wise, design-wise) than the one-column-per-property model, where the table will have many columns like uid, name, emailid.
What I like about the first model is, you can add as many fields as possible there is no limitation.
Also, I was wondering, now that I have implemented the first model. How do I perform a query on it, like, I want to fetch all the users who have name like 'foo'?
Question - Which is the better way to store user related data (keeping in mind that number of fields is not fixed) in database using - JSON or column-per-field? Also, if the first model is implemented, how to query database as described above? Should I use both the models, by storing all the data which may be searched by a query in a separate row and the other data in JSON (is a different row)?
Update
Since there won't be too many columns on which I need to perform search, is it wise to use both the models? Key-per-column for the data I need to search and JSON for others (in the same MySQL database)?
Updated 4 June 2017
Given that this question/answer have gained some popularity, I figured it was worth an update.
When this question was originally posted, MySQL had no support for JSON data types and the support in PostgreSQL was in its infancy. Since 5.7, MySQL now supports a JSON data type (in a binary storage format), and PostgreSQL JSONB has matured significantly. Both products provide performant JSON types that can store arbitrary documents, including support for indexing specific keys of the JSON object.
However, I still stand by my original statement that your default preference, when using a relational database, should still be column-per-value. Relational databases are still built on the assumption of that the data within them will be fairly well normalized. The query planner has better optimization information when looking at columns than when looking at keys in a JSON document. Foreign keys can be created between columns (but not between keys in JSON documents). Importantly: if the majority of your schema is volatile enough to justify using JSON, you might want to at least consider if a relational database is the right choice.
That said, few applications are perfectly relational or document-oriented. Most applications have some mix of both. Here are some examples where I personally have found JSON useful in a relational database:
When storing email addresses and phone numbers for a contact, where storing them as values in a JSON array is much easier to manage than multiple separate tables
Saving arbitrary key/value user preferences (where the value can be boolean, textual, or numeric, and you don't want to have separate columns for different data types)
Storing configuration data that has no defined schema (if you're building Zapier, or IFTTT and need to store configuration data for each integration)
I'm sure there are others as well, but these are just a few quick examples.
Original Answer
If you really want to be able to add as many fields as you want with no limitation (other than an arbitrary document size limit), consider a NoSQL solution such as MongoDB.
For relational databases: use one column per value. Putting a JSON blob in a column makes it virtually impossible to query (and painfully slow when you actually find a query that works).
Relational databases take advantage of data types when indexing, and are intended to be implemented with a normalized structure.
As a side note: this isn't to say you should never store JSON in a relational database. If you're adding true metadata, or if your JSON is describing information that does not need to be queried and is only used for display, it may be overkill to create a separate column for all of the data points.
Like most things "it depends". It's not right or wrong/good or bad in and of itself to store data in columns or JSON. It depends on what you need to do with it later. What is your predicted way of accessing this data? Will you need to cross reference other data?
Other people have answered pretty well what the technical trade-off are.
Not many people have discussed that your app and features evolve over time and how this data storage decision impacts your team.
Because one of the temptations of using JSON is to avoid migrating schema and so if the team is not disciplined, it's very easy to stick yet another key/value pair into a JSON field. There's no migration for it, no one remembers what it's for. There is no validation on it.
My team used JSON along side traditional columns in postgres and at first it was the best thing since sliced bread. JSON was attractive and powerful, until one day we realized that flexibility came at a cost and it's suddenly a real pain point. Sometimes that point creeps up really quickly and then it becomes hard to change because we've built so many other things on top of this design decision.
Overtime, adding new features, having the data in JSON led to more complicated looking queries than what might have been added if we stuck to traditional columns. So then we started fishing certain key values back out into columns so that we could make joins and make comparisons between values. Bad idea. Now we had duplication. A new developer would come on board and be confused? Which is the value I should be saving back into? The JSON one or the column?
The JSON fields became junk drawers for little pieces of this and that. No data validation on the database level, no consistency or integrity between documents. That pushed all that responsibility into the app instead of getting hard type and constraint checking from traditional columns.
Looking back, JSON allowed us to iterate very quickly and get something out the door. It was great. However after we reached a certain team size it's flexibility also allowed us to hang ourselves with a long rope of technical debt which then slowed down subsequent feature evolution progress. Use with caution.
Think long and hard about what the nature of your data is. It's the foundation of your app. How will the data be used over time. And how is it likely TO CHANGE?
Just tossing it out there, but WordPress has a structure for this kind of stuff (at least WordPress was the first place I observed it, it probably originated elsewhere).
It allows limitless keys, and is faster to search than using a JSON blob, but not as fast as some of the NoSQL solutions.
uid | meta_key | meta_val
----------------------------------
1 name Frank
1 age 12
2 name Jeremiah
3 fav_food pizza
.................
EDIT
For storing history/multiple keys
uid | meta_id | meta_key | meta_val
----------------------------------------------------
1 1 name Frank
1 2 name John
1 3 age 12
2 4 name Jeremiah
3 5 fav_food pizza
.................
and query via something like this:
select meta_val from `table` where meta_key = 'name' and uid = 1 order by meta_id desc
the drawback of the approach is exactly what you mentioned :
it makes it VERY slow to find things, since each time you need to perform a text-search on it.
value per column instead matches the whole string.
Your approach (JSON based data) is fine for data you don't need to search by, and just need to display along with your normal data.
Edit: Just to clarify, the above goes for classic relational databases. NoSQL use JSON internally, and are probably a better option if that is the desired behavior.
Basically, the first model you are using is called as document-based storage. You should have a look at popular NoSQL document-based database like MongoDB and CouchDB. Basically, in document based db's, you store data in json files and then you can query on these json files.
The Second model is the popular relational database structure.
If you want to use relational database like MySql then i would suggest you to only use second model. There is no point in using MySql and storing data as in the first model.
To answer your second question, there is no way to query name like 'foo' if you use first model.
It seems that you're mainly hesitating whether to use a relational model or not.
As it stands, your example would fit a relational model reasonably well, but the problem may come of course when you need to make this model evolve.
If you only have one (or a few pre-determined) levels of attributes for your main entity (user), you could still use an Entity Attribute Value (EAV) model in a relational database. (This also has its pros and cons.)
If you anticipate that you'll get less structured values that you'll want to search using your application, MySQL might not be the best choice here.
If you were using PostgreSQL, you could potentially get the best of both worlds. (This really depends on the actual structure of the data here... MySQL isn't necessarily the wrong choice either, and the NoSQL options can be of interest, I'm just suggesting alternatives.)
Indeed, PostgreSQL can build index on (immutable) functions (which MySQL can't as far as I know) and in recent versions, you could use PLV8 on the JSON data directly to build indexes on specific JSON elements of interest, which would improve the speed of your queries when searching for that data.
EDIT:
Since there won't be too many columns on which I need to perform
search, is it wise to use both the models? Key-per-column for the data
I need to search and JSON for others (in the same MySQL database)?
Mixing the two models isn't necessarily wrong (assuming the extra space is negligible), but it may cause problems if you don't make sure the two data sets are kept in sync: your application must never change one without also updating the other.
A good way to achieve this would be to have a trigger perform the automatic update, by running a stored procedure within the database server whenever an update or insert is made. As far as I'm aware, the MySQL stored procedure language probably lack support for any sort of JSON processing. Again PostgreSQL with PLV8 support (and possibly other RDBMS with more flexible stored procedure languages) should be more useful (updating your relational column automatically using a trigger is quite similar to updating an index in the same way).
short answer
you have to mix between them ,
use json for data that you are not going to make relations with them like contact data , address , products variabls
some time joins on the table will be an overhead. lets say for OLAP. if i have two tables one is ORDERS table and other one is ORDER_DETAILS. For getting all the order details we have to join two tables this will make the query slower when no of rows in the tables increase lets say in millions or so.. left/right join is too slower than inner join.
I Think if we add JSON string/Object in the respective ORDERS entry JOIN will be avoided. add report generation will be faster...
You are trying to fit a non-relational model into a relational database, I think you would be better served using a NoSQL database such as MongoDB. There is no predefined schema which fits in with your requirement of having no limitation to the number of fields (see the typical MongoDB collection example). Check out the MongoDB documentation to get an idea of how you'd query your documents, e.g.
db.mycollection.find(
{
name: 'sann'
}
)
As others have pointed out queries will be slower. I'd suggest to add at least an '_ID' column to query by that instead.

MongoDB or MySQL database

I have a question about making the decision whether to use MySQL database or Mongo database, the problem with my decision is that I am highly depending on these things:
I want to select records between two dates (period)
However is this possible?
My Application won't do any complex queries, just basic crud. It has Facebook integration so sometimes I got to JOIN the users table at the current setup.
Either DB will allow you to filter between dates and I wouldn't use that requirement to make the decision. Some questions you should answer:
Do you need to store your data in a relational system, like MySQL? Relational databases are better at cross entity joining.
Will your data be very complicated, but you will only make simple queries (e.g. by an ID), if so MongoDB may be a better fit as storing and retrieving complex data is a cinch.
Who and where will you be querying the data from? MySql uses SQL for querying, which is a much more well known skill than mongo's JSON query syntax.
These are just three questions to ask. In order to make a recommendation, we'll need to know more about your application?
MySQL(SQL) or MongoDB(NoSQL), both can work for your needs. but idea behind using RDBMS/NoSQL is the requirement of your application
if your application care about speed and no relation between the data is necessary and your data schema changes very frequently, you can choose MongoDB, faster since no joins needed, every data is a stored as document
else, go for MySQL
If you are looking for range queries in MongoDB - yes, Mongo supports those. For date-based range queries, have a look at this: http://cookbook.mongodb.org/patterns/date_range/