Storing custom MySQL MetaData - best practices - mysql

I have a fairly large database containing a number of different tables representing different product types (eg. cars; baby strollers).
I'm using a website built with PHP to access the data and display it, and I allow users to filter the data (typical online product database sort of stuff).
I'm not sure if I went about storing my metadata the correct way. I'm using XML to do a lot of stuff, which requires making a product type table in MySQL first, and then adding information about each of the columns in that table in my big XML "column attribute" file. So I'll have the name of each column listed in the XML table with information about the column. I store localized names for the column in the XML file, and indicate what type of information about the product is being stored in the column (e.g. Is a column showing a dimension (to be listed in the product dimensions area) or a feature (for the features area)).
First off, am I way off base storing all this custom metadata in XML?
Secondly, if I should be storing some of it in MySQL (and I think I should be moving some of it there), what's the best way to do that? I see that I can make column "comments" in MySQL....are those standard fare for databases? If I move to Oracle some day, would I lose all my comment info? I'm not thinking of moving much information to the database, and some of it could be accomplished by just adding a little identifier to my column names (e.g. number_of_wheels becomes number_of_wheels_quantity, length becomes length_dimension)
Any advice from the database design gurus out there would be vastly appreciated. Thanks :)

First off, am I way off base storing all this custom metadata in XML?
Yes, XML is a great markup for transporting data in a nearly human readable format, but a horrible one for storing it. It's very costly to search through XML, and I don't know of a (good) way to have a query search through XML stored in a field in the DB. You are probably better off with a table that stores these things directly, you can easily convert them into XML if you need to, after you query them from the DB. I think in your case a table with the following columns would be useful: "ColumnName","MetaData" Would be all you need, populate with values as per your example:
__________________________________________________________________________________________________
|colDimension | Is a column showing a dimension (to be listed in the product dimensions area) |
|colFeature | a feature (for the features area) |
--------------------------------------------------------------------------------------------------
This scheme will resolve your comments conundrum as well, as you can add another field to the above table to store the comments in, which will make them much more accessible to your middle tier (php in your case) if you ever want to display those comments.
I had to make a few assumptions as to intent and existing data and whatnot, so if I'm wrong about anything, let me know why it doesn't work for you and I'll respond with some corrections or other pointers.

See, your purpose is to keep the Meta data at a place. right?
I'll suggest you to use the freely available tool Mysql Workbench. In this tool you have option to create ER diagram (or EER diagram). You can keep the whole structure and at any point of time you can sync with server and restore the structure. You can backup those structures also. Its kind of you have to learn first if you are not already using it. But at last its a very helpful tool for keeping the structure in an organized way.

Related

How should I organize user data with several rows in MySQL?

I am currently developing a quiz-app, that keeps track of user data, such as :
the sets they've studied by the ID of the specific set ([1,2,6,12])
the friends they have by the id of the user ([1,2,3,4])
their schedule(
{"2022-07-03 00:00:00":{"551":{"type":"Flashcards","setid":"1"},"552":{"type":"Flashcards","setid":"1"},"553":{"type":"Flashcards","setid":"1"},"554":{"type":"Flashcards","setid":"1"},"555":{"type":"Flashcards","setid":"1"},"556":{"type":"Flashcards","setid":"1"},"557":{"type":"Flashcards","setid":"6"},"558":{"type":"Flashcards","setid":"6"},"559":{"type":"Flashcards","setid":"6"},"560":{"type":"Flashcards","setid":"6"}}})
every individual day they've logged in (["05/15/2022","05/17/2022","05/18/2022","05/19/2022","05/22/2022","05/23/2022","05/24/2022","05/25/2022","05/28/2022","05/29/2022","05/30/2022","05/31/2022","06/02/2022","06/05/2022","06/07/2022","06/08/2022","06/10/2022","06/11/2022","06/13/2022","06/14/2022","06/15/2022","06/17/2022","06/18/2022","06/19/2022","06/20/2022","06/22/2022","06/24/2022","06/25/2022","06/26/2022","06/28/2022","06/29/2022","06/30/2022","07/01/2022","07/02/2022"])
Note: there is quite a lot of other types of information that is stored inside the table.
All of this aforementioned information is collected in a mysql table called "users", which has rows for each user, with accompanying data (as mentioned above).
It has recently come to my attention that MySQL has data limits for the amount of data that can be represented in a given row (around 65K bytes). If I continue to represent data this way, I believe that at scale (assume a user uses the app for 5 years, imagine the amount of data inside the "every individual day they've logged in" table), I will face MySQL's data limits and it may cause problems in the future.
Here's a picture, showing how the information is represented inside of the table "users"
How would I better represent this type of table? Should I use multiple tables inside an SQL database? How should I format it? Do I not have to worry about the data limit, and should I continue saving data in this way?
Thanks.
If I understand this correctly, you are packing way too much information in each row. The structure of your data is not being represented in a way that allows MySQL to do what it is good at. You are just creating big buckets for each user and stashing them in MySQL.
To make this work better, you can either create tables to store each relationship (this is, after all, a relational database) like user_login, user_friend_requests, and so on. The direct answer to your question is that each cell in your table should be a table itself.
OR, you can embrace the blob, and use something like mongodb, which is much more suited for storing and retrieving the data in a way that fits your mindset. Since you don't do any real queries on the data, a NoSQL solution would probably fit you better.
So the "right" answer to your question is "modify your schema to store this data better, or switch your database to match the way you want to store the data."
However, having said all that, since it seems you are storing JSON in those cells, you can use the JSON data type (max size 1GB but better if you don't use so much - see https://dev.mysql.com/blog-archive/how-large-can-json-documents-be), or LONGTEXT, 4GB. (Assuming you are running in a 64-bit environment - see Maximum length for MySQL type text).
The JSON data type actually has some pretty cool features.

MySQL, objects and superclasses

I'm having some troubles with designing my website. I'm trying to use OOP design in the way I design my site and using MySQL to store the objects in JSON format.
So I'm creating a MySQL table inside my database. The table is going to contain a primary key (PK) and what I call page type (pageType, ex: homePage, aMessage, aTutorial, etc). This means that I will have serval different pages that have different page formats (pageFormats, ex: headerArea, contentArea, footerArea, etc). So depending on the pageType object that was requested, the query would then go to pageFormat table to retrieve the desired divs.
So for example, we have a AJAX request that says the pageType is set to homePage. The request would then go to the pageFormat table and see which divs the homePage is allowed and then return them. I then of course would write them to the document and continue on loading the page with desired content and so forth.
I am just having trouble going from my UML / documentation to actual development of this idea. So if someone could help me with this it'd be greatly appreciated. The trouble that is most difficult for me to understand is that in MySQL database I have is setting up the table for pageType and pageFormat.
The returned types would be in JSON form so the scripts of my page would be able to format them correctly. So that leads me to my second question of what is the best way to store JSON objects in a MySQL table that are going to be divs? Would it be TINYTEXT? Because I don't plan on having large amount of text in there?
Then my last question would be, what would be the best table type? I'm having trouble with selecting this as well.
I have referenced http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html for the table types to try and help me though I'm still unsure.
I also have been reading http://www.agiledata.org/essays/mappingObjects.html#BasicConcepts to understand how to implement relational databases and mapping objects to them. Is there any other good reads that I should look into?
Thanks for any help / direction.
If you want to objects data as JSON-documents, I recommend you to use a NoSQL database like MongoDB or CouchDB instead of MySQL, since they allow you to use JSON directly to store data.
With MySQL, the most common approach is to use a ORM layer to avoid the impedance between a relational database management system and a object-oriented language. By storing a JSON-like object in a RDMS, you are forcing your dynamic schema document to fit in a scalar value, which in my opinion is not the best solution at all.

Storing JSON in database vs. having a new column for each key

I am implementing the following model for storing user related data in my table - I have 2 columns - uid (primary key) and a meta column which stores other data about the user in JSON format.
uid | meta
--------------------------------------------------
1 | {name:['foo'],
| emailid:['foo#bar.com','bar#foo.com']}
--------------------------------------------------
2 | {name:['sann'],
| emailid:['sann#bar.com','sann#foo.com']}
--------------------------------------------------
Is this a better way (performance-wise, design-wise) than the one-column-per-property model, where the table will have many columns like uid, name, emailid.
What I like about the first model is, you can add as many fields as possible there is no limitation.
Also, I was wondering, now that I have implemented the first model. How do I perform a query on it, like, I want to fetch all the users who have name like 'foo'?
Question - Which is the better way to store user related data (keeping in mind that number of fields is not fixed) in database using - JSON or column-per-field? Also, if the first model is implemented, how to query database as described above? Should I use both the models, by storing all the data which may be searched by a query in a separate row and the other data in JSON (is a different row)?
Update
Since there won't be too many columns on which I need to perform search, is it wise to use both the models? Key-per-column for the data I need to search and JSON for others (in the same MySQL database)?
Updated 4 June 2017
Given that this question/answer have gained some popularity, I figured it was worth an update.
When this question was originally posted, MySQL had no support for JSON data types and the support in PostgreSQL was in its infancy. Since 5.7, MySQL now supports a JSON data type (in a binary storage format), and PostgreSQL JSONB has matured significantly. Both products provide performant JSON types that can store arbitrary documents, including support for indexing specific keys of the JSON object.
However, I still stand by my original statement that your default preference, when using a relational database, should still be column-per-value. Relational databases are still built on the assumption of that the data within them will be fairly well normalized. The query planner has better optimization information when looking at columns than when looking at keys in a JSON document. Foreign keys can be created between columns (but not between keys in JSON documents). Importantly: if the majority of your schema is volatile enough to justify using JSON, you might want to at least consider if a relational database is the right choice.
That said, few applications are perfectly relational or document-oriented. Most applications have some mix of both. Here are some examples where I personally have found JSON useful in a relational database:
When storing email addresses and phone numbers for a contact, where storing them as values in a JSON array is much easier to manage than multiple separate tables
Saving arbitrary key/value user preferences (where the value can be boolean, textual, or numeric, and you don't want to have separate columns for different data types)
Storing configuration data that has no defined schema (if you're building Zapier, or IFTTT and need to store configuration data for each integration)
I'm sure there are others as well, but these are just a few quick examples.
Original Answer
If you really want to be able to add as many fields as you want with no limitation (other than an arbitrary document size limit), consider a NoSQL solution such as MongoDB.
For relational databases: use one column per value. Putting a JSON blob in a column makes it virtually impossible to query (and painfully slow when you actually find a query that works).
Relational databases take advantage of data types when indexing, and are intended to be implemented with a normalized structure.
As a side note: this isn't to say you should never store JSON in a relational database. If you're adding true metadata, or if your JSON is describing information that does not need to be queried and is only used for display, it may be overkill to create a separate column for all of the data points.
Like most things "it depends". It's not right or wrong/good or bad in and of itself to store data in columns or JSON. It depends on what you need to do with it later. What is your predicted way of accessing this data? Will you need to cross reference other data?
Other people have answered pretty well what the technical trade-off are.
Not many people have discussed that your app and features evolve over time and how this data storage decision impacts your team.
Because one of the temptations of using JSON is to avoid migrating schema and so if the team is not disciplined, it's very easy to stick yet another key/value pair into a JSON field. There's no migration for it, no one remembers what it's for. There is no validation on it.
My team used JSON along side traditional columns in postgres and at first it was the best thing since sliced bread. JSON was attractive and powerful, until one day we realized that flexibility came at a cost and it's suddenly a real pain point. Sometimes that point creeps up really quickly and then it becomes hard to change because we've built so many other things on top of this design decision.
Overtime, adding new features, having the data in JSON led to more complicated looking queries than what might have been added if we stuck to traditional columns. So then we started fishing certain key values back out into columns so that we could make joins and make comparisons between values. Bad idea. Now we had duplication. A new developer would come on board and be confused? Which is the value I should be saving back into? The JSON one or the column?
The JSON fields became junk drawers for little pieces of this and that. No data validation on the database level, no consistency or integrity between documents. That pushed all that responsibility into the app instead of getting hard type and constraint checking from traditional columns.
Looking back, JSON allowed us to iterate very quickly and get something out the door. It was great. However after we reached a certain team size it's flexibility also allowed us to hang ourselves with a long rope of technical debt which then slowed down subsequent feature evolution progress. Use with caution.
Think long and hard about what the nature of your data is. It's the foundation of your app. How will the data be used over time. And how is it likely TO CHANGE?
Just tossing it out there, but WordPress has a structure for this kind of stuff (at least WordPress was the first place I observed it, it probably originated elsewhere).
It allows limitless keys, and is faster to search than using a JSON blob, but not as fast as some of the NoSQL solutions.
uid | meta_key | meta_val
----------------------------------
1 name Frank
1 age 12
2 name Jeremiah
3 fav_food pizza
.................
EDIT
For storing history/multiple keys
uid | meta_id | meta_key | meta_val
----------------------------------------------------
1 1 name Frank
1 2 name John
1 3 age 12
2 4 name Jeremiah
3 5 fav_food pizza
.................
and query via something like this:
select meta_val from `table` where meta_key = 'name' and uid = 1 order by meta_id desc
the drawback of the approach is exactly what you mentioned :
it makes it VERY slow to find things, since each time you need to perform a text-search on it.
value per column instead matches the whole string.
Your approach (JSON based data) is fine for data you don't need to search by, and just need to display along with your normal data.
Edit: Just to clarify, the above goes for classic relational databases. NoSQL use JSON internally, and are probably a better option if that is the desired behavior.
Basically, the first model you are using is called as document-based storage. You should have a look at popular NoSQL document-based database like MongoDB and CouchDB. Basically, in document based db's, you store data in json files and then you can query on these json files.
The Second model is the popular relational database structure.
If you want to use relational database like MySql then i would suggest you to only use second model. There is no point in using MySql and storing data as in the first model.
To answer your second question, there is no way to query name like 'foo' if you use first model.
It seems that you're mainly hesitating whether to use a relational model or not.
As it stands, your example would fit a relational model reasonably well, but the problem may come of course when you need to make this model evolve.
If you only have one (or a few pre-determined) levels of attributes for your main entity (user), you could still use an Entity Attribute Value (EAV) model in a relational database. (This also has its pros and cons.)
If you anticipate that you'll get less structured values that you'll want to search using your application, MySQL might not be the best choice here.
If you were using PostgreSQL, you could potentially get the best of both worlds. (This really depends on the actual structure of the data here... MySQL isn't necessarily the wrong choice either, and the NoSQL options can be of interest, I'm just suggesting alternatives.)
Indeed, PostgreSQL can build index on (immutable) functions (which MySQL can't as far as I know) and in recent versions, you could use PLV8 on the JSON data directly to build indexes on specific JSON elements of interest, which would improve the speed of your queries when searching for that data.
EDIT:
Since there won't be too many columns on which I need to perform
search, is it wise to use both the models? Key-per-column for the data
I need to search and JSON for others (in the same MySQL database)?
Mixing the two models isn't necessarily wrong (assuming the extra space is negligible), but it may cause problems if you don't make sure the two data sets are kept in sync: your application must never change one without also updating the other.
A good way to achieve this would be to have a trigger perform the automatic update, by running a stored procedure within the database server whenever an update or insert is made. As far as I'm aware, the MySQL stored procedure language probably lack support for any sort of JSON processing. Again PostgreSQL with PLV8 support (and possibly other RDBMS with more flexible stored procedure languages) should be more useful (updating your relational column automatically using a trigger is quite similar to updating an index in the same way).
short answer
you have to mix between them ,
use json for data that you are not going to make relations with them like contact data , address , products variabls
some time joins on the table will be an overhead. lets say for OLAP. if i have two tables one is ORDERS table and other one is ORDER_DETAILS. For getting all the order details we have to join two tables this will make the query slower when no of rows in the tables increase lets say in millions or so.. left/right join is too slower than inner join.
I Think if we add JSON string/Object in the respective ORDERS entry JOIN will be avoided. add report generation will be faster...
You are trying to fit a non-relational model into a relational database, I think you would be better served using a NoSQL database such as MongoDB. There is no predefined schema which fits in with your requirement of having no limitation to the number of fields (see the typical MongoDB collection example). Check out the MongoDB documentation to get an idea of how you'd query your documents, e.g.
db.mycollection.find(
{
name: 'sann'
}
)
As others have pointed out queries will be slower. I'd suggest to add at least an '_ID' column to query by that instead.

DB Structure - Storage of relationships

This is a complex problem, so I'm going to try to simplify it.
I have a mysql instance on my server hosting a number of schemas for different purposes. The schemas are structured generally (not perfectly) in a EAV fashion. I need to transition information into and out of that structure on a regular basis.
Example1: in order to present the information on a webpage, I get the information, stick it into a complex object, which I then pass via json to the webpage, where I convert the json into a complex javascript object, which I then present with knockoutjs and similar things.
Conclusion: This resulted in a lot of logic being put into multiple places so that I could associate the values on the page with the values in the database.
Example2: in order to allow users to import information from a pdf, I have a lot of information stored in pdf form fields. In this case, I didn't write the pdf though, so the form fields aren't named in such a way that all of this logic is easy enough to write 3 or more times for CRUD.
Conclusion: This resulted in my copying a list of the pdf form fields to a table in the database, so that I could then somehow associate them with where their data should be placed. The problem that arose is that the fields on the pdf would need to associate with schema.table.column and the only way I found to store that information was via a VARCHAR
Neither of the examples are referring to a small amount of data (something like 6 tables in example 1 and somewhere around 1400 pdf form fields in example 2). Given Example1 and the resulting logic being stored multiple places, it seemed logical to build Example2, where I could store the relationships between data in the database where they could be accessed and changed consistently and for all involved methods.
Now, it's quite possible I'm just being stupid and all of my googling hasn't come across that there's an easy way to associate this data with the correct schema.table.column If this is the case, then telling me the right way to do that is the simple answer here.
However, and this is where I get confused. I have always been told that you never want to store information about a database in the database, especially not as strings (varchar). This seems wrong on so many levels and I just can't figure out if I'm being stupid, and it's better to follow Example1 or if there's some trick here that I've missed about database structure.
Not sure where you got "... never ... store information about a database in the database". With an EAV model it is normal to store the metamodel (the entity types and their allowable attributes) in the database itself so that it is self-describing. If you had to change the metamodel, would you rather change code or a few rows in a table?
The main drawback to EAV databases is that you lose the ability to do simple joins. Join-type operations become much more complex. Like everything else in life, you make tradeoffs depending on your requirements. I have seen self-describing EAV architectures used very successfully.

Implementing a database structure for generic objects

I'm building a PHP/MySQL website and I'm currently working on my database design. I do have some database and MySQL experience, but I've never structured a database from scratch for a real world application which hopefully is going to get some good traffic, so I'd love to hear advices from people who've already done it, in order to avoid common mistakes. I hope my explanations are not too confusing.
What I need
In my application, the user should be able to write a post (title + text), then create an "object" (which can be anything, like a video, or a song, etc.) and attach it to the post. The site has a list of predefined object types the user can create, and I should be able to add new types in the future. The user should also have the ability to see the object's details in a dedicated page and add a comment to it - the same applies to posts.
What I tried
I created an objects table with these fields: oid, type, name and date. This table contains records for anything the user should be able to add comments to (i.e. posts and objects). Then I created a postmeta table which contains additional post data (such as text, author, last edit date, etc.), a videometa table for data about the "video" object (URL, description, etc.), and so on. A postobject table (pid,oid) links objects to posts. Additionally, there's a comments table which contains the comment text, the author and the ID of the object it refers to.
Since the list of object types is predefined and is probably not going to change (though I still need the ability to add a type easily at any time without changing the app's code structure or the database design), and it is relatively small, it's not a problem to create a "meta" table for each type and make a corresponding PHP class in my application to handle it.
Finally, a page on the site needs to show a list of all the posts including the objects attached to it, sorted by date. So I get all the records from the objects table with type "post" and join it with postmeta to get the post metadata. Then I query postobject to get all the objects attached to this post, and comments to get all the comments.
The questions
Does this make any sense? Is it any good to design a database in this way for a real world site? I need to join quite a few tables to get all the data I need, and the objects table is going to become huge since it contains almost every item (only the type, name and creation date, though) - this is to keep the database and the app code flexible, but does it work in the real world, or is it too expensive in the long term? Am I thinking about it in the wrong way with this kind of OOP approach?
More specifically: suppose I need to list all the posts, including their attached objects and metadata. I would need to join these tables, at least: posts, postmeta, postobject and {$objecttype}meta (not to mention an users table to get all posts by a specific user, for example). Would I get poor performance doing this, even if I'm using only numeric indexes?
Also, I considered using a NoSQL database (MongoDB) for this project (thanks to Stuart Ellis advice). Apparently it seems much more suitable since I need some flexibility here. But my doubt is: metadata for my objects includes a lot of references to other records in the database. So how would I avoid data duplication if I can't use JOIN? Should I use DBRef and the techniques described here? How do they compare to MySQL JOINs used in the structure described above in terms of performance?
I hope these questions do make any sense. This is my first project of this kind and I just want to avoid to make huge mistakes before I launch it and find out I need to rework the design completely.
I'm not a NoSQL person, but I wonder whether this particular case might actually be handled best with a document database (MongoDB or CouchDB). Various type of objects with metadata attached sounds like the kind of scenario that MongoDB is designed for.
FWIW, you've got a couple of issues with your table and field naming that might bite you later. For example, type and date are rather generic, and also reserved words. You've also mixed singular and plural table names, which will throw any automatic object mapping.
Whichever database you use, it's a good idea to find an existing set of database naming conventions and apply it from the start - this will help you avoid subtle issues and ensure that your naming stays consistent. I tend to use the Rails naming conventions ATM, because they are well-known and fairly sensible.
Or you could store the object contents as a file, outside of the database, if you're concerned about the database space.
If you store anything in the database, you already have the object type in objects; so you could just add object_contents table with a long binary field to store the object. You don't need to create a new table for each new type.
I've seen a lot of JOIN's in real world web application (5 to 10). Objects table may get large, but that's indices are for. So far, I don't see anything wrong in your database. BTW, what felt strange to me - one post, one object, and separate comments for each? No ability to mix pictures with text?