MySQL Relational Database with Large Data Sets Unique to Each User - mysql

I am working on a project which involves building a social network-style application allowing users to share inventory/product information within their network (for sourcing).
I am a decent programmer, but I am admittedly not an expert with databases; even more so when it comes to database design. Currently, user/company information is stored via a relational database scheme in MySQL which is working perfectly.
My problem is that while my relational scheme works brilliantly for user/company information, it is confusing me on how to implement inventory information. The issue is that each "inventory list" will definitely contain differing attributes specific to the product type, but identical to the attributes of each other product in the list. My first thought was to create a table for each "inventory list". However, I feel like this would be very messy and would complicate future attempts at KDD. I also (briefly) considered using a 'master inventory' and storing the information (e.g. the variable categories and data as a JSON string. But I figured JSON strings MySQL would just become a larger pain in the ass.
My question is essentially how would someone else solve this problem? Or, more generally, sticking with principles of relational database management, what is the "correct" way to associate unique, large data sets of similar type with a parent user? The thing is, I know I could easily jerry-build something that would work, but I am genuinely interested in what the consensus is on how to solve this problem.
Thanks!

I would check out this post: Entity Attribute Value Database vs. strict Relational Model Ecommerce
The way I've always seen this done is to make a base table for inventory that stores universally common fields. A product id, a product name, etc.
Then you have another table that has dynamic attributes. A very popular example of this is Wordpress. If you look at their data model, they use this idea heavily.
One of the good things about this approach is that it's flexible. One of the major negatives is that it's slow and can produce complex code.
I'll throw out an alternative of using a document database. In that case, each document can have a different schema/structure and you can still run queries against them.

Related

Best practices for transforming a relational database to a non relational?

I have a MySQL Database and I need to create a Mongo Database (I don't care about keeping any data).
So are there any good practices for designing the structure (mongoose.Schema) based on the relational tables of MySQL ?
For example, the SQL has a table users and a table courses with relation 1:n, should I also create two collections in MongoDB or would it be better to create a new field courses: [] inside user document and create only the user collection ?
The schema definition should be driven by the use cases of the application.
Under which conditions is data accessed and modified. Which is the leading entity.
e.g. When a user is loaded do you always also want to know the courses of the user? This would be an argument for embedding.
Can you update courses without knowing all of its users, e.g. update the name of a course? Do you want to list an overview of all courses? This would be an argument for extracting into an own collection.
So there is no general guideline for such migration as only from the schema definition, the use cases cannot be derived.
If you don't care about data, the best approach is to redesign it from scratch.
NoSQLs differ from RDBMS in many ways so direct mapping will hardly be efficient and in many cases not possible at all.
First thing you need to answer to yourself (and probably to mention in the question) is why you need to change database in the first place. There are different kind of problems that Mongo can solve better than SQL and they require different data models. None of them come for free so you will need to understand the tradeoffs.
You can start from the very simple rule: in SQL you model your data after your business objects and describe relations between them, in Mongo you model data after queries that you need to respond to. As soon as you grasp the idea it will let you ask answerable questions.
It may worth reading https://www.mongodb.com/blog/post/building-with-patterns-a-summary as a starting point.
An old yet still quite useful https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1 Just keep in mind it was written long time ago when mongo did not have many of v4+ features. Nevertheless it describes philosophy of mongo data modelling with simple examples.It didn't change much since then.

DB Structure - Storage of relationships

This is a complex problem, so I'm going to try to simplify it.
I have a mysql instance on my server hosting a number of schemas for different purposes. The schemas are structured generally (not perfectly) in a EAV fashion. I need to transition information into and out of that structure on a regular basis.
Example1: in order to present the information on a webpage, I get the information, stick it into a complex object, which I then pass via json to the webpage, where I convert the json into a complex javascript object, which I then present with knockoutjs and similar things.
Conclusion: This resulted in a lot of logic being put into multiple places so that I could associate the values on the page with the values in the database.
Example2: in order to allow users to import information from a pdf, I have a lot of information stored in pdf form fields. In this case, I didn't write the pdf though, so the form fields aren't named in such a way that all of this logic is easy enough to write 3 or more times for CRUD.
Conclusion: This resulted in my copying a list of the pdf form fields to a table in the database, so that I could then somehow associate them with where their data should be placed. The problem that arose is that the fields on the pdf would need to associate with schema.table.column and the only way I found to store that information was via a VARCHAR
Neither of the examples are referring to a small amount of data (something like 6 tables in example 1 and somewhere around 1400 pdf form fields in example 2). Given Example1 and the resulting logic being stored multiple places, it seemed logical to build Example2, where I could store the relationships between data in the database where they could be accessed and changed consistently and for all involved methods.
Now, it's quite possible I'm just being stupid and all of my googling hasn't come across that there's an easy way to associate this data with the correct schema.table.column If this is the case, then telling me the right way to do that is the simple answer here.
However, and this is where I get confused. I have always been told that you never want to store information about a database in the database, especially not as strings (varchar). This seems wrong on so many levels and I just can't figure out if I'm being stupid, and it's better to follow Example1 or if there's some trick here that I've missed about database structure.
Not sure where you got "... never ... store information about a database in the database". With an EAV model it is normal to store the metamodel (the entity types and their allowable attributes) in the database itself so that it is self-describing. If you had to change the metamodel, would you rather change code or a few rows in a table?
The main drawback to EAV databases is that you lose the ability to do simple joins. Join-type operations become much more complex. Like everything else in life, you make tradeoffs depending on your requirements. I have seen self-describing EAV architectures used very successfully.

What is the difference between a Relational and Non-Relational Database?

MySQL, PostgreSQL and MS SQL Server are relational database systems, and NoSQL, MongoDB, etc. are non-relational DBMSs.
What are the differences between the two types of system?
Hmm, not quite sure what your question is.
In the title you ask about Databases (DB), whereas in the body of your text you ask about Database Management Systems (DBMS). The two are completely different and require different answers.
A DBMS is a tool that allows you to access a DB.
Other than the data itself, a DB is the concept of how that data is structured.
So just like you can program with Oriented Object methodology with a non-OO powered compiler, or vice-versa, so can you set-up a relational database without an RDBMS or use an RDBMS to store non-relational data.
I'll focus on what Relational Database (RDB) means and leave the discussion about what systems do to others.
A relational database (the concept) is a data structure that allows you to link information from different 'tables', or different types of data buckets. A data bucket must contain what is called a key or index (that allows to uniquely identify any atomic chunk of data within the bucket). Other data buckets may refer to that key so as to create a link between their data atoms and the atom pointed to by the key.
A non-relational database just stores data without explicit and structured mechanisms to link data from different buckets to one another.
As to implementing such a scheme, if you have a paper file with an index and in a different paper file you refer to the index to get at the relevant information, then you have implemented a relational database, albeit quite a simple one. So you see that you do not even need a computer (of course it can become tedious very quickly without one to help), similarly you do not need an RDBMS, though arguably an RDBMS is the right tool for the job. That said there are variations as to what the different tools out there can do so choosing the right tool for the job may not be all that straightforward.
I hope this is layman terms enough and is helpful to your understanding.
Relational databases have a mathematical basis (set theory, relational theory), which are distilled into SQL == Structured Query Language.
NoSQL's many forms (e.g. document-based, graph-based, object-based, key-value store, etc.) may or may not be based on a single underpinning mathematical theory. As S. Lott has correctly pointed out, hierarchical data stores do indeed have a mathematical basis. The same might be said for graph databases.
I'm not aware of a universal query language for NoSQL databases.
Most of what you "know" is wrong.
First of all, as a few of the relational gurus routinely (and sometimes stridently) point out, SQL doesn't really fit nearly as closely with relational theory as many people think. Second, most of the differences in "NoSQL" stuff has relatively little to do with whether it's relational or not. Finally, it's pretty difficult to say how "NoSQL" differs from SQL because both represent a pretty wide range of possibilities.
The one major difference that you can count on is that almost anything that supports SQL supports things like triggers in the database itself -- i.e. you can design rules into the database proper that are intended to ensure that the data is always internally consistent. For example, you can set things up so your database asserts that a person must have an address. If you do so, anytime you add a person, it will basically force you to associate that person with some address. You might add a new address or you might associate them with some existing address, but one way or another, the person must have an address. Likewise, if you delete an address, it'll force you to either remove all the people currently at that address, or associate each with some other address. You can do the same for other relationships, such as saying every person must have a mother, every office must have a phone number, etc.
Note that these sorts of things are also guaranteed to happen atomically, so if somebody else looks at the database as you're adding the person, they'll either not see the person at all, or else they'll see the person with the address (or the mother, etc.)
Most of the NoSQL databases do not attempt to provide this kind of enforcement in the database proper. It's up to you, in the code that uses the database, to enforce any relationships necessary for your data. In most cases, it's also possible to see data that's only partially correct, so even if you have a family tree where every person is supposed to be associated with parents, there can be times that whatever constraints you've imposed won't really be enforced. Some will let you do that at will. Others guarantee that it only happens temporarily, though exactly how long it can/will last can be open to question.
The relational database uses a formal system of predicates to address data. The underlying physical implementation is of no substance and can vary to optimize for certain operations, but it must always assume the relational model. In layman's terms, that's just saying I know exactly how many values (attributes) each row (tuple) in my table (relation) has and now I want to exploit the fact accordingly, thoroughly and to it's extreme. That's the true nature of the beast.
Since we're obviously the generation that has had a relational upbringing, if you look at NoSQL database models from the perspective of the relational model, again in layman's terms, the first obvious difference is that no assumptions about the number of values a row can contain is ever made. This is really oversimplifying the matter and does not cleanly apply to the intricacies of the physical models of every NoSQL database, but it's the pinnacle of the relational model and the first assumption we have to leave behind or, if you'd rather, the biggest leap we have to make.
We can agree to two things that are true for every DBMS: it can store any kind of data and has enough mathematical underpinnings to make it possible to manage the data in any way imaginable. The reality is that you'll never want to make the mistake of putting any of the two points to the test, but rather just stick with what the actual DBMS was really made for. In layman's terms: respect the beast within!
(Please note that I've avoided comparing the (obviously) well founded standards revolving around the relational model against the many flavors provided by NoSQL databases. If you'd like, consider NoSQL databases as an umbrella term for any DBMS that does not completely assume the relational model, in exclusion to everything else. The differences are too many, but that's the principal difference and the one I think would be of most use to you to understand the two.)
Try to explain this question in a level referring to a little bit technology
Take MongoDB and Traditional SQL for comparison, imagine the scenario of posting a Tweet on Twitter. This tweet contains 9 pictures. How do you store this tweet and its corresponding pictures?
In terms of traditional relationship SQL, you can store the tweets and pictures in separate tables, and represent the connection through building a new table.
What's more, you can set a field which is an image type, and zip the 9 pictures into a binary document and store it in this field.
Using MongoDB, you could build a document like this (similar to the concept of a table in relational SQL):
{
"id":"XXX",
"user":"XXX",
"date":"xxxx-xx-xx",
"content":{
"text":"XXXX",
"picture":["p1.png","p2.png","p3.png"]
}
Therefore, in my opinion, the main difference is about how do you store the data and the storage level of the relationships between them.
In this example, the data is the tweet and the pictures. The different mechanism about storage level of relationship between them also play a important role in the difference between both.
I hope this small example helps show the difference between SQL and NoSQL (ACID and BASE).
Here's a link of picture about the goals of NoSQL from the Internet:
http://icamchuwordpress-wordpress.stor.sinaapp.com/uploads/2015/01/dbc795f6f262e9d01fa0ab9b323b2dd1_b.png
The difference between relational and non-relational is exactly that. The relational database architecture provides with constraints objects such as primary keys, foreign keys, etc that allows one to tie two or more tables in a relation. This is good so that we normalize our tables which is to say split information about what the database represents into many different tables, once can keep the integrity of the data.
For example, say you have a series of table that houses information about an employee. You could not delete a record from a table without deleting all the records that pertain to such record from the other tables. In this way you implement data integrity. The non-relational database doesn't provide this constraints constructs that will allow you to implement data integrity.
Unless you don't implement this constraint in the front end application that is utilized to populate the databases' tables, you are implementing a mess that can be compared with the wild west.
First up let me start by saying why we need a database.
We need a database to help organise information in such a manner that we can retrieve that data stored in a efficient manner.
Examples of relational database management systems(SQL):
1)Oracle Database
2)SQLite
3)PostgreSQL
4)MySQL
5)Microsoft SQL Server
6)IBM DB2
Examples of non relational database management systems(NoSQL)
1)MongoDB
2)Cassandra
3)Redis
4)Couchbase
5)HBase
6)DocumentDB
7)Neo4j
Relational databases have normalized data, as in information is stored in tables in forms of rows and columns, and normally when data is in normalized form, it helps to reduce data redundancy, and the data in tables are normally related to each other, so when we want to retrieve the data, we can query the data by using join statements and retrieve data as per our need.This is suited when we want to have more writes, less reads, and not much data involved, also its really easy relatively to update data in tables than in non relational databases. Horizontal scaling not possible, vertical scaling possible to some extent.CAP(Consistency, Availability, Partition Tolerant), and ACID (Atomicity, Consistency, Isolation, Duration)compliance.
Let me show entering data to a relational database using PostgreSQL as an example.
First create a product table as follows:
CREATE TABLE products (
product_no integer,
name text,
price numeric
);
then insert the data
INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', 9.99);
Let's look at another different example:
Here in a relational database, we can link the student table and subject table using relationships, via foreign key, subject ID, but in a non relational database no need to have two documents, as no relationships, so we store all the subject details and student details in one document say student document, then data is getting duplicated, which makes updating records troublesome.
In non relational databases, there is no fixed schema, data is not normalized. no relationships between data is created, all data mostly put in one document. Well suited when handling lots of data, and can transfer lots of data at once, best where high amounts of reads and less writes, and less updates, bit difficult to query data, as no fixed schema. Horizontal and vertical scaling is possible.CAP (Consistency, Availability, Partition Tolerant)and BASE (Basically Available, soft state, Eventually consistent)compliance.
Let me show an example to enter data to a non relational database using Mongodb
db.users.insertOne({name: ‘Mary’, age: 28 , occupation: ‘writer’ })
db.users.insertOne({name: ‘Ben’ , age: 21})
Hence you can understand that to the database called db, and there is a collections called users, and document called insertOne to which we add data, and there is no fixed schema as our first record has 3 attributes, and second attribute has 2 attributes only, this is no problem in non relational databases, but this cannot be done in relational databases, as relational databases have a fixed schema.
Let's look at another different example
({Studname: ‘Ash’, Subname: ‘Mathematics’, LecturerName: ‘Mr. Oak’})
Hence we can see in non relational database we can enter both student details and subject details into one document, as no relationships defined in non relational databases, but here this way can lead to data duplication, and hence errors in updating can occur therefore.
Hope this explains everything
In layman terms it's strongly structured vs unstructured, which implies that you have different degrees of adaptability for your DB.
Differences arise in indexation particularly as you need to ensure that a certain reference index can link to a another item -> this a relation. The more strict structure of relational DB comes from this requirement.
To note that NosDB apaprently provides both relational and non relational DBs and a way to query both http://www.alachisoft.com/nosdb/sql-cheat-sheet.html

Implementing a database structure for generic objects

I'm building a PHP/MySQL website and I'm currently working on my database design. I do have some database and MySQL experience, but I've never structured a database from scratch for a real world application which hopefully is going to get some good traffic, so I'd love to hear advices from people who've already done it, in order to avoid common mistakes. I hope my explanations are not too confusing.
What I need
In my application, the user should be able to write a post (title + text), then create an "object" (which can be anything, like a video, or a song, etc.) and attach it to the post. The site has a list of predefined object types the user can create, and I should be able to add new types in the future. The user should also have the ability to see the object's details in a dedicated page and add a comment to it - the same applies to posts.
What I tried
I created an objects table with these fields: oid, type, name and date. This table contains records for anything the user should be able to add comments to (i.e. posts and objects). Then I created a postmeta table which contains additional post data (such as text, author, last edit date, etc.), a videometa table for data about the "video" object (URL, description, etc.), and so on. A postobject table (pid,oid) links objects to posts. Additionally, there's a comments table which contains the comment text, the author and the ID of the object it refers to.
Since the list of object types is predefined and is probably not going to change (though I still need the ability to add a type easily at any time without changing the app's code structure or the database design), and it is relatively small, it's not a problem to create a "meta" table for each type and make a corresponding PHP class in my application to handle it.
Finally, a page on the site needs to show a list of all the posts including the objects attached to it, sorted by date. So I get all the records from the objects table with type "post" and join it with postmeta to get the post metadata. Then I query postobject to get all the objects attached to this post, and comments to get all the comments.
The questions
Does this make any sense? Is it any good to design a database in this way for a real world site? I need to join quite a few tables to get all the data I need, and the objects table is going to become huge since it contains almost every item (only the type, name and creation date, though) - this is to keep the database and the app code flexible, but does it work in the real world, or is it too expensive in the long term? Am I thinking about it in the wrong way with this kind of OOP approach?
More specifically: suppose I need to list all the posts, including their attached objects and metadata. I would need to join these tables, at least: posts, postmeta, postobject and {$objecttype}meta (not to mention an users table to get all posts by a specific user, for example). Would I get poor performance doing this, even if I'm using only numeric indexes?
Also, I considered using a NoSQL database (MongoDB) for this project (thanks to Stuart Ellis advice). Apparently it seems much more suitable since I need some flexibility here. But my doubt is: metadata for my objects includes a lot of references to other records in the database. So how would I avoid data duplication if I can't use JOIN? Should I use DBRef and the techniques described here? How do they compare to MySQL JOINs used in the structure described above in terms of performance?
I hope these questions do make any sense. This is my first project of this kind and I just want to avoid to make huge mistakes before I launch it and find out I need to rework the design completely.
I'm not a NoSQL person, but I wonder whether this particular case might actually be handled best with a document database (MongoDB or CouchDB). Various type of objects with metadata attached sounds like the kind of scenario that MongoDB is designed for.
FWIW, you've got a couple of issues with your table and field naming that might bite you later. For example, type and date are rather generic, and also reserved words. You've also mixed singular and plural table names, which will throw any automatic object mapping.
Whichever database you use, it's a good idea to find an existing set of database naming conventions and apply it from the start - this will help you avoid subtle issues and ensure that your naming stays consistent. I tend to use the Rails naming conventions ATM, because they are well-known and fairly sensible.
Or you could store the object contents as a file, outside of the database, if you're concerned about the database space.
If you store anything in the database, you already have the object type in objects; so you could just add object_contents table with a long binary field to store the object. You don't need to create a new table for each new type.
I've seen a lot of JOIN's in real world web application (5 to 10). Objects table may get large, but that's indices are for. So far, I don't see anything wrong in your database. BTW, what felt strange to me - one post, one object, and separate comments for each? No ability to mix pictures with text?

Database schema - organise by object or data?

I'm refactoring a horribly interwoven db schema, it's not that it's overly normalised; just grown ugly over time and not terribly well laid out.
There are several tables (forum boards, forum posts, idea posts, blog entries) that share virtually identical data structures and composition, but are seperated simply because they represent different "objects" from the applications perspective. My initial reaction is to put everything that has the same data structure into the same table, and use a "type" column to distinguish data when performing a select.
Am I setting myself up for a fall by adopting this "all into one" approach and allowing (potentially) so many parts of the application to access the same table? FYI, I can't see this database growing to more than ~20mb over the next year or so...
There's basically three ways to store an object inheritance hierarchy in a relational database. Each has their own pros and cons. See:
http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html
http://www.martinfowler.com/eaaCatalog/classTableInheritance.html
http://www.martinfowler.com/eaaCatalog/concreteTableInheritance.html
The book is great too. Luck would have it that chapter 3 - "Mapping to Relational Databases" - is available freely as a sample chapter. You can read more about the tradeoffs in there.
I used to dislike this "all into one" approach, but after I was forced to use it on a complex project a few years ago, I became a fan. If you index the table correctly, performance should be OK. You'll want an index on the type column to speed up your sort by type operations, for instance.
I now usually recommend that you use a single table to store similar objects. The only question then, is, do you want to use subtables to store data that's specific to a certain type of object? The answer to this question really depends on how different the structure of each object type is, and how many object types you'll have. If you have 50 object types with vastly differing structures, you may want to consider storing just the consistent object parts in the main table and creating a sub table for each object type.
In your example, however, I think you'd be fine just putting it all into a single table.
For more info, see here: http://www.agiledata.org/essays/mappingObjects.html
Don't lean too much on the "applications perspective", it tends to vary over time anway. Often databases are accessed by different applications too, and it usually outlives them all ...
When simliar objects are stored in different tables the reason may be that they actually represent the same domain object, but in a different state, or in a different step in a workflow. Then it often makes sense to keep them in one table and add some simple attributes to flag the state. If the workflow, or whatever it is changes, it's easier to change the database and application too, you may not need to add more tables or classes.