Database structure - is mySQL the right choice? - mysql

We are currently planning the database structure of a quite complex e-commerce web app that has flexibility as its main cornerstone.
Our app features a large amount of data (products) and we have run into a slight headache trying to keep performance high without compromizing normalization rules in the database, or leaving our highly beloved flexibility concept behind when integrating product options (also widely known as product attributes or parameters).
Based on various references and sources available, we have made up lists on pros and cons of all major and well known database patterns to solve this. After comparing these, we have come up with two final alternatives:
EAV (Entity-attribute-value model) :
Pros: Database is used for all sorting.
Cons: All related queries will include a number of joins between multiple tables in order to complete the collection of data.
SLOB (Serialized LOB, also known as Facade?) :
Pros: Very flexible. Keeping the number of necessary joins low compared to a EAV design pattern. Easy to update/add/remove data from each product but hard to keep data integrity without additional tables.
Cons: All sorting will be done by the application instead of the database. Will use lots of performance (memory?) when big datasets is processed by a large number of users.
Our main questions:
Which pattern/structure would you use, or maybe even a different solution?
Is there better databases besides mySQL available nowadays to accomplish what we want?
Thanks a lot!
Reference: How to design a product table for many kinds of product where each product has many parameters

Why limit yourself to one model? It's very possible that you'll be better off with two different models where each one meets a specific goal very well.
Assuming, as is often the case, that the two don't have to be absolutely and instantaneously in sync, you might easily end up with much better overall performance. What kind of hard requirements would you have on synchronization? Milliseconds up to a minute?
Udi Dahan has some good information on command query responsibility separation (CQRS) that's relevant. See also a couple of other articles. InfoQ also has very relevant video of Greg Young from QCon08.
EDIT: Here's another video (by Udi Dahan) that discusses, among other things, the benefits of multiple models.

MySQL performs very well even for very large datasets. I use it at a financial services SaaS company and it has always worked well. I have also use SQL Server and Oracle for very large applications and MySQL performs no better or worse on whole. My focus is more the business layer, though, and you may get more detailed opinions from people closer to the DB.
When selecting a pattern, keep in mind that it's much more straightforward to scale the application tier than the data tier (easy and cheap to add application servers). Performing many joins for common operations can cause a real performance bottleneck.
I would suggest you prototype both approaches so that you can both get more familiar with each of them, and benchmark their performance in your specific environment.
Additionally, you may want to look into alternatives to SQL that attempt to achieve a pattern similar to the ones you outline. A friend at a very large, well-known Internet company is starting to use Project Voldemort. He prefers it over similar efforts mostly due to the very active community.

from your solution, it seems you don't want to use a relational model, so perhaps it's better not to use a relational database, take a look at these alternatives: http://nosql-database.org/ btw SQLServer has nice SLOB features in the form of xml fields (can be indexed an queried through XQuery)

Related

EAV vs Document DB vs XML Column for a system with arbitrary entities?

Despite reading an awful lot of varying opinions and advice online and in SO I still cannot really decide the best solution to solve my current requirements.
In essence I need to make a system where objects can be arbitarily defined with any number of properties. The applicaiton tracks the where abouts and state of these objects and I cannot possibly know at compile time what the full gambit of these objects will be (besides this will be sold to many companies to track what they will).
The data itself WILL have some forms of relation to each other. The biggest of which will be the notion of a location hierarchy; think Country->Province->Town->Postcode->Building->Room->Locker->Object
There will be some other Parent->Child relations in the data too. For example an instance of a car has an instance of an engine, has an instance of a piston.
The history of objects and data will be important. What the state of the object has been at various times and places will be a heavily used feature of this system. Being able to retrieve the full history for reporting will be important too.
The options as I see them:
EAV - Entity Attribute Value (or hybrid of) in SQL
Pros:
It's relational and normalised
Querying is powerful
The relational and hierarchical parts of the data fit this paradigm
History achieved by storing dates against properties
Cons:
Query complexity
90% of the time every attribute for an object will be required giving
a serious number of joins
Most likely there will be pivots everywhere
Others?
Relational with XML catch all columns:
Pros:
Relational goodness, ORMs etc, etc (all of above)
How to store the history?
Cons:
The vast majority of the attributes will be in this XML column
(say>70%)
Slow queries?
Others?
Document DB
Pros:
Open Schema
History is as simple as retrieving older docs
Cons:
I've got a fair amount of relational data!
Query support (I'm not well enough versed to say what the pros and
cons are of each Document DB tech)
Others?
As you can tell the most of my experience has come in the form of relational DBs (SQL). Add to that I have already prototyped a similar solution with and EAV/Relational hybrid in SQL and found it an utter pain when things got even remotely complex.
I'm tech agnostic; I have front to back end experience in lots of techs and not adverse to learning anything new.
What are the thoughts on my situation; the long and short of it is each of the above is a valid way to solve the problem but I'm keen to hear what other people think and have experienced so I can try to avoid any costly blind sides.

Performance Analysis of CouchDB

I am developing a Discussion Forum for my University. For this to manipulate the data i m using CouchDB as database.
I m finding difficulty in designing the structure of my db, in order to maximize the performance of my db.
I want to discuss what is the good practice of designing a document database.
Either we should make only one database as SQL and make 'n' no. of documents in the database.
Or we can make more no of database in order to flatten my db structure.This also reduce the more no. of documents to be developed.
The questions you need to ask are simply this: "How do you want to get data out of your database?"
Database design hinges around the queries to be made, not what is available to be stored.
This is especially important for Document DBs like Couch, since, while it does have a flexible schema, it does not have flexible indexing. By that I mean that because of the granularity of the data, it's quite like that later on, when you need to ask a question that it was not designed to answer, answering that question may well be very expensive. It's much, much cheaper to design your views and other constructs early, when there is little data in the data base rather than later after you have thousands or millions of rows.
RDBMS's, since they tend to have a finer granularity of data, tend to be more nimble to new queries and such later in life. Document DBs, not so much.
So think through your use cases up front, and design around those, and design those early on, it's much less painless now than later.
It's hard to tell the right way to approach modeling your data since you don't give much information. Generally though you want to keep as much data as possible in one database as this allows you to index it together (indexes cannot span more than one database).
Also, since there is no schema enforcement in the database, you can create different types of records in each database. For example, there is nothing wrong with have both user information and forum entries in the same database.
Last, you will most likely want to keep messages and their replies in different records. This is an old but still relevant discussion on this topic: http://www.cmlenz.net/archives/2007/10/couchdb-joins
Cheers.

Is mongo appropriate to use alongside MySQL?

I can't discuss things in great detail due to an NDA, but I'm hoping an overview of the system being built can help you in aiding me in making a decision concerning our databases.
I'm building an app that will help vendors compete to gain clientele by making strategic offers based on records of inventory/purchase from the storefronts.
One side of the app is for the store owners to see presented offers, network, etc. I've got that going with a standard php/MySQL setup.
My question is concerning the records of inventory. We are talking millions of records here nearly immediately. The sample data I'm using is roll up of four of their managers (they have dozens) over the course of a year or two and it had over 500k rows with about 30 or more columns. When we get scores of stores with all of their managers it will be massive, at least compared to anything I've worked with as of yet.
The vendors will have a side of the product in which they can search through these records and make competitive offers based off of it.
Is the sheer size a good reason to use something like mongo? Or is it more a matter of how the data is laid out / what it consists of? Or some other element that I'm not considering?
And, if not mongo/nosql, then is there some other methodology or technology that such large data stores would benefit from me using (sharding, amazon cloud database, etc).
Thanks
Answers ...
Q: Is the sheer size a good reason to use something like mongo?
A: I think so. Mongo was built from the ground up to scale in a massive way. You have replica sets and sharding that can help you scale. They also have features to make sure your data gets stored in the appropriately geographically distributed data centers.
Q: Or is it more a matter of how the data is laid out / what it consists of?
A: Mongo is a document database and you're right, the data models will be different. You have to think of data in a denormalized way instead of normalized. Just like any technology, there are pros and cons to storing things as documents.
Some pros: Schema management is a breeze. Data more naturally fits objects in your application. Don't have to pay the price of complicated/slow joins.
Some cons: Schemas can be inconsistent - you have to manage it. Data is repeated, which is not managed means it can become inconsistent.
In general I think Mongo would be a good choice to deal with that scale. Mongo has a new aggregation framework that brings a lot of SQL concepts to queries on documents. Easier to make complex queries. Also Mongo has map/reduce to run any kind of query you might have.
After using Mongo daily for about a year, I've really enjoyed the support around it as a product and the general ease of setting it up and working with it.

MongoDB, MySQL or something third for my project?

Ok guys.
I've begun developing a little sparetime project that might become big someday. Before I really get started, I want to be certain that I'm starting with the right setup. So I come to you.
I'm making a service, which will work mostly as a todolist/project planner.
In this system there will be an amount of users and an amount of tasks. Each task can be assigned to multiple users, and each user can have multiple tasks (many to many relation).
Until now I was planning to use MySQL, but a friend of mine, who is part of the project, sugested MongoDB instead. He tells me that it would increase performance and be more scaleable.
On the other hand I'm thinking that in order to either get all tasks assigned to a specific user, or all users assigned to a specifik task, one would need to use joins, which MongoDB doesnt have (or have in a cumbersome way as far as I have understood).
Now my question to you is "Which DB system would you suggest. MySQL or MongoDB or a third option? And why?"
Thank you for your time and your assistance.
Morten
We use MySQL at IGN to store person relationships (many-to-many like your use case), and have about 5M records in the relationship table. We have 4 MySQL servers in a cluster and the reads are distributed across 3 MySQL slaves. BTW you can always denormalize to optimize reads and penalizing writes among other things based on the read/write heavyness of your system.
We use the DAO pattern with Spring, so its fairly easy for us to swap DB providers through configuration (and by writing a Mongo/MySQL DAO Implementation as applicable). We have moved activities (like in Social Media) to Mongo almost a year ago but the person relationships are living happily in MySQL.
The comment to your post by Jonas says it all,
If need be, you can always scale later.
This.
I am very much of the mindset that If you don't have scaling problems, don't worry too much (if at all) about scaling problems. Why not use what is easiest, smartest and cleanest to deliver the features clients pay for (in my case at least!) This approach saves a lot of time and energy and is the proper one for 9 projects out of 10.
Learning a technology because it scales is great. Being tied to an unlearned technology and unknown technology because it scales in an upcoming project, is not as great. There are many other factors than scalability, when using 3rd party stuff.
MySQL would seem to be a good choice MySQL being more mature and having loads of client libraries, ORM's and other timesaving technologies. MySQL can handle millions (billions if you have the ram) of rows. I have yet to encounter a project it could not handle, and I have seen some pretty impressive datasets!
Of course, when you will need performance, sure maybe you will find yourself ripping out orm and sql generating code to replace with your own hand tweaked queries, but that day is way down the line and chances are, that day will never even come.
Mongodb, although it is real cool I am sorry to say may well bring you issues having nothing to do with scaling.
My 2 cents, happy coding!
MySQL
Either would likely work for your purposes, but your database seems relatively rigid in its structure, something which SQL deals well with. As such, I would recommend MySQL. A many-to-many relationship is relatively easy to implement and access, as well.
You may take a tiny bit of a performance hit, but in my experience, this is generally not extremely noticeable with smaller scale applications (i.e. databases with less than millions of entries). I do agree with #Jonas Elfström's comment, however: you should have an abstraction layer between your application and the database, so that should scaling become an issue, you can address it without too many problems.
Stick with a relational database, it can handle many to many relationships and is fully featured for backup and recovery, high availability and importantly you will find that every developer you need is familiar with it. There are plenty of documented methods for scaling a relational database.
Pick an open source databases either MySQL or Postgres dependant upon which your team is most familiar with and how it integrates into the rest of your infrastructure stack.
Make sure you design your data model correctly most importantly the relationships between the entities.
Good luck!

do document-oriented databases have integrity?

I'm coming from a MySQL background, and I'm interested in document-oriented databases, specifically CouchDB. One of the things I'm interested in is data integrity. How do document-oriented databases handle this? For instance, in RDBMSes, there are ways to prevent duplication of records, or guaranteeing that if you have one bit of information, you will have another, or else none at all.
I guess more broadly, my question is, what types of problems are RDBMSes cut out for, compared to problems that DODBes are used for? I looked on some of the other stackoverflow questions for an explanation, but didn't find any good ones.
Also, with my databases at work, I do a lot of reporting, with summing and averaging values, and historical trending. Is this something appropriate for document-oriented databases?
Most of the document-databases have only support very limited integrity or no integrity checks. They rely on the application to ensure that the data is correct. I can tell you how it is in CouchDB.
To the second part. I think RDBMS do very well at reporting and analyzing data. The fact that you can run complex queries on the data with joins, aggregations, functions etc make RDBMS a very powerful reporting-tool.
Document-databases do really well for storing the 'live' application-data. It very easy to store an retrieve object-graph into document-databases. The schema-free design makes it easy to extends the model for new application features. However this only works if you can split your application-data into nice documents. Otherwise you loose a lot of the elegance.
If you want to do mostly reporting, I would prefer a RDBMS. When to store lots of flat, simple records it very easy to do reporting on it. The tooling etc. is perfect for reporting. However when want to do reporting on complex structured data, you probably still better of with another database desgine than a RDBMS.
However this doesn't mean you need to limit yourself to RDBMS. You could combine the two technologies. Imagine a blog-software. You store the 'live' application data like blog-posts and comments into the documentdatabase. Data for reporting like click- and login-statistics is stored in a RDBMS. See also Rob Conerys post.