Database design - empty fields [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am currently debating an issue with my dev team. They believes that empty fields are bad news. For instance, if we have a customer details table that stores data for customers from different countries, and each country has a slightly different address configuration - plus 1-2 extra fields, e.g. French customer details may also store details for entry code, and floor/level plus title fields (madamme, etc.). South Africa would have a security number. And so on.
Given that we're talking about minor variances my idea is to put all of the fields into the table and use what is needed on each form.
My colleague believes we should have a separate table with extra data. E.g. customer_info_fr. But this seams to totally defeat the purpose of a combined table in the first place.
The argument is that empty fields / columns is bad - but I'm struggling to find justification in terms of database design principles for or against this argument and preferred solutions.
Another option is a separate mini EAV table that stores extra data with parent_id, key, val fields. Or to serialise extra data into an extra_data column in the main customer_data table.
I think I am confused because what I'm discussing is not covered by 3NF which is what I would typically use as a reference for how to structure data.
So my question specifically: -
If you have slight variances in data for each record (1-2 different fields for instance) what is the best way to proceed?

There is definitely a school of thought which holds that NULL fields are bad, in and of themselves. Relational theory demands that databases consist of facts, and NULLs are the absence of fact. So, a rigorously designed database would have no nullable columns.
Your colleague is proposing something which is on the road to 6th Normal Form, where all the tables consist of a primary key and at most one other column. Only in such a schema we wouldn't have tables called customer_info_fr. That's not normalised. Many countries might include ENTRY_CODE in their addresses. So we would need address_entry_codes and address_floor_numbers. Not to mention address_building_number and address_building_name, as some places are identified by number and other by name.
It's completely accurate and truthful as a logical design. Alas from a physical perspective it is Teh Suck! The simplest query - select * from addresses - becomes a multi-table join, and outer joins at that. Nullable columns are a way of reconciling ugly design with the hard truth, "you cannae break the laws of physics". Nullable columns allow us to combine disjoint data sets into a single table, albeit at the cost of handling nulls (they can affect data retrieval, index usage, maths, etc).
Some designs attempt to get around the use of nulls by applying magic values. That is, if we don't know the correct value for some column we inject a default value which is a value but also means "unknown". A common instance of this is date '9999-12-31' as an open-ended TO_DATE in a FROM-TO date range. As long as everybody understands and adheres to the convention it's not a problem. It becomes a problem when some tables have date '9999-12-01' or date '9999-01-31' instead.
This is why magic values are not a robust solution. Consumers of our data need to know that -1 is the value we use for DofQ in our stock control system when we don't know the real value. But at least it's obviously not a valid value. Choosing say 20 as a magic value is deadly because it could be a real DofQ: we can no longer tell the actual values from the "don't knows".
So, given a choice between nulls and magic values, choose nulls.

I'd be interested in your colleague's justification as to why empty fields are bad. As far as I'm aware, empty or null fields aren't bad in and of themselves. If you have a lot of empty data values for a column that you are planning on putting an important index on, you may want to consider other options. This goes for any column where you have a lot of duplicate records actually and need an index, as duplicated records lower the cardinality of the column, making indexes less useful. In your case, I don't see it being an issue.
For this kind of data, you're likely using a VARCHAR or some kind of TEXT column anyway, which are variable length fields in the database. It doesn't matter if your field is full of data or empty, you're still going to incur the overhead of a variable-length column (which isn't worth worrying about in normal circumstances). So again, there's no difference to the RDBMS.
From the sounds of what you're designing, I think if you came up with a generic method of handling address variances in a single table, it would be the way to go. Your code and structure would be much simpler at the negligible (in my opinion) cost of some empty data fields.

That's what nullable fields are for: "Data not available/applicable".
SQL has a different notion of null than most programming languages, so SQL's null is often a misunderstood concept.

Whatever you do, do not go down the EAV route. This is a prescription for a poorly performing database, far, far worse than a few empty fields.
If you must have a separate related tables for the different situations, a lot of that will depend on how different the entities are and how they will be queried. If you will be querying across categories, you will find that joins to a bunch of tables to get all the data you may or may not need is a nightmare (I don't know if Germany will be in my result set so I join to the Germany details tables, oops didn't need to). It can be far simpler to handle nulls than to try it figure out which of many tables you need to join to (and to always remember to left join to those tables).
However, if you will never be querying across the entitites and the fields make sense separate, then put them in a separate table.

Nulls invariably add complexity to a data model because the behaviour of null in SQL rarely matches the maths, logic or reality that you intended to model with it. In other words, some queries return incorrect results, which you then need to compensate for with additional logic.
All information can be represented accurately without nulls. Since nulls add complexity it is sound design practice to begin your data model without them and then only add a null where you find some special reason to do so or where some database feature or limitation forces a null upon you.

I wouldn't overthink it. NULL can be used, but developers need to be careful using them.
I would prefer to have the Address be a long Text field in the database for any website that deals with multiple countries.
Most websites have Address Line1, Address Line 2, Postal/ZIP Code, City, State/Region, Country ... anything more than that (like EAV) would be overkill.
I wouldn't mind having the user interface show different labels near the text boxes for each country.
Entry code, floor/level, title fields, security number, and so on should fit in the address lines, the label near it, or a tip in the UI can indicate it.

Related

What is more efficient, a table with 100 columns and less rows or 5 columns and 30 times more rows

Edit 1:
Because few good ppl have pointed out that my question isnt very clear, I thought I will rewrite it and make it more clear now.
So basically, I am making an app, which allows users to create his own form with his own set of input fields, with data like name, type etc. After creating his form and he publishes the form, whenever there is an entry in the form, the data gets saved into the db ofcourse. Because the form itself is dynamic. I need a way to save this data.
My first choice was JSONizing it and saving. But because I cannot do any SQL queries on them, if I save in JSON format, i am eliminating this option.
Then the simple method is storing in a table like (id, rowid, columnname, value) and i keep the rowid same for all row data. But in this way, if a form contains 30 fields, after 100 entries my db would have 3000 rows. so in the long run, it would go huge and I think queries will get slow when there are millions of rows in the table.
Then I got this idea of a table like (id, rowid, column1, column2...column100). And i will save all the inputs in the form into single row. In this way it does add only 1 row per submit and its easier to query too. I will store the actual column names and map them to the right column(number) from there. This is my idea. column100 because 100 is the maximum inputs the user can add in his form.
So my question is, whether my idea is good, or should I stick to the classic table.
If I've understood your question, you need to have to design a database structure to store data whose schema you don't know in advance.
This is hard - there's no "efficient" solution in relational databases that I'm aware of.
Option 1 would be to look at a non-relational (NoSQL) solution instead.
I won't elaborate the benefits and drawbacks, as they are highly dependent on which NoSQL option you choose.
It's worth noting that many relational engines (including MySQL) allow you to store and query structured data formats like JSON. I've not used this feature in MySQL myself, but similar functionality in SQL Server performs very well.
Within relational databases, the common solution is an "Entity/Attribute/Value" (EAV)schema. This is sorta like your option 2.
EAV designs can theoretically store an unlimited number of columns, and an unlimited number of rows - but common queries quickly become impossible. In your sample data, finding all records where the name begins with a K and the power is at least 22 turns into a very complex SQL query. It also means the application needs to enforce rules of uniqueness, mandatory/optional data attributes, and data transformation from one format to another.
From a performance point of view, this doesn't really scale to complex queries. This is because every clause in your "where" needs a self join, and indexes won't have a big impact on searches for non-text strings (searching for numerical "greater than 20" is not the same as searching for a text "greater than 20".).
Option 3 is, indeed, to make the schema logic fit into a limited number of columns (your option 1).
It means you have a limitation on the number of columns, and you still have to manage mandatory/optional, uniqueness etc. in the application. However, querying the data should be easier - finding accounts where the name starts with K and the power is at least 22 is a fairly straightforward exercise.
You do have a lot of unused columns, but that doesn't really impact performance much - disk space is so cheap that all the wasted space is probably less space than you carry around in your smart phone.
If I understand your requirement, what I will do with your requirement is to create a many to many relationship something like this:
(tbl1) form:
- id
- field1
- field2
(tbl2) user_added_fields:
- id
- field_name
(tbl3) form_table_user_added_fields:
- form_id (fk)
- user_added_fields_id (fk)
This may not likely to solve your own requirements, but I hope this will give you a hint. Happy coding! :)

"Merging" Multiple Database Tables [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've read through multiple questions here on SO regarding merging multiple databases into one, however they primarily all deal with uniform schema/tables. My apologies if I'm repeating a question.
I have an assortment of database tables that are all similar, but not identical. For example, imagine ten databases with ten "User" tables. All contain a userid (we'll use this for reference). Most contain username and an email columns. Some will contain other columns, such as skype, msn, phone, etc. that exist in only a few of the other tables, or no other tables.
I want to merge this content into one database, with the prerequisite that, moving forward, the possibility of additional databases also containing unique columns will also need to be merged into the new database.
I've been looking at EAV Tables, and was considering something along the lines of (continuing with the example above) a master user table that had a newly-assigned user id (id), originating database reference of some type (database_id), and the originating user-id (native_user_id). I'd then have a separate properties table with a primary key (id), a entity key (user_id), an attribute (attribute) column, and the value (value) column.
The issue at hand is that almost everything I've read recommends against EAV tables while implying there are better ways to go about this. However, I've not actually found any material that covers what this method would be.
So, my questions:
Are EAV Tables really that bad?
What practical major downfalls that I should plan ahead for should I go the EAV table route (any examples of personal experience would be swell)?
What alternatives exist for handling this type of scenario besides EAV tables (while accommodating future attributes without tedious ALTER TABLE commands)?
I used EAV in a project to address requirements similar to yours: lack of a universal data model in the messy real world.
In my case, EAV allowed incremental change as the company grew by acquisition, which in turn caused continual expansion, refinement, or generalization of the data model. The project ultimately failed because management withdrew support for it.
I learned that EAV presents itself to management and users as needlessly complex unless you do the work to create concise views to hide the complexity while preserving the completeness of the data. I also learned that EAV imposes a demand to fill in the "missing answers" in a meaningful way. It isn't enough to say that every answer to a question that wasn't asked in database X is "NULL". Sometimes that is not the right answer. "NULL" becomes a synonym for "I don't know; the attribute didn't exist in this database so no-one ever decided what the value should be".
This is a fairly broad question, eh?
If you have your tables already in SQL I suggest you try experimenting with this sort of UNION ALL query.
SELECT 'one' AS dbid,
id AS id,
first AS first_name,
last AS last_name
FROM first_table
UNION ALL
SELECT 'two' AS dbid,
member_id AS id,
fname AS first_name,
lname AS last_name
FROM members
Etcetera. The idea is to use a UNION ALL query to try to coerce your various sources of information into a single result set, and figure out which of your values from those various sources are somehow conformable. If the lion's share of your data is conformable -- that is, you can simply move it over into appropriate columns in your new tables, you'll avoid the worst pitfalls of EAV storage.
Once you have done that, you can use EAV style storage for your remaining information.
I hope this helps you plan this migration a bit.

Designing a database : Which is the better approach? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am designing a database and am wondering which approach should I use. I am going to describe the database I intend to design and the possible approaches that I can use to store the data in the tables.
Please recommend which approach I should use and why?
About the data:
A) I have seven attributes that need to be taken care of. These are just examples and not the actual ones I intend to store. Let me call them:
1)Name
2)DOB (Modified..I had earlier put in age here..)
3)Gender
4)Marital Status
5)Salary
6)Mother Tongue
7)Father's Name
B) There will be a minimum of 10000 rows in the table and they can go up from there in the long term
C) The number of attributes can change over the period of time. That is, new attributes can be added to the existing dataset. No attributes will ever be removed.
Approach 1
Create a table with 7 attributes and store the data as it is. Added new columns if and when new attributed need to be added.
Pro: Easier to read the data and information is well organized
Con: There can be a lot of null values in certain rows for certain attributes for which values are unknown.
Approach 2
Create a table with 3 attributes. Let them be called :
1) Attr_Name : Stores the attribute name . eg name,age,gender ..etc
2) Attr_Value :Stores value for the above attribute, eg : Tom, 25, Male
3) Unique ID : Uniquely identifies the Name, Value pair in the database. eg. SSN
So, in approach 2, in case new attributes need to be added for certain rows, we can just add them to the hashmap we have created without worrying about null values.
Pro: Hashmap structure. Eliminates nulls.
Con: Data is not easy to read. Information cannot be easily grasped.
C) The Question
Which is the better approach.?
I feel that approach 1 is the better approach. Because its not too tough to handle null values and data is well organized and its easy to grasp this king of data. Please suggest which approach I should use and why?
Thanks!
This is a typical narrow table (attribute based) vs. wide table discussion. The problem with approach #2 is that you are probably going to have to pivot the data, to get it into a form the user can work with (back into a wide view format). This can be very resource intensive as the number of rows grows, and as the number of attributes grows. It's also hard to look at the table, in raw table view, and see what's going on.
We have had this discussion many times at our company. We have some tables that lend themselves very well to an attribute type schema. We've always decided against it because of the necessity to pivot the data and the inability to view the data and have it make sense (but this is the lessor of the two problems for us - we just don't want to pivot millions of rows of data).
BTW, I wouldn't store age as a number. I would store the birth date, if you have it. Also, I don't know what 'Mother Tongue' refers to, but, if it's the language the mother speaks, I would store this as a FK to a master language table. It's more efficient and lessens the problem of bad data because of a misspelled language.
Your second option is one of teh worst design mistakes you can make. This should only be done when you have hundreds of attributes that change constantly and are in no way the same from object to object (such as medical lab tests). If you need to do that, then do not under any circumstances use a relational database to do it. NOSQL database handle EAV designs better by far than relational ones.
Another problem with design 2 is that it becomes almost impossible to have good data integrity as you cannot correctly enforce FKs and data types and add contraints to the data. Since this stuff shoudl never be designed to happen only in the application since things other than the application often affect the data, this factor alone is enough to make your second idea foolish and foolhardy.
The first design will perform better in general. It will be easier to write queries and it will force you to think about what needs to change when you add an attribute (this is a plus not a minus) instead of having to design to always show all attributes whether you need them or not. If you would have a lot of nulls, then add a related table rather than more columns(you can have one-to-one related tables). Usually in this case you might have something that you know only a subset of the records will have and they often fall into groupings by subject fairly naturally. For instance you might have general people related attributes (name, phone, email, address) that belong in one table. Then you might have student-related attributes that belong in a separate table and teacher-related attributes that belong in a third table. Or you might have things you need for all insurance policies and separate tables for vehicle insurance, health insurance, House insurance and life insurance.
There is a third design possibility. If you have a set of attributes you know up front then put them in one table and have an EAV table only for attributes that cannot be determined at design time. This is the common pattern when the application wants to have the flexibility for the user to add customer specific data fields.
I don't think anyone can really determine which one is better immediately, but here are a couple of things to think about:
Do you have sample data? If yes then see if there will be a lot of nulls, if there are not then just go with option 1
Do you have a good sense on how the attributes will grow? For instance looking at the attributes you listed above, you may not know all of them, but they all do exist - so in theory you could fill the table. If you will have a lot of sparse data then #2 may work
When you do get new types of data can you group it into another table and use a foreign key? For instance if you want to capture the address you could always have an address table that references your initial table
What type of queries do you plan on using? It's much harder to query a key-value table than a "normal one" (not super hard, just harder - if you're comfortable using implied joins and the like to normalize the data then it's probably not a big deal).
Overall I'd be really careful before you implemented #2 - I've done it for certain specialized cases (metrics gathering where I have dozens of different metrics and don't really want to maintain dozens of different tables) but in general it's more trouble than it's worth.
For something like this I'd just create one table, and either add columns as you go along, or just create new tables for new data structures if necessary.

mysql table with 40+ columns

I have 40+ columns in my table and i have to add few more fields like, current city, hometown, school, work, uni, collage..
These user data wil be pulled for many matching users who are mutual friends (joining friend table with other user friend to see mutual friends) and who are not blocked and also who is not already friend with the user.
The above request is little complex, so i thought it would be good idea to put extra data in same user table to fast access, rather then adding more joins to the table, it will slow the query more down. but i wanted to get your suggestion on this
my friend told me to add the extra fields, which wont be searched on one field as serialized data.
ERD Diagram:
My current table: http://i.stack.imgur.com/KMwxb.png
If i join into more tables: http://i.stack.imgur.com/xhAxE.png
Some Suggestions
nothing wrong with this table and columns
follow this approach MySQL: Optimize table with lots of columns - which serialize extra fields into one field, which are not searchable's
create another table and put most of the data there. (this gets harder on joins, if i already have 3 or more tables to join to pull the records for users (ex. friends, user, check mutual friends)
As usual - it depends.
Firstly, there is a maximum number of columns MySQL can support, and you don't really want to get there.
Secondly, there is a performance impact when inserting or updating if you have lots of columns with an index (though I'm not sure if this matters on modern hardware).
Thirdly, large tables are often a dumping ground for all data that seems related to the core entity; this rapidly makes the design unclear. For instance, the design you present shows 3 different "status" type fields (status, is_admin, and fb_account_verified) - I suspect there's some business logic that should link those together (an admin must be a verified user, for instance), but your design doesn't support that.
This may or may not be a problem - it's more a conceptual, architecture/design question than a performance/will it work thing. However, in such cases, you may consider creating tables to reflect the related information about the account, even if it doesn't have a x-to-many relationship. So, you might create "user_profile", "user_credentials", "user_fb", "user_activity", all linked by user_id.
This makes it neater, and if you have to add more facebook-related fields, they won't dangle at the end of the table. It won't make your database faster or more scalable, though. The cost of the joins is likely to be negligible.
Whatever you do, option 2 - serializing "rarely used fields" into a single text field - is a terrible idea. You can't validate the data (so dates could be invalid, numbers might be text, not-nulls might be missing), and any use in a "where" clause becomes very slow.
A popular alternative is "Entity/Attribute/Value" or "Key/Value" stores. This solution has some benefits - you can store your data in a relational database even if your schema changes or is unknown at design time. However, they also have drawbacks: it's hard to validate the data at the database level (data type and nullability), it's hard to make meaningful links to other tables using foreign key relationships, and querying the data can become very complicated - imagine finding all records where the status is 1 and the facebook_id is null and the registration date is greater than yesterday.
Given that you appear to know the schema of your data, I'd say "key/value" is not a good choice.
I would advice to run some tests. Try it both ways and benchmark it. Nobody will be able to give you a definitive answer because you have not shared your hardware configuration, sample data, sample queries, how you plan on using the data etc. Here is some information that you may want to consider.
Use The Database as it was intended
A relational database is designed specifically to handle data. Use it as such. When written correctly, joining data in a well written schema will perform well. You can use EXPLAIN to optimize queries. You can log SLOW queries and improve their performance. Databases have been around for years, if putting everything into a single table improved performance, don't you think that would be all the buzz on the internet and everyone would be doing it?
Engine Types
How will inserts be affected as the row count grows? Are you using MyISAM or InnoDB? You will most likely want to use InnoDB so you get row level locking and not table. Make sure you are using the correct Engine type for your tables. Get the information you need to understand the pros and cons of both. The wrong engine type can kill performance.
Enhancing Performance using Partitions
Find ways to enhance performance. For example, as your datasets grow you could partition the data. Data partitioning will improve the performance of a large dataset by keeping slices of the data in separate partions allowing you to run queries on parts of large datasets instead of all of the information.
Use correct column types
Consider using UUID Primary Keys for portability and future growth. If you use proper column types, it will improve performance of your data.
Do not serialize data
Using serialized data is the worse way to go. When you use serialized fields, you are basically using the database as a file management system. It will save and retrieve the "file", but then your code will be responsible for unserializing, searching, sorting, etc. I just spent a year trying to unravel a mess like that. It's not what a database was intended to be used for. Anyone advising you to do that is not only giving you bad advice, they do not know what they are doing. There are very few circumstances where you would use serialized data in a database.
Conclusion
In the end, you have to make the final decision. Just make sure you are well informed and educated on the pros and cons of how you store data. The last piece of advice I would give is to find out what heavy users of mysql are doing. Do you think they store data in a single table? Or do they build a relational model and use it the way it was designed to be used?
When you say "I am going to put everything into a single table", you are saying that you know more about performance and can make better choices for optimization in your code than the team of developers that constantly work on MySQL to make it what it is today. Consider weighing your knowledge against the cumulative knowledge of the MySQL team and the DBAs, companies, and members of the database community who use it every day.
At a certain point you should look at the "short row model", also know as entity-key-value stores,as well as the traditional "long row model".
If you look at the schema used by WordPress you will see that there is a table wp_posts with 23 columns and a related table wp_post_meta with 4 columns (meta_id, post_id, meta_key, meta_value). The meta table is a "short row model" table that allows WordPress to have an infinite collection of attributes for a post.
Neither the "long row model" or the "short row model" is the best model, often the best choice is a combination of the two. As #nevillek pointed out searching and validating "short row" is not easy, fetching data can involve pivoting which is annoyingly difficult in MySql and Oracle.
The "long row model" is easier to validate, relate and fetch, but it can be very inflexible and inefficient when the data is sparse. Some rows may have only a few of the values non-null. Also you can't add new columns without modifying the schema, which could force a system outage, depending on your architecture.
I recently worked on a financial services system that had over 700 possible facts for each instrument, most had less than 20 facts. This could have been built by setting up dozens of tables, each for a particular asset class, or as a table with 700 columns, but we chose to use a combination of a table with about 20 columns containing the most popular facts and a 4 column table which contained the other facts. This design was efficient but was difficult ot access, so we built a few table functions in PL/SQL to assist with this.
I have a general comment for you,
Think about it: If you put anything more than 10-12 columns in a table even if it makes sense to put them in a table, I guess you are going to pay the price in the short term, long term and medium term.
Your 3 tables approach seems to be better than the 1 table approach, but consider making those into 5-6 tables rather than 3 tables because you still can.
Move currently, currently_position, currently_link from user-table and work from user-profile into a new table with your primary key called USERWORKPROFILE.
Move locale Information from user-profile to a newer USERPROFILELOCALE information because it is generic in nature.
And yes, all your generic attributes in all the tables should be int and not varchar.
For instance, City needs to move out to a new table called LIST_OF_CITIES with cityid.
And your attribute city should change from varchar to int and point to cityid in LIST_OF_CITIES.
Do not worry about performance issues; the more tables you have, better the performance, because you are actually handing out the performance to the database provider instead of taking it all in your own hands.

Meta table VS table with many fields, on a large scale. Performance-wise

Here is a concrete example:
Wordpress stores user information(meta) in a table called wp_usermeta where you get the meta_key field (ex: first_name) and the meta_value (John)
However, only after 50 or so users, the table already packs about 1219 records.
So, my question is: On a large scale, performance wise, would it be better to have a table with all the meta as a field, or a table like WordPress does with all the meta as a row ?
Indexes are properly set in both cases. There is little to no need of adding new metas. Keep in mind that a table like wp_usermeta must use a text/longtext field type (large footprint) in order to accommodate any type of data that could be entered.
My assumptions are that the WordPress approach is only good when you don't know what the user might need. Otherwise:
retrieving all the meta requires more I/O because the fields aren't stored in a single row. The field isn't optimised.
You can't really have an index on the meta_value field without suffering from major drawbacks (indexing a longtext ? unless it's a partial index...but then, how long?)
Soon, your database is cluttered with many rows, cursing your research even for the most precise meta
Developer-friendly is absent. You can't really do a join request to get everything you need and displayed properly.
I may be missing a point though. I'm not a database engineer, and I know only the basics of SQL.
You're talking about Entity-Attribute-Value.
- Entity = User, in your Wordpress Example
- Attribute = 'First Name', 'Last Name', etc
- Value = 'John', 'Smith', etc
Such a schema is very good at allowing a dynamic number of Attributes for any given Entity. You don't need to change the schema to add an Attribute. Depending on the queries, the new attributes can often be used without changing any SQL at all.
It's also perfectly fast enough at retrieving those attributes values, provided that you know the Entity and the Attribute that you're looking for. It's just a big fancy Key-Value-Pair type of set-up.
It is, however, not so good where you need to search the records based on the Value contents. Such as, get me all users called 'John Smith'. Trivial to ask in English. Trivial to code against a 'normal' table; first_name = 'John' AND last_name = 'Smith'. But non-trivial to write in SQL against EAV, and awful relative performance; (Get all the Johns, then all the Smiths, then intersect them to get Entities that match both.)
There is a lot said about EAV on-line, so I won't go in to massive detail here. But a general rule of thumb is: If you can avoid it, you probably should.
Depends on the number of names packed into wp_usermeta on average.
Text field searches are notoriously slow. Indexes are generally faster.
But some data warehouses index the crap out of every field and Wordpress might be doing the same thing.
I would vote for meta as a field not a row.
Good SQL, good night.
Mike
Examples from two major software in the GPL arena would illustrate how big difference there is in between the two designs :
Wordpress & oScommerce
Both have their flaws and strengths, and both are massively dominant in their respective areas and a lot of things are done with them. But one of the fundamental and biggest differences in between them is their approach to database table design. Of course, when comparing these, their code architecture also plays a role in how fast they do searches, but both are hampered by their own drawbacks and boosted by their own advantages, so the comparison is more or less accurate for production environments.
Wordpress uses EAV. The general data (called posts with different post types) is stored as the main entity, and all else is stored in post meta tables. Some fundamental data is stored in the main table, like revisions, post type etc, but almost all the rest is stored in metas.
VERY good for adding, modifying data, and therefore functionality.
But try a search with a complex SQL join which needs to pick up 3-4 different values from the meta table and get the resulting set. Its an iron dog. Search comes out VERY slow depending on the data you are searching for.
This is one of the reasons why you dont see many wordpress plugins which need to host complex data, and the ones which actually do, tend to create their own tables.
oScommerce on the other hand, keeps almost all product related data in products table. And majority of oScommerce mods modify this table and add their fields. There is products_attribute table, however this is also rather flat, and doesnt have any meta design. Its just linked to products over product ids.
As a result, despite being an aged spaghetti code from a very long time ago, oScommerce comes up with stunningly fast search results even when you search for complicated and combined product criteria. Actually, most of oScommerce's normal display code (like what it shows in product details page) comes from quite complicated SQLs pulling data from around 2-3 tables in complicated join statements. Comparably much more simpler sql with even one join could make wordpress duke it out with the database.
Therefore its rather plain conclusion : EAV is very good for easy extension and modification of data for extended functionality (ie as in wordpress). Flat, big monolithic tables are MUCH better for databases which will represent complicated records, and will have complicated searches with multiple criteria run on them.
Its a question of application.
For what i've seen the EAV model doesn't affect the performance. Unless you need the null values. In that case you should make a join with the table that holds all the type_meta.
I don't agree with the answer of Dems.
If you want to make the fullname of the user you don't ask for every name that matchs the name.
For that you should use a 5th or 6th NF.
Or you may even have a table of the user entity where you have:
id
username
password
salt
and there you go. That's the base, and for all the user "extra" data you should have a user_meta and user_type_meta entities. Then with the user.