Modelling a "one to one or two" relationship - mysql

I have an uncommon database design problem I'm not sure how to handle properly. There is a table called profile storing a website users' public profile information. However, every profile can belong to either a single person or a couple so I need an additional child table called person to store person-specific data. Every profile entity must have at least one but no more than two person child entities.
What is the best (in terms of being "kosher" and/or performance) way to model such relationship? Should I go with regular one-to-many and enforce the number of children programatically or with stored procedures? Or should I just create two foreign key fields in the parent table and allow null for one of them? Maybe there's another way I can't think of?
Edit: Additional info in response to Gordon's questions
A person can be related to only one profile and there can't be a person without a profile. Perhaps the name person is confusing, as it may suggest that a person has profile, while in fact it's the profile that has person information.
In case of couple profiles both persons are equal. Due to the site's specific the limit on 2 will never change, however it should be possible to add or remove a person (to make a single person profile a couple profile and vice-versa) but there can never be less than 1 or more than 2 persons.
The person data would never be fetched without the profile data but the profile data could sometimes be fetched without the person data.

1)
The solution with two fields:
PRO: Allows you to precisely restrict both minimal and maximal number of people per proflie.
CON: Would allow a profile-less person.
CON: Would require 2 indexes (1 on each field) to efficiently get the profile of a given person, taking additional space and potentially slowing down INSERT/UPDATE/DELETE.
2)
But if you are willing to enforce the minimal number at the application level, you might be better off with something like this:
CHECK(PERSON_NO = 1 OR PERSON_NO = 2)
Characteristics:
CON: Allows a person-less profile.
PRO: Restricts maximal number of people per profile, yet easy to change by just modifying the CHECK.
PRO: If you keep the identifying relationship as above, it doesn't require additional indexes and is clustering-friendly (persons of the same profile can be stored physically close together, minimizing I/O during JOIN).
On the other hand, if you have a key PERSON_ID (or similar), then an additional index on {PROFILE_ID, PERSON_NO} would be necessary for the efficient enforcement of key constraint on these fields too.
3)
Theoretically, you could even combine the two approaches and avoid both profile-less persons and person-less profiles:
(PERSON1_ID is not NULL-able, PERSON2_ID is NULL-able)
However, this leads to circular references, requiring deferred constraints to resolve, which are unfortunately not supported by MySQL.
4)
And finally, you could just take a brute-force approach and simply place fields of both persons in the profile table (and make one of these sets not NULL-able and the other NULL-able).
Out of all these possibilities, I'd probably go with 2).

You essentially have two options, as you mention in your question. You can store two fields in the table. Or, you can have a second table that has the mapping information.
Here are some additional question to help you answer the question:
Can a person have their own profile and a profile as part of a couple?
Are both people on a profile "equal" or is one the "master" and the other an "alternate"?
When you fetch profile information, will you always be including information about all people on the profile?
Can you have persons without profiles?
In this case, I just have the sneaky suspicion that the limit on "2" may change in the future. This suggests storing the mapping in a separate table, since increasing "2" by adding a field is a problem in terms of modifying existing code. In other words, creating a separate table, person-profile, that maps persons to profiles. In mysql, you can always gather the person-level information using GROUP_CONCAT().
One case where it is better to put such similar fields in the same table is when one is clearly preferred and the other is the alternate. In that case, you are doing a lot of "coalesce(, )" type of logic.

Related

Best table structure for users with different roles

We are working on a website which will feature about 5 different user roles, each with different properties. In the current version of the database schema we have a single users table which holds all the users, and all of their properties.
The problem is that the properties that we need differ per user role. All users have the same basis properties, like a name, e-mail address and password. But on top of that the properties differ per role. Some have social media links, others have invoice addresses, etc. In total there may be up to 60 columns (properties), of which only a portion are used by each user role.
In total we may have about 250,000 users in the table, of which the biggest portion (about 220,000) will be of a single user role (and use about 20 of the 60 columns). The other 30,000 users are divided over four other rules and use a sub-set of the other 40 columns.
What is the best database structure for this, both from a DB as a development perspective? My idea is to have a base users table, and then extend on that with tables like users_ moderators, but this may lead to a lot of JOIN'ed queries. A way to prevent this is by using VIEWs, but I've read some (out-dated?) articles that VIEWs may hurt performance, like: http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker/.
Does the 'perfect' structure even exist? Any suggestion, or isn't this really a problem at all and should we just put all users in a single big tables?
There are two different ways to go about this. One is called "Single Table Inheritance". This is basically the design you ask for comments on. It's pretty fast because there are no joins. However NULLs can affect throughput to a small degree, because fat rows take a little longer to bring into memory than thinner rows.
An alternative design is called "Class Table Inheritance". In this design, there is one table for the super class and one table for each subclass. Non key attributes go into the table where they pertain. Often, a design called "Shared Primary Key" can be used with this design. In shared primary key, the key ids of the subclass tables are copies of the id from the corresponding row in the superclass table.
It's a little work at insert time, but it pays for itself when you go to join data.
You should look up all three of these in SO (they have their own tags) or out on the web. You'll get more details on the design, and an indication of how well each design fits your case.
'Perfect' structure for such cases, in my opinion, is party-role-relationship model. Search for Len Silverston's books about data models. It looks quite complicated at the beginning, but it gives great flexibility...
The biggest question is practicability of adopting perfect solution. Nobody except you can answer that. Refactoring is never an easy and fast task, so say if your project lifetime is 1 year, spending 9 month paying out 'technical debts' sounds more like wasting of time/efforts/etc.
As for performance of joins, having proper indexes usually solves potential issues. If not, you can always implement materialized view ; even though mysql doesn't have such option out of the box, you can design it yourself and refresh it in different ways(for instance, using triggers or launch refresh procedure periodically/on demand).
table user
table roles
table permissions
table userRole
table userPermission
table RolesPermissions
Each role have is permissions in role permissions table
Each user can have a permission whitout the role (extention...)
So in PHP you just have to merge arrays of user permissions in user roles and extended permissions...
And in your "acl" class you check if your user have the permission to view or process a webpage or a system process...
I think you don't need to worry about speed here so much.
Because it will be one time thing only. i.e. on user login store acl in session and get it next time from there.
JOINs are not so bad. If you have your indexes and foreign keys in right places with InnoDB engine it will be really fast.
I would use one table for users and role_id. Second table with roles. Third table for resources, and one to link it all together + enabled flag.

MySQL design regarding a web

I am tackling a problem in class to design a mySQL representation of a web that stores a list of events associated with a person. So, for this table/tables, it would have 2 columns, one of which is the person's name and the other is the event. However, a person will generally have anywhere from 30-1000 events, so this table, which we plan to have for our entire undergraduate class of 6000 students, will have millions of entries. Is there a better way to store this in mySQL that will take less space, but will still be able to retrieve individual events and the list of people that attended it just as easily as if it was a table of two columns?
Yes, there is a technique called many-to-many, and essentially breaks your one table into three, which is critical when you consider that there are indeed exactly three entities being modeled (as a good sanity check)
Person
Event
A Person's association with an Event
You model this as three tables, with the first two having essentially two columns each: one with a unique index (called "primary key"), and the second being a semantic name (person name, event name). Note that you can also add any number of columns to these with only one factor of increased storage (most likely your first move will be to add a date column to the event table).
The third table is the interesting one, it contains only 2 columns, each numeric, both of which are references to the other tables (each row is simply: (person_id, event_id)). We term these "foreign keys".
This structure means a few things:
No matter how many events someone goest to, that someone is only represented once.
same with events, not matter how many attendees
The attendance is a "first-class" entity, and can grow to include it's own attributes (i.e. "role")
This structure is called many-to-many because each person may attend many events, and each event may have many attendees.
The quintessential feature of the design is that no single piece of domain knowledge is repeated, only "keys" are repeated as necessary to model the real-world domain. (i.e. in your first example, accounting for a name change would require an unknown quantity of updates, and might lead to data anomalies, avoidance of which is a primary concern of database normalization.
Don't worry about "space". This isn't the 1970s and we're not going to run out of columns on punch cards to store data. You should be concerned with expressing your requirements in the proper, most normalized data structure. With proper indexing there shouldn't be a problem, not with this volume of data.
Remember indexes need to be defined on anything you will include as part of a WHERE clause, and sometimes you may need to add additional indexes for large lists fetched with ORDER BY and LIMIT.
Whenever possible or practical use an integer identifier instead of a string. These are stored as a small number of bytes, typically 4, compared with a variable length string which is typically at least the length of the string in bytes plus 1.
A properly normalized database will use numerical identifiers for things anyway, so this kind if thing isn't a huge concern. The only time you go against this, or deliberately de-normalize your data, is when you have a legitimate performance problem that cannot be easily solved using some other method.
As always, test your schema by generating large amounts of dummy data and see how it performs. Since you have a good idea of the requirements in advance, do some testing at those levels, and then, to be on the safe side, try 2x, 5x and 10x the data to see how much flexibility your design has. It's okay to have performance limitations so long as you know at what kind of scale you'll experience them.
mySQL relational databases were designed specifically to handle this sort of problem. Handling millions of entries is not a problem. Complex queries may take a couple seconds but will perform remarkably well.
It is best design to store 1 event per row. The way you are going about it sounds like the best way. Good Luck.

How to handle massive storage of records in database for user authorization purposes?

I am using Ruby on Rails 3.2.2 and MySQL. I would like to know if it is "advisable" / "desirable" to store in a database table related to a class all records related to two others classes for each "combination" of their instances.
That is, I have User and Article models. In order to store all user-article authorization objects, I would like to implement a ArticleUserAuthorization model so that
given N users and M articles there are N*M ArticleUserAuthorization records.
Making so, I can state and use ActiveRecord::Associations as the following:
class Article < ActiveRecord::Base
has_many :user_authorizations, :class_name => 'ArticleUserAuthorization'
has_many :users, :through => :user_authorizations
end
class User < ActiveRecord::Base
has_many :article_authorizations, :class_name => 'ArticleUserAuthorization'
has_many :articles, :through => :article_authorizations
end
However, the above approach of storing all combinations will result in a big database table containing billions billions billions of rows!!! Furthermore, ideally speaking, I am planning to create all authorization records when an User or an Article object is created (that is, I am planning to create all previously mentioned "combinations" at once or, better, in "delayed" batches... in any way, this process creates other billions billions of database table rows!!!) and make the viceversa when destroying (by deleting billions billions of database table rows!!!). Furthermore, I am planning to read and update those rows at once when an User or Article object is updated.
So, my doubts are:
Is this approach "advisable" / "desirable"? For example, what kind of performance problems may occur? or, is a bad "way" / "prescription" to admin / manage databases with very large database tables?
How may / could / should I proceed in my case (maybe, by "re-thinking" at all how to handle user authorizations in a better way)?
Note: I would use this approach because, in order to retrieve only "authorized objects" when retrieving User or Article objects, I think I need "atomic" user authorization rules (that is, one user authorization record for each user and article object) since the system is not based on user groups like "admin", "registered" and so on. So, I thought that the availability of a ArticleUserAuthorization table avoids to run methods related to user authorizations (note: those methods involve some MySQL querying that could worsen performance - see this my previous question for a sample "authorization" method implementation) on each retrieved object by "simply" accessing / joining the ArticleUserAuthorization table so to retrieve only "user authorized" objects.
The fact of the matter is that if you want article-level permissions per user then you need a way to relate Users to the Articles they can access. This neccesitates a minimum you need N*A (where A is the number of uniquely permissioned articles).
The 3NF approach to this would be, as you suggested, to have a UsersArticles set... which would be a very large table (as you noted).
Consider that this table would be accessed a whole lot...
This seems to me like one of the situations in which a slightly denormalized approach (or even noSQL) is more appropriate.
Consider the model that Twitter uses for their user follower tables:
Jeff Atwood on the subject
And High Scalability Blog
A sample from those pieces is a lesson learned at Twitter that querying followers from a normalized table puts tremendous stress on a Users table. Their solution was to denormalize followers so that a user's follower's are stored on their individual user settings.
Denormalize a lot. Single handedly saved them. For example, they store all a user IDs friend IDs together, which prevented a lot of costly joins.
- Avoid complex joins.
- Avoid scanning large sets of data.
I imagine a similar approach could be used to serve article permissions and avoid a tremendously stressed UsersArticles single table.
You don't have to re-invent the wheel. ACL(Access Control List) frameworks deals with same kind of problem for ages now, and most efficiently if you ask me. You have resources (Article) or even better resource groups (Article Category/Tag/Etc).On the other hand you have users (User) and User Groups. Then you would have a relatively small table which maps Resource Groups to User Groups. And you would have another relatively small table which holds exceptions to this general mapping. Alternatively you can have rule sets to satify for accessing an article.You can even have dynamic groups like : authors_friends depending on your user-user relation.
Just take a look at any decent ACL framework and you would have an idea how to handle this kind of problem.
If there really is the prospect of "a big database table containing billions billions billions of rows" then perhaps you should craft a solution for your specific needs around a (relatively) sparsely populated table.
Large database tables pose a significant performance challange in how quickly the system can locate the relevant row or rows. Indexes and primary keys are really needed here; however they add to the storage requirements and also require CPU cycles to be maintained as records are added, updated, and deleted. Evenso, heavy-duty database systems also have partitioning features (see http://en.wikipedia.org/wiki/Partition_(database) ) that address such row location performance issues.
A sparsely populated table can probably serve the purpose assuming some (computable or constant) default can be used whenever no rows are returned. Insert rows only wherever something other than the default is required. A sparsely populated table will require much less storage space and the system will be able to locate rows more quickly. (The use of user-defined functions or views may help keep the querying straightforward.)
If you really cannot make a sparsely populated table work for you, then you are quite stuck. Perhaps you can make that huge table into a collection of smaller tables, though I doubt that's of any help if your database system supports partitioning. Besides, a collection of smaller tables makes for messier querying.
So let's say you have millions or billions of Users who or may not have certain privileges regarding the millions or billions of Articles in your system. What, then, at the business level determines what a User is privileged to do with a given Article? Must the User be a (paying) subscriber? Or may he or she be a guest? Does the User apply (and pay) for a package of certain Articles? Might a User be accorded the privilege of editing certain Articles? And so on and so forth.
So let's say a certain User wants to do something with a certain Article. In the case of a sparsely populated table, a SELECT on that grand table UsersArticles will either return 1 row or none. If it returns a row, then one immediately knows the ArticleUserAuthorization, and can proceed with the rest of the operation.
If no row, then maybe it's enough to say the User cannot do anything with this Article. Or maybe the User is a member of some UserGroup that is entitled to certain privileges to any Article that has some ArticleAttribute (which this Article has or has not). Or maybe the Article has a default ArticleUserAuthorization (stored in some other table) for any User that does not have such a record already in UsersArticles. Or whatever...
The point is that many situations have a structure and a regularity that can be used to help reduce the resources needed by a system. Human beings, for instance, can add two numbers with up to 6 digits each without consulting a table of over half a trillion entries; that's taking advantage of structure. As for regularity, most folks have heard of the Pareto principle (the "80-20" rule - see http://en.wikipedia.org/wiki/Pareto_principle ). Do you really need to have "billions billions billions of rows"? Or would it be truer to say that about 80% of the Users will each only have (special) privileges for maybe hundreds or thousands of the Articles - in which case, why waste the other "billions billions billions" (rounded :-P).
You should look at a hierarchical role based access control (RBAC) solutions. You should also consider sensible defaults.
Are all users allowed to read an article by default? Then store the deny exceptions.
Are all users not allowed to read an article by default? Then store the allow exceptions.
Does it depend on the article whether the default is allow or deny? Then store that in the article, and store both allow and deny exceptions.
Are articles put into issues, and issues collected into journals, and journals collected into fields of knowledge? Then store authorizations between users and those objects.
What if a User is allowed to read a Journal but is denied a specific Article? Then store User-Journal:allow, User-Article:deny and the most specific instruction (in this case the article) takes precedence over the more general (in this case the default, and the journal).
Shard the ArticleUserAuthorization table by user_id. The principle is to reduce the effective dataset size on the access path. Some data will be accessed more frequently than others, also it be be accessed in a particular way. On that path the size of the resultset should be small. Here we do that by having a shard. Also, optimize that path more by maybe having an index if it is a read workload, cache it etc
This particular shard is useful if you want all the articles authorized by a user.
If you want to query by article as well, then duplicate the table and shard by article_id as well. When we have this second sharding scheme, we have denormalized the data. The data is now duplicated and the application would need to do extra work to maintain data-consistency. Writes also will be slower, use a queue for writes
Problem with sharding is that queries across shards is ineffectve, you will need a separate reporting database. Pick a sharding scheme and think about recomputing shards.
For truly massive databases, you would want to split it across physical machines. eg. one or more machines per user's articles.
some nosql suggestions are:
relationships are graphs. so look at graph databases. particularly
https://github.com/twitter/flockdb
redis, by storing the relationship in a list.
column-oriented database like hbase. can treat it like a sparse nested hash
all this depends on the size of your database and the types of queries
EDIT: modified answer. the question previously had 'had_one' relationships Also added nosql suggestions 1 & 2
First of all, it is good to think about default values and behaviors and not store them in the database. For example, if by default, a user cannot read an article unless specified, then, it does not have to be stored as false in the database.
My second thought is that you could have a users_authorizations column in your articles table and a articles_authorizations in your users table. Those 2 columns would store user ids and article ids in the form 3,7,65,78,29,78. For the articles table for example, this would mean users with ids 3,7,65,78,29,78 can access the articles. Then you would have to modify your queries to retrieve users that way:
#article = Article.find(34)
#users = User.find(#article.user_authorizations.split(','))
Each time an article and a user is saved or destroyed, you would have to create callbacks to update the authorization columns.
class User < ActiveRecord
after_save :update_articles_authorizations
def update_articles_authorizations
#...
end
end
Do the same for Article model.
Last thing: if you have different types of authorizations, don't hesitate creating more columns like user_edit_authorization.
With these combined techniques, the quantity of data and hits to the DB are minimal.
Reading through all the comments and the question I still doubt the validity of storing all the combinations. Think about the question in another way - who will populate that table? The author of the article or moderator, or someone else? And based on what rule? You wound imagine how difficult that is. It's impossible to populate all the combinations.
Facebook has a similar feature. When you write a post, you can choose who do you want to share it with. You can select 'Friends', 'Friends of Friends', 'Everyone' or custom list. The custom list allows you to define who will be included and excluded. So same as that, you only need to store the special cases, like 'include' and 'exclude', and all the remaining combinations fall into the default case. By dong this, N*M could be reduced significantly.

'Many to two' relationship

I am wondering about a 'many to two' relationship. The child can be linked to either of two parents, but not both. Is there any way to reinforce this? Also I would like to prevent duplicate entries in the child.
A real world example would be phone numbers, users and companies. A company can have many phone numbers, a user can have many phone numbers, but ideally the user shouldn't provide the same phone number as the company as there would be duplicate content in the DB.
This question shows that you don't fully understand entity relationships (no rudeness intended). Of which there are four (technically only 3) types below:
One to One
One to Many
Many to One
Many to Many
One to One (1:1):
In this case a table has been broken up into two parts for purposes of complying with normalisation, or more usually the open closed principle.
Normalisation compliance: You might have a business rule that each customer has only one account. Technically, you could in this case say customer and account could all be in the same table, but this breaks the rules of normalisation, so you split them and make a 1:1.
Open-Close principle compliance: A customer table, might have id, first & last names, and address. Later someone decides to add a date of birth and with it the ability to calculate age along with a bunch of other much needed fields. This is an over simplified example of one to one, but you get the main use for it is to extend your database without breaking existing code. Much code written (sadly) is tightly coupled to the database so changes in the structure of a table will break the code. Adding a 1:1 like this will extend the table to meet new requirements without modifying the origional, thereby allowing old code to continue functioning normally and new code to make use of the new db features.
The downside of normalisation and extending tables using 1:1 relationships in this way is performance. Often times on heavly used systems, the first target to increase database performance is de-normalising and combining such tables into a single table, and optimising the indexes thus removing the need to use joins and read from multiple tables. Normalisation / De-Normalisation is neither a good or bad thing, as it depends on the needs of the system. Most systems usually start off normalised changing back when needed, but this change needs to be done very carefully as mentioned, if code is tightly coupled to the DB structure, it will almost definitely cause the system to fail. i.e. When you combine 2 tables, one ceases to exist, all the code that includes that now nonexistant table fails until it is modified (in db terms, imagine connecting relationships to any of the tables in the 1:1, when you remove those tables, this breaks the relationships, and so the structure has to be greatly modified to compensate. Unfortunately, such bad designs are much easier to spot in the DB world than in the software world in most cases and you don't usually notice something went wrong in code until it all falls apart) unless the system is properly designed with separation of concerns in mind.
It the closest thing you can get to inheritance in object oriented programming. But its not quite the same.
One to Many (1:M) / Many to One (M:1):
These two relationships (hense why 4 become 3), are the most popular relationship types. They are both the same type of relationship, the only thing that changes is your point of view. An example A customer has many phone numbers, or alternately, many phone numbers can belong to a customer.
In object oriented programming this would be considered composition. Its not inheritance, but you are saying one item is composed of many parts. This is usually represented with arrays / lists / collections etc. inside of classes as opposed to an inheritance structure.
Many to Many (M:M):
This type of relationship with current technology is impossible. For this reason we need to break it down into two one to many relationships with an "association" table joining them. The many side of the two one to many relationships is always on the association / link table.
For your example, the person who said you need a many to many is correct. Because a two to many is effectively a many (meaning more than one) to many relationship. This is the only way you would get your system to work. Unless you are intending to research the field of relational calculus to find some new type of relationship that would allow this.
Also for such relationships (m2m) you have two choices, either create a compound key in the linker table so the combination of fields become a unique entry (if you are interested in db optimisation this is the slower choice, but takes less space). Alternately, you create a third field with an auto generated id column and make that the primary key (for db optimisation, this is the faster choice, but takes more space).
In your example specifically above...
A real world example would be phone numbers, users and companies. A company can have many phone numbers, a user can have many phone numbers, but ideally the user shouldn't provide the same phone number as the company as there would be duplicate content in the DB.
This would be a many to many relationship with the phone number table as the linker table between companies and users. As explained, to ensure no phone number is repeated, you simply set it as the primary key or use another primary key and set the phone number field to unique.
For those kind of questions, it is really down to how you phrase them. What is causing you to get confused about this, and how you overcome this confusion to see the solution is simple. Rephrase the problem as follows. Start by asking is it a one to one, if the answer is no, move on. Next ask is it a one to many, if the answer is no move on. The only other option remaining is many to many. Be careful though, ensure you have considered the first 2 questions carefully before moving on. Many inexperienced database people often over complicate issues by defining one to many as many to many. Once again, the most popular type of relationship by far is one to many (I would say 90%) with the many to many and one to one spliting the remaining 10% 7/3 respectevely. But those figures are just my personal perspective, so dont go quoting them as industry standard statistics. My point is to make extra extra sure it is definitely not a one to many before choosing many to many. It is worth the extra effort.
So now to find the linker table between the two, decide which two are your main tables, and what fields need to be shared between them. In this case, company and user tables both need to share the phone. Hense you need to make a new phone table as the linker.
The warning alarm of misunderstanding should show as soon as you decide none of the 3 are working for you. This should be enough to tell you that you simply are not phrasing the relationship question correctly. You will get better at it as time passes, but it is an essential skill and really should be mastered as soon as possible for your own sanaty.
Of course you could also go to an object oriented database which will allow a range of other relationships called "Hierarchacal" relationships. Thats great if you are thinking of becomming a programmer too. But I wouldnt recommend this as it going to make your head hurt when you start finding ways to combine the various types of relationships. Especially given there is not much need since nearly all databases in the world consist of just those 3 types of relationships unless they are something super duper special.
Hope this was a reasonable answer. Thanks for taking the time to read it.
Just make phone number a key in your contact numbers table.
For your phone number example, you would put the phone number in a table by itself, with an ID.
Then you link to that phone_id from each of users and companies.
For your parents example, you don't link the child to parent - instead you link the parent to the child. OR, you put both parents in the same table, and the child just links to one of them.

What is the most efficient method of keeping track of each user's "blocked users" in a MySQL Database?

What is the most efficient method of managing blocked users for each user so they don't appear in search results on a PHP/MySQL-run site?
This is the way I am currently doing it and I have a feeling this is not the most efficient way:
Create a BLOB for each user on their main user table that gets updated with the unique User ID's of each user they block. So if User ID's 313, 563, and 732 are blocked by a user, their BLOB simply contains "313,563,732". Then, whenever a search result is queried for that user, I include the BLOB contents like so "AND UserID NOT IN (313,563,732)" so that the blocked User ID's don't show up for that user. When a user "unblocks" someone, I remove that User ID from their BLOB.
Is there a better way of doing this (I'm sure there is!)? If so, why is it better and what are the pros and cons of your suggestion?
Thanks, I appreciate it!
You are saving relationships in a relational database in a way that it does not understand. You will not have the benefit of foreign keys etc.
My recommended way to do this would be to have a seperate table for the blocked users:
create table user_blocked_users (user_id int, blocked_user_id);
Then when you want to filter the search result, you can simply do it with a subquery:
select * from user u where ?searcherId not in (select b.blocked_user_id from user_blocked_users where b.user_id = u.id)
You may want to start out that way, and then optimize it with queries, caches or other things if neccessary - but do it last. First, do a consistent and correct data model that you can work with.
Some of the pros of this approach:
You will have a correct data model
of your block relations
With foreign keys, you will keep your data model consistent
The cons of this approach:
In your case, none that I can see
The cons of your approach:
It will be slow and not scalable, as blobs are searched binarily and not indexed
Your data model will be hard to maintain and you will not have the benefit of foreign keys
You are looking for a cross reference table.
You have a table containing user IDs and "Blocked" user IDs, then you SELECT blockid FROM blocked WHERE uid=$user and you have a list of user ids that are blocked, which you can filter through a where clause such as WHERE uid NOT IN(SELECT blockid FROM blocked WHERE uid=$user)
Now you can block multiple users per user, and the other way round, with all the speed of an actual database.
You are looking for a second table joined in a many-to-many relationship. Check this post:
Many-to-Many Relationships in MySQL
The "Pros" are numerous. You are handling your data with referential integrity, which has incalculable benefits down the road. The issue you described will be followed by others in your application, and some of those others will be more unmanageable than this one.
The "Cons" are that
You will have have to learn how referential data works (but that's ahead anyway, as I say)
You will have more tables to deal with (ditto)
You will have to learn more about CRUD, which is difficult ... but, just part of the package.
What you are currently using is not regarded as a good practice for relational database design, however, like with anything else, there are cases when that approach can be justified, albeit restrictive in terms of what you can accomplish.
What you could do is, like J V suggested, create a cross reference table that contains mappings of user relationships. This allows you to, among other things, skip unnecessary queries, make use of table indexes and possibly most importantly, it gives you far greater flexibility in the future.
For instance, you can add a field to the table that indicates the type/status of the relationship (ie. blocked, friend, pending approval etc.) which would allow a much more complex system to be developed easily.