Social web application database design: how can I improve this schema? - mysql

Background
I am developing a social web app for poets and writers, allowing them to share their poetry, gather feedback, and communicate with other poets. I have very little formal training in database design, but I have been reading books, SO, and online DB design resources in an attempt to ensure performance and scalability without over-engineering.
The database is MySQL, and the application is written in PHP. I'm not sure yet whether we will be using an ORM library or writing SQL queries from scratch in the app. Other than the web application, Solr search server and maybe some messaging client will interact with the database.
Current Needs
The schema I have thrown together below represents the primary components of the first version of the website. Initially, users can register for the site and do any of the following:
Create and modify profile details and account settings
Post, tag and categorize their writing
Read, comment on and "favorite" other users' posts
"Follow" other users to get notifications of their activity
Search and browse content and get suggested posts/users (though we will be using the Solr search server to index DB data and run these type of queries)
Schema
Here is what I came up with on MySQL Workbench for the initial site. I'm still a little fuzzy on some relational databasey things, so go easy.
Questions
In general, is there anything I'm doing wrong or can improve upon?
Is there any reason why I shouldn't combine the ExternalAccounts table into the UserProfiles table?
Is there any reason why I shouldn't combine the PostStats table into the Posts table?
Should I expand the design to include the features we are doing in the second version just to ensure that the initial schema can support it?
Is there anything I can do to optimize the DB design for Solr indexing/performance/whatever?
Should I be using more natural primary keys, like Username instead of UserID, or zip/area code instead of a surrogate LocationID in the Locations table?
Thanks for the help!

In general, is there anything I'm doing wrong or can improve upon?
Overall, I don't see any big flaws in your current setup or schema.
What I'm wonderng is your split into 3 User* tables. I get what you want your intendtion was (having different user-related things seperate) but I don't know if I would go with the exact same thing. If you plan on displaying only data from the User table on the site, this is fine, since the other info is not needed multiple times on the same page but if users need to use their real name and display their real name (like John Doe instead of doe55) than this will slow down things when the data gets bigger since you may require joins. Having the Preferences seperate seems like a personal choice. I have no argument in favor of nor against it.
Your many-to-many tables would not need an addtional PK (e.g PostFavoriteID). A combined primary of both PostID and UserID would be enough since PostFavoriteID is never used anywhere else. This goes for all join tables
Is there any reason why I shouldn't combine the ExternalAccounts
table into the UserProfiles table?
As withe the prev. answer, I don't see a advatanage or disadvantage. I may put both in the same table since the NULL (or maybe better -1) values would not bother me.
Is there any reason why I shouldn't combine the PostStats table
into the Posts table?
I would put them into the same table using a trigger to handle the increment of the ViewCount table
Should I expand the design to include
the features we are doing in the
second version just to ensure that the
initial schema can support it?
You are using a normalsied schema so any additions can be done at any time.
Is there anything I can do to optimize the DB design for Solr
indexing/performance/whatever?
Can't tell you, haven't done it yet but I know that Solr is very powerfull and flexible so I think you should be doing fine.
Should I be using more natural primary keys, like Username instead of
UserID, or zip/area code instead of a
surrogate LocationID in the Locations
table?
There are many threads here on SO discussing this. Personally, I like a surrogate key better (or another unique number key if available) since it makes queries more easier and faster since an int is looked up easier. If you allow a change of username/email/whatever-your-PK-is than there are massive updates required. With the surrogate key, you don't need to bother.
What I would also do is to add things like created_at, last_accessed at (best done via triggers or procedures IMO) to have some stats already available. This can realy give you valuable stats
Further strategies to increate the performance would be things like memcache, counter cache, partitioned tables,... Such things can be discussed when you are really overrun by users because there may be things/technologies/techniques/... that are very specific to your problem.

I'm not clear what's going on with your User* tables - they're set up as if they're 1:1 but the diagram reflects 1-to-many (the crow's foot symbol).
The ExternalAccounts and UserSettings could be normalised further (in which case they would then be 1-to-many!), which will give you a more maintainable design - you wouldn't need to add further columns to your schema for additional External Account or Notification Types (although this may be less scalable in terms of performance).
For example:
ExternalAccounts
UserId int,
AccountType varchar(45),
AccountIdentifier varchar(45)
will allow you to store LinkedIn, Google, etc. accounts in the same structure.
Similarly, further Notification Types can be readily added using a structure like:
UserSettings
UserId int,
NotificationType varchar(45),
NotificationFlag ENUM('on','off')
hth

Related

Is it good or bad practice to have multiple foreign keys in a single table, when the other tables can be connected using joins?

Let's say I wanted to make a database that could be used to keep track of bank accounts and transactions for a user. A database that can be used in a Checkbook application.
If i have a user table, with the following properties:
user_id
email
password
And then I create an account table, which can be linked to a certain user:
account_id
account_description
account_balance
user_id
And to go the next step, I create a transaction table:
transaction_id
transaction_description
is_withdrawal
account_id // The account to which this transaction belongs
user_id // The user to which this transaction belongs
Is having the user_id in the transaction table a good option? It would make the query cleaner if I wanted to get all the transactions for each user, such as:
SELECT * FROM transactions
JOIN users ON users.user_id = transactions.user_id
Or, I could just trace back to the users table from the account table
SELECT * FROM transactions
JOIN accounts ON accounts.account_id = transactions.account_id
JOIN users ON users.user_id = accounts.user_id
I know the first query is much cleaner, but is that the best way to go?
My concern is that by having this extra (redundant) column in the transaction table, I'm wasting space, when I can achieve the same result without said column.
Let's look at it from a different angle. From where will the query or series of queries start? If you have customer info, you can get account info and then transaction info or just transactions-per-customer. You need all three tables for meaningful information. If you have account info, you can get transaction info and a pointer to customer. But to get any customer info, you need to go to the customer table so you still need all three tables. If you have transaction info, you could get account info but that is meaningless without customer info or you could get customer info without account info but transactions-per-customer is useless noise without account data.
Either way you slice it, the information you need for any conceivable use is split up between three tables and you will have to access all three to get meaningful information instead of just a data dump.
Having the customer FK in the transaction table may provide you with a way to make a "clean" query, but the result of that query is of doubtful usefulness. So you've really gained nothing. I've worked writing Anti-Money Laundering (AML) scanners for an international credit card company, so I'm not being hypothetical. You're always going to need all three tables anyway.
Btw, the fact that there are FKs in the first place tells me the question concerns an OLTP environment. An OLAP environment (data warehouse) doesn't need FKs or any other data integrity checks as warehouse data is static. The data originates from an OLTP environment where the data integrity checks have already been made. So there you can denormalize to your hearts content. So let's not be giving answers applicable to an OLAP environment to a question concerning an OLTP environment.
You should not use two foreign keys in the same table. This is not a good database design.
A user makes transactions through an account. That is how it is logically done; therefore, this is how the DB should be designed.
Using joins is how this should be done. You should not use the user_id key as it is already in the account table.
The wasted space is unnecessary and is a bad database design.
In my opinion, if you have simple Many-To-Many relation just use two primary keys, and that's all.
Otherwise, if you have Many-To-Many relation with extra columns use one primary key, and two foreign keys. It's easier to manage this table as single Entity, just like Doctrine do it. Generally speaking simple Many-To-Many relations are rare, and they are usefull just for linking two tables.
Denormalizing is usually a bad idea. In the first place it is often not faster from a performance standard. What it does is make the data integrity at risk and it can create massive problems if you end up changing from a 1-1 relationship to a 1-many.
For instance what is to say that each account will have only one user? In your table design that is all you would get which is something I find suspicious right off the bat. Accounts in my system can have thousands of users. SO that is the first place I question your model. Did you actually think interms of whether the realtionships woudl be 1-1 or 1-many? Or did you just make an asssumpltion? Datamodels are NOT easy to adjust after you have millions of records, you need to do far more planning for the future in database design and far more thinking about the data needs over time than you do in application design.
But suppose you have one-one relationship now. And three months after you go live you get a new account where they need to have 3 users. Now you have to rememeber all the places you denornmalized in order to properly fix the data. This can create much confusion as inevitably you will forget some of them.
Further even if you never will need to move to a more robust model, how are you going to maintain this if the user_id changes as they are going to do often. Now in order to keep the data integrity, you need to have a trigger to maintain the data as it changes. Worse, if the data can be changed from either table you could get conflicting changes. How do you handle those?
So you have created a maintenance mess and possibly risked your data intergrity all to write "cleaner" code and save yourself all of ten seconds writing a join? You gain nothing in terms of things that are important in database development such as performance or security or data integrity and you risk alot. How short-sighted is that?
You need to stop thinking in terms of "Cleaner code" when developiong for databases. Often the best code for a query is the most complex appearing as it is the most performant and that is critical for databases. Don't project object-oriented coding techniques into database developement, they are two very differnt things with very differnt needs. You need to start thinking in terms of how this will play out as the data changes which you clearly are not doing or you would not even consider doing such a thing. You need to think more of thr data meaning and less of the "Principles of software development" which are taught as if they apply to everything but in reality do not apply well to databases.
It depends. If you can get the data fast enough, used the normalized version (where user_id is NOT in the transaction table). If you are worried about performance, go ahead and include user_ID. It will use up more space in the database by storing redundant information, but you will be able to return the data faster.
EDIT
There are several factors to consider when deciding whether or not to denormalize a data structure. Each situation needs to be considered uniquely; no answer is sufficient without looking at the specific situation (hence the "It depends" that begins this answer). For the simple case above, denormalization would probably not be an optimal solution.

mysql table with 40+ columns

I have 40+ columns in my table and i have to add few more fields like, current city, hometown, school, work, uni, collage..
These user data wil be pulled for many matching users who are mutual friends (joining friend table with other user friend to see mutual friends) and who are not blocked and also who is not already friend with the user.
The above request is little complex, so i thought it would be good idea to put extra data in same user table to fast access, rather then adding more joins to the table, it will slow the query more down. but i wanted to get your suggestion on this
my friend told me to add the extra fields, which wont be searched on one field as serialized data.
ERD Diagram:
My current table: http://i.stack.imgur.com/KMwxb.png
If i join into more tables: http://i.stack.imgur.com/xhAxE.png
Some Suggestions
nothing wrong with this table and columns
follow this approach MySQL: Optimize table with lots of columns - which serialize extra fields into one field, which are not searchable's
create another table and put most of the data there. (this gets harder on joins, if i already have 3 or more tables to join to pull the records for users (ex. friends, user, check mutual friends)
As usual - it depends.
Firstly, there is a maximum number of columns MySQL can support, and you don't really want to get there.
Secondly, there is a performance impact when inserting or updating if you have lots of columns with an index (though I'm not sure if this matters on modern hardware).
Thirdly, large tables are often a dumping ground for all data that seems related to the core entity; this rapidly makes the design unclear. For instance, the design you present shows 3 different "status" type fields (status, is_admin, and fb_account_verified) - I suspect there's some business logic that should link those together (an admin must be a verified user, for instance), but your design doesn't support that.
This may or may not be a problem - it's more a conceptual, architecture/design question than a performance/will it work thing. However, in such cases, you may consider creating tables to reflect the related information about the account, even if it doesn't have a x-to-many relationship. So, you might create "user_profile", "user_credentials", "user_fb", "user_activity", all linked by user_id.
This makes it neater, and if you have to add more facebook-related fields, they won't dangle at the end of the table. It won't make your database faster or more scalable, though. The cost of the joins is likely to be negligible.
Whatever you do, option 2 - serializing "rarely used fields" into a single text field - is a terrible idea. You can't validate the data (so dates could be invalid, numbers might be text, not-nulls might be missing), and any use in a "where" clause becomes very slow.
A popular alternative is "Entity/Attribute/Value" or "Key/Value" stores. This solution has some benefits - you can store your data in a relational database even if your schema changes or is unknown at design time. However, they also have drawbacks: it's hard to validate the data at the database level (data type and nullability), it's hard to make meaningful links to other tables using foreign key relationships, and querying the data can become very complicated - imagine finding all records where the status is 1 and the facebook_id is null and the registration date is greater than yesterday.
Given that you appear to know the schema of your data, I'd say "key/value" is not a good choice.
I would advice to run some tests. Try it both ways and benchmark it. Nobody will be able to give you a definitive answer because you have not shared your hardware configuration, sample data, sample queries, how you plan on using the data etc. Here is some information that you may want to consider.
Use The Database as it was intended
A relational database is designed specifically to handle data. Use it as such. When written correctly, joining data in a well written schema will perform well. You can use EXPLAIN to optimize queries. You can log SLOW queries and improve their performance. Databases have been around for years, if putting everything into a single table improved performance, don't you think that would be all the buzz on the internet and everyone would be doing it?
Engine Types
How will inserts be affected as the row count grows? Are you using MyISAM or InnoDB? You will most likely want to use InnoDB so you get row level locking and not table. Make sure you are using the correct Engine type for your tables. Get the information you need to understand the pros and cons of both. The wrong engine type can kill performance.
Enhancing Performance using Partitions
Find ways to enhance performance. For example, as your datasets grow you could partition the data. Data partitioning will improve the performance of a large dataset by keeping slices of the data in separate partions allowing you to run queries on parts of large datasets instead of all of the information.
Use correct column types
Consider using UUID Primary Keys for portability and future growth. If you use proper column types, it will improve performance of your data.
Do not serialize data
Using serialized data is the worse way to go. When you use serialized fields, you are basically using the database as a file management system. It will save and retrieve the "file", but then your code will be responsible for unserializing, searching, sorting, etc. I just spent a year trying to unravel a mess like that. It's not what a database was intended to be used for. Anyone advising you to do that is not only giving you bad advice, they do not know what they are doing. There are very few circumstances where you would use serialized data in a database.
Conclusion
In the end, you have to make the final decision. Just make sure you are well informed and educated on the pros and cons of how you store data. The last piece of advice I would give is to find out what heavy users of mysql are doing. Do you think they store data in a single table? Or do they build a relational model and use it the way it was designed to be used?
When you say "I am going to put everything into a single table", you are saying that you know more about performance and can make better choices for optimization in your code than the team of developers that constantly work on MySQL to make it what it is today. Consider weighing your knowledge against the cumulative knowledge of the MySQL team and the DBAs, companies, and members of the database community who use it every day.
At a certain point you should look at the "short row model", also know as entity-key-value stores,as well as the traditional "long row model".
If you look at the schema used by WordPress you will see that there is a table wp_posts with 23 columns and a related table wp_post_meta with 4 columns (meta_id, post_id, meta_key, meta_value). The meta table is a "short row model" table that allows WordPress to have an infinite collection of attributes for a post.
Neither the "long row model" or the "short row model" is the best model, often the best choice is a combination of the two. As #nevillek pointed out searching and validating "short row" is not easy, fetching data can involve pivoting which is annoyingly difficult in MySql and Oracle.
The "long row model" is easier to validate, relate and fetch, but it can be very inflexible and inefficient when the data is sparse. Some rows may have only a few of the values non-null. Also you can't add new columns without modifying the schema, which could force a system outage, depending on your architecture.
I recently worked on a financial services system that had over 700 possible facts for each instrument, most had less than 20 facts. This could have been built by setting up dozens of tables, each for a particular asset class, or as a table with 700 columns, but we chose to use a combination of a table with about 20 columns containing the most popular facts and a 4 column table which contained the other facts. This design was efficient but was difficult ot access, so we built a few table functions in PL/SQL to assist with this.
I have a general comment for you,
Think about it: If you put anything more than 10-12 columns in a table even if it makes sense to put them in a table, I guess you are going to pay the price in the short term, long term and medium term.
Your 3 tables approach seems to be better than the 1 table approach, but consider making those into 5-6 tables rather than 3 tables because you still can.
Move currently, currently_position, currently_link from user-table and work from user-profile into a new table with your primary key called USERWORKPROFILE.
Move locale Information from user-profile to a newer USERPROFILELOCALE information because it is generic in nature.
And yes, all your generic attributes in all the tables should be int and not varchar.
For instance, City needs to move out to a new table called LIST_OF_CITIES with cityid.
And your attribute city should change from varchar to int and point to cityid in LIST_OF_CITIES.
Do not worry about performance issues; the more tables you have, better the performance, because you are actually handing out the performance to the database provider instead of taking it all in your own hands.

Best table structure for users with different roles

We are working on a website which will feature about 5 different user roles, each with different properties. In the current version of the database schema we have a single users table which holds all the users, and all of their properties.
The problem is that the properties that we need differ per user role. All users have the same basis properties, like a name, e-mail address and password. But on top of that the properties differ per role. Some have social media links, others have invoice addresses, etc. In total there may be up to 60 columns (properties), of which only a portion are used by each user role.
In total we may have about 250,000 users in the table, of which the biggest portion (about 220,000) will be of a single user role (and use about 20 of the 60 columns). The other 30,000 users are divided over four other rules and use a sub-set of the other 40 columns.
What is the best database structure for this, both from a DB as a development perspective? My idea is to have a base users table, and then extend on that with tables like users_ moderators, but this may lead to a lot of JOIN'ed queries. A way to prevent this is by using VIEWs, but I've read some (out-dated?) articles that VIEWs may hurt performance, like: http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker/.
Does the 'perfect' structure even exist? Any suggestion, or isn't this really a problem at all and should we just put all users in a single big tables?
There are two different ways to go about this. One is called "Single Table Inheritance". This is basically the design you ask for comments on. It's pretty fast because there are no joins. However NULLs can affect throughput to a small degree, because fat rows take a little longer to bring into memory than thinner rows.
An alternative design is called "Class Table Inheritance". In this design, there is one table for the super class and one table for each subclass. Non key attributes go into the table where they pertain. Often, a design called "Shared Primary Key" can be used with this design. In shared primary key, the key ids of the subclass tables are copies of the id from the corresponding row in the superclass table.
It's a little work at insert time, but it pays for itself when you go to join data.
You should look up all three of these in SO (they have their own tags) or out on the web. You'll get more details on the design, and an indication of how well each design fits your case.
'Perfect' structure for such cases, in my opinion, is party-role-relationship model. Search for Len Silverston's books about data models. It looks quite complicated at the beginning, but it gives great flexibility...
The biggest question is practicability of adopting perfect solution. Nobody except you can answer that. Refactoring is never an easy and fast task, so say if your project lifetime is 1 year, spending 9 month paying out 'technical debts' sounds more like wasting of time/efforts/etc.
As for performance of joins, having proper indexes usually solves potential issues. If not, you can always implement materialized view ; even though mysql doesn't have such option out of the box, you can design it yourself and refresh it in different ways(for instance, using triggers or launch refresh procedure periodically/on demand).
table user
table roles
table permissions
table userRole
table userPermission
table RolesPermissions
Each role have is permissions in role permissions table
Each user can have a permission whitout the role (extention...)
So in PHP you just have to merge arrays of user permissions in user roles and extended permissions...
And in your "acl" class you check if your user have the permission to view or process a webpage or a system process...
I think you don't need to worry about speed here so much.
Because it will be one time thing only. i.e. on user login store acl in session and get it next time from there.
JOINs are not so bad. If you have your indexes and foreign keys in right places with InnoDB engine it will be really fast.
I would use one table for users and role_id. Second table with roles. Third table for resources, and one to link it all together + enabled flag.

(Somewhat) complicated database structure vs. simple — with null fields

I'm currently choosing between two different database designs. One complicated which separates data better then the more simple one. The more complicated design will require more complex queries, while the simpler one will have a couple of null fields.
Consider the examples below:
Complicated:
Simpler:
The above examples are for separating regular users and Facebook users (they will access the same data, eventually, but login differently). On the first example, the data is clearly separated. The second example is way simplier, but will have at least one null field per row. facebookUserId will be null if it's a normal user, while username and password will be null if it's a Facebook-user.
My question is: what's prefered? Pros/cons? Which one is easiest to maintain over time?
First, what Kirk said. It's a good summary of the likely consequences of each alternative design. Second, it's worth knowing what others have done with the same problem.
The case you outline is known in ER modeling circles as "ER specialization". ER specialization is just different wording for the concept of subclasses. The diagrams you present are two different ways of implementing subclasses in SQL tables. The first goes under the name "Class Table Inheritance". The second goes under the name "Single Table Inheritance".
If you do go with Class table inheritance, you will want to apply yet another technique, that goes under the name "shared primary key". In this technique, the id fields of facebookusers and normalusers will be copies of the id field from users. This has several advantages. It enforces the one-to-one nature of the relationship. It saves an extra foreign key in the subclass tables. It automatically provides the index needed to make the joins run faster. And it allows a simple easy join to put specialized data and generalized data together.
You can look up "ER specialization", "single-table-inheritance", "class-table-inheritance", and "shared-primary-key" as tags here in SO. Or you can search for the same topics out on the web. The first thing you will learn is what Kirk has summarized so well. Beyond that, you'll learn how to use each of the techniques.
Great question.
This applies to any abstraction you might choose to implement, whether in code or database. Would you write a separate class for the Facebook user and the 'normal' user, or would you handle the two cases in a single class?
The first option is the more complicated. Why is it complicated? Because it's more extensible. You could easily include additional authentication methods (a table for Twitter IDs, for example), or extend the Facebook table to include... some other facebook specific information. You have extracted the information specific to each authentication method into its own table, allowing each to stand alone. This is great!
The trade off is that it will take more effort to query, it will take more effort to select and insert, and it's likely to be messier. You don't want a dozen tables for a dozen different authentication methods. And you don't really want two tables for two authentication methods unless you're getting some benefit from it. Are you going to need this flexibility? Authentication methods are all similar - they'll have a username and password. This abstraction lets you store more method-specific information, but does that information exist?
Second option is just the reverse the first. Easier, but how will you handle future authentication methods and what if you need to add some authentication method specific information?
Personally I'd try to evaluate how important this authentication component is to the system. Remember YAGNI - you aren't gonna need it - and don't overdesign. Unless you need that extensibility that the first option provides, go with the second. You can always extract it at a later date if necessary.
This depends on the database you are using. For example Postgres has table inheritance that would be great for your example, have a look here:
http://www.postgresql.org/docs/9.1/static/tutorial-inheritance.html
Now if you do not have table inheritance you could still create views to simplify your queries, so the "complicated" example is a viable choice here.
Now if you have infinite time than I would go for the first one (for this one simple example and prefered with table inheritance).
However, this is making things more complicated and so will cost you more time to implement and maintain. If you have many table hierarchies like this it can also have a performance impact (as you have to join many tables). I once developed a database schema that made excessive use of such hierarchies (conceptually). We finally decided to keep the hierarchies conceptually but flatten the hierarchies in the implementation as it had gotten so complex that is was not maintainable anymore.
When you flatten the hierarchy you might consider not using null values, as this can also prove to make things a lot harder (alternatively you can use a -1 or something).
Hope these thoughts help you!
Warning bells are ringing loudly with the presence of two the very similar tables facebookusers and normalusers. What if you get a 3rd type? Or a 10th? This is insane,
There should be one user table with an attribute column to show the type of user. A user is a user.
Keep the data model as simple as you possibly can. Don't build it too much kung fu via data structure. Leave that for the application, which is far easier to alter than altering a database!
Let me dare suggest a third. You could introduce 1 (or 2) tables that will cater for extensibility. I personally try to avoid designs that will introduce (read: pollute) an entity model with non-uniformly applicable columns. Have the third table (after the fashion of the EAV model) contain a many-to-one relationship with your users table to cater for multiple/variable user related field.
I'm not sure what your current/short term needs are, but re-engineering your app to cater for maybe, twitter or linkedIn users might be painful. If you can abstract the content of the facebookUserId column into an attribute table like so
user_attr{
id PK
user_id FK
login_id
}
Now, the above definition is ambiguous enough to handle your current needs. If done right, the EAV should look more like this :
user_attr{
id PK
user_id FK
login_id
login_id_type FK
login_id_status //simple boolean flag to set the validity of a given login
}
Where login_id_type will be a foreign key to an attribute table listing the various login types you currently support. This gives you and your users flexibility in that your users can have multiple logins using different external services without you having to change much of your existing system

What is the most efficient method of keeping track of each user's "blocked users" in a MySQL Database?

What is the most efficient method of managing blocked users for each user so they don't appear in search results on a PHP/MySQL-run site?
This is the way I am currently doing it and I have a feeling this is not the most efficient way:
Create a BLOB for each user on their main user table that gets updated with the unique User ID's of each user they block. So if User ID's 313, 563, and 732 are blocked by a user, their BLOB simply contains "313,563,732". Then, whenever a search result is queried for that user, I include the BLOB contents like so "AND UserID NOT IN (313,563,732)" so that the blocked User ID's don't show up for that user. When a user "unblocks" someone, I remove that User ID from their BLOB.
Is there a better way of doing this (I'm sure there is!)? If so, why is it better and what are the pros and cons of your suggestion?
Thanks, I appreciate it!
You are saving relationships in a relational database in a way that it does not understand. You will not have the benefit of foreign keys etc.
My recommended way to do this would be to have a seperate table for the blocked users:
create table user_blocked_users (user_id int, blocked_user_id);
Then when you want to filter the search result, you can simply do it with a subquery:
select * from user u where ?searcherId not in (select b.blocked_user_id from user_blocked_users where b.user_id = u.id)
You may want to start out that way, and then optimize it with queries, caches or other things if neccessary - but do it last. First, do a consistent and correct data model that you can work with.
Some of the pros of this approach:
You will have a correct data model
of your block relations
With foreign keys, you will keep your data model consistent
The cons of this approach:
In your case, none that I can see
The cons of your approach:
It will be slow and not scalable, as blobs are searched binarily and not indexed
Your data model will be hard to maintain and you will not have the benefit of foreign keys
You are looking for a cross reference table.
You have a table containing user IDs and "Blocked" user IDs, then you SELECT blockid FROM blocked WHERE uid=$user and you have a list of user ids that are blocked, which you can filter through a where clause such as WHERE uid NOT IN(SELECT blockid FROM blocked WHERE uid=$user)
Now you can block multiple users per user, and the other way round, with all the speed of an actual database.
You are looking for a second table joined in a many-to-many relationship. Check this post:
Many-to-Many Relationships in MySQL
The "Pros" are numerous. You are handling your data with referential integrity, which has incalculable benefits down the road. The issue you described will be followed by others in your application, and some of those others will be more unmanageable than this one.
The "Cons" are that
You will have have to learn how referential data works (but that's ahead anyway, as I say)
You will have more tables to deal with (ditto)
You will have to learn more about CRUD, which is difficult ... but, just part of the package.
What you are currently using is not regarded as a good practice for relational database design, however, like with anything else, there are cases when that approach can be justified, albeit restrictive in terms of what you can accomplish.
What you could do is, like J V suggested, create a cross reference table that contains mappings of user relationships. This allows you to, among other things, skip unnecessary queries, make use of table indexes and possibly most importantly, it gives you far greater flexibility in the future.
For instance, you can add a field to the table that indicates the type/status of the relationship (ie. blocked, friend, pending approval etc.) which would allow a much more complex system to be developed easily.