Defining a two-way link - mysql

I have a users table, and I want to define a "friends" relationship between two arbitrary users.
Up until now, I've used two different methods for this:
The friends table contains user1 and user2. Searching for users involves a query that looks like
... WHERE #userid IN (`user1`,`user2`), which is not terribly efficient
The friends table contains from and to fields. Initiating a friend request creates a row in that direction, and if it accepted then a second row is inserted with the opposite direction. There is additionally a status column that indicates that this has happened, making the search something like:
... WHERE `user1`=#userid AND `status`=1
I'm not particularly satisfied with either of these solutions. The first one feels messy with that IN usage, and the second seems bloated having two rows to define a single link.
So that's why I'm here. What would you suggest for such a link? Note that I don't need any more information saved with it, I just need two user IDs associated with each other, and preferably some kind of status like ENUM('pending','accepted','blocked'), but that's optional depending on what the best design for this is.

There are in general two approaches:
Store each friend pair once, storing the friend with the least id first.
CREATE TABLE
friend
(
l INT NOT NULL,
g INT NOT NULL,
PRIMARY KEY
(l, g),
KEY (g)
)
Store each friend pair twice, both ways:
CREATE TABLE
(
user INT NOT NULL,
friend INT NOT NULL,
PRIMARY KEY
(user, friend)
)
To store additional fields like friendship status, acceptance dates etc. you usually utilize a second table, for reasons I'll describe below.
To retrieve a list of friends for each user, you do:
SELECT CASE #myuserid WHEN l THEN g ELSE l END
FROM friend
WHERE l = #myuserid
OR
g = #myuserid
or
SELECT g
FROM friend
WHERE l = #myuserid
UNION
SELECT l
FROM friend
WHERE g = #myuserid
for the first solution; and
SELECT friend
FROM friend
WHERE user = #friend
To check if two users are friends, you issue this:
SELECT NULL
FROM friend
WHERE (l, g) =
(
CASE WHEN #user1 < #user2 THEN #user1 ELSE #user2 END,
CASE WHEN #user1 > #user2 THEN #user1 ELSE #user2 END
)
or
SELECT NULL
FROM friend
WHERE (user, friend) = (#user1, #user2)
Storage-wise, the two solutions are almost the same. The first (least/greatest) solution stores twice as few rows, however, for it to work fast you should have a secondary index on g, which, in fact, has to store g plus the part of the table's primary key which is not in the secondary index (that is, l). Thus, each record is effectively store twice: once in the table itself, once again in the index on g.
Performance-wise, the solutions are almost the same too. The first one, though, requires two index seeks followed by index scans (for "all friends"), the second one just one index seek, so for the L/G solution I/O amount might be slighly more. This might be mitigated a little by the fact that the one single index may become one level deeper than two independent ones, so the initial search may take one page read more. This may slow down "are they friends" query a little for the "both pairs" solution, compared to L/G.
As for the additional table for extra data, you most probably want it because it's usually much less used than the two query I described above (and usually only for history purposes).
Its layout also depends on the kind of queries you are using. Say, if you want "show my last ten friendships", then you may want to store the timestamp in "both pairs" so that you don't have to do filesorts, etc.

Consider the following schema:
CREATE TABLE `users` (
`uid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(30) NOT NULL,
PRIMARY KEY (`uid`)
);
INSERT INTO `users` (`uid`, `username`) VALUES
(1, 'h2ooooooo'),
(2, 'water'),
(3, 'liquid'),
(4, 'wet');
CREATE TABLE `friends` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`uid_from` int(10) unsigned NOT NULL,
`uid_to` int(10) unsigned NOT NULL,
`status` enum('pending','accepted','blocked') NOT NULL,
PRIMARY KEY (`id`),
KEY `uid_from` (`uid_from`),
KEY `uid_to` (`uid_to`)
);
INSERT INTO `friends` (`id`, `uid_from`, `uid_to`, `status`) VALUES
(1, 1, 3, 'accepted'), -- h2ooooooo sent a friend request to liquid - accepted
(2, 1, 2, 'pending'), -- h2ooooooo sent a friend request to water - pending
(3, 4, 1, 'pending'), -- wet sent a friend request to h2ooooooo - pending
(4, 4, 2, 'pending'), -- wet sent a friend request to water - pending
(5, 3, 4, 'accepted'); -- liquid sent a friend request to wet - accepted
I'd use something like the following:
SELECT
fu.username as `friend_username`,
fu.uid as `friend_uid`
FROM
`users` as `us`
LEFT JOIN
`friends` as `fr`
ON
(fr.uid_from = us.uid OR fr.uid_to = us.uid)
LEFT JOIN
`users` as `fu`
ON
(fu.uid = fr.uid_from OR fu.uid = fr.uid_to)
WHERE
fu.uid != us.uid
AND
fr.status = 'accepted'
AND
us.username = 'liquid'
Result:
friend_username | friend_uid
----------------|-----------
h2ooooooo | 1
wet | 4
Here us would be the user you want to query for friends, and fu would be the users friends. You could easily change the WHERE statement to select the user in whatever whay you want. The status could be changed to pending (and should only join on uid_to) if you want to find friends request that the users hasn't answered.
DEMO ON SQLFIDDLE
The EXPLAIN if we use us.uid to match the user (as it's indexed):

Performance considerations aside, another option might be a "friend" table in which one row represents a friend (does not matter which way around), together with a view which produces two result rows (one in each direction) for any friend row. In use, it would simplify queries because it could be used in the same way as the "two row" solution while only requiring one data row per "friendship".
The only drawback could be performance... depending on how the query optimizer works.

I tried to be creative, here are some results.
Easier drawn than said,
A simple trigger on table friends would do a nice service, ordering (user1,user2) without forgeting who requested friendship.
CREATE TRIGGER `friends_insert` BEFORE INSERT ON friends
FOR EACH ROW BEGIN
DECLARE X INT UNSIGNED;
IF NEW.user1 > NEW.user2 THEN
SET X = NEW.user1;
SET NEW.user1 = NEW.user2;
SET NEW.user2 = X;
SET NEW.invited_by = 1;
END IF;
END$$
Finally, let's say a user U has id = x. We can say U divides table users in two parts: users with id < x and ones with id > x. Before inserting a tuple into table friends, we order its ids, and so a certain information won't be explicitly written twice.
We obtain friends of our user U (id = x) through union of U's friends with id < x and ones with id > x:
SELECT user1 AS `friend_id` FROM friends
WHERE user1<#id AND user2=#id
UNION
SELECT user2 AS `friend_id` FROM friends
WHERE user2>#id AND user1=#id;
The main goal here is query performance. Dividing in these two cases would help MySQL to use the right index for each situation.
[ Time for questions & disagreement. Perhaps you want the complete SQL; it's shown here ]

You could try something like this SQLFiddle: http://sqlfiddle.com/#!2/219dae/3/0
Here is the code:
The SCHEMA:
-- This is the users table:
CREATE TABLE users
(
u_id int auto_increment,
username varchar(20),
PRIMARY KEY (u_id)
);
INSERT INTO users (username)
VALUES ('user1'),
('user2'),
('user3'),
('user4'),
('user5');
-- This is the friends table:
CREATE TABLE friends
(
f_id int auto_increment,
r_name varchar(20), -- the name of the user that requests for friendship
a_name varchar(20), -- the name of the user that answers the friendship request
status varchar(20), -- the status of the request
PRIMARY KEY (f_id)
);
-- below, user1 sends frind requests to user2, user3, user4 and user5; and receives one from user2:
INSERT INTO friends (r_name, a_name, status)
VALUES ('user1','user2', 'pending');
INSERT INTO friends (r_name, a_name, status)
VALUES ('user1','user3', 'pending');
INSERT INTO friends (r_name, a_name, status)
VALUES ('user1','user4', 'pending');
INSERT INTO friends (r_name, a_name, status)
VALUES ('user1','user5', 'pending');
INSERT INTO friends (r_name, a_name, status)
VALUES ('user2','user1', 'pending');
-- user1 accepts user2 request to be his friend:
UPDATE friends
SET status='accepted'
WHERE a_name='user1' AND r_name='user2';
-- user3 accepts user1 request to be his friend:
UPDATE friends
SET status='accepted'
WHERE a_name='user3' AND r_name='user1';
and the SELECT:
-- here we select all friend requests that the user1 received and all friend requests that he made
SELECT r_name, a_name, status FROM users
INNER JOIN friends ON users.username=friends.a_name
WHERE username='user1'
UNION
SELECT r_name, a_name, status FROM users
INNER JOIN friends ON users.username=friends.r_name
WHERE username='user1'

Related

complex SQL query - one table

I am new to SQL.
I was wondering if there is a way to form a complex (I think) query of a certain form, regarding a single table - or a simple query for the same effect.
Let's say I have a table of voice actor candidates, with different attributes (columns) - name and characteristics.
Let's say I have two different actor evaluators (Stewie and Griffin), and all the candidates were evaluated by minimum one of them (one, or both). The evaluators evaluate the actors, and the table is built.
The rows in the table are per-evaluation, not per-person, meaning that some candidates have two separate rows, one from each evaluation.
The evaluator's name is also an attribute, a column.
Can I make a query that will choose all candidates that were evaluated by both evaluators? (and let's say show all these rows, an even number then)
(There is no attribute "evaluated by both" - that's the core)
I think it should find all rows with evaluator Stewie, then search the entire table for rows with the corresponding candidates' names, and get those with evaluator Griffin.
Summary
A table with people - names and characteristics. One or two rows per person. Each row was filled according to a different observer. There is an attribute "Is Nice". How to find all people that were observed by two observers, one marked "Yes" and one "No" under "Is Nice"?
Update
It will take me some time to check all the answers (as not enough experience yet), and I will update what worked for me.
Can I make a query that will choose all candidates that were evaluated
by both evaluators?
(and let's say show all these rows, an even number then)
There are multiple ways to do this. You can check the existence of other evaluator's evaluation, using EXISTS:
SELECT * FROM Candidate AS C1 WHERE EXISTS (SELECT * FROM Candidate AS C2 WHERE C1.id = C2.id AND C1.evaluator != C2.evaluator)
Or, you could join the table to itself: (The checks for evaluators should be changed as appropriate)
SELECT C1.candidateName FROM Candidate AS C1 JOIN Candidate AS C2 USING (id) WHERE C1.evaluator = Stewie AND C2.evaluator = Griffin
How to find all people that were observed by two observers, one marked
"Yes" and one "No" under "Is Nice"?
For this one, you add another condition to the queries above, that checks if one evaluation was "Yes" and the other one was "No".
You seem to want group by and having. SInce a person cannot have more than two rows, and there are only two distinct possible values for isnice (yes or no), we can phrase the query as:
select name
from people
group by name
having max(isnice) <> min(isnice)
This filter names that have (at least) two different values in isnice. Starting from the above assumptions, this is sufficient to ensure that that person was evaluated more than once, and that isnice has (at least) two different values.
So, I read the problem very carefully, and came up with my own solution.
Please verify the code below if this is what you were really asking for?
--Create Candidates Table
CREATE TABLE tbl_candidates
(
c_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
c_name VARCHAR(30),
)
--Create Evaluators Table
CREATE TABLE tbl_evaluators
(
e_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
e_name VARCHAR(30),
)
--Create Evaluations Table
CREATE TABLE tbl_evaluations
(
ee_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
ee_title VARCHAR(30) NOT NULL,
ee_remarks VARCHAR(30) NOT NULL,
ee_date date NOT NULL,
c_id INT FOREIGN KEY (c_id) REFERENCES tbl_candidates(c_id) NOT NULL,
e_id1 INT FOREIGN KEY (e_id1) REFERENCES tbl_evaluators(e_id) NOT NULL,
e_id2 INT FOREIGN KEY (e_id2) REFERENCES tbl_evaluators(e_id),
IsNice VARCHAR(4)
)
--Populate data & check to verify
INSERT INTO tbl_candidates (c_name) VALUES ('Sam') , ('Smith')
SELECT * FROM tbl_candidates
INSERT INTO tbl_evaluators (e_name) VALUES ('Stewie'),('Griffin')
SELECT * FROM tbl_evaluators
INSERT INTO tbl_evaluations
(ee_title,ee_remarks,ee_date,c_id,e_id1,e_id2,IsNice)
VALUES
('Some Title','Some Comment','2020-6-12',1,1,NULL,'No'),
('Some Title','Some Comment','2020-6-12',2,1,2,'Yes'),
('Some Title','Some Comment','2020-6-12',3,2,NULL,'No')
--finally comparing whether we have the matching data of our input vs tables combined data display
select * from tbl_evaluations
select ee_id,ee_title,c_name,ee_remarks,e1.e_name,e2.e_name,ee_date,IsNice from tbl_evaluations ee
left join tbl_candidates c on c.c_id = ee.c_id left join tbl_evaluators e1 on e1.e_id = ee.e_id1 left join tbl_evaluators e2 on e2.e_id = ee.e_id2
See the result proof :
This is surely not the best way to write it, but my first thought is
SELECT * FROM evaluations
WHERE PrName IN (
SELECT PrName
FROM evaluations
WHERE IsNice ='No')
AND PrName IN (
SELECT PrName
FROM evaluations
WHERE IsNice ='Yes')

How can I get all business data AS WELL as if current user is following them?

In mysql how can I write a query that will fetch ALL business data, and at the same time (or not if it is better another way) check if user is following that business? I have the following relationship table to determine if a user is following a business (status=1 would mean that person is following):
CREATE TABLE IF NOT EXISTS `Relationship_User_Follows_Business` (
`user_id` int(10) unsigned NOT NULL,
`business_id` int(10) unsigned NOT NULL,
`status` tinyint(3) unsigned NOT NULL DEFAULT '0' COMMENT '1=following, 0=not following'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE `Relationship_User_Follows_Business`
ADD UNIQUE KEY `unique_user_business_id` (`user_id`,`business_id`);
Assume business table just holds data on different businesses like name, phone number, etc. I would want to return all of the business data in my query (Business.*). I want to append the status (0 or 1) to the end of each business row to determine if the user is following that business. I have tried the following query but it does not work because it is narrowing the results to only show a business if there is a relationship row. I wish to show ALL businesses regardless if a relationship row exists or not because I only create the relationship row if a user follows:
SELECT Business.*, Relationship_User_Follows_Business.status FROM Business, Relationship_User_Follows_Business WHERE 104=Relationship_User_Follows_Business.user_id AND Business.id=Relationship_User_Follows_Business.business_id
Note that I am using 104 as a test user id. The user id would normally be dependent on user, not a static 104.
You are looking for a LEFT JOIN and not an INNER JOIN which keeps all the records from the master table and all the matching rows from the details table . Also, avoid using implicit join syntax(comma separated) and use the proper syntax of a join :
SELECT Business.*, Relationship_User_Follows_Business.status
FROM Business
LEFT JOIN Relationship_User_Follows_Business
ON Business.id = Relationship_User_Follows_Business.business_id
AND Relationship_User_Follows_Business.user_id = 104

How to select posts created by me or my friends in a news feed?

I've been trying to do this for quite a while now and then I figured that it is the SQL query that I cannot get it working properly in the first place.
I am trying to create a simple news feed page similar to Facebook where you can see the post you made on your profile as well as your friends profile and also you can see posts which your friends made on your profile as well as their own profile.
SELECT
friendsTbl.profile_one,
friendsTbl.profile_two,
user_profile.u_id as my_uid,
user_profile.f_name as my_fname,
user_profile.l_name as my_lname,
friend_profile.u_id as friend_uid,
friend_profile.f_name as friend_fname,
friend_profile.l_name as friend_lname,
profilePostTbl.post as post,
profilePostTbl.post_to as posted_profile,
profilePostTbl.post_by as posted_by
FROM friendsTbl
LEFT JOIN profileTbl AS user_profile ON user_profile.profile_name = IF(friendsTbl.profile_one = 'john123', friendsTbl.profile_one, friendsTbl.profile_two)
LEFT JOIN profileTbl AS friend_profile ON friend_profile.profile_name = IF(friendsTbl.profile_one = 'john123', friendsTbl.profile_two, friendsTbl.profile_one)
LEFT JOIN profilePostTbl ON (post_by = IF(profilePostTbl.post_to = friendsTbl.profile_one,profilePostTbl.post_to, profilePostTbl.post_by))
WHERE friendsTbl.profile_one = 'john123' OR friendsTbl.profile_two = 'john123'
Here's a fiddle: http://sqlfiddle.com/#!2/a10f39/1
For this example, john123 is the user that is currently logged in and is friends with hassey, smith and joee and therefore only those posts must show in the news feed which john123 posted on his own or his friend's post and the ones that his friends posted on their own profile as well as on john123's profile.
This question is a follow-up to PHP Sub-query to select all user's profile who are friends with me in Friends table.
I know you've already accepted an answer, but I was half-way writing this so I decided to post it anyway.
I'm going to go a little bit back before hopefully answering your question. When developing applications and constructing databases, you should ALWAYS try to structure things as descriptive and compact as possible. It would be really awkward to have a variable/column named color and store encrypted user passwords there (weird, right?). There are some standard database naming conventions which, when followed, make life a lot easier specially when developing complicated applications. I would advice you to read some blogs regarding the naming conventions. A good starting point may be this one.
I fully realize that with the suggested changes below you might need to partially/fully rewrite the application code you've written so far, but it's up to you if you really want things working better.
Let's begin by fixing the database structure. By the looks of it, you're doing an application similar to facebook's newsfeed. In this case, using FOREIGN KEYS is pretty much mandatory so you could guarantee some data consistency. The example database schema below shows how you can achieve that.
-- Application users are stored here.
CREATE TABLE users (
user_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255),
profile_name VARCHAR(255)
) ENGINE=InnoDb;
-- User friendship relations go here
CREATE TABLE friends (
friend_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
profile_one INT NOT NULL,
profile_two INT NOT NULL,
FOREIGN KEY (profile_one) REFERENCES users (user_id),
FOREIGN KEY (profile_two) REFERENCES users (user_id)
) ENGINE=InnoDb;
-- User status updates go here
-- This is what will be displayed on the "newsfeed"
CREATE TABLE statuses (
status_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
author_id INT NOT NULL,
recipient_id INT NOT NULL,
message TEXT,
-- created date ?
-- last updated date ?
FOREIGN KEY (author_id) REFERENCES users (user_id),
FOREIGN KEY (recipient_id) REFERENCES users (user_id)
) ENGINE=InnoDb;
-- Replies to user statuses go here. (facebook style..)
-- This will be displayed as the response of a user to a certain status
-- regardless of the status's author.
CREATE TABLE replies (
reply_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
status_id INT NOT NULL,
author_id INT NOT NULL,
message TEXT,
FOREIGN KEY (status_id) REFERENCES statuses (status_id),
FOREIGN KEY (author_id) REFERENCES users (user_id)
) ENGINE=InnoDb;
Now that this is fixed, we could proceed with the next step - selecting the newsfeed for john123 (who has user_id=1). This can be achieved with the query below:
SET #search_id:=1; -- this variable contains the currently logged in user_id so that we don't need to replace the value more than once in the whole query.
SELECT
statuses.*,
author.first_name AS author_first_name,
author.last_name AS author_last_name,
recipient.first_name AS recipient_first_name,
recipient.last_name AS recipient_last_name
FROM statuses
JOIN users AS author ON author.user_id = statuses.author_id
JOIN users AS recipient ON recipient.user_id = statuses.recipient_id
WHERE (statuses.author_id = #search_id OR statuses.recipient_id = #search_id)
ORDER BY status_id ASC
And here you could see it in action in an sqlfiddle. As you can see, just by structuring the database better, I've eliminated the need of a sub-query (which is what EXISTS / NOT EXISTS do according to the docs and EXPLAIN). Furthermore the above SQL code would be a lot easier to maintain and extend.
Anyway, I hope you find this useful.
I think you need to start with your posts and use EXISTS to limit result to the relevant ones:
SELECT * FROM
profilePostTbl post
WHERE
-- This post is by a friend of current user or his own post.
EXISTS (SELECT * FROM friendsTbl WHERE
post.post_by IN (profile_one, profile_two)
AND 'john123' IN (profile_one, profile_two))
-- This post is addressing the current user
OR post_to = 'john123';
If you want to render names of post authors and addressees, just join them to the post:
SELECT post.*,
post_by.u_id as by_uid,
post_by.f_name as by_fname,
post_by.l_name as by_lname,
post_to.u_id as to_uid,
post_to.f_name as to_fname,
post_to.l_name as to_lname
FROM
profilePostTbl post INNER JOIN
profileTbl post_by ON post.post_by = post_by.profile_name INNER JOIN
profileTbl post_to ON post.post_to = post_to.profile_name
WHERE
-- This post is by a friend of current user or his own post.
EXISTS (SELECT * FROM friendsTbl WHERE
post.post_by IN (profile_one, profile_two)
AND 'john123' IN (profile_one, profile_two))
-- This post is addressing the current user
OR post_to = 'john123';

How to optimise change history data for MySQL

The previous table this data was stored in approached 3-4gb, but the data wasn't compressed before/after storage. I'm not a DBA so I'm a little out of my depth with a good strategy.
The table is to log changes to a particular model in my application (user profiles), but with one tricky requirement: we should be able to fetch the state of a profile at any given date.
Data (single table):
id, username, email, first_name, last_name, website, avatar_url, address, city, zip, phone
The only two requirements:
be able to fetch a list of changes for a given model
be able to fetch state of model on a given date
Previously, all of the profile data was stored for a single change, even if only one column was changed. But to get a 'snapshot' for a particular date was easy enough.
My first couple of solutions in optimising the data structure:
(1) only store changed columns. This would drastically reduce data stored, but would make it quite complicated to get a snapshot of data. I'd have to merge all changes up to a given date (could be thousands), then apply that to a model. But that model couldn't be a fresh model (only changed data is stored). To do this, I'd have to first copy over all data from current profiles table, then to get snapshot apply changes to those base models.
(2) store whole of data, but convert to a compressed format like gzip or binary or whatnot. This would remove ability to query the data other than to obtain changes. I couldn't, for example, fetch all changes where email = ''. I would essentially have a single column with converted data, storing the whole of the profile.
Then, I would want to use relevant MySQL table options, like ARCHIVE to further reduce space.
So my question is, are there any other options which you feel are a better approach than 1/2 above, and, if not, which would be better?
First of all, I wouldn't worry at all about a 3GB table (unless it grew to this size in a very short period of time). MySQL can take it. Space shouldn't be a concern, keep in mind that a 500 GB hard disk costs about 4 man-hours (in my country).
That being said, in order to lower your storage requirements, create one table for each field of the table you want to monitor. Assuming a profile table like this:
CREATE TABLE profile (
profile_id INT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(50) -- and so on
);
... create two history tables:
CREATE TABLE profile_history_username (
profile_id INT NOT NULL,
username VARCHAR(50) NOT NULL, -- same type as profile.username
changedAt DATETIME NOT NULL,
PRIMARY KEY (profile_id, changedAt),
CONSTRAINT profile_id_username_fk
FOREIGN KEY profile_id_fkx (profile_id)
REFERENCES profile(profile_id)
);
CREATE TABLE profile_history_email (
profile_id INT NOT NULL,
email VARCHAR(50) NOT NULL, -- same type as profile.email
changedAt DATETIME NOT NULL,
PRIMARY KEY (profile_id, changedAt),
CONSTRAINT profile_id_fk
FOREIGN KEY profile_id_email_fkx (profile_id)
REFERENCES profile(profile_id)
);
Everytime you change one or more fields in profile, log the change in each relevant history table:
START TRANSACTION;
-- lock all tables
SELECT #now := NOW()
FROM profile
JOIN profile_history_email USING (profile_id)
WHERE profile_id = [a profile_id]
FOR UPDATE;
-- update main table, log change
UPDATE profile SET email = [new email] WHERE profile_id = [a profile_id];
INSERT INTO profile_history_email VALUES ([a profile_id], [new email], #now);
COMMIT;
You may also want to set appropriate AFTER triggers on profile so as to populate the history tables automatically.
Retrieving history information should be straightforward. In order to get the state of a profile at a given point in time, use this query:
SELECT
(
SELECT username FROM profile_history_username
WHERE profile_id = [a profile_id] AND changedAt = (
SELECT MAX(changedAt) FROM profile_history_username
WHERE profile_id = [a profile_id] AND changedAt <= [snapshot date]
)
) AS username,
(
SELECT email FROM profile_history_email
WHERE profile_id = [a profile_id] AND changedAt = (
SELECT MAX(changedAt) FROM profile_history_email
WHERE profile_id = [a profile_id] AND changedAt <= [snapshot date]
)
) AS email;
You can't compress the data without having to uncompress it in order to search it - which is going to severely damage the performance. If the data really is changing that often (i.e. more than an average of 20 times per record) then it would be more efficient to for storage and retrieval to structure it as a series of changes:
Consider:
CREATE TABLE profile (
id INT NOT NULL autoincrement,
PRIMARY KEY (id);
);
CREATE TABLE profile_data (
profile_id INT NOT NULL,
attr ENUM('username', 'email', 'first_name'
, 'last_name', 'website', 'avatar_url'
, 'address', 'city', 'zip', 'phone') NOT NULL,
value CARCHAR(255),
starttime DATETIME DEFAULT CURRENT_TIME,
endtime DATETIME,
PRIMARY KEY (profile_id, attr, starttime)
INDEX(profile_id),
FOREIGN KEY (profile_id) REFERENCES profile(id)
);
When you add a new value for an existing record, set an endtime in the masked record.
Then to get the value at a date $T:
SELECT p.id, attr, value
FROM profile p
INNER JOIN profile_date d
ON p.id=d.profile_id
WHERE $T>=starttime
AND $T<=IF(endtime IS NULL,$T, endtime);
Alternately just have a start time, and:
SELECT p.id, attr, value
FROM profile p
INNER JOIN profile_date d
ON p.id=d.profile_id
WHERE $T>=starttime
AND NOT EXISTS (SELECT 1
FROM prodile_data d2
WHERE d2.profile_id=d.profile_id
AND d2.attr=d.attr
AND d2.starttime>d.starttime
AND d2.starttime>$T);
(which will be even faster with the MAX concat trick).
But if the data is not changing with that frequency then keep it in the current structure.
You need a slow changing dimension:
i will do this only for e-mail and telephone so you understand (pay attention to the fact of i use two keys, 1 as unique in the table, and another that is unique to the user that it concerns. This is, the table key identifies the the record, and the user key identifies the user):
table_id, user_id, email, telephone, created_at,inactive_at,is_current
1, 1, mario#yahoo.it, 123456, 2012-01-02, , 2013-04-01, no
2, 2, erik#telecom.de, 123457, 2012-01-03, 2013-02-28, no
3, 3, vanessa#o2.de, 1234568, 2012-01-03, null, yes
4, 2, erik#telecom.de, 123459, 2012-02-28, null, yes
5, 1, super.mario#yahoo.it, 654321,2013-04-01, 2013-04-02, no
6, 1, super.mario#yahoo.it, 123456,2013-04-02, null, yes
most recent state of the database
select * from FooTable where inactive_at is null
or
select * from FooTable where is_current = 'yes'
All changes to mario (mario is user_id 1)
select * from FooTable where user_id = 1;
All changes between 1 jan 2013 and 1 of may 2013
select * from FooTable where created_at between '2013-01-01' and '2013-05-01';
and you need to compare with the old versions (with the help of a stored procedure, java or php code... you chose)
select * from FooTable where incative_at between '2013-01-01' and '2013-05-01';
if you want you can do a fancy sql statement
select f1.table_id, f1.user_id,
case when f1.email = f2.email then 'NO_CHANGE' else concat(f1.email , ' -> ', f2.email) end,
case when f1.phone = f2.phone then 'NO_CHANGE' else concat(f1.phone , ' -> ', f2.phone) end
from FooTable f1 inner join FooTable f2
on(f1.user_id = f2.user_id)
where f2.created_at in
(select max(f3.created_at) from Footable f3 where f3.user_id = f1.user_id
and f3.created_at < f1.created_at and f1.user_id=f3.user_id)
and f1.created_at between '2013-01-01' and '2013-05-01' ;
As you can see a juicy query, to compare the user_with the previews user row...
the state of the database on 2013-03-01
select * from FooTable where table_id in
(select max(table_id) from FooTable where inactive_at <= '2013-03-01' group by user_id
union
select id from FooTable where inactive_at is null group by user_id having count(table_id) =1 );
I think this is the easiest way of implement what you want... you could implement a multi-million tables relational model, but then it would be a pain in the arse to query it
Your database is not big enough, I work everyday with one even bigger. Now tell me is the money you save in a new server worthy the time you spend on a super-complex relational model?
BTW if the data changes too fast, this approach cannot be used...
BONUS: optimization:
create indexes on created_at, inactive_at, user_id and the pair
perform partition (both horizontal and vertical)
if you try and put all occurring changes in different tables and later if you require an instance on some date you join them along and display by comparing dates, for example if you want an instance at 1st of july you can run a query with condition where date is equal or less than 1st of july and order it in asc ordering limiting the count to 1. that way the joins will produce exactly the instance it was at 1st of july. in this manner you can even figure out the most frequently updated module.
also if you want to keep all the data flat try range partitioning on the basis of month that way mysql will handle it pretty easily.
Note: by date i mean storing unix timestamp of the date its pretty easier to compare.
I'll offer one more solution just for variety.
Schema
PROFILE
id INT PRIMARY KEY,
username VARCHAR(50) NOT NULL UNIQUE
PROFILE_ATTRIBUTE
id INT PRIMARY KEY,
profile_id INT NOT NULL FOREIGN KEY REFERENCES PROFILE (id),
attribute_name VARCHAR(50) NOT NULL,
attribute_value VARCHAR(255) NULL,
created_at DATETIME NOT NULL DEFAULT GETTIME(),
replaced_at DATETIME NULL
For all attributes you are tracking, simply add PROFILE_ATTRIBUTE records when they are updated, and mark the previous attribute record with the DATETIME it was replaced at.
Select Current Profile
SELECT *
FROM PROFILE p
LEFT JOIN PROFILE_ATTRIBUTE pa
ON p.id = pa.profile_id
WHERE p.username = 'username'
AND pa.replaced_at IS NULL
Select Profile At Date
SELECT *
FROM PROFILE p
LEFT JOIN PROFIILE_ATTRIBUTE pa
ON p.id = pa.profile_id
WHERE p.username = 'username'
AND pa.created_at < '2013-07-01'
AND '2013-07-01' <= IFNULL(pa.replaced_at, GETTIME())
When Updating Attributes
Insert the new attribute
Update the previous attribute's replaced_at value
It would probably be important that the created_at for a new attribute match the replaced_at for the corresponding old attribute. This would be so that there is an unbroken timeline of attribute values for a given attribute name.
Advantages
Simple two-table architecture (I personally don't like a table-per-field approach)
Can add additional attributes with no schema changes
Easily mapped into ORM systems, assuming an application lives on top of this database
Could easily see the history for a certain attribute_name over time.
Disadvantages
Integrity is not enforced. For example, the schema doesn't restrict on multiple NULL replaced_at records with the same attribute_name... perhaps this could be enforced with a two-column UNIQUE constraint
Let's say you add a new field in the future. Existing profiles would not select a value for the new field until they save a value to it. This is opposed to the value coming back as NULL if it were a column. This may or may not be an issue.
If you use this approach, be sure you have indexes on the created_at and replaced_at columns.
There may be other advantages or disadvantages. If commenters have input, I'll update this answer with more information.

add friends logic in database with efficiency

I'm looking forward to a logic to store friends list in a database. I'mthinking of adding things in an array with userids
example: user a's array in friends table would contain userid arrays of friends like 1,2,4,6,77,44 etc
I want to know whether this will be an efficient way of doing this. If not, what logic should be ideally implemented for large community?
You likely need a separate many-to-many join table. To achieve this, if both the user and their friends reside in the same user table. The table could look like this:
user_id - id of user the friend lookup is being done for, is foreign key to user_id field in user table
friend_id - id of friend associated with user, also is a foreign key to user_id field in user table
You would have a compound primary key across both fields, ensuring that each user_id to friend_id combination is unique.
This would be sample CREATE TABLE statement:
CREATE TABLE `friends` (
`user_id` INT(11) NOT NULL,
`friend_id` INT(11) NOT NULL,
PRIMARY KEY (`user_id`, `friend_id`)
)
ENGINE = INNODB;
And sample data may look like this:
INSERT INTO `friends` (`user_id`, `friend_id`)
VALUES
(1, 2), (1, 3), (1, 4), (2, 1), (2, 10), (4, 1), (4, 20);
Then say you wanted to do a lookup of all the user table data for the friends associated with a particular user (i.e. the logged in user). You could query that data like this:
SELECT users.*
FROM users
INNER JOIN friends ON users.user_id = friends.friend_id
WHERE friends.user_id = [The ID of current user]
No, no, no!
Don't store multiple values in one db field. It will bring you very much problems.
You could use a stucture like this for instance
friends (user_id, user_id)
which indicates 2 friends.