I have object which store in database, it's a some text with properties.
That text has rating. I need to store this rating, and prevent to one user raise this raiting more than one time. If I store "text id" and "user id" in other table and count all records which have needing "text id" i have too much records in table.
There are two ways:
You can use many-to-many relationship ie use separate table with name like 'user_likes', it will have user_id and like_id columns, both of them are primary key (it makes possible user to like the like_object only once)
Another way - which hightraffic websites use: every user record in user table has columns: likes which is just serialized array or json, whatever. Before update this columns your application retrieve this data and look for particular like_object_id if it doesn't exist - you update your database. Please note that in this case all care about data consistency in your application (for instance like_object_id exists in some user record, but doesn't exist in like_object table) should be implemented in your application code, not database.
P.S. Sorry for my english, but I tried to explain as best as I could.
If I store "text id" and "user id" in other table and count all records which have needing "text id" i have too much records in table.
How do you know what is too many records?
Some of the MySQL tables I support have billions of rows. If they need more than that, they split the data to multiple MySQL servers. 1 million rows is not a problem for a MySQL database.
If you want to limit the data so each user can "like" a given text only once, you must store the data separately for each user. This is also true if a user can "unlike" a text they had previously liked.
CREATE TABLE likes (
user_id BIGINT UNSIGNED NOT NULL,
post_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, post_id),
KEY (post_id, user_id)
);
This example table uses its primary key constraint to ensure each user can like a given post only once. By adding a second index, this helps to optimize queries for likes on a specific post.
This is only 16 bytes per row, plus the size of the index. I filled an InnoDB table with over 1 million rows, and it uses about 60MB.
mysql> show table status\G
Name: likes
Engine: InnoDB
Rows: 1046760
Data_length: 39419904
Index_length: 23658496
It's common to store databases on terabyte-sized storage these days, so a 60MB table doesn't seem too large.
I store the likes with the post itself, but not sure with its performance since non of my websites reached a very heavy load.
but I do the following :
Post {
id int;
likes_count int; // likes count to quickly retrive it
likes string; // id of the users liked this post, comma separated
}
when a user likes a post, (using ajax):
the UI will update directly and show that the user liked the post
ajax will send request to the server with the post id and the user id, then post data will be updated as follow:
post.likes_count += 1;
post.likes += userId + ',' ;
when the user reload the page, it will check if his id is in likes, then it the post will appear as liked.
Related
I am making a small system to clean up the database. Every person that visits the site gets put in the db, but if he/she doesn't register, he/she should be removed from the database with a cronjob or so if the time when he/she first visited the site is longer than 2 days. The date is stored in MySQL as a timestamp but looks like this: 2013-06-05 01:18:43.
So what I thought about doing was the following:
$STH = $DBH->query("DELETE FROM user WHERE type=0 AND joindate < ".date('d-m-Y H:i:s',time()-$userLife));
Like this, the format of the timestamp is the same as in MySQL. I'm using $userLife so I can easily adjust this var at the beginning of my script.
The problem is however, that I also need to do queries for other tables containing this user_id. For example the table pages:
id | user_id | level | time | views
In this table, it is possible that there are multiple instances of user_id.
Can this be done in one single query, or do I need to first loop through all the users, for each user then do the DELETE-queries for 3 other tables and after that loop delete all the users?
Ideally, you'd define things with a FOREIGN KEY constraint, and define an ON DELETE CASCADE, which automagically will delete all that related data for you. If that's not possible for some reason (stuck with a MyISAM table for instance), you could simply JOIN the related tables (yes, you can delete from more then 1 table at once). If it's your first time doing that, do it on a testdatabase, and certainly not in production.
I am trying to design a notifications architecture where each notification has a UID and needs to be delivered to multiple users. Each user device has a local cache of the latest notifications. When the user device comes online it always checks for any new notifications and pulls all of them meant for that user. The device keeps the UID of the latest notification it synced and uses that UID to fetch newer notifications from the server.
I am wondering the best way to implement this in MySQL tables to make it scalable for more than 500K users.
I have a notifications details table where the notification UID is the auto increment primary key. I would need suggestions about the user mapping table which can be like (ignoring the foreign key constraints)
CREATE TABLE user_notifications_mapping (
user_id INT UNSIGNED NOT NULL,
notification_id BIGINT UNSIGNED NOT NULL,
UNIQUE KEY (user_id, notification_id)
) ENGINE=InnoDB;
but am skeptical if it would be the best performance while making a query like
SELECT notification_id FROM user_notifications_mapping WHERE user_id = <user-id> AND notification_id > <last-notification-uid>
If the table is properly indexed, this design is very suitable. Assuming that only a "small" number of notifications will be returned to one given device on synchronisation, a medium-range server will be able to handle hundreds of such requests per second, even if the table is huge (millions of rows).
Now this table is going to grow very huge. But I believe one given notification needs to be sent to one given device only once. I would consider removing (or archiving in another table) records of this table once a notification has been sent. Conceptually, this table becomes something like pending_notifications.
[edit]
Given the new information, this table is likely to grow beyond practical size. You need to take a different approach. For example, there is probably a way to group your notifications (eg. they are of a given type, or they originate from a given entity in your application). The same concept can be applied to your users: maybe you want some notifications be sent to (eg.) all "customers" or "all "administrators".
The underlying idea is to establish the n-n relationship between two entities of smaller cardinality. You wouldn't model the case "some users receive some notifications" but rather "some user groups receive some types of notifications".
Example:
notification can be an "Announcement", a "Notice" or a "Warning" (notification type)
users can be "Administrators" or "Customers" (user group)
Then the notifications_mapping table would look like this:
+-----------------------+
| notifications_mapping |
+-----------------------+
| notification_type |
| group_id |
+-----------------------+
And the corresponding query could be:
SELECT notification_id
FROM notifications_mapping AS map
JOIN user ON user.group_id = map.group_id
JOIN notifications ON notifications.type = map.notification_type
WHERE user_id = <user-id> AND notification_id > <last-notification-uid>
I had to implement the following into my database:
The activities that users engage in. Each activity can have a name with up to 80 characters, and only distinct activities should be stored. That is, if two different users like “Swimming”, then the activity “Swimming” should only be stored once as a string.
Which activities each individual user engages in. Note that a user can have more than one hobby!
So I have to implement tables for this purpose and I must also make any modifications to existing tables if and as required and implement any keys and foreign key relationships needed.
All this must be stored with minimal amount of storage, i.e., you must choose the appropriate data types from the MySQL manual. You may assume that new activities will be added frequently, that activities will almost never be removed, and that the total number of distinct activities may reach 100,000.
So I already have a 'User' table with 'user_id' as my primary key.
MY SOLUTION TO THIS:
Create a table called 'Activities' and have 'activity_id' as PK (mediumint(5) ) and 'activity' as storing hobbies (varchar(80)) then I can create another table called 'Link' and use the 'user_id' FK from user table and the 'activity_id' FK from the 'Activities' table to show user with the activities that they like to do.
Is my approach to this question right? Is there another way I can do this to make it more efficient?
How would I show if one user pursues more than one activity in the foreign key table 'Link'?
Your idea is the correct, and only(?) way.. it's called a many to many relationship.
Just to reiterate what you're proposing is that you'll have a user table, and this will have a userid, then an activity table with an activityid.
To form the relationship you'll have a 3rd table, which for performance sake doesn't require a primary key however you should index both columns (userid and activityid)
In your logic when someone enters an activity name, pull all records from the activity table, check whether entered value exists, if not add to table and get back the new activityid and then add an entry to the user_activity table linking the activityid to the userid.
If it already exists just add an entry linking that activity id to the userid.
So your approach is right, the final question just indicates you should google for 'many to many' relationships for some more info if needed.
My web application allows a user to define from 1 up to 30 emails (could be anything else).
Which of these options is best?
1) ...store the data inside only one column using a separator, like this:
[COLUMN emails] peter#example.com,mary#example.com,john#example.com
Structure:
emails VARCHAR(1829)
2) ...or save the data using distinct columns, like this:
[COLUMN email1] peter#example.com
[COLUMN email2] mary#example.com
[COLUMN email3] john#example.com
[...]
Structure:
email1 VARCHAR(60)
email2 VARCHAR(60)
email3 VARCHAR(60)
[...]
email30 VARCHAR(60)
Thank you in advance.
Depends on how you are going to use the data and how fixed the amount of 30 is. If it is an advantage to quickly query for the 3rd address or filter using WHERE clauses and such: use distinct fields; otherwise it might not be worth the effort of creating the columns.
Having the data in a database still has the advantage of concurrent access by several users.
Number two is the better option, without question. If you do the first one (comma separated), then it negates the advantages of using a RDBMS (you can't run an efficient query on your emails in that case, so it may as well be a flat file).
number 2 is better than number one.
However, you should consider another option of getting a normalized structure where you have a separate emails table with a foreign key to your user record. This would allow you to define an index if you wanted to search by email to find a user and place a constraint ensuring no duplicate emails are registered - if you wanted to do that.
Neither one is a very good option.
Option 1 is a poor idea because it makes looking a user up by email a complex, inefficient task. You are effectively required to perform a full text search on the email field in the user record to find one email.
Option 2 is really a WORSE idea, IMO, because it makes any surrounding code a huge pain to write. Suppose, again, that you need to look up all users who have a value X. You now need to enumerate 30 columns and check each one to see if that value exists. Painful!
Storing data in this manner -- 1-or-more of some element of data -- is very common in database design, and as Adam has previously mentioned, is best solved in MOST cases by using a normalized data structure.
A correct table structure, written in MySQL since this was tagged as such, might look like:
Users table:
CREATE TABLE user (
user_id int auto_increment,
...
PRIMARY KEY (user_id)
);
Emails table:
CREATE TABLE user_email (
user_id int,
email char(60) not null default '',
FOREIGN KEY (user_id) REFERENCES user (user_id) ON DELETE CASCADE
);
The FOREIGN KEY statement is optional -- the design will work without it, however, that line causes the database to force the relationship. For example, if you attempt to insert a record into user_email with a user_id of 10, there MUST be a corresponding user record with a user_id of 10, or the query will fail. The ON DELETE CASCADE tells the database that if you delete a record from the user table, all user_email records associated with it will also be deleted (you may or may not want this behavior).
This design of course also means that you need to perform a join when you retrieve a user record. A query like this:
SELECT user.user_id, user_email.email FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause>;
Will return one row for EACH user_email address stored in the system. If you have 5 users and each user has 5 email addresses, the above query will return 25 rows.
Depending on your application, you may want to get one row per user but still have access to all the emails. In that case you might try an aggregate function like GROUP_CONCAT which will return a single row per user, with a comma-delimited list of emails belonging to that user:
SELECT user.user_id, GROUP_CONCAT(user_email.email) AS user_emails FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause> GROUP BY user.user_id;
Again, depending on your application, you may want to add an index to the email column.
Finally, there ARE some situations where you do not want a normalized database design, and a single-column design with delimited text might be more appropriate, although those situations are few and far between. For most normal applications, this type of normalized design is the way to go and will help it perform and scale better.
I have a table for many-to-many relationship of users (three columns: relationship_id, user_id, user_id). How can I keep the relationships unique when the table accepts any entry? When I have a row of
22 11 43
How can I prevent INSERT of next_id 11 43 and more importantly next_id 43 11? When user 11 requested relationship with user 43, user 43 must not be able to request relationship with user 11.
How can I check two columns before INSERT?
And my problem is even more serious as I have two tables (request and relationships). A row from request table will be deleted and inserted into relationships upon user approval. The reason for using two tables is that many pending requests make the table so long, which should be used regularly for displaying user friends.
When INSERTing request from user 11 to user 43, what is the fastest and efficient method to check for possible existence of 11 43 and 43 11 rows in tables: requests and relationships?
Anytime I have used a "Linking Table" to keep track of many to many relationships I always DELETE any existing relationships before INSERTING the new relationships. However, this may not work in your case because your linking table contains the surrogate key relationship_id. If you drop the surrogate key from the linking table and use a stored procedure you can do every thing you listed.
Identifying DuplicatesCreate a View using CASE logic
CREATE VIEW vFriendRequests
AS
SELECT
CASE
WHEN ID_1 < ID_2 THEN ID_1
ELSE ID_2
END CASE as RequestId_1,
CASE
WHEN ID_1 < ID_2 THEN ID_2
ELSE ID_1
END CASE as RequestId_2
FROM Friend_Requests_Table
Then you can do a select distinct from this view to get only the unique sets of requests.
Here are some options to achieve what you want (which to choose or combine depends mainly on your datamodel, client architecture and storage engine):
create a unique composite index on both user_id columns
revoke INSERT into relationships and implement a Stored Procedure for INSERT (this can do any checks etc. you want) OR implement an ON BEFORE INSERT trigger which does what you want
IF the the order of the user_ids is not relevant change the INSERT code to always sort both IDs before INSERTing (for example via the Stored Procedure approach)
This way you don't need to check explicitely but the index will do all work for you
create a fourth column idcomb in relationships with a UNIQUE INDEX and an ON BEFORE INSERT TRIGGER which just takes both user_id sorts them and concatenates them with a - inbetween and assign that to idcomb column as value... this way all work is done by the index and no change on the client-side is needed (when some duplicate is inserted is just comes back with an error)