Suppose I need to do user likes cache since there could be many users hit like button or dislike button in a short period of time.
My idea is that create a mysql table with fields including user_id, liked_object_id, liked_object_type, create_time and (user_id, liked_object_id, liked_object_type) as composite key. When a user hit like button, insert a record to the table, and when a user cancel the like, delete the row from the table. This mysql table is for persisting the data.
And then put the redis in, when user hit the like button, first add a hash object to the redis database, and put the key of that hash object to a set called "cached_keys", so I can grab those hash objects more easily. If the user cancel the like, then the key of the cancelled hash object is moved from "cached_keys" to another set called "deleted_keys" which all the cancelled like's keys is in. Why I do this is I could track which like object is added or deleted when I write all the data in the redis back to mysql table after certain period of time. Except that persisting to mysql, I only do query and write to the redis.
There are other problems. Like query all all likes of a certain type of object and of one user. in order to do this, I need create many redis sets, which namespace is combine of object_type and user_id, and store the all the corresponding object_id in the set, and modify them along with other set like "cached_keys" and "deleted_keys". Another similar case is query all users who liked a certain object.
Another problem is do join query. Since the data is cached in the redis, it's difficult to do paged join querys to the mysql likes table, for example query a list of objects, sort by it's like count. I have to get all the objects from mysql to the memory, then get likes count from redis and sort them, then page the results. And If I want to do some query of one type of objects which one user has liked, I have to get all the keys from the redis and use a "IN".
Doing it like this feels not the right way to do it. Every time the persist happended, its gonna write all the records to the database at once, doesn't feel like it could scale well, as well as the join query.
How could I improve this, or should I do it in a completely different way? Thanks a lot.
Related
I have a central database containing millions of IDs. And I have a group of users (50-100 users), all being able to request extraction of IDs from this big database.
Atm what I do is when a user sends a GET request, I SELECT 100 ids then update them with the flag USED and return the 100. The problem is, if I get too many requests at the same time, multiple users will receive the same ids (because I dont lock the db when doing select and then update)
If I lock the database my problem will be solved, but it will also be slower.
What other alternative I have?
Thanks!
Look ahead another step... What if a "user" gets 100 rows, then keels over dead. Do you have a way to release those 100 for someone else to work on?
You need an extra table to handle "check out" and "check in". Also, use that table to keep track of the "next" 100 to assign to a user.
When a user checks out the 100, a record of that is stored in the table, together with a timestamp and "who" checked them out. If they don't "check them back in within, say, an hour, then you assign that 100 to another user.
Back on something more mundane... How to pick 100. If there is an auto_increment id with no gaps, then use simple math to chunk up the list. If there are a lot of gaps, then use SELECT id FROM tbl WHERE id > $leftoff ORDER BY id LIMIT 100, 1 to get the end of the next 100.
If each user has their own key, you could pull from the millions of IDs starting from their key*10000. For example, user #9 would first get IDs #90000 to #90099, then #90100 to #90199 next time.
You could set the IDs as "Used" before they get sent back, so one user requesting IDs multiple times will never get duplicates. This needn't lock the database for other users.
If they don't request keys more than 100 times before the database can update, this should avoid collisions. You might need to add logic to allow users who request often not to run out, like by having a pool of IDs that can repopulate their supply, but that depends on particulars that aren't clear from the original question.
I'm working now on a project that involves many users and they're log in time/log out time (and summary) details. To be able to watch after their presence.
My question is what is possibly the best way to store tat data? (if we talk about hundreds or maybe thousands of users)
1. To make an DB that contains a table for each user, there it has all the dates and hours?
2. To make one big table which contains all this data?
Thanks.
A table for each user is a weird approach.
Make a table for ALL users, which is the correct way to go.
Then make a table called actions with the user_id as a FOREIGN KEY, and two more columns: type and time.
When the user logs in, add a new row to the actions table with type = 1 (login) and when he logs out, add a type = 2 (logout).
Using numbers instead of strings is better since it reduces database weight.
Repeating the same string is costy.
The type column must be a INT type.
The time column can have CURRENT_TIMESTAMP as the default value, since it will log the action when it has happened.
See a example fiddle with schema and query
I am in the process of creating a simple activity stream for my app.
The current technology layer and logic is as follows:
** All data relating to an activity is stored in MYSQL and an array of all activity id's are kept in Redis for every user.**
User performs action and activity is directly stored in an 'activities' table in MYSQL and a unique 'activity_id' is returned.
An array of this user's 'followers' is retrieved from the database and for each follower I push this new activity_id into their list in Redis.
When a user views their stream I retrieve the array of activity id's from redis based on their userid. I then perform a simple MYSQL WHERE IN($ids) query to get the actual activity data for all these activity id's.
This kind of setup should I believe be quite scaleable as the queries will always be very simple IN queries. However it presents several problems.
Removing a Follower - If a user stops following someone we need to remove all activity_id's that correspond with that user from their Redis list. This requires looping through all ID's in the Redis list and removing the ones that correspond to the removed user. This strikes me as quite unelegant, is there a better way of managing this?
'archiving' - I would like to keep the Redis lists to a length of
say 1000 activity_id's as a maximum as well as frequently prune old data from the MYSQL activities table to prevent it from growing to an unmanageable size. Obviously this can be achieved
by removing old id's from the users stream list when we add a new
one. However, I am unsure how to go about archiving this data so
that users can view very old activity data should they choose to.
What would be the best way to do this? Or am I simply better off
enforcing this limit completely and preventing users from viewing very old activity data?
To summarise: what I would really like to know is if my current setup/logic is a good/bad idea. Do I need a total rethink? If so what are your recommended models? If you feel all is okay, how should I go about addressing the two issues above? I realise this question is quite broad and all answers will be opinion based, but that is exactly what I am looking for. Well formed opinions.
Many thanks in advance.
1 doesn't seem so difficult to perform (no looping):
delete Redis from Redis
join activities on Redis.activity_id = activities.id
and activities.user_id = 2
and Redis.user_id = 1
;
2 I'm not really sure about archiving. You could create archive tables every period and move old activities from the main table to an archive table periodically. Seems like a single properly normalized activity table ought to be able to get pretty big though. (make sure any "large" activity stores the activity data in a separate table, the main activity table should be "narrow" since it's expected to have a lot of entries)
I have 200 users each user will eventually have a "reviewINFO" table with certain data.
Each user will have a review every 3 to 4 months
So for every review, it creates a new row inside the "reviewINFO" table.
This is where i'm stuck. I'm not sure if I need to serialize a table inside each row or not.
Example:
-> links
"USER1reviewINFO"-row1->USER1table1
-row2->USER1table2
-row3->USER1table3
-row4->USER1table4
-row5->USER1table5
"USER2reviewINFO"-row1->USER2table1
-row2->USER2table2
-row3->USER2table3
-row4->USER2table4
-row5->USER2table5
using this method it will make a couple of thousand rows within two years. And I think its harder to manage.
"Userxtablex" is a table with dynamic rows of children names,ages,boolean
What i'm think of doing is serialize each USERxtable into its corresponding row.
Please help as I would not like to make this complicate or inefficient
Generally, you should never have to serialize data of this nature into a table row to accomplish what your goal is (which I am assuming is an implicit link between a user and a review)
What you need to do is key the reviews by a user_id such that all the reviews are packaged in one table, and relate numerically back to the users table.
Assuming you have an AUTO_INCREMENT primary key in the user table, all you would need is a user_id field in the reviews table that represents what user the review relates to. There is no need for a separate structure for each user, if that's what you are suggesting. Reviews can have date fields as well, so you can perform queries for a specific year or window of time.
You can then use a JOIN query to select out your data set relating to a particular user or review, and apply the usual WHERE clause to determine what result set you want to fetch.
What could be the best way to fetch records from a MySQL table for more than one clients connected, which are retrieving records concurrently and periodically.
So everyone gets the new messages as the new record enters the table but old messages should not retrieve again.
Current Table Structure:
MessageId, Message, DatePosted, MessageFromID
Thanks
Your problem can be translated to: How can each client know, which records to read and which records not.
There are two completly different approaches to that, with very different properties.
Let the client care for that
Let the server care for it
Model #1 would quite simply require, that you
Use something like an AUTO_INCREMENT on some field, if your MessageID is not guaranteed to be incrementing
On the server give each client not only the messages, but also the ID
Have the client keep this ID and use it as a filter for the next poll
Model #2 needs you to
Have another table with 'ClientID and MessageID'
Whenever a client gets a message, create a record there
Use non-existance of such a record as a polling filter