member action table data model suggestion - mysql

I'm trying to add an action table, but i'm currently at odds as to how to approach the problem.
Before i go into more detail.
We have members who can do different actions on our website
add an image
update an image
rate an image
post a comment on image
add a blog post
update a blog post
comment on a blog post
etc, etc
the action table allows our users to "Watch" other member's activities if they want to add them to their watch list.
I currently created a table called member_actions with the following columns
[UserID] [actionDate] [actionType] [refID]
[refID] can be a reference either to the image ID in the DB or blogpost ID, or an id column of another actionable table (eg. event)
[actionType] is an Enum column with action names such as (imgAdd,imgUpdate,blogAdd,blogUpdate, etc...)
[actionDate] will decide which records get deleted every 90 days... so we won't be keeping the actions forever
the current mysql query i cam up with is
SELECT act.*,
img.Title, img.FileName, img.Rating, img.isSafe, img.allowComment AS allowimgComment,
blog.postTitle, blog.firstImageSRC AS blogImg, blog.allowComments AS allowBlogComment,
event.Subject, event.image AS eventImg, event.stimgs, event.ends,
imgrate.Rating
FROM member_action act
LEFT JOIN member_img img ON (act.actionType="imgAdd" OR act.actionType="imgUpdate")
AND img.imgID=act.refID AND img.isActive AND img.isReady
LEFT JOIN member_blogpost blog ON (act.actionType="blogAdd" OR act.actionType="blogUpdate")
AND blog.id=act.refID AND blog.isPublished AND blog.isPublic
LEFT JOIN member_event event ON (act.actionType="eventAdd" OR act.actionType="eventUpdate")
AND event.id=act.refID AND event.isPublished
LEFT JOIN img_rating imgrate ON act.actionType="imgRate" AND imgrate.UserID=act.UserID AND imgrate.imgID=act.refID
LEFT JOIN member_favorite imgfav ON act.actionType="imgFavorite" AND imgfav.UserID=act.UserID AND imgfav.imgID=act.refID
LEFT JOIN img_comment imgcomm ON (act.actionType="imgComment" OR act.actionType="imgCommentReply") AND imgcomm.imgID=act.refID
LEFT JOIN blogpost_comment blogcomm ON (act.actionType="blogComment" OR act.actionType="blogCommentReply") AND blogcomm.blogPostID=act.refID
ORDER BY act.actionDate DESC
LIMIT XXXXX,20
Ok so basically, given that i'll be deleting actions older than 90 days every week or so... would it make sense to go with this query for displaying the member action history?
OR should i add a new text column in member_actions table called [actionData] where i can store a few details in json or xml format for fast querying of the member_action table.
It adds to the table size and reduces query complexity, but the table will be purged from periodically from old entries.
the assumption is that eventually we'll have no more than a few 100k members so would i'm concerned about the table size of the member_action table with it's text [actionData] column that will contain some specific details.
I'm leaning towards the [actionData] model but any recommendations or considerations will be appreciated.
another consideration is that it's possible that the table entries for img or blog could get deleted... so i could have action but no reference record...this sure does add to the problem.
thanks in advance

Because you are dealing with user interface issues, performance is key. All the joins will do take time, even with indexes. And, querying the database is likely to lock records in all the tables (or indexes), which can slow down inserts.
So, I lean towards denormalizing the data, by maintaining the text in the record.
However, a key consideration is whether the text can be updated after the fact. That is, you will load the data when it is created. Can it then change? The problem of maintaining the data in light of changes (which could involve triggers and stored procedures) could introduce a lot of additional complexity.
If the data is static, this is not an issue. As for table size, I don't think you should worry about that too much. Databases are designed to manage memory. It is maintaining the table in a page cache, which should contain pages for currently active members. You can always increase memory size, especially for 100,000 users which is well within the realm of today's servers.

I'd be wary of this approach - as you add kinds of actions that you want to monitor the join is going to keep growing (and the sparse extra columns in the select statement as well).
I don't think it would be that scary to have a couple of extra columns in this table - and this query sounds like it would be running fairly frequently, so making it efficient seems like it would be a good idea.

Related

How to design a MySQL table that tracks the Status of each Asset, as well as every old Status?

I would like to create a table that tracks the status of each asset as well as each past status. Basically I want to keep a log of all status changes.
Do I create a timestamp for each updated status and have every update be its own separate row, linked back to the asset through the assetid? Then sort by the timestamp to get these statuses in order? I can see this table getting unwieldy if there are tons of rows for each asset and the table grows linearly over time.
This is for a MySQL database.
Here is an example of how I have designed a database table to track/log purposes.
Columns:
auto increment pk (if you don't have better pk)
timestamp
tracked object id (asset_id in your case)
event type (probably you don’t need but this is explained below)
content (this can be also named status in your case)
My example is very simplified but the main idea is to insert each record into own row. You can create a table with proper primary keys or indexes to have a good search performance.
Using the structure you should be able to search by asset, by status, or get latest changes etc. The structure depends on your needs so usually I have modified it to support the need.
Don’t care too much about the event -columns. I just put it here because most of the implementations are based on event sourcing. Here is a link to one article that could explain it: http://scottlobdell.me/2017/01/practical-implementation-event-sourcing-mysql/
I suggest that you could read more about that event sourcing that if the design could work in your case. Look only the database example because that is similar like in my example.
In the results, you should have a journal of status changes. Then it depends on your code how to handle/read data and show results.
About the linear growth… I would say it is not a big problem. Of course, if you have more information what “tons of rows” means, then ask. I have not seen any scaling problems. The same structure works very well with relational or with NoSQL databases. Mysql also has features to optimize that kind of structure if the size of data will be a problem.

Best way to do a query with a large number of possible joins

On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

Using Redis as a Key/Value store for activity stream

I am in the process of creating a simple activity stream for my app.
The current technology layer and logic is as follows:
** All data relating to an activity is stored in MYSQL and an array of all activity id's are kept in Redis for every user.**
User performs action and activity is directly stored in an 'activities' table in MYSQL and a unique 'activity_id' is returned.
An array of this user's 'followers' is retrieved from the database and for each follower I push this new activity_id into their list in Redis.
When a user views their stream I retrieve the array of activity id's from redis based on their userid. I then perform a simple MYSQL WHERE IN($ids) query to get the actual activity data for all these activity id's.
This kind of setup should I believe be quite scaleable as the queries will always be very simple IN queries. However it presents several problems.
Removing a Follower - If a user stops following someone we need to remove all activity_id's that correspond with that user from their Redis list. This requires looping through all ID's in the Redis list and removing the ones that correspond to the removed user. This strikes me as quite unelegant, is there a better way of managing this?
'archiving' - I would like to keep the Redis lists to a length of
say 1000 activity_id's as a maximum as well as frequently prune old data from the MYSQL activities table to prevent it from growing to an unmanageable size. Obviously this can be achieved
by removing old id's from the users stream list when we add a new
one. However, I am unsure how to go about archiving this data so
that users can view very old activity data should they choose to.
What would be the best way to do this? Or am I simply better off
enforcing this limit completely and preventing users from viewing very old activity data?
To summarise: what I would really like to know is if my current setup/logic is a good/bad idea. Do I need a total rethink? If so what are your recommended models? If you feel all is okay, how should I go about addressing the two issues above? I realise this question is quite broad and all answers will be opinion based, but that is exactly what I am looking for. Well formed opinions.
Many thanks in advance.
1 doesn't seem so difficult to perform (no looping):
delete Redis from Redis
join activities on Redis.activity_id = activities.id
and activities.user_id = 2
and Redis.user_id = 1
;
2 I'm not really sure about archiving. You could create archive tables every period and move old activities from the main table to an archive table periodically. Seems like a single properly normalized activity table ought to be able to get pretty big though. (make sure any "large" activity stores the activity data in a separate table, the main activity table should be "narrow" since it's expected to have a lot of entries)

How slow is the LIKE query on MySQL? (Custom fields related)

Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.