SQL: Seeking an advise for tables structure etc - mysql

First what are conditions. I have people belonging to "small" group. (which in other words means every one has "small_group_id". Then "small" groups form "big" groups (which in other words means "small_groups" may or not have "big_group_id" depending if small group belongs to bigger ot not).
I want to create a table structure (that would be used by PHP) for keeping and displaying two following things:
Public messages (means whoever is regestered or even not will be able to see it). Only author of the message can edit/delete. This is easy part :)
Private messages WITH defining how private is it. That means privat emessage should have property a) what small groups can see it b) what big groups can see it (that assumes that all members of big groups will have rights to see it).
Basically the challenge for me is how to design and later work with visibility of those private messages.
My first though was table like: msgID, msgBody, small_groups_list, big_group_list, authorID So I store e.g. in 'small_groups_id' something like 'id_1; id_4; id_10', etc and similar for big groups. But then I'm not sure how do I do search through such stored lists when e.g. person belonging to small_group_id = 10 supposed to see that mesage. Also what should be the columns small_groups_list and big_group_list defenitions/types.
Perhaps there is better way to store such things and using them as well?
That is why I'm here. What would be better practices for such requirements?
(it is going to be implemented on mySQL)
Thank you in advance.
[edit]
I'm pretty unexperienced in SQL and DB things. Please take that into account when answering.

First: Don't denormalize your data with "array" columns. That makes it a horror to query, and even worse to update.
Instead, you need two separate tables: small_group_visibility and big_group_visibility. Each of these two tables will consist of msgID and groupID. Basically, it's a many-to-many relationship that's pointing out to both the group and the message it is concerned with.
This is a pretty common database pattern.
To query for messages to be displayed, imagine that we have a user whose small groups are (1, 2, 3) and whose large groups are (10, 20).
SELECT DISTINCT msgID, msgSubject, msgBody -- and so on
FROM messages m
LEFT JOIN small_group_visibility sg
ON sg.msg_id = m.msg_id
LEFT JOIN big_group_visibility bg
ON bg.msg_id = m.msg_id
WHERE
sg.group_id IN (1, 2, 3) OR
bg.group_id IN (10, 20);

Related

cons of storing comma separated value of ids for custom sort order

We're working a web application (Ruby/Rails + Backbone,jQuery,Javascript) where a user can manage a booklist and drag and drop books to rearrange their order within the list, which has to be persisted.
We have books and a custom collection of books called booklist, for which we have two tables: book and booklist. Since a book could belong to multiple booklists, and a booklist consists of multiple books, they have an m x n relationship, and we have another additional table to store the mapping. Lets say we use this for all purposes. Now when the user wants to re-order the books in her bookshelf, we'd need to store that order.
I can totally see the sense about why storing ids in a column is evil , no doubts about it. What if we have the tables normalized, and for all other cases we'd go through the standard operations.
There are quite a few approaches on storing an additional order column. But still it seems like bad design to store the ids of the books in a booklist in a comma separated list in the booklist table, even assuming that integrity is maintained.
We'd never run into this...
SELECT * FROM users WHERE... OH F#$%CK -
Yes it's bad, you can't order, count, sum (etc) or even do a simple report without depending
on a top level language.
because we'd simply be selecting books based on the booklist id using the join table like the standard approach. (In any case, we're only getting the books as an array as part of the backbone booklist model)
So what if we retrieve the booklist and books for the booklist, and do the sorting programatically on the client side (in this case Javascript?) based on the CSV column.
It appears to be a simple solution because:
Every time the user reorders a book, we simply store all the ids in this one column freshly again. (A user will have at the most 20 to 30 books in a booklist).
We could of course simply ignore invalid ids, i.e. books that have been deleted after the booklist had been created.
What are the disadvantages of this approach, which seems to be simpler than maintaining the sort order and updating other columns every time an order is changed, or using a float or weightage, etc.
As per my knowldege its really violating the rule of RDBMS.Which causes facing many difficulties when applying JOIN.
Hope it will help you.

Best way to do a query with a large number of possible joins

On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

Database model for individual users seeing notice

I am looking for the best solution for the way the mySQL db should be set up for my app.
My app works like a noticeboard with two sections, "New Notices" and "Seen Notices".
Now when a user has viewed a notice, they click a button and it moves from New to Seen. But ONLY for this person.
Each person will have all of the notices viewable - but not necessarily in the same sections - as users will view them at different times and check them off as seen at different times.
My guess is having one table "Notices" for all notices, and a seperate table called "Seen" with the rows "UserID" and "noticeID". This means that for each notice it will need to consult the "Seen" table to find out if it should be shown or not. Is this ideal or is there another way?
Having a table with NoticeID and UserID is correct, I'd also add viewed date.
You can use 3 tables
Users
Notices
SeenNotices(maybe not the best name)
In the SeenNotices table have three columns UserID, NoticesID, HaveSeen. The have HaveSeen column will tell you if the user has seen it.
The way you are thinking should work, although over time you'll end up with a very big 'Seen' table, which is not scalable. An easy alternative is to use 'Unseen' table instead. This way the table gets smaller as people view the notice and you can also delete very old entries (old notices may no longer relevant so doesn't matter if they are not shown as Unseen to user).
Using the 'unseen' table your query will look like this:
SELECT n.notice_id, n.notice_msg, IF(u.user_id, 'new', 'seen') AS status
FROM notice n
LEFT JOIN unseen u ON (u.user_id = $user_id AND n.notice_id = u.notice_id)
WHERE user_id = $user_id;

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...