Adding extra column to reduce complex queries - mysql

Let's look at two tables, for example tables Post and User
In Post table, column user_id is foreign key.
Table Post
-id
-user_id
-post
Table User
-userID
-username
If we want to get poster's username, we have to use join query and get it from User table.
Wouldn't it be easier that we add extra column in Post table for storing posters' usernames in order to simplify SQL queries. In this case, Post table would have user_id and username columns (username column is redundant) but that would eliminate join queries for catching posters' usernames.
What is the best choice, to store usernames in Post table or not

Creating a view is an opition: and ust query that view. However, views in msql don't offer as much benefit in terms of performance as they once did.
It might be that just running the join is fine. Depending on the amount of data you will eventually store you may need to consider other means to improve performance such as partitioning tables etc etc. - but that is for down-the-line

Adding redundant data to the DB is not a good choice.
If you really have to simplify your select queries then create a view for that case. That way the data is in one place and you don't need to join every time. But a simple join is actually not that big a deal.

Related

Modelling ownership in MySQL

I have a table Things and I want to add ownership relations to a table Users. I need to be able to quickly query the owners of a thing and the things a user owns. If I know that there will be at most 50 owners, and the pdf for the number of owners will probably look like this, should I rather
add 50 columns to the Things table, like CoOwner1Id, CoOwner2Id, …, CoOwner50Id, or
should I model this with a Ownerships table which has UserId and ThingId columns, or
would it better to create a table for each thing, for example Thing8321Owners with a row for each owner, or
perhaps a combination of these?
The second choice is the correct one; you should create an intermediate table between the table Things and the table Owners (that contains the details of each owner).
This table should have the thing_id and the owner_id as the primary key.
So finally, you well have 3 tables:
Things (the things details and data)
Owner (the owners details and data)
Ownerships (the assignment of each thing_id to an owner_id)
Because in a relational DB you should not have any redundant data.
You should definitely go with option 2 because what you are trying to model is a many to many relationship. (Many owners can relate to a thing. Many things can relate to an owner.) This is commonly accomplished using what I call a bridging table. (Which exactly what option 2 is.) It is a standard technique in a normalized database.
The other two options are going to give you nightmares trying to query or maintain.
With option 1 you'll need to join the User table to the Thing table on 50 columns to get all of your results. And what happens when you have a really popular thing that 51 people want to own?
Option 3 is even worse. The only way to easily query the data is to use dynamic sql or write a new query each time because you don't know which Thing*Owners table to join on until you know the ID value of the thing you're looking for. Or you're going to need to join the User table to every single Thing*Owners table. Adding a new thing means creating a whole new table. But at least a thing doesn't have a limit on the number of owners it could possibly have.
Now isn't this:
SELECT Users.Name, Things.Name
FROM Users
INNER JOIN Ownership ON Users.UserId=Ownership.UserId
INNER JOIN Things ON Things.ThingId=Ownership.ThingId
much easier than any of those other scenarios?

Best way to do a query with a large number of possible joins

On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

MySQL: Join 1 master table to 1000 sub tables

I have a problem.
I have one master table of urls. The UrlId is primary key. Starting from 10000. In this table i have all urls addresses i have found. Then i have 1000 tables in a different database (so i cannot use foreign keys). They are all identical to eachother like urldata.100 up to urldata.999. (Here i store meta data etc)
Suppose i want to use the left function in a join to join each master record for instance
Id 10000 - joins to table urldata.100
Id 11034 - joins to table urldata.110
So i wanna take out the first 3 numbers from each UrlId from the master table in a massive join.
Can this be achieved?
This is a REALLY REALLY bad design, as it restricts both scalability, as well as the performance of the database.
If you really want to store your meta data for different UrlIds, you just need to create one table. Call it UrlMeta, or something similar.
The structure would be (UrlID, MetaName, MetaValue). This structure can store as much meta data as you want as (MetaName, MetaValue) pairs.
After you manage this, your queries would become extremely simple, as you only need to join 2 tables, instead of 1001!

How to store data in mysql, to get the fastest performance?

I'm thinking about it, which of the following two query types would give me the fastest performance for a user messaging module inside my site:
The first one i thought about is a multi table setup, which has a connection table, and a main table. The connection table holds the connection between accounts, and the messaging table.
In this case a query would look like following, to get some data of the author, and the messages he has sent:
SELECT m.*, a.username
FROM messages AS m
LEFT JOIN connection_table
ON (message_id = m.id)
LEFT JOIN accounts AS a
ON (account_id = a.id)
WHERE m.id = '32341'
Inserting into it is a little bit more "complicated".
My other idea, and in my thought the better solution of this problem is that i store the data i would use in a connection table in the same table where is store the data of the mail. Sounds like i would get lots of duplicated entries, but no, because i have a field which has text type and holds user ids like this: *24*32*249*
If I want to query them, i use the mysql LIKE method. Deleting is an other problem, but for this i have one more field where i store who has deleted the post.
Sad about that i don't know how to join this.
So what would you recommend? Are there other ways?
Sounds like you are using an n:m relation.. if yes, don't put a list of ids in a single column but create a mapping table containing two columns - the primary key of table1 and the primary key of table2. Then selecting, inserting and deleting will all be easy and still cheap.
I wonder how many messages will be send to multiple recipients? It might just be easier to have it all in one table - MessageID, SentFrom, SentTo, Message, and dup it for multiple people. This obviously makes it extremely easy to query.
Definately avoid storing multiple ID's in one field and using LIKE - that'll be a performance killer - go with ThiefMasters suggestion if you want something like that.