This might not be possible but would be amazingly awesome if it were. I have the following basic table structure:
groups
group_id
other_stuff
users
user_id
other_stuff
users_to_groups
user_id
group_id
other_stuff
events
event_id
group_id (where the event belongs)
other_stuff
events is actually a set of tables for the various actions a user can perform on the site.
I would like to be able to perform a query on of the tables and have it return a result something like:
event_type event_id info_columns ...
user_join user_to_group_id
photo_upload photo_id
comment comment_id
where the value in the event_type column would be one generated by the query based on the table name of the source content.
I know I can do this using multiple queries and then piecing them together in PHP, but I was thinking that maybe there is a way to do it entirely in MySQL. Is something like this even possible? If so, what are the basic steps to make it happen?
if you have a number of select statements, and you get the data you want in each, then of course you can join them.
Others have asked similar questions, mysql-join-most-recent-matching-record-from-one-table-to-another
Now, given you're only asking for the latest event. your pseudo code goes
select <userinfo>
from users
join groups
on user=group
join (select lastest event by group
from events
) as tmp
on group=tmp
That way you do 1 query, you hand off that work to the database
Related
I am having a database design issue and i'm still pretty new to MySQL so I thought I would ask here. What would be the best way to get data for a chronological feed from multiple tables? For example a user does many things, they vote, comment, rate, ask questions. I save all this information in their respective tables "tblVote", "tblRate" etc, now the tricky part. a user can follow a user or many, so say you follow 3-4 people. Following allows you to see their interactions, voting, rating, commenting, asking questions etc in your feed (like facebook or something similar).
What would be the best way to get all the information from all 5 tables for every person they follow and then sort all of that chronologically? I Am assuming my current method (foreach follower grab all votes, comments, ratings etc and sort all would be terrible)
My working theory, so my working idea is to create a Interaction table, that has a column for the users id, the id of the other tables entry, and a type reference. so for example
User ID | InteractionID | Type
9 1232 Comment
10 80 Rating
9 572 Vote
Then you could just go ahead and grab all Interactions for each of the people they follow, sort that and then say grab the top 10? and query the individual databases to get the full info (time of comment, text of comment etc)
A many to many relationship exists between User and Follower. Since, Follower is also another user, this becomes a recursive many-to-many. When you decompose this relationship, you get a Association table or a gerund.
User_Follower {id, userid_fk, followerid_fk}
Both the userid_fk and followerid_fk are referencing to the User table.
Now, I am assuming you have a One-to-many relationship between User-tblRate, User-tblVote, User-tblPost etc.
So, you can write a join something like this:
select p.postTitle, p.postTag, ...,
c.commentId, c.commentData, ...
from (tblUser u INNER JOIN tblPost p
ON (p.userid = u.userid)) INNER JOIN tblComment c
ON (c.userid =
u.userid)
where u.userid in
(select userid_fk from user_follower where followerid_fk = 100)
orderby p.datetime_col ASC, c.datetime_col ASC
LIMIT 10;
100 is the user you want to get the information for.
The idea is that you just have one association table linking the User and Follower, then use simple joins to get the data for all the followees
I am developing web application where I have to implement 'Likes' system as facebook has. Application will have a few categories of products that customer can 'like'. So I have started to create database, but I stuck on one obstacle. As I understand there are two ways of doing this:
First. Create one database table with fields of "id, user_id, item_category, item_id". When user click 'like' button information will be saved in this table with various categories of products (item_category).
Second. Create several tables for certain categories of item. For instance, "tbl_item_category_1, tbl_item_category_2, tbl_item_category_3" with fields of "user_id, item_id".
Would be great to get more insight about best practices of this kind database structures. Which works faster? and more logical/practical? I will use only several categories of items.
I would go with the first version with a table structure similar to this:
User Table: PK id
id
username
Category Table: PK id
id
categoryname
Like Table: PK both user_id and catgory_id
user_id
category_id
Here is a SQL Fiddle with demo of table structure with two sample queries to give the Total Likes by user and Total Likes by category
The second one - creating multiple tables is a terrible idea. If you have 50-100 categories trying to query those tables would be horrible. It would become completely unmanageable.
If you have multiple tables trying to get a the total likes would be:
Select count(*)
from category_1
JOIN category_2
ON userid = userid
join category_3
ON userid = userid
join .....
Use one table, no question.
The first method is the correct one. Never make multiple tables for item categories, it makes maintaining your code a nightmare, and makes queries ugly.
In fact, the general rule is that anything that is dynamic (i.e. it changes) should not be stored as a set of static objects (e.g. tables). If you think you might add a new type of 'something' later on, then you need a 'something' types table.
For example, imagine trying to get a count of how many items a user has liked. With the first method, you can just do SELECT COUNT(*) FROM likes WHERE user_id = 123, but in the second method you'd need to do a JOIN or UNION, which is bad for performance and bad for maintainability.
The first method is the correct one. Because you dont know how many categories you will be having and it is very difficult to get the data.
I want to create a table where my users can associate a friendship between one another. Which at the same time this table will work in conjunction to what I would to be a one-to-many relation between various other tables I am attempting to work up.
Right now I am thinking of something like this
member_id, friend_id, active, date
member_id would be the column of the user making the call, friend_id would be the column of the friend they are attempting to tie to, active would be a toggle of sorts 0 = pending, 1 = active, date would just be a logged date of the last activity on that particular row.
Now my confusion is if I were to query I would typically query for member_id then base the rest of the query off of associated friend_id's to display data accordingly to the right people. So with this logic of sorts in mind, that makes me think I would have to have 2 rows per request. One where its the member_id who's requesting and the friend_id of the request inserted into the table, then one thats the opposite so I could query accordingly every time. So in essences its like double dipping for every one action requested to this particular table I need to make 2 like actions to make it work.
Which in all does not make sense to me as far as optimization goes. So in all my question is what is the proper way to handle data for relations like this? Or am I actually thinking sanely about this being an approach to handling it?
If a friendship is always mutual, then you can choose between data redundancy (i.e. both directions having a row) for the sake of simpler queries, or learn to live with slightly more complex queries. I'd personally avoid data redundancy unless there is a compelling reason otherwise - you're not just wasting space and performance, but you'll need to be careful when enforcing it - a simple CHECK is incapable of referencing other rows and depending on your DBMS a trigger may be limited in what it can do with a mutating table.
An easy way ensure to only one row per friendship is to always insert the lower value in member_id and higher value in friend_id (make a constraint CHECK (member_id < friend_id) to enforce it). Then, when you query, you'll have search in both directions - for example, finding all friends of the given person (identified by person_id) would look something like this:
SELECT *
FROM
person
WHERE
id <> :person_id
AND (
id IN (
SELECT friend_id
FROM friendship
WHERE member_id = :person_id
)
OR
id IN (
SELECT member_id
FROM friendship
WHERE friend_id = :person_id
)
)
BTW, in this scheme, you'd probably want to rename member_id and friend_id to, say, friend1_id and friend2_id...
Two ways to look at it:
WHERE ((friend_id = x AND member_id = y) OR (friend_id = y AND member_id = x))
would allow you to query by simply stating one side of the relationship. If both sides are added, this method would still work without causing duplicate rows to be returned.
Conversely, adding both sides of the relationship, so that your queries consist of
WHERE friend_id = x AND member_id = y
not only makes queries easier to write, but also easier to plan (meaning better DB performance).
My vote is for the latter option.
Beautiful - there's no problem with your table as-is.
ALSO:
I'm not sure if this cardinality is "one to many", or "many to many":
http://en.wikipedia.org/wiki/Cardinality_%28data_modeling%29
Q: I were to query I would typically query for member_id then base the
rest of the query off of associated friend_id's to display data
accordingly to the right people
A: Frankly, I don't see any problem querying "member to friend", or "friend to member" (or any other combinations - e.g. friends who share friends). Again, it looks good.
Introduce a helper table like:
users
user_id, name, ...
friendship
user_id, friend_id, ....
select u.name as user, u2.name as friend from users u
inner join friendship f on f.user_id = u.user_id
inner join users u2 on u2.user_id = f.friend_id
I think this is pretty similar to what you have, just putting a query as an example.
Please forgive my ignorance here. SQL is decidedly one of the biggest "gaps" in my education that I'm working on correcting, come October. Here's the scenario:
I have two tables in a DB that I need to access certain data from. One is users, and the other is conversation_log. The basic structure is outlined below:
users:
id (INT)
name (TXT)
conversation_log
userid (INT) // same value as id in users - actually the only field in this table I want to check
input (TXT)
response (TXT)
(note that I'm only listing the structure for the fields that are {or could be} relevant to the current challenge)
What I want to do is return a list of names from the users table that have at least one record in the conversation_log table. Currently, I'm doing this with two separate SQL statements, with the one that checks for records in conversation_log being called hundreds, if not thousands of times, once for each userid, just to see if records exist for that id.
Currently, the two SQL statements are as follows:
select id from users where 1; (gets the list of userid values for the next query)
select id from conversation_log where userid = $userId limit 1; (checks for existing records)
Right now I have 4,000+ users listed in the users table. I'm sure that you can imagine just how long this method takes. I know there's an easier, more efficient way to do this, but being self-taught, this is something that I have yet to learn. Any help would be greatly appreciated.
You have to do what is called a 'Join'. This, um, joins the rows of two tables together based on values they have in common.
See if this makes sense to you:
SELECT DISTINCT users.name
FROM users JOIN conversation_log ON users.id = converation_log.userid
Now JOIN by itself is an "inner join", which means that it will only return rows that both tables have in common. In other words, if a specific conversation_log.userid doesn't exist, it won't return any part of the row, user or conversation log, for that userid.
Also, +1 for having a clearly worded question : )
EDIT: I added a "DISTINCT", which means to filter out all of the duplicates. If a user appeared in more than one conversation_log row, and you didn't have DISTINCT, you would get the user's name more than once. This is because JOIN does a cartesian product, or does every possible combination of rows from each table that match your JOIN ON criteria.
Something like this:
SELECT *
FROM users
WHERE EXISTS (
SELECT *
FROM conversation_log
WHERE users.id = conversation_log.userid
)
In plain English: select every row from users, such that there is at least one row from conversation_log with the matching userid.
What you need to read is JOIN syntax.
SELECT count(*), users.name
FROM users left join conversion_log on users.id = conversation_log.userid
Group by users.name
You could add at the end if you wanted
HAVING count(*) > 0
I'm working on redesigning some parts of our schema, and I'm running into a problem where I just don't know a good clean way of doing something. I have an event table such as:
Events
--------
event_id
for each event, there could be n groups or users associated with it. So there's a table relating Events to Users to reflect that one to many relationship such as:
EventUsers
----------
event_id
user_id
The problem is that we also have a concept of groups. We want to potentially tie n groups to an event in addition to users. So, that user_id column isn't sufficient, because we need to store potentially either a user_id or a group_id.
I've thought of a variety of ways to handle this, but they all seem like a big hack. For example, I could make that a participant_id and put in a participant_type column such as:
EventUsers
----------
event_id
participant_id
participant_type
and if I wanted to get the events that user_id 10 was a part of, it could be something like:
select event_id
from EventUsers
where participant_id = 10
and participant_type = 1
(assuming that somewhere participant_type 1 was defined to be a User). But I don't like that from a philosophical point of view because when I look at the data, I don't know what the number in participant_id means unless I also look at the value in particpant_type.
I could also change EventUsers to be something like:
EventParticipants
-----------------
event_id
user_id
group_id
and allow the values of user_id and group_id to be NULL if that record is dealing with the other type of information.
Of course, I could just break EventUsers and we'll call it EventGroups into 2 different tables but I'd like to keep who is tied to an event stored in one single place if there's a good logical way to do it.
So, am I overlooking a good way to accomplish this?
Tables Events, Users and Groups represent the basic entities. They are related by EventUsers, GroupUsers and EventGroups. You need to union results together, e.g. the attendees for an event are:
select user_id
from EventUsers
where event_id = #event_id
union
select GU.user_id
from EventGroups as EG inner join
GroupUsers as GU on GU.group_id = EG.group_id
where EG.event_id = #event_id
Don't be shy about creating additional tables to represent different types of things. It is often easier to combine them, e.g. with union, than to try to sort out a mess of vague data.
Of course, I could just break EventUsers and we'll call it EventGroups into 2 different tables
This is the good logical way to do it. Create a junction table for each many-to-many relationship; one for events and users, the other for events and groups.
There's no correct answer to this question (although I'm sure if you look hard enough you'll finds some purists that believe that their approach is the correct one).
Personally, I'm a fan of the second approach because it allows you to give columns names that accurately reflect the data they contain. This makes your SELECT statements (in particular when it comes to joining) a bit easier to understand. Yeah, you'll end up with a bunch of NULL values in the column that is unused, but that's not really a big deal.
However, if you'll be joining on this table a lot, it might be wise to go with the first approach, so that the column you join on is consistently the same. Also, if you anticipate new types of participant being added in the future, which would result in a third column in EventParticipants, then you might want to go with the first approach to keep the table narrow.