Find row based on set of foreign rows - mysql

I've got 3 tables for a chat system:
conversation this table holds an id and timestamp
conversation_participant tells me which users are in the chat
conversation_message the messages send within this chat (also holds sender_id)
This setup works quite good for me but now I want to do a 'reverse' lookup. I know which users are going to start a new chat and I want to know if this specific group already has a conversation in the DB. Is there a way to find a row based on a dynamic set of foreign rows?
(preferable without pivot like tricks)
Or is my database design flawed and should I alter that?
CONVERSATION
id int auto_increment
start timestamp
CONVERSATION_PARTICIPANT
conversation_id int (foreign key to conversation)
participant_id int (foreign key to users table)
CONVERSATION_MESSAGE
id int auto_increment
conversation_id int (foreign key to conversation)
author_id int
time timestamp
message text

This assumes you:
have a list of comma delimited participant_id's -> $list
know the number of participants -> $qty
use one of the participants in the WHERE clause -> $participant
replace those pseudo-variable with real values
You can see the code here: http://sqlfiddle.com/#!2/e90f2/11
Code:
SELECT conversation_participant.conversation_id AS conversation_id
, SUM(IF(members.participant_id IN ($list),1,0)) AS member_count
, COUNT(*) AS total
FROM conversation_participant
JOIN conversation_participant AS members
ON members.conversation_id = conversation_participant.conversation_id
WHERE conversation_participant.participant_id = $participant
GROUP BY conversation_id
HAVING member_count = total
AND member_count = $qty;
FYI: The purpose of the WHERE clause is to limit the number of potential conversations.

Related

Merging rows and updating related foreign key

I have a current users table. A distinct user is defined as when the email and phoneNumber together are unique. Currently the table looks like this:
And another table called giftee_info which has the foreign key on column userId to users.id:
The users table is going to be parsed out into 2 tables: users and user_metadata. Now a distinct user will be defined by the phoneNumber. So you can see in the data above, users with id's 4 and 5 are the same, and have to be merged.
The new users table will look like:
And the new user_metadata table will look like this:
Note how the 4th row has userId of 4, so users 4 and 5 have merged to one user.
And giftee_info will look like this:
See how the 3rd row in giftee_info contains userId 4, as the user with id 5 has been merged into one user.
The data I've provided is basic. In reality there are many rows, and a user with the same number may have 5 different email address (and so are currently treated as separate users in the current schema).
The part I'm having most trouble with is updating giftee_info. So any rows with userId's that have been merged down into one user need to be updated. How can I do this?
Since phonenumber can be NULL, I'm using externalid as the unique identifier below.
Start by creating the new users table from the distinct phone numbers in the old users table:
CREATE TABLE new_users (id INT PRIMARY KEY AUTO_INCREMENT, externalid VARCHAR(32), phonenumber VARCHAR(32))
SELECT DISTINCT NULL, externalid, phonenumber
FROM users
Then put all the emails into the user_metadata table, by joining the old and new users tables to get the emails along with the new user IDs.
CREATE TABLE user_metadata (id INT PRIMARY KEY AUTO_INCREMENT, userid INT, email VARCHAR(100), subscribe INT(1))
SELECT NULL, nu.id, u.email, 0
FROM new_users as nu
JOIN users AS u ON nu.externalid = u.externalid
Now you can update giftee_info by replacing the old user IDs with the new user IDs.
UPDATE giftee_info AS g
JOIN users as u ON g.userid = u.userid
JOIN new_users As nu ON nu.externalid = u.externalid
SET g.userid = nu.userid
Once this is all done you can rename the tables so new_users is now users.

Database Structure for MessageBoardApp

Ok i was told my last question was too wide so i try it more precise this time.
I need a Database Structure for a MessageBoard App.
I have 2 ideas but dont know which scales better.
Is there another possibility that is even better than my ideas?
There are Users, Messages and Groups.
All messages belong into at least one group but can be in more.
Users subsscribe to at least one Group and than can see all Messages of all Groups they belong to.
Users can create Groups at will.
Scale(theoretically):
Users: Millions
Messages: Billions
Groups: More than Users
I have two ideas right now:
Idea 1:
Table Users:
ID
All personal information...
GroupsSubscribed (string with all group IDs?)
LastUpdate (Date)
Table Messages:
ID
ImageURI
Text
Creator
Date
Groups
Answer_Messages_IDs (String with all IDs of messages that are an answer to this message)
Table Groups:
ID
GroupName
LastUpdate (Date)
Idea:
Message Get:
App gets every X seconds the Group-LastUpdate (DB call: Group)
If Group-LastUpdate > User-LastUpdate ->
Select all Messages where Groups contain Group and Date > LastUpdate (DB call: Messages)
Message Write:
App writes Message belonging to more Groups
Save Message in Message Table (DB call)
Update Group Table LastUpdate (DB call)
-----------------
Idea 2:
Table Users:
ID
All personal information...
GroupsSubscribed (string with all group IDs?)
NewMessages (string with MessageIDs?)
Table Messages:
ID
ImageURI
Text
Creator
Date
Groups
Answer_Messages_IDs (String with all IDs of messages that are an answer to this message)
Table Groups:
ID
GroupName
UserIDs (string with all user IDs)
Idea:
Message Get:
App gets every X seconds the User-NewMessages(DB call: Users)
If User-NewMessages != "" ->
Select all Messages where ID in List of NewMessages (DB call: Messages)
Message Write:
App writes Message belonging to more Groups
Save Message in Message Table (DB call: Messages)
Get Groups-UserIDs for every Group (DB call: Groups)
Update every User with new Message ID (DB call: Users)
This is an exercise in database normalization as #Paul Spiegel indicates above.
You would create something like the following:
Users
UserID PK
ImageURI
... personal user informational columns ...
Messages
MessageID PK
Text
UserID FK -> Users(UserID) // Message Author (Creator)
Date
Replies
MessageID FK -> Messages(MessageID)
ReplyID FK -> Messages(MessageID)
PK (MessageID, ReplyID)
Groups
GroupID PK
Name
Description
UserID FK -> Users(UserID) // Group Moderator -- I'm just adding this one in for fun.
User_Groups
UserID FK -> Users(UserID)
GroupID FK -> Groups(GroupID)
PK (UserID, GroupID)
Message_Groups
MessageID FK -> Messages(MessageID)
GroupID FK -> Groups(GroupID)
PK (MessageID, GroupID)
I moved ImageID from Messages to Users on the assumption that it is a user Avatar. If it is really something else associated with a message, then move it back.
There are three application integrity rules in addition to the PKs and FKs already included.
AIR #1 - The existence of a row in Messages implies at least one matching row in Message_Groups.
AIR #2 - The existence of a row in Users implies at least one matching row in User_Groups.
AIR #3 - A given ReplyID can only appear once in Replies. This keeps adjacency list semantics preventing a generalized many-to-many association and enforcing a hierarchical association.
The database join logic and application code is left as an exercise to the reader.

MySql Join Tables with List of Indexes in one table's Column?

I have MySQL database with two tables, Users and Games. Basically each user has a unique ID and a set of games that they are playing and each game table has an ID and HasStarted flag (when HasStarted = false players can join the game). The table was set up so the users list of games a string of ID numbers separated by semi-colons (EX: 1;3;5; means they are part of game 1, 3, and 5). Create examples at bottom
What I would like to do is fetch all the games that are joinable AND that they are not already a part of.
I tried using
SELECT * FROM games INNER JOIN users ON games.GID = users.gamelist
WHERE games.status = 0;
but that only matches the games based on the first game in their list. What I'm going to do for now is just fetch all the joinable games into an array and then get the player's list of games and drop those but I was wondering if there was a way to fetch them all in one query. Does anyone know if I can do something like this?
CREATE TABLE users
(UID INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(UID),
games TINYTEXT)
CREATE TABLE games
(GID INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(GID),
has_started BIT)
You REALLY should alter your schema to have a users_to_games table to normalize your data. The setup could look like this:
users
------
uid - primary key
[other fields specific to a single user]
games
------
gid - primary key
has_started
[other fields specific to a single game]
users_to_games
--------------
uid - first field in compound primary key
gid - second field in compound primary key
There would be one record in users_to_games for every game a particular user is involved in.
Now in order to get your list of games that have not started but user has not joined you could do this:
SELECT gid, [any other fields from game table you want for display]
FROM games
WHERE is_started = 0
AND gid NOT IN (SELECT DISTINCT gid FROM users_to_games WHERE uid = ?)

How to set up two MySQL data fields so one or the other can be null but not both?

I have 5 MySQL data fields for a votes table:
post id
poll id
vote id
voter
voteid
You can vote in a poll or vote for a post. If you vote in a poll, the post/person field will be null. If you vote for a post, the vote field will be null.
I want to set up the table so it will allow you to make either the post id or vote id null, but not both. I'm using phpmyadmin to manage my database.
Any ideas?
I have to agree with jmilloy above the best thing to do is to create separate tables. This an example how this would work:
Table structure and sample data:
CREATE TABLE post (
post_id INT AUTO_INCREMENT PRIMARY KEY,
vote_id INT
);
CREATE TABLE poll (
poll_id INT AUTO_INCREMENT PRIMARY KEY,
vote_id INT
);
CREATE TABLE voter(
vote_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(30)
);
INSERT INTO post (vote_id) VALUES(1),(2),(3),(6);
INSERT INTO poll (vote_id) VALUES(3),(5),(4),(7);
INSERT INTO voter(name) VALUES ('bob'),
('Jack'),
('Joe'),
('Shara'),
('Hillary'),
('Steven'),
('Sandra');
To retrieve the voter that has voted for a post you have to use a JOIN. This is an example how this would look like if you want to find the voters for a post number 2.
SELECT post.post_id, vote.name
FROM (post
JOIN post_vote
ON post_vote.post_id = post.post_id)
JOIN vote
ON vote.vote_id = post_vote.vote_id
WHERE post.post_id = 2;
SQL FIDDLE DEMO
Some explanation if you have a poll and a vote you have a many to many relationship, i.e. one voter can vote for more than one poll and one poll can have more than one voter. To bridge between the vote and poll table you use a bridge table. This tables contains all the poll numbers and vote combinations. So when you want to know who has voted for a particular poll you need to link the poll_id with the poll_id in the poll_vote table. The result is then matched with the vote table using the vote_id in the poll_vote table and the vote table. Hope this helps. Good luck with your project.
Look at the MySQL CREATE TABLE syntax http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Notice that NOT NULL or NULL are part of a column definition. The default is NULL. This can only be applied to columns, not pairs of columns.
The solution here is to make two separate tables, one for post votes and one for poll votes. Then you can put the relevant fields in each table. This will also save you space, and make your data less error prone.

Database design - primary key naming conventions

I am interested to know what people think about (AND WHY) the following 3 different conventions for naming database table primary keys in MySQL?
-Example 1-
Table name: User,
Primary key column name: user_id
-Example 2-
Table name: User,
Primary key column name: id
-Example 3-
Table name: User,
Primary key column name: pk_user_id
Just want to hear ideas and perhaps learn something in the process :)
Thanks.
I would go with option 2. To me, "id" itself seems sufficient enough.
Since the table is User so the column "id" within "user" indicates that it is the identification criteria for User.
However, i must add that naming conventions are all about consistency.
There is usually no right / wrong as long as there is a consistent pattern and it is applied across the application, thats probably the more important factor in how effective the naming conventions will be and how far they go towards making the application easier to understand and hence maintain.
I always prefer the option in example 1, in which the table name is (redundantly) used in the column name. This is because I prefer to see ON user.user_id = history.user_id than ON user.id = history.user_id in JOINs.
However, the weight of opinion on this issue generally seems to run against me here on Stackoverflow, where most people prefer example 2.
Incidentally, I prefer UserID to user_id as a column naming convention. I don't like typing underscores, and the use of the underscore as the common SQL single-character-match character can sometimes be a little confusing.
ID is the worst PK name you can have in my opinion. TablenameID works much better for reporting so you don't have to alias a bunch of columns named the same thing when doing complex reporting queries.
It is my personal belief that columns should only be named the same thing if they mean the same thing. The customer ID does not mean the same thing as the orderid and thus they should conceptually have different names. WHen you have many joins and a complex data structure, it is easier to maintain as well when the pk and fk have the same name. It is harder to spot an error in a join when you have ID columns. For instance suppose you joined to four tables all of which have an ID column. In the last join you accidentally used the alias for the first table and not the third one. If you used OrderID, CustomerID etc. instead of ID, you would get a syntax error because the first table doesn't contain that column. If you use ID it would happily join incorrectly.
I tend to go with the first option, user_id.
If you go with id, you usually end up with a need to alias excessively in your queries.
If you go with more_complicated_id, then you either must abbreviate, or you run out of room, and you get tired of typing such long column names.
2 cents.
I agree with #InSane and like just Id. And here's why:
If you have a table called User, and a column dealing with the user's name, do you call it UserName or just Name? The "User" seems redundant. If you have a table called Customer, and a column called Address, do you call the column CustomerAddress?
Though I have also seen where you would use UserId, and then if you have a table with a foreign key to User, the column would also be UserId. This allows for the consistency in naming, but IMO, doesn't buy you that much.
In response to Tomas' answer, there will still be ambiguity assuming that the PK for the comment table is also named id.
In response to the question, Example 1 gets my vote. [table name]_id would actually remove the ambiguity.
Instead of
SELECT u.id AS user_id, c.id AS comment_id FROM user u JOIN comment c ON u.id=c.user_id
I could simply write
SELECT user_id, comment_id FROM user u JOIN comment c ON u.user_id=c.user_id
There's nothing ambiguous about using the same ID name in both WHERE and ON. It actually adds clarity IMHO.
I've always appreciated Justinsomnia's take on database naming conventions. Give it a read: http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/
I would suggest example 2. That way there is no ambiguity between foreign keys and primary keys, as there is in example 1. You can do for instance
SELECT * FROM user, comment WHERE user.id = comment.user_id
which is clear and concise.
The third example is redundant in a design where all id's are used as primary keys.
OK so forget example 3 - it's just plain silly, so it's between 1 and 2.
the id for PK school of thought (2)
drop table if exists customer;
create table customer
(
id int unsigned not null auto_increment primary key, -- my names are id, cid, cusid, custid ????
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
id int unsigned not null auto_increment primary key, -- my names are id, oid, ordid
cid int unsigned not null -- hmmm what shall i call this ?
)engine=innodb;
insert into orders (cid) values (1),(2),(1),(1),(2);
-- so if i do a simple give me all of the customer orders query we get the following output
select
c.id,
o.id
from
customer c
inner join orders o on c.id = o.cid;
id id1 -- big fan of column names like id1, id2, id3 : they are sooo descriptive
== ===
1 1
2 2
1 3
1 4
2 5
-- so now i have to alias my columns like so:
select
c.id as cid, -- shall i call it cid or custid, customer_id whatever ??
o.id as oid
from
customer c
inner join orders o on c.id = o.cid; -- cid here but id in customer - where is my consistency ?
cid oid
== ===
1 1
2 2
1 3
1 4
2 5
the tablename_id prefix for PK/FK name school of thought (1)
(feel free to use an abbreviated form of tablename i.e cust_id instead of customer_id)
drop table if exists customer;
create table customer
(
cust_id int unsigned not null auto_increment primary key, -- pk
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
order_id int unsigned not null auto_increment primary key,
cust_id int unsigned not null
)engine=innodb;
insert into orders (cust_id) values (1),(2),(1),(1),(2);
select
c.cust_id,
o.order_id
from
customer c
inner join orders o on c.cust_id = o.cust_id; -- ahhhh, cust_id is cust_id is cust_id :)
cust_id order_id
======= ========
1 1
2 2
1 3
1 4
2 5
so you see the tablename_ prefix or abbreviated tablename_prefix method is ofc the most
consistent and easily the best convention.
I don't disagree with what most of the answers note - just be consistent. However, I just wanted to add that one benefit of the redundant approach with user_id allows for use of the USING syntactic sugar. If it weren't for this factor, I think I'd personally opt to avoid the redundancy.
For example,
SELECT *
FROM user
INNER JOIN subscription ON user.id = subscription.user_id
vs
SELECT *
FROM user
INNER JOIN subscription USING(user_id)
It's not a crazy significant difference, but I find it helpful.