Non-unique many-to-many table design - mysql

I'm implementing a voting system for a php project which uses mysql. The important part is that I have to store every voting action separately for statistic reasons. The users can vote for many items multiple times, and every vote has a value (think of it like a donation kinda stuff).
So far I have a table votes in which I'm planning to store the votes with the following columns:
user_id - ID of the voting user, foreign key from users table
item_id - ID of the item which the user voted for, foreign key from items table
count - # of votes spent
created - date and time of voting
I'll need to get things out of the table like: Top X voters for an item, all the items that a user have voted for.
My questions are:
Is this table design suitable for the task? If it is, how should I index it? If not, where did I go wrong?
Would it be more rewarding to create another table beside this one, which has unique rows for the user-item relationship (not storing every vote separately, but update the count row)?

Each base table holds the rows that make a true statement from some fill-in-the-(named-)blanks statement aka predicate.
-- user [userid] has name ...
-- User(user_id, ...)
SELECT * FROM User
-- user [user_id] voted for item [item_id] spending [count] votes on [created]
-- Votes(user_id, item_id, count, created)
SELECT * FROM Votes
(Notice how the shorthand for the predicate is like an SQL declaration for its table. Notice how in the SQL query a base table predicate becomes the table's name.)
Top X voters for an item, all the items that a user have voted for.
Is this table design suitable for the task?
That query can be asked using that design. But only you can know what queries "like" that one are. You have to define sufficient tables/predicates to describe everything you care about in every situation. If Votes records the history of all relevant info about all events then it must be suitable. The query "all the items that user User has voted for" returns rows satisfying predicate
-- user User voted for item [item] spending some count on some date.
-- for some count & created,
user User voted for item [item_id] spending [count] votes on [created]
-- for some count & created, Votes(User, item_id, count, created)
-- for some user_id, count & created,
Votes(user_id, item_id, count, created) AND user_id = User
SELECT item_id FROM Votes WHERE user_id = User
(Notice how in the SQL the condition turns up in the WHERE and the columns you keep are the ones that you care about. More here and here on querying.)
If it is, how should I index it?
MySQL automatically indexes primary keys. Generally, index column sets that you JOIN ON, otherwise test, GROUP BY or ORDER BY. MySQL 5.7 Reference Manual 8.3 Optimization and Indexes
Would it be more rewarding to create another table beside this one, which has unique rows for the user-item relationship
If you mean a user-item table for some count & created, [user_id] voted for [item_id] spending [count] votes on [created] and you still want all the individual votings then you still need Votes, and that user-item table is just SELECT user_id, item_id FROM Votes. But if you want to ask about people who haven't voted, you need more.
(not storing every vote separately, but update the count row)
If you don't care about individual votings then you can have a table with user, item and the sum of count for user-item groups. But if you want Votes then that user-item-sum table is expressible in terms of Votes using GROUP BY user_id, item_id & SUM(count).

Related

Alternatives to junction table?

I'm designing a relational database tables for storing data about eCommerce scenario where I need to store
List of Products purchased by a user
List of users who purchased a particular product.
Its a many to many relationship.
So far I could only thinking of doing this.
create a table for storing orders
table recordorders(
userID // foreign key from users table
productID, // foreign key from products table
dateofpurchase,
quantity,
price_per_unit,
total_amount
)
It will act like a junction table.
Is this a good approach and are there any other methods than junction table that are more effective and efficient for querying ?
Your bullets describe two tables, not one. Your junction table is not properly described as two lists. It is a set of order info rows. The junction table you gave holds rows where "user [userID] purchased product [productID] on ...". Ie it records order info. (Combinations of user, product, date, etc of orders.) Given a user or product, you can get the corresponding bullet table by querying the order info table.
However your table probably needs another column that is a unique order id. Otherwise it cannot record that there are two orders that are alike in all those columns. (Eg if the same person buys the same product on the same date in the same quantity, price and total.) Ie its rows probably aren't 1:1 with orders. That's why above I called it an order info table rather than an order table. It records that some order had those properties; but it doesn't record distinct orders if there can be orders with the same info. It's really a many-to-many-to-etc (for every column) association. That is why an order id gets picked as a unique name for an order as further info. This new table would be called an entity table, not a junction or association table. It holds rows where "in order [id] user [user] purchased ...".
PS An order is usually something that can be characterized as an association on/among/between an order id, user, set of order-lines (product, quantity, price & total), and other stuff (date, grand total, etc). The orders are usually relationally characterized by an order entity table on order id with its user, date etc plus an order-line association table on order ids and their order-line info.
PPS It's time for you to read a book about information modeling and database design.
You don't "store" those two things in a table (Junction, or otherwise), you discover them from the raw ("Fact") data:
Using your proposed table:
List of Products purchased by a user:
SELECT productID
FROM recordorders
WHERE userID = 123;
List of users who purchased a particular product:
SELECT userID
FROM recordorders
WHERE productID = 987;

Mysql setup for multiple users with large number of individual options

i'm building a study tool and i'm not sure of the best way to go about structuring my database.
Basically, i have a simple but big table with around 50000 bits of information in it.
info (50'000 rows)
id
info_text
user
id
name
email
password
etc
What i want is for the students to be able to marked each item as studied or to be studied(basically on and off), so that they can tick off each item when they have revised it.
I want to build tool to cope with thousands of users and was wondering what the most efficient/easiest option way of setting up the database and associated queries.
At the moment i would lean towards just having one huge table with two primary keys one with user id and then id of the info they had studied and then doing some sort of JOIN statement so i could only pull back the items that they had left to study.
user_info
user_id
info_id
Thanks in advance
Here is one way to model this situation:
The table in the middle has a composite primary key on USER_ID and ITEM_ID, so a combination of the two must be unique, even though individually they don't have to be.
A user (with given USER_ID) has studied a particular item (with given ITEM_ID) only if there is a corresponding row in the STUDIED table (with these same USER_ID and ITEM_ID values).
Conversely, the user has not studied the item, if and only if the corresponding row in STUDIED is missing. To pull all items a given user hasn't studied, you can do something like this:
SELECT * FROM ITEM
WHERE NOT EXISTS (
SELECT * FROM STUDIED
WHERE
USER_ID = <given_user_id>
AND ITEM.ITEM_ID = STUDIED.ITEM_ID
)
Or, alternatively:
SELECT ITEM.*
FROM ITEM LEFT JOIN STUDIED ON ITEM.ITEM_ID = STUDIED.ITEM_ID
WHERE USER_ID = <given_user_id> AND STUDIED.ITEM_ID IS NULL
The good thing about this design is that you don't need to care about STUDIED table in advance. When adding a new user or item, just leave the STUDIED alone - you'll gradually fill it later as users progress with their studies.
I would do something like this:
1) A users table with a uid primary key
2) A enrolled table (this table shows all courses that have enrolled students) with a primary key of (uid, cid)
3) A items (info) table holding all items to study, with a primary key of itemid
Then in the enrolled table just have one attribute (a binary flag) 1 means it has been studyed and 0 means they still need to study it.

Best way to store ordered lists in a database?

What's the best way to store "ordered lists" in a database, so that updating them (adding, removing and changing the order of entries) is easily done?
Consider a database where you have a table for users and movies. Each user has a list of favorite movies.
Since many users can like the same movie, I made users and movies separate tables and uses a third table to connect them, usermovies.
usermovies contains an id of a user and a movie and an "order number". The order number is used to order the list of movies for users.
For example, user Josh might have the following list:
Prometheus
Men in Black 3
The Dictator
and user Jack might have a list like:
The Dictator
Prometheus
Battleship
Snow White and the Huntsman
So, they share some favorites, but not necessarily in the same order.
I can get the list of movie IDs for each user using a query:
SELECT movie_id FROM usermovies WHERE user_id =? ORDER BY order_number
Then, with the ordered movie_ids, I can get the list of movies using another query
SELECT name FROM movies WHERE id in (?,?,?) ORDER BY FIELD (id, ?,?,?)
So queries work, but updating the lists seems really complex now - are there better ways to store this information so that it would be easy to get the list of movies for user x, add movies, remove them and change the order of the list?
If you are not looking for a "move up / move down" kinda solution, and then defaulting to adding at the bottom of the list, here are a few more pointers:
Inserting new rows into a specific position can be done like this: (inserting at position 3)
UPDATE usermovies SET order_number = ordernumber + 1
WHERE ordernumber > 3 and user_id = ?;
INSERT INTO usermovies VALUES (?, 3, ?);
And you can delete in a similar fashion: (deleting position 6)
DELETE usermovies WHERE order_numer = 6 and user_id=?;
UPDATE usermovies SET order_number = ordernumber - 1
WHERE ordernumber > 6 and user_id = ?;
A junction/link table with additional columns for the attributes of the association between movies and users is the standard way of realizing a many-many association with an association class - so what you have done seems correct.
Regarding the ease of insert/update/delete, you'll have to manage the entire association (all rows for the user-movie FKs) every time you perform an insert/update/delete.
There probably isn't a magical/simpler way to do this.
Having said this, you'll also need to run these operations in a transaction and more importantly have a 'version' column on this junction table if your application is multi-user capable.
To retrieve user favourites movies you could use a single query:
SELECT um.order_number, m.name FROM movies m
INNER JOIN usermovies um ON m.id = um.movie_id
WHERE um.user_id = ?
ORDER BY um.order_number
To add/remove a favourite movie simply add/remove related record in usermovies table.
To alter a movie order simply change all order_number field in user_movies table related to user.
In addition to what others have said, reordering existing favorites can be done in a single UPDATE statement, as explained here.
The linked answer explains reordering of two items, but can be easily generalized to any number of items.

MySQL Database design and effecient query

I have the following tables:
users (id, first_name, last_name)
category (id, name)
rank(id, user_id, rank)
Each user can belong to several categories. And all users are in the rank table and have a value between 0.0 and 1.0, where 0 is the lowest rank and 1 is the highest. I’d like to setup additional tables to create the following webpage:
A visitor to the page (identified by either one of the recorded ids in the user table, or a numeric representation of their ip address) chooses a category and is presented with two randomly chosen users from the users table such that:
1) the visiting user_id has not seen this pairing in a period of 24 hours
2) the two users belong to the chosen category
3) the two users are within 1 rank value of each other. Let me explain that last criteria - if the ranks were sorted, the two chosen users would have adjacent ranks.
This is a hard one and I can’t for the life of me figure it out how to do this effeciently
I truly appreciate any help on this front.
Thanks
You just need two more tables and the rest go in your website logic.
user_category(user_id, category_id)
user_pairing(first_user_id, second_user_id, last_seen)
The first table is to represent a ManyToMany relationship between the users and the category, and the second one is for the users pairing.
I agree with #Yasel, i want to add that you properly want another table
candidate(first_user_id, second_user_id);
this table is used to pre-calculate the candidates for each user, this candidate table is prepopulated every hour/day, so when each first_user_id, second_user_id is assigned, this pair is removed from candidate table and moved into user_pairing table. so each time you only need to query candidate table which should be efficient.

SQL get polls that specified user is winning

Hello all and thanks in advance
I have the tables accounts, votes and contests
A vote consists of an author ID, a winner ID, and a contest ID, so as to stop people voting twice
Id like to show for any given account, how many times theyve won a contest, how many times theyve come second and how many times theyve come third
Whats the fastest (execution time) way to do this? (Im using MySQL)
After using MySQL for a long time I'm coming to the conclusion that virtually any use of GROUP BY is really bad for performance, so here's a solution with a couple of temporary tables.
CREATE TEMPORARY TABLE VoteCounts (
accountid INT,
contestid INT,
votecount INT DEFAULT 0
);
INSERT INTO VoteCounts (accountid, contestid)
SELECT DISTINCT v2.accountid, v2.contestid
FROM votes v1 JOIN votes v2 USING (contestid)
WHERE v1.accountid = ?; -- the given account
Make sure you have an index on votes(accountid, contestid).
Now you have a table of every contest that your given user was in, with all the other accounts who were in the same contests.
UPDATE Votes AS v JOIN VoteCounts AS vc USING (accountid, contestid)
SET vc.votecount = vc.votecount+1;
Now you have the count of votes for each account in each contest.
CREATE TEMPORARY TABLE Placings (
accountid INT,
contestid INT,
placing INT
);
SET #prevcontest := 0;
SET #placing := 0;
INSERT INTO Placings (accountid, placing, contestid)
SELECT accountid,
IF(contestid=#prevcontest, #placing:=#placing+1, #placing:=1) AS placing,
#prevcontest:=contestid AS contestid
FROM VoteCounts
ORDER BY contestid, votecount DESC;
Now you have a table with each account paired with their respective placing in each contest. It's easy to get the count for a given placing:
SELECT accountid, COUNT(*) AS count_first_place
FROM Placings
WHERE accountid = ? AND placing = 1;
And you can use a MySQL trick to do all three in one query. A boolean expression always returns an integer value 0 or 1 in MySQL, so you can use SUM() to count up the 1's.
SELECT accountid,
SUM(placing=1) AS count_first_place,
SUM(placing=2) AS count_second_place,
SUM(placing=3) AS count_third_place
FROM Placings
WHERE accountid = ?; -- the given account
Re your comment:
Yes, it's a complex task no matter what to go from the normalized data you have to the results you want. You want it aggregated (summed), ranked, and aggregated (counted) again. That's a heap of work! :-)
Also, a single query is not always the fastest way to do a given task. It's a common misconception among programmers that shorter code is implicitly faster code.
Note I have not tested this so your mileage may vary.
Re your question about the UPDATE:
It's a tricky way of getting the COUNT() of votes per account without using GROUP BY. I've added table aliases v and vc so it may be more clear now. In the votes table, there are N rows for a given account/contest. In the votescount table, there's one row per account/contest. When I join, the UPDATE is evaluated against the N rows, so if I add 1 for each of those N rows, I get the count of N stored in votescount in the row corresponding to each respective account/contest.
If I'm interpreting things correctly, to stop people voting twice I think you only need a unique index on the votes table by author (account?) ID and contestID. It won't prevent people from having multiple accounts and voting twice but it will prevent anyone from casting a vote in a contest twice from the same account. To prevent fraud (sock puppet accounts) you'd need to examine voting patterns and detect when an account votes for another account more often then statistically likely. Unless you have a lot of contests that might actually be hard.