I am trying to figure out the best method to design a database that allows users to rank a number of items.
The items are ranked by all users.
Every item will have a rank assigned to it automatically in the beginning. So when I create a user, the rankings table will be auto populated.
So something like this:
users (id, name) - 100,000+ entries
items (id, name) - never more than 1,000 entries
The only thing I can currently think of to house the rankings is this:
rankings (id, user_id, item_id, ranking)
But that feels wrong because I'll have 100 million entries in the rankings table. I don't know if that's ok? What other option could I use?
Can each user assign either zero or one ranking to each item? Or can she assign more than one ranking to a given item?
If it's zero-or-one, your ranking table should have these columns
user_id INT
item_id INT
ranking INT
The primary key should be the composite key (user_id, item_id). This will disallow multiple rankings for the same item for the same user, and will be decently efficient. Putting a separate id on this table is not the right thing to do.
For the sake of query efficiency I suggest you also create the covering indexes (user_id, item_id,ranking) and (item_id, user_id, ranking).
If you can get your hundreds of thousands of users to rank all 1000 items, that will be great. That's a problem any web app developer would love to have.
This table's rows are reasonably small, and with the indexes I mentioned it should perform decently well both at a smaller scale and as you add users.
Related
I'm setting up a system where for every user (1000+), I want to add a set of values every single day.
Hypotetically:
A system where I can log when Alice and Bob woke up and what they had for dinner on the August 1st 2019 or 2024.
Any suggestions on how to best structure the database tables?
A person table with a primary person ID?
rows: n
A date table with a primary date ID?
rows: m
And a personDate table the person ID and date ID as foreign keys?
rows n x m
I don't think u need a date table unless u want to use it to make specific queries easier. Such as left join against the date to see what days you are missing events. Nevertheless, I would stick to the DATE or DATETIME as the field and avoid making a separate surrogate foreign key. It won't save any space and will potentially perform worse and will be more difficult to use for the developer.
This seems simple and fine to me. I wouldn't worry too much about the performance based upon the number of elements alone. You can insert a billion records with no problem and that implies a very large site.
Just don't insert records if the event didn't happen. In other words you want your database to grow in relation to the real usage. Avoid growth based upon phantom events and you should be okay.
person
person_id
action
action_id
personAction
person_id
action_id
action_datetime
Consider we have one sql table customers
now consider iF we have a table where their are two columns customer_name and orders_name now one customer may have multiple orders (one to many relationship) So we have table where in which we choose customer_name as foriegn key. But now consider we have 100 orders to one customer_name so we have to write same customer_name 100 times. waist of memory.
customer_name,customer_orders table is
so i was thinking is can't we just make table with name of customer_name orders, for examle if we have customer_name bill so we can create a table with name of bill's orders, and write all his orders in it, now we not using any foriegn key,
bill's orders table is
and more tables we can create for other users so how it is possible to delete the table when we delete that customer_name from main table. any idea?
You solve the issue of wasted space by using surrogate keys. Instead of copying a huge alphanumeric field (names) to child tables, you would create an ID of sorts using a more compact data type (byteint, smallint, int, etc.). In the approach you propose where you create a separate table for each customer, you will run into the following issues:
cannot run aggregates across customers, i.e., you cannot simply do a sum, avg, min, etc. for sets of customers slicing the data different ways
SQL will be far more complex with each extra customer added to the queries
your data dictionary is going to grow huge and at some point you will incur major performance issues that are not easy to fix
The point of using a relational database is to allow for users to dynamically slice and dice the data. The method that you are proposing would not be useful for querying.
I have a web application which allows users to join multiple groups.
I have a 'users' table which stores details about the users (id, email, password, etc.) and a 'groups' table which stores details about the available groups (id, name, owner of group).
I have been researching the best way to store group memberships (i.e. which users are in which group, bearing in mind they can be members of multiple) - however I am still not sure what the most efficient solution would be.
Would you recommend I:-
Create a second table called 'group_memberships' and store the user's ID along with the corresponding group ID?
Store an array alongside the group particulars in the 'groups' table with the user IDs of its members?
Approach this task a different way?
The DBMS I am using is phpMyAdmin.
I would advise you to go with option 1; where you have a Mapping Table for linking Users & Groups.
The Users Table will have PK on User_ID.
The Groups table will have PK on Group_ID.
The Mapping table will have User_ID(FK) and Group_ID(FK).
Now you should have PK on these two columns together.
This will ensure you don't have duplicate entries.
What you're describing is called a many-to-many relationship in database terms. A user can belong to multiple groups, and groups have more than one user (or else they wouldn't be "groups"!).
Your first idea, the group_memberships table, is the accepted best way to model this relationship. Although you'll want to name it users_groups or something similar to reflect the fact it relates or associates those two tables. At its most basic, this association table needs three columns:
ID (primary key)
user_id (foreign key)
group_id (foreign key)
By JOINing to this table ON either user_id or group_id, you can find the related records from either side of the relationship. And you can do it right from a SQL query, without any additional code like you'd need if you stored an array.
I would definitely go with option 1 - creating the junction table 'group_memberships' - I have used this approach many times without problems. Don't forget to add an Index on the new table 'group_memberships' for columns: 'groupID' and 'userID'.
Option 2 is not scalable for a large amount of data, especially if groups have a lot of users.
UPDATE:
For info on Indexes, here is a good (and short) blog: https://blog.viaduct.io/mysql-indexes-primer/.
The first option is a right choice. Actually it is a materialized view for both user table and group table.
Just think materialized view as a extra table or a redundant data structure that denormalizes the user properties and group properties together for quick search.
Since if we do not have the view, when we query a group id to list all its users, we have to filter millions of users to check if he/she is in the certain group. It is a performance nightmare!
Mysql has tools to build this view very efficiently. You may build secondary index on this view columns for quick search, say, group id, group name, user id, user name or something else you hope to search with.
good luck :-)
I am modeling a database that will hold data relating sales from a shop. I have two different kind of users which have distinct properties.
In my normalization process I created a User1 Table and a User2 Table.
The problem I have is tracking sales. For example if I want to track sales say for a particular kind of user for a product and know the users buying trends for that product, I would have to create two tables, Bread1 and Bread2 for the two types of users.
I came up with a solution but I don't know the performance implications on the long run or if its the best solution at all. The solution is having a unified table User which have the ID's of both users.
If there is any other better solution, I would appreciate it.
Thanks.
Better to have just one table for Users and one for Products and so on. You can easily categorize your users or products by making an another table for grouping them.
UserTypes(UserTypeId PK, ...)
Users(UserId PK, UserTypeId FK, ...)
Also for properties you mentioned for users or even for products, You can have a list of properties in a table and assign them by a third table to users.
Properties(PropertyId PK, Name)
UsersProperties(UserPropertyId PK, UserId FK, PropertyId FK, Value)
1 database with 3 tables: user - photo - vote
- A user can have many photos.
- A photo can have many votes.
- A user can vote on many photos.
- A vote records:
. the result as an int (-1/disliked, 0/neutral, 1/liked)
. the id of the user who voted.
Here is what I have (all FKs are cascade on delete and update):
http://grab.by/iZYE (sid = surrogate id)
My question is: this doesn't seem right, and I look at this for 2 days already and can't confidently move on. How can I optimize this or am I completely wrong?
MySQL/InnoDB tables are always clustered (more on clustering here and here).
Since primary key also acts as a clustering key1, using the surrogate primary key means you are physically sorting the table in order that doesn't have a useful meaning for the client applications and cannot be utilized for querying.
Furthermore, secondary indexes in clustered tables can be "fatter" than in heap-based tables and may require double lookup.
For these reasons, you'd want to avoid surrogates and use more "natural" keys, similar to this:
({USER_ID, PICTURE_NO} in table VOTE references the same-named fields in PICTURE. The VOTE.VOTER_ID references USER.USER_ID. Use integers for *_ID and *_NO fields if you can.)
This physical model will enable extremely efficient querying for:
Pictures of the given user (a simple range scan on PICTURE primary/clustering index).
Votes on the given picture (a simple range scan on VOTE primary/clustering index). Depending on circumstances, this may actually be fast enough so you don't have to cache the sum in PICTURE.
If you need votes of the given user, change the VOTE PK to: {VOTER_ID, USER_ID, PICTURE_NO}. If you need both (votes of picture and votes of user), keep the existing PK, but create a covering index on {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}.
1 In InnoDB. There are DBMSes (such as MS SQL Server) where clustering key can differ from primary.
The first thing I see is that you have duplicate unique IDs on the tables. You don't need the sid columns; just use user_id, photo_id, and photo_user_id (maybe rename this one to vote_id). Those ID columns should also be INT type, definitely not VARCHARs. You probably don't need the vote total columns on photo; you can just run a query to get the total when you need it and not worry about keeping both tables in sync.
Assuming that you will only allow one vote per user on each photo, the structure of the can be modified so the only columns are user_id, photo_id, and vote_result. You would then make the primary key a composite index on (user_id, photo_id). However, since you're using foreign keys, that makes this table a bit more complicated.