MySQL - Best performance between 2 solutions - mysql

I need and advice about MySQL.
I have a user table that have id, nickname, numDVD, money and table DVD that have idDVD, idUser, LinkPath, counter.
Now I belive that I could have max. 20 user and each user has about 30 DVD.
So when I insert a DVD I should have idDVD(auto-Increment), idUser (same idUser of User table), LinkPath (generic String), and counter that it is a number from 1 to 30 (unique number) (depends from number or DVD) for each user.
The problem is handle the last column "counter", because I would select for example 2 3 random DVD from 1 to 30 that have the same UserId.
So I was thinking if it's the best solution in my case and hard to handle (for me I never used MySQL) OR it's better create 20 tables (1 for each user) that contains the ID and DVDname etc.
Thanks

Don't create 20 tables! That'd be way overkill, and what if you needed to add more users in the future ? It'd be practically impossible to maintain and update reliably.
A better way would be like:
Table users
-> idUser
-> other user specific data
Table dvd
-> idDvd
-> DVDname
-> LinkPath
-> other dvd specific data (no user data here)
Table usersDvds
-> idUser
-> idDvd
This way, it's no problem if one or more users has the same DVD, as it's just another entry in the usersDvds table - the idDvd value would be the same, but idUser woudl be different. And to count how many DVDs a user has, just do a SELECT count(*) FROM usersDvds WHERE userId = 1

You don't need a table per user, and doing so will make the subsequent SQL programming basically impossible. However with these data volumes practically nothing you do is going to cause or relieve bottlenecks. Very probably the entire database will fit into memory so access via any schema will be practically instantenous.

If I understand your requirements clearly, you should be able to accomplish that by creating a compound index for you to be able to select efficiently.
If there is too much of data that is being handled in that table, then it would help to clear up some historical data.

Related

(MYSQL) Performances in checking if an ID already exists (for ex. in a Poll voters table)

I would like to know which is the best when I want to check if somebody already participated to a members related event (like a poll):
Imagine that I have a table that stores all the voters votes. Over time, it can reach a very big size (10000+ entries/votes for 500 different polls).
When I want to check if a member has already voted to my new poll, what's the best? :
1/ Make a SELECT or a COUNT on the "10000+ entries VOTERS table" to see if said USERID already voted to my new poll.
2/ Having a TEXT columns in my POLL_main_infos Table where i stock/CONCAT the USERIDS like these:
"1,15,42,12,523,8521,7444, etc etc."
And to check, I get that columns as a variable then in my PHP script, I use a REGEX to check if a USERID is already present in it (like looking for ",42,", meaning the USER with the ID "42" already participated to said poll.
Also, if the second solution is the best, should I stock the IDS in a text column or a BLOB?
Anyways, thank you very much in advance!
In a Relational Database system it's usually best to store data normalized, so no list of user ids, never.
So using Holmes IV suggested tables it's a simple:
SELECT 'User already voted'
WHERE EXISTS
(
SELECT * FROM Participation
WHERE PollId = 123
AND UserId = 456
)
With an appropriate index (a UNIQUE index if a user might vote only once per poll) this will be very fast regardless of the number of rows.
How much freedom do you have with this? I would say you need 3 tables, 1 for each : Polls, Voters, Participation . In the Polls table you store Poll_ID and Poll_Description. In voter you store all Voter_ID and any other voter information you might have. Then in the participation you store, Poll_ID,Voter_ID, and results or simply a date of when they participated. Then when you want to see such information you query the 3rd table, and join it on the other too for the specifics
The best way to check if a user already voted is to query the Polls_Voters tables in your setup it would be
Select *
from Polls_Voters
where Voter_ID = 'ID of voter you are looking for'
and poll_id = 'ID of poll you wish to see if they voted in'

SnowFlake Diagram and Many to Many relationship

I have a snowflake diagram with:
Fact:
id_movie
id_user
rating
Dim Users:
id_user
...
Dim Movies:
id_movie
...
In my ERD, I also have a table Category, that has a many to many relationship with the movies like this:
Dim_Category:
id_category
...
Map_Category_Movie:
id_movie
id_category
relevance
I am trying to find an efficient way to model this in a snowflace/star schema. My issues:
I could just add these two tables into the snowflake diagram, but this would feel wrong as I usually only use tables that are aggregates of the subtables on the outer fringes of this diagram.
I could create another fact table for the relevance, but as I want to ultimately report on the correlation of relevance of users to their behaviour in rating in movie, I'd need to use both fact tables, which to me is an incorrect approach.
Any guidance here?
There is huge chance that you have already answered to yourself and welcome to hell.
First, quotation from http://www.information-management.com/ would be interested to you:
The snowflake structure will reduce batch updates to dimensions. Though always said to be slower than a star, some tests have revealed no difference in performance between flattened and snowflaked dimensions. In fact in some cases, the snowflake provides superior performance, such as when a wide dimension (i.e., customer) is segmented into a snowflake.
So, using a bridge table is not going to cause significant loss of performance. I prefer snowflake in good percent of cases because sometimes is really easier to manage your data mart and hardware/size of data gives you an opportunity to do it.
My friendly advise is to create bridge table (movie_ID, category_ID, relevance) and go on.
If you have fixed and small list of categories, create table with predefined categories:
dim_movies
----------
movies_id
category1_relavance
category2_relavance
category3_relavance
Up to ten is perhaps ok, especially if you work for company you're creating dwh, not just consulting it (you can administer).
Once, we have tried to create a masterpiece of data warehouse, where was a similar example like yours. Payment deal was based on performance (data was over 2TB per fact table) so we decided to give shot to create star-schema.
We created dimension like I described above and every time when no. of distinct categories grows etl added new field in table.
ETL process also had to dynamically recreate the cube.
It took a lot of pain but performance was as I remember 13% better than snow-flake.
Also, during the most exhaustively project, where I believe that 10y.o kid would designed DB better, we had to connect exact 5 categories per item. Each category points to one of 20+ possible tables. It could be joined ONLY through theirs software based on some rules. It was some kind of 1...5: Many relationship (it doesn't exists!?!)
pk code_conto cat1 cat2 cat3 cat4 cat5
----------------------------------------------------------
1 123 17 NULL 5467 12 NULL
2 124 67 1098 NULL 1423 AK12
3 123 NULL NULL NULL 13 23
Code was like this:
If (code_conto == 123)
{
Category1_join_set = 'SELECT cat_id, cat_name FROM cat_customers'; //NOTE THIS
Category2_join_set = 'SELECT cat_id, cat_name FROM cat_products';
Category3_join_set = 'SELECT cat_id, cat_name FROM cat_city';
...
...
}
If (code_conto == 124)
{
Category1_join_set = 'SELECT cat_id, cat_name FROM cat_products'; //AND THIS
Category2_join_set = 'SELECT cat_id, cat_name FROM cat_origin'; //ON SAME FIELD
Category3_join_set = 'SELECT cat_id, cat_name FROM cat_blabla'; //DIFFERENT JOIN TABLE
...
...
}
All hard-coded. So we hard coded our queries with over 100 times repeating WHEN in CASE Statement. Guess what? ERP provider 'improved' his software and created mapping table where was 'C' if statements based on code_conto key.
We took more than 3 weeks to provide a good and secure ETL job (with SQLs, external tools).
I didn't wrote all this for nothing. I wanted to convince you and others that using bridge table in many to many relationships is probably the best practice in 97% percents.
However, there are five design solutions to M:M relationship possible:
Array or series (I don't want to even try it)
Bridge table
Groupings
Fixed levels
Dynamically created fixed levels
Hope I didn't confused you.

The optimal way to store multiple-selection survey answers in a database

I'm currently working on a survey creation/administration web application with PHP/MySQL. I have gone through several revisions of the database tables, and I once again find that I may need to rethink the storage of a certain type of answer.
Right now, I have a table that looks like this:
survey_answers
id PK
eid
sesid
intvalue Nullable
charvalue Nullable
id = unique value assigned to each row
eid = Survey question that this answer is in reply to
sesid = The survey 'session' (information about the time and date of a survey take) id
intvalue = The value of the answer if it is a numerical value
charvalue = the value of the answer if it is a textual representation
This allowed me to continue using MySQL's mathematical functions to speed up processing.
I have however found a new challenge: storing questions that have multiple responses.
An example would be:
Which of the following do you enjoy eating? (choose all the apply)
Girl Scout Cookies
Bacon
Corn
Whale Fat
Now, when I want to store the result, I'm not sure of the best way to handle it.
Currently, I have a table just for multiple choice options that looks like this:
survey_element_options
id PK
eid
value
id = unique value associated with each row
eid = question/element that this option is associated with
value = textual value of that option
With this setup, I then store my returned multiple selection answers in 'survey_answers' as strings of comma separated id's of the element_options rows that were selected in the survey. (ie something like "4,6,7,9") I'm wondering if that is indeed the best solution, or if it would be more practical to create a new table that would hold each answer chosen, and then reference back to a given answer row which in turn references back to the element and ultimately the survey.
EDIT
for anyone interested, here is the approach I ended up taking (In PhpMyAdmin Relations View):
And a rudimentary query to gather the counts for a multiple select question would look like this:
SELECT e.question AS question, eo.value AS value, COUNT(eo.value) AS count
FROM survey_elements e, survey_element_options eo, survey_answer_options ao
WHERE e.id = 19
AND eo.eid = e.id
AND ao.oid = eo.id
GROUP BY eo.value
This really depends on a lot of things.
Generally, storing lists of comma separated values in a database is bad, especially if you plan to do anything remotely intelligent with that data. Especially if you want to do any kind of advanced reporting on the answers.
The best relational way to store this is to also define the answers in a second table and then link them to the users response to a question in a third table (with multiple entries per user-question, or possibly user-survey-question if the user could take multiple surveys with the same question on it.
This can get slightly complex as a a possible scenario as a simple example:
Example tables:
Users (Username, UserID)
Questions (qID, QuestionsText)
Answers (AnswerText [in this case example could be reusable, but this does cause an extra layer of complexity as well], aID)
Question_Answers ([Available answers for this question, multiple entries per question] qaID, qID, aID),
UserQuestionAnswers (qaID, uID)
Note: Meant as an example, not a recommendation
Convert primary key to not unique index and add answers for the same question under the same id.
For example.
id | eid | sesid | intval | charval
3 45 30 2
3 45 30 4
You can still add another column for regular unique PK if needed.
Keep things simple. No need for relation here.
It's a horses for courses thing really.
You can store as a comma separated string (But then what happens when you have a literal comma in one of your answers).
You can store as a one-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
intvalue Nullable
charvalue Nullable
And then loop over that table. If you picked one answer, it would create one row in this table. If you pick two answers, it will create two rows in this table, etc. Then you would remove the intvalue and charvalue from the survey_answers table.
Another choice, since you're already storing the element options in their own table, is to create a many-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
survey_element_options_id FK
Again, one row per option selected.
Another option yet again is to store a bitmask value. This will remove the need for a many-to-many table.
survey_element_options
id PK
eid FK
value Text
optionnumber unique for each eid
optionbitmask 2 ^ optionnumber
optionnumber should be unique for each eid, and increment starting with one. There will impose a limit of 63 options if you are using bigint, or 31 options if you are using int.
And then in your survey_answers
id PK
eid
sesid
answerbitmask bigint
Answerbitmask is calculated by adding all of the optionbitmask's together, for each option the user selected. For example, if 7 were stored in Answerbitmask, then that means that the user selected the first three options.
Joins can be done by:
WHERE survey_answers.answerbitmask & survey_element_options.optionbitmask > 0
So yeah, there's a few options to consider.
If you don't use the id as a foreign key in another query, or if you can query results using the sesid, try a many to one relationship.
Otherwise I'd store multiple choice answers as a serialized array, such as JSON or through php's serialize() function.

What is a good way to design a table with relationship between users?

I'm currently working on a social networking site. (Yeah, I know, there's a whole bunch of them. I'm not trying to make Facebook all over)
I was wondering if anyone could tell me if my way of thinking is way off, or if it is the way it is actually done.
I want a user to be able to have friends. And, for that, I'm thinking that I should have one usertable like so:
USER
uId
userName
email
etc..
This should probably be a 1:N relationship, so I'm thinking that a table "contacts" should hold a list of users and their friends like so:
CONTACTS
uId (From USER)
FriendId (From USER table)
Friendship type ENUM[Active, Inactive, Pending]
Would it be an effective solution to sort this table on uId, so that a query result would look something similar to this:
uID | friendId
1 | 2
1 | 6
1 | 97
75 | 1
75 | 34
etc
Or are there any different solutions to this?
If you are simply looking to select a specific users set of friends, the query will be straightforward and you won't have to worry about performance.
For example: If you are looking to return the id's of UID 8's friends, you can just do something like:
Select FriendId FROM TABLE where UID=8;
In your case, since the UID column is not unique, make sure to have an Index on this column to allow quick lookup (optimize performance).
You might also want to think about what other data you will need about the users friends. For example its probably not useful to just grab the FriendIds, you probable want names etc. So your query will likely look more like:
Select FriendId, Users.name FROM Friends JOIN Users ON Users.uid=Friends.FriendId WHERE Friends.UID=8;
Again, having the proper columns indexed is key for optimized lookups, especially once your table size gets big.
Also, since the act of adding friends is likely very uncommon in comparison to the number of lookup queries you do, be sure to choose a database engine that provides the fastest lookup speed. In this case MyISam is probably your best bet. MyISam uses table level locking for inserts (i.e. slower inserts) but the lookups are quick.
Good luck!
I think the best way is without doubt creating a table like you proposed. This will allow you to better manage the friends, do query's for friends on this table, ... this would be the best solution.

building activity feed

I want to create some kind of 'activity feed'. For example, There are total 1000 users in database, of which there are 100 people in contact list of user X, who is concerned with those 100 users only, and want that if any of them posts a note (in general, takes an action), he wants to get that update on my page. For this purpose, do i need to make a database table, like:
id user_id note_id
In this table, there will be users which are not concerned to user X, so I will make some query like,
select user_id from activity_table which exists in contact list of user X
Is my approach correct regarding this matter (for example database table design and query)?
Is there any better approach?
If I understand you correctly I think you need a relation table where you will store user_ids of the user that is being concerned and of the user that concerns.