Mysql database empty column values vs additional identifying table - mysql

Sorry, not sure if question title is reflects the real question, but here goes:
I designing system which have standard orders table but with additional previous and next columns.
The question is which approach for foreign keys is better
Here I have basic table with following columns (previous, next) which are self referencing foreign keys. The problem with this table is that the first placed order doesn't have previous and next fields, so they left out empty, so if I have say 10 000 records 30% of them have those columns empty that's 3000 rows which is quite a lot I think, and also I expect numbers to grow. so in a let's say a year time period it can come to 30000 rows with empty columns, and I am not sure if it's ok.
The solution I've have came with is to main table with other 2 tables which have foreign keys to that table. In this case those 2 additional tables are identifying tables and nothing more, and there's no longer rows with empty columns.
So the question is which solution is better when considering query speed, table optimization, and common good practices, or maybe there's one even better that I don't know? (P.s. I am using mysql with InnoDB engine).

If your aim is to do order sets, you could simply add a new table for that, and just have a single column as a foreign key to that table in the order table.
The orders could also include a rank column to indicate in which order orders belonging to the same set come.
create table order_sets (
id not null auto_increment,
-- customer related data, etc...
primary key(id)
);
create table orders (
id int not null auto_increment,
name varchar,
quantity int,
set_id foreign key (order_set),
set_rank int,
primary key(id)
);
Then inserting a new order means updating the rank of all other orders which come after in the same set, if any.
Likewise, for grouping queries, things are way easier than having to follow prev and next links. I'm pretty sure you will need these queries, and the performances will be much better that way.

Related

Remove duplicate values without ID

I have a table like this:
uuid | username | first_seen | last_seen | score
Before, the table used the primary key of a "player_id" column that ascended. I removed this player_id as I no longer needed it. I want to make the 'uuid' the primary key, but there's a lot of duplicates. I want to remove all these duplicates from the table, but keep the first one (based off the row number, the first row stays).
How can I do this? I've searched up everywhere, but they all show how to do it if you have a row ID column...
I highly advocate having auto-incremented integer primary keys. So, I would encourage you to go back. These are useful for several reasons, such as:
They tell you the insert order of rows.
They are more efficient for primary keys.
Because primary keys are clustered in MySQL, they always go at the end.
But, you don't have to follow that advice. My recommendation would be to insert the data into a new table and reload into your desired table:
create temporary table tt as
select t.*
from tt
group by tt.uuid;
truncate table t;
alter table t add constraint pk_uuid primary key (uuid);
insert into t
select * from tt;
Note: I am using a (mis)feature of MySQL that allows you to group by one column while pulling columns not in the group by. I don't like this extension, but you do not specify how to choose the particular row you want. This will give values for the other columns from matching rows. There are other ways to get one row per uuid.

Nullable foreign keys - good or bad practice?

Let's assume I have 2 tables: foo and bar.
In third table I want to store different kind of data, however every row will have a reference to either foo OR bar.
Is it correct if I create 2 NULLable foreign keys - foo_id and bar_id - in the third table, or is it againts database design principles?
Basically, I thought all the time that foreign keys need to ALWAYS have a "parent", so if I try to e.g. INSERT a row with no primary key matched (or, in this case, with a foreign key set to NULL), I will get an error. Nullable FK-s are new to me, and they feel a bit off.
Also, what are the alternatives? Is it better to create separate tables storing single reference? Isn't this creating redundancy?
Linking tables?
Help.
A nullable FK is "okay". You will still get an error when you try to insert a non-existing parent key (it is just NULL that is allowed now).
The alternative is two link tables, one for foo and one for bar.
Things to consider:
Link tables allow for 1:N. If you don't want that, you can enforce it by primary key on the link table. That is not necessary for the id column solution (they are always 1:N).
You can avoid columns with mostly NULL values using link tables. In your case, though, it seems that you have NULL for exactly half the values. Probably does not qualify as "mostly". This becomes more interesting with more than two parent tables.
You may want to enforce the constraint that exactly one of your two columns is NULL. This can be done with the id column version using a check constraint. It cannot be done with link tables (unless you use triggers maybe).
it is depend on the business logic of the program. if the foreign key field must has a value , it is bad to set it null-able .
for example .
a book table has category_id field which the value is reference from bookCategory table.
each record in book table must has category . if for some reason you set it as nullable . this will cause some record in book table with category_id is null.
the problem will show up in report. the following 2 query will return different total_book
select count(*) as total_book from book;
select
count(*) as total_book
from
book
inner join bookCategory
on book.category_id = category.id
my advice is don't use null-able unless you expect value and no-value . alot of complex system that sometime have value different from one report and another , usually is cause by this.

Proper database design for this task

Ok, so I am going to have at least 2 tables, possibly three.
The data is going to be as follows:
First, a list of search terms. These search terms are unrelated to anything else in the program (only involved in getting the outputs, no manipulation of this data at all), so I plan to store them separately in their own table.
Then things get trickier. I've got a list of words, and each word can be in multiple categories. So for example, if you have "sad", it could be under "angst" and "tragedy", just as "happy" could be under "joy" and "fulfillment".
Would it be better to set up a table where I've got three columns: a UID, a word, and a category, or would it be better to set up two tables: both with UIDs, one with the word, one with the category, and set them up as a foreign key?
The ultimate role is generating number of words in a given category over a given period of time.
I'll be using MySQL and Python (MySQLdb) if that helps anyone.
Ignoring your 'search terms' table (since it doesnt seem to have any relevance to the question), I would probably do it similar to this
words (w_id int, w_word varchar(50))
categories (c_id int, c_category)
wordcategories (wc_wordid int, wc_catid int)
Add foreign key constraints from the ids in wordcategories, onto word and categories tables
Without having a whole lot of details, I would set it up the following way:
Word Table
id int PK
word varchar(20)
Category Table
id int PK
category varchar(20)
Word_Category Table
wordId int PK
categoryId int PK
The third would be the join table between the word and the category. This table would contain the foreign key constraints to the word and category tables.

Best practice MySQL UPDATE/INSERT

Disclaimer: I didn't design these tables, and am aware they're not the best way to store this data. I need advice on the best way to handle the situation.
I have two MySQL tables:
`students`
id int; primary, unique
prof1_id varchar
prof2_id varchar
prof3_id varchar
`professors`
id int; primary, unique, auto-increment
username varchar
fname varchar
fname varchar
students_id int
comment text
A professor's username may be in any of the last three columns in the students table. Professors will need to provide one comment for each student who has them in their row in the students table.
The application that is the front end to this database expects two more columns in the professors table: student_id and comment. Since each student may have any 3 professors in their row, new rows will need to be added to professors whenever they are listed for multiple students. This means that the professors id field will need to auto increment for each new student comment.
What is the best way to accomplish this data transfer with a MySQL query? I've tried looking at UPDATE in combination with JOIN, but I'm not sure there is even a valid way to do this. Also, do I need to create another primary key for professors since the multiple rows will have the same id?
I suspect that a VIEW might be the best way to accomplish this, but the application on the front end expects the information to be stored in the professors table.
One other thing I was considering was that I could create a new table by joining the two tables and have a two-column primary key, students.id, professor.id.
Thanks!
Also, do I need to create another primary key for professors since the
multiple rows will have the same id?
Yes. This is good idea.
I wrote simple query that merges data from first table into 2 columns. This is not complete answer, but it can help you a lot:
SELECT id, prof1id as profid
UNION
SELECT id, prof2id
UNION
SELECT id, prof3id
UNION;
You may use this for view, inserts, but im not familiar with specyfic MySQL syntax and i dont want to misslead you. Please give feedback if it work.
"UNION" removes duplicate rows, you may need to use "UNION ALL" to keep duplicates (like duplicated values in 2 or 3 professors columns).

Would an index on a table be beneficial in this case?

Ive got a table with a many rows, currently no fields are unique. I've got a userid field and an gameid field and some other rows storing information on the games the users have played. As a user plays the game the score is updated so there are quite alot of update queries happening on this table and it's starting to get pretty large.
Would it be beneficial adding another field thats an index and then storing a string such as userid_gameid which would then mean updates were faster in the table if I do my update queries so where index=10_10 (example)
Thanks
Don't add a noddy field for userid + gameid, instead, create an index that includes both columns. If the two columns taken together are intended to be unique then make this the primary key of the table.
CREATE INDEX myIndex ON myTable (userid, gameid)