Does Relations between tables in database speed up performance of queries? - mysql

I am using join in my quires and i want to know if the relations between database tables leads to increase performance of the queries.
Thank You.

For boosting performance, you should use indexes, use appropriate datatypes as well (storing number as string takes more space and comparing may be less efficient).
Relations between tables, i.e. foreign key are constraints, so you cannot enter new value to referenced table without referencing records in other table - it is a way to keep data integrity, eg.
Table1
id table2_id
1 1
2 1
3 3
Table2
id some_column
1 123
2 123
3 null
Here, Table1.table2_id references Table2.id. Now you won't be able to insert such row to Table1: 4, 4, because there's no id = 4 in Table2.

Related

Problems generating random data in SQL workbench

I am new using mySQL, so probably my question will be very banal, but I didn't find any solution on internet.
I have two tables, TABLE 1 and TABLE 2, each one with a single primary key tab1PK (INT) and tab2PK (VARCHAR).
Since TABLE 1 and TABLE 2 have a M:N relationship, I have a third table, TABLE 3, whose PK are two: tab1PK and tab2PK.
I generated random data for TABLE 1 and TABLE 1. Is there a way to generate rapidly data for the TABLE 3? Is there a way to easily combine tab1PK and tab2PK?
This will give you a cartesian join of all table1 & table2 primary keys.
insert into table3 (tab1PK, tab2PK)
select table1.tab1PK, table2.tab2PK
from table1, table2

Disadvantage of "combined" lookup table in mySQL vs individual lookup tables

Are there big disadvantages (maybe in query speed etc.) of using only ONE combined lookup table (mySQL database) to store "links" between tables over having individual lookup tables? I am asking because in my project scenario I would end up with over a hundred individual lookup tables, which I assume will be a lot of work to setup and maintain. But to make an easier example here is a simplified scenario between only 4 tables:
Table: teacher
teacherID
name
1
Mr. X
2
Mrs. Y
Table: student
studentID
name
4
Tom
5
Chris
Table: class
classID
name
7
Class A
8
Class B
Table: languageSpoken
languageSpokenID
name
10
English
11
German
======================= INDIVIDUAL LOOKUP TABLES ==========================
Table: student_teacher
studentID
teacherID
4
1
5
1
Table: student_class
studentID
classID
4
7
5
8
Table: student_languageSpoken
studentID
languageSpokenID
4
10
4
11
====== VS ONE COMBINED LOOKUP TABLE (with one helper table) =====
helper table: allTables
tableID
name
1
teacher
2
student
3
class
4
languageSpoken
table: lookupTable
table_A
ID_A
table_B
ID_B
1
1
2
4
1
1
2
5
3
7
2
4
3
8
2
5
Your 2nd lookup schema is absolutely unuseful.
You refer to a table by its name/index. But you cannot use this relation directly (tablename cannot be parametrized), you need to build conditional joining expression or use dynamic SQL. This is slower.
Your lookup table is reversable, i.e. the same reference may be written by 2 ways. Of course, you may add CHECK constraint like CHECK table_A < table_B (additionally it avoids self-references), but this again degrades the performance.
Your lookup does not prevent non-existent relations (for example, class and language are not related but nothing prevents to create a row for such relation). Again, additional constraint and decreased performance.
There are more disadvantages... but I'm too lazy to list them all.
Another very important point: Foreign key constraints assuring referential integrity cannot be used in the "combined lookup" approach. They needed to be simulated by complex and error prone triggers. Overall the "combined lookup" approach is just a horrible idea. – sticky bit
There is a rule - non-relational relations must be separated.
In the 1st scheme - does a student may study in more than one class at the same time? If not then you do not need in student_class lookup table, and class_id is an attribute in student table.
Lookup tables are usually static so there shouldn't be much maintenance overhead. If you update the lookup data, however, now have to manage the life cycle of a subset of rows of your single lookup table which may get tricky opposed to just truncating a table when new data becomes available. Where I would be careful if your lookup table have different schemas with columns have to be null as they apply to a given "type" of row. You may not be able to implement the right foreign keys. If you happen to use the wrong id, you would get a nonsensical value. Those help you keep your data consistent (in production systems). If this is school project, especially a database class, you will be dinged for not using textbook normalization.

multi-to-multi indexing issue, mysql

i have a multi-to-multi table, this is going to have millions of rows. Let me describe my confusion with an example.
example:
table: car_dealer_rel
opt:1
columns: car_id: int unsigned, dealer_id: int unsigned
index on: car_id, dealer_id
car_id|dealer_id
-------|---------
1 | 1
1 | 2
....
sub-opt:1: Here I can have one index on both columns.
sub-opt:2: One combined index on 2 columns.
opt-2:
one column table
col: car_id_dealer_id: varchar:21
index on: PKI on this single column.
Here idea is to put values as: car_id.dealer_id and do searches as %.xxx and or xxx.%
car_id_dealer_id
----------------
1.1
1.2
1.15
2.10
...
...
after millions of records which will be faster for:
read from
add/update/delete.
I am novice on MySQL, all help is appreciated.
with first one
car_id|dealer_id
-------|---------
1 | 1
1 | 2
you can easlily create composite index fo both sides
create index ind1 on car_dealer_rel (car_id,dealer_id );
create index ind2 on car_dealer_rel (dealer_id, car_id );
that work very fast
and you can easily filter in both the sense
where car_id = your_value
or
where dealer_id = another_value
or using both
with the second one you can't do this easily( you need frequently string manipulation and this don't let you use the index) and in some condition you can't do using sql
and for update, insert and delete the performance remain pratically the same
It depends on the actual query that you use and I suggest run EXPLAIN first, with quite many dummy data, to understand how MySQL is going to execute your query.
But if you are going to find records by column car_id alone or car_id and dealer_id, you can use composite index (car_id, dealer_id).
If you also want to find by dealer_id alone, you can add additional index on dealer_id column.
Your one column table option is not very good because
You cannot find rows by dealer_id fast.
Table schema is not normalized.

Is it good practice to keep 2 related tables (using auto_increment PK) to have the same Max of auto_increment ID when table1 got modified?

Let see this example, we have 2 interrelated tables:
Table1
textID - text
1 - love..
2 - men...
...
Table2
rID - textID
1 - 1
2 - 2
...
Note:
In Table1:
textID is auto_increment primary key
In Table2:
rID is auto_increment primary key & textID is foreign key
The relationship is that 1 rID will have 1 and only 1 textID but 1 textID can have a few rID.
So, when table1 got modification then table2 should be updated accordingly.
Ok, here is a fictitious example. You build a very complicated system. When you modify 1 record in table1, you need to keep track of the related record in table2. To keep track, you can do like this:
Option 1: When you modify a record in table1, you will try to modify a related record in table 2. This could be quite hard in term of programming expecially for a very very complicated system.
Option 2: instead of modifying a related record in table2, you decided to delete old record in table 2 & insert new one. This is easier for you to program.
For example, suppose you are using option2, then when you modify record 1,2,3,....,100 in table1, the table2 will look like this:
Table2
rID - textID
101 - 1
102 - 2
...
200 - 100
This means the Max of auto_increment IDs in table1 is still the same (100) but the Max of auto_increment IDs in table2 already reached 200.
what if the user modify many times? if they do then the table2 may run out of records? we can use BigInt but that make the app run slower?
Note: If you spend time to program to modify records in table2 when table1 got modified then it will be very hard & thus it will be error prone. But if you just clear the old record & insert new records into table2 then it is much easy to program & thus your program is simpler & less error prone.
So, is it good practice to keep 2 related tables (using auto_increment PK) to have the same Max of auto_increment ID when table1 got modified?

Mysql Relational Database duplicate with different keys

I'm trying to correct a relational db for a month, but i cant find efficient solution.
Hier is my problem:
I have like 534 M rows Relational Db with lots of foreig keys(30).
I can handle normal duplicates with union...group by...havin count(*)=1 by inserting, but there are also duplciates with different keys.
example:
table 1
id | key1 | value
1 | 11 | a1
2 | 22 | a1
table 2
key1 | value
11 | a2
22 | a2
Foreign key table1(key1) references table2(key1)
I'm trying to find, remove duplicate , correct the parents.
I have tried 3 different ways,
1: PHP Script,Arrays
export tables (dump) --> array_unique, find duplicates, correct the parents array --> import tables
Its pretty fast, but need 80GB Memory, which could be problem in the future
2: PHP Script,SQL Query
exporrt tables(dump) --> find duplicates --> send queries to parent table
No need memory, but the tables are really big and 5 queries take 1 second, 50 M duplicates would take days, months, years
3: ON DUPLICATE UPDATE KEY: I added one column 'duplicate' to store duplicate keys and I defined all columns except key as unique key,
insert.... on duplicate update concat(duplicate,';',VALUES(key)).
But some tables has more than 1 key and sometimes I should define 24 column as unique index and memory problem again
I hope I could explain my problem. Do you have any idea ?
Why don't you simply create a unique key on column. Just use "Ignore" keyword it will remove the duplicate records.So your query will be something like: ALTER IGNORE TABLE testdb.table1
ADD UNIQUE INDEX column1 (column1 ASC) ;