Question
Is it valid to have a Null foreign key? Are there any disadvantages?
Example
______ _________________
| id | | id | id_fk |
| 1 | | 1 | 2 |
| 2 | | 2 | 5 |
| 3 | | 3 | |
| 4 | | 4 | 1 |
| 5 | | 5 | |
‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
Yes it is possible, and no it is not bad practice. You will want to make sure you take this into account when running queries on your database. If you attempt to run an inner join with your selects, then you will automatically exclude all rows with a null value. In cases where you want to join on that relationship even if the foreign key is null, you will want to use an outer join.
Related
I've been trying to implement the solution here with the added flavour of updating existing records. As an MRE I'm looking to populate the sum_date_diff column in a table with the sum of all the differences between the current row date and the date of every previous row where the current row p1_id matches the previous row p1_id or p2_id. I have already filled out the expected result below:
+-----+------------+-------+-------+---------------+
| id_ | date_time | p1_id | p2_id | sum_date_diff |
+-----+------------+-------+-------+---------------+
| 1 | 2000-01-01 | 1 | 2 | Null |
| 2 | 2000-01-02 | 2 | 4 | 1 |
| 3 | 2000-01-04 | 1 | 3 | 3 |
| 4 | 2000-01-07 | 2 | 5 | 11 |
| 5 | 2000-01-15 | 2 | 3 | 35 |
| 6 | 2000-01-20 | 1 | 3 | 35 |
| 7 | 2000-01-31 | 1 | 3 | 68 |
+-----+------------+-------+-------+---------------+
My query so far looks like:
UPDATE test.sum_date_diff AS sdd0
JOIN
(SELECT
id_,
SUM(DATEDIFF(sdd1.date_time, sq.date_time)) AS sum_date_diff
FROM
test.sum_date_diff AS sdd1
LEFT OUTER JOIN (SELECT
sdd2.date_time AS date_time, sdd2.p1_id AS player_id
FROM
test.sum_date_diff AS sdd2 UNION ALL SELECT
sdd3.date_time AS date_time, sdd3.p2_id AS player_id
FROM
test.sum_date_diff AS sdd3) AS sq ON sq.date_time < sdd1.date_time
AND sq.player_id = sdd1.p1_id
GROUP BY sdd1.id_) AS master_sq ON master_sq.id_ = sdd0.id_
SET
sdd0.sum_date_diff = master_sq.sum_date_diff
This works as shown here.
However, on a table of 1.5m records the query has been hanging for the last hour. Even when I add a WHERE clause onto the bottom to restrict the update to a single record then it hangs for 5 mins+.
Here is the EXPLAIN statement for the query on the full table:
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| 1 | UPDATE | sum_date_diff | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100 | NULL |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | const | 10 | 100 | NULL |
| 2 | DERIVED | sum_date_diff | NULL | index | PRIMARY,ix__match_oc_history__date_time,ix__match_oc_history__p1_id,ix__match_oc_history__p2_id,ix__match_oc_history__date_time_players | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index; Using temporary |
| 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2968576 | 100 | Using where; Using join buffer (hash join) |
| 3 | DERIVED | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
| 4 | UNION | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
Here is the CREATE TABLE statement:
CREATE TABLE `sum_date_diff` (
`id_` int NOT NULL AUTO_INCREMENT,
`date_time` datetime DEFAULT NULL,
`p1_id` int NOT NULL,
`p2_id` int NOT NULL,
`sum_date_diff` int DEFAULT NULL,
PRIMARY KEY (`id_`),
KEY `ix__sum_date_diff__date_time` (`date_time`),
KEY `ix__sum_date_diff__p1_id` (`p1_id`),
KEY `ix__sum_date_diff__p2_id` (`p2_id`),
KEY `ix__sum_date_diff__date_time_players` (`date_time`,`p1_id`,`p2_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1822120 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
MySQL version is 8.0.26 running on a 2016 MacBook Pro with Monterey with 16Gb RAM.
After reading around about boosting the RAM available to MySQL I've added the following to the standard my.cnf file:
innodb_buffer_pool_size = 8G
tmp_table_size=2G
max_heap_table_size=2G
I'm wondering if:
I've done something wrong
This is just a very slow task no matter what I do
There is a faster method
I'm hoping someone could enlighten me!
Whereas it is possible to do calculations like this in SQL, it is messy. If the number of rows is not in the millions, I would fetch the necessary columns into my application and do the arithmetic there. (Loops are easier and faster in PHP/Java/etc than in SQL.)
LEAD() and LAG() are possible, but they are not optimized well (or so is my experience). In an APP language, it is easy and efficient to look up things in arrays.
The SELECT can (easily and efficiently) do any filtering and sorting so that the app only receives the necessary data.
Good day to all - I am a novice in SQL and have reviewed potential solutions here to this problem. The first approach of using Insert Into commands do not work, period. It creates double rows and NULLS.
I have read and tried solutions posted here using Update-Join syntax. Can't see to get it right, maybe I don't need a WHERE condition, maybe I do, etc.
Here are the two tables. I want to take data from IPTEST table, IP column and insert it into the POSTALTEST table, IP column:
IPTEST TABLE
+---------------+
| IP |
+---------------+
| 65.4.166.241 |
| 65.49.188.24 |
| 65.12.231.173 |
| 65.30.224.18 |
| 65.83.140.96 |
+---------------+
POSTALTEST TABLE
+-------+------+-----------+------+
| zip | zip4 | ailmentID | IP |
+-------+------+-----------+------+
| 15227 | 3709 | 26 | NULL |
| 15227 | 3724 | 29 | NULL |
| 15227 | 3736 | 22 | NULL |
| 15227 | 3737 | 22 | NULL |
| 15227 | 3737 | 26 | NULL |
+-------+------+-----------+------+
I would be so grateful if someone can help, went to links of "learn about join" already so please someone spare me the humiliation of posting that and help me with the syntax :D
Kind Regards,
Jason
If you are using SQL SERVER:
I suggest you to create one more column which will be the primary key in first table and to add FOREIGN CONSTRAINT to create relationship between between the first and second table
IPTEST TABLE
+---------------++---------------+
| IP_ID | IP |
+---------------++---------------+
| 1 || 65.4.166.241 |
| 2 || 65.49.188.24 |
| 3 || 65.12.231.173 |
| 4 || 65.30.224.18 |
| 5 || 65.83.140.96 |
+---------------++---------------+
-- add primary key
ALTER TABLE IPTEST ADD PRIMARY KEY (IP_ID)
POSTALTEST TABLE
+-------+------+-----------+------+
| zip | zip4 | ailmentID | IP |
+-------+------+-----------+------+
| 15227 | 3709 | 26 | 1 |
| 15227 | 3724 | 29 | 2 |
| 15227 | 3736 | 22 | 3 |
| 15227 | 3737 | 22 | 4 |
| 15227 | 3737 | 26 | 5 |
+-------+------+-----------+------+
-- add FOREIGN CONSTRAINT to create relationship
ALTER TABLE POSTALTEST
ADD CONSTRAINT FK_X FOREIGN KEY (IP)
REFERENCES IPTEST (IP_ID);
-- inner join
SELECT ip_1.*,pt.*
FROM IPTEST as ip_1
INNER JOIN POSTALTEST as pt on ip_1 = pt.IP
If you just wanna to know how to use insert select than u can use:
INSERT INTO POSTALTEST (IP)
SELECT IP
FROM IPTEST
I have a table of parents, and a table of children
My parents table is as follows:
+----------------+-------------------------------+
| parent_id | parent_name |
+----------------+-------------------------------+
| 10792 | Katy |
| 7562 | Alex |
| 13330 | Drew |
| 9153 | Brian |
+----------------+-------------------------------+
My children's table is:
+----------+-------------------------------+-----------+-----+
| child_id | child_name | parent_id | age |
+----------+-------------------------------+-----------+-----+
| 1 | Matthew | 10792 | 4 |
| 2 | Donald | 9153 | 5 |
| 3 | Steven | 10792 | 9 |
| 4 | John | 7562 | 6 |
+----------+-------------------------------+-----------+-----+
When I use a sub-select such as:
SELECT parent_name, (SELECT SUM(age) FROM children WHERE parent_id = parents.parent_id) AS combined_age FROM parents;
My issue is that when I execute this query (parents are 13,000 records, children are 21,000 records) an index of parent_id in children doesn't get used, as shown in the explain plan. I get:
+----+--------------------+--------------------------+--------+---------------+------+---------+-------+-------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------------+--------+---------------+------+---------+-------+-------+-------------------------------------------------+
| 1 | PRIMARY | parents | ALL | NULL | NULL | NULL | NULL | 13548 | NULL
| 2 | DEPENDENT SUBQUERY | children | ALL | PARENTS,AGE | NULL | NULL | NULL | 21654 | Range checked for each record (index map: 0x22) |
+----+--------------------+--------------------------+--------+---------------+------+---------+-------+-------+-------------------------------------------------+
This query is taking over 3 minutes to run, and I can't seem to get the subquery to use an index to query where the children belong to the parent. I tried USE INDEX and FORCE INDEX, as well as USE KEY AND FORCE KEY. Any ideas?
So It turns out this happens when the two ID fields are not the same field type INT(11) VS. VARCHAR(45). In the application, one of the table's ID fields was created strangely. Updating the field type solved the SQL optimizer.
Thanks!
The index should be used. The optimal index would be children(parent_id, age).
You can try re-writing the query as a join:
select p.parent_name, sum(c.age)
from parents p left join
children c
on p.parent_id = c.parent_id
group by p.parent_name;
I'm facing a problem of SELECT perfomance issue with MYSQL.
I have two tables "domain" and "email" which contain duplicates, theses tables are frequently updated (INSERT/DELETE) by different sources (every ten mins approximatively).
My primary objective was to make two views from thoses tables without any duplicates. I know a view is a stored query but this is my only way to keep it dynamic, creating a new table without duplicate every tens mins would be mad (maybe not?).
Both views are used by another thread (postfix) to check if the recipient is an allowed one. When i try to do a simple query
SELECT email FROM emailview WHERE email = 'john#google.com'`
the query takes 3-4seconds. On the contrary if I do my SELECT directly on the email table (with duplicates in) it takes 0,01sec.
How could i improve the SELECT performances on my system to obtain almost similar result with a view and not directly on the table ?
Here is the detail of the architecture (INNODB Engine, value 1 is random and doesn't really matter) :
Domain Table :
| field | type | null | key |
|--------------|--------------|------|------|
| domain | varchar(255) | NO | NULL |
| creationdate | datetime | NO | NULL |
| value 1 | varchar(255) | NO | NULL |
| source_fkey | varchar(255) | MUL | NULL |
| domain | creationdate | value 1 | source_fkey |
|------------|---------------------|-----------------------|
| google.com | 2013-05-28 15:35:01 | john | Y |
| google.com | 2013-04-30 12:10:10 | patrick | X |
| yahoo.com | 2011-04-02 13:10:10 | britney | Z |
| ebay.com | 2012-02-12 10:48:10 | harry | Y |
| ebay.com | 2013-04-15 07:15:23 | bill | X |
Domain View (duplicate domain are removed using the oldest creation date) :
CREATE VIEW domainview AS
SELECT domain.domain, creationdate, value1, source_fkey
FROM domain
WHERE (domain, creationdate) IN (SELECT domain, MIN(creationdate)
FROM domain GROUP BY domain);
| domain | creationdate | value 1 | source_fkey |
|------------|---------------------|-----------------------|
| google.com | 2013-04-30 12:10:10 | patrick | X |
| yahoo.com | 2011-04-02 13:10:10 | britney | Z |
| ebay.com | 2012-02-12 10:48:10 | harry | Y |
Email table :
| field | type | null | key |
|--------------|--------------|------|------|
| email | varchar(255) | NO | NULL |
| source_fkey | varchar(255) | MUL | NULL |
| email | foreign_key |
|--------------------|-------------|
| john#google.com | X |
| john#google.com | Y | <-- duplicate from wrong foreign/domain
| harry#google.com | X |
| mickael#google.com | X |
| david#ebay.com | Y |
| alice#yahoo.com | Z |
Email View (legit emails and emails from domain/foreign_key of the domain view) :
CREATE VIEW emailview AS
SELECT email.email, email.foreign_key
FROM email, domainview
WHERE email.foreign_key = domainview.foreign_key
AND SUBSTRING_INDEX(email.email,'#',-1) = domainview.domain;
| email | foreign_key |
|--------------------|-------------|
| john#google.com | X |
| harry#google.com | X |
| mickael#google.com | X |
| david#ebay.com | Y |
| alice#yahoo.com | Z |
There is no unique, no indexes, the only primary key is in the table where the foreign_key is.
Thanks for help.
Previous discussion : Select without duplicate from a specific string/key
both queries are slow - first because of the subselect in the IN clause - which is not optimized until MySQL 5.6; the second because uses a function in the where clause.
In the first query you can replace the subselect with a join
In the second, it's best to store the domain in separate column and use it for comparision
Make sure you have composite indexes on the fields used in joins, where and group by clauses
Here is my goods table.
+----------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+---------------+------+-----+---------+-------+
| ID | decimal(18,0) | NO | PRI | | |
| gkey | varchar(255) | YES | MUL | NULL | |
| GOODS | decimal(18,0) | YES | MUL | NULL | |
Column ID is auto-increment.
GOODS is the id of a goods category.
Here is the goods category table.
+-------+---------------------+
| ID | NAME |
+-------+---------------------+
| 1 | book |
| 2 | phone |
+-------+---------------------+
My question is in goods table I need the gkey is also an unique key with prefix-id(here id is started from 1.) Like BOOK-1,BOOK-2....PHONE-1,PHONE-2... when insert a new goods record into goods table.
Like that:
+--------------------+---------------+------+-----+---------+-------+
| ID | GKEY |GOODS | PRI | COUNTRY | Extra |
+--------------------+---------------+------+-----+---------+-------+
| 1 | BOOK-1 | 1 | 10 | | |
| 2 | PHONE-1 | 2 | 12 | | |
| 3 | BOOK-2 | 1 | 13 | | |
| 4 | BOOK-3 | 1 | 10 | | |
| 5 | PHONE-2 | 2 | 10 | | |
| 6 | PHONE-3 | 2 | 20 | | |
+--------------------+---------------+------+-----+---------+-------+
How to get that GKEY in PHP+MYSQL?
thanks.
You can use the UNIQUE constraint. For more information check here.
I don't think you want to be doing this at all. Instead, good.gkey should be a foreign key to goods_category.id. If the insertion order is important, you can add an insert_date date field to the goods table.
From there you can run all sorts of queries, with the added bonus of having referential integrity on your tables.
First you should do everything Database side. So no php. That would be my advice.
Then not having sequence in MySQL that what I would Suggest you :
SELECT COUNT(id)
FROM GOODS
WHERE GKEY LIKE('BOOK-%')
Then you insert
INSERT INTO goods (id, "BOOK-" || SELECT MAX(SUBSTR(LENGTH(GKEY -1), LENGTH(GKEY))FROM GOODS WHERE GKEY LIKE('BOOK-%') + 1, ...)
This way you will always have the next number available.
You can do it, but you need to be quite careful to make sure that two simultaneous inserts don't get given the same gkey. I'd design it with these two design points:
In the GoodsCategory table, have a counter which helps to generate the next gkey.
Use locks so that only one process can generate a new key at any one time.
Here's the details:
1. Add a couple of columns to the GoodsCategory table:
+----------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+---------------+------+-----+---------+-------+
| ID | TINYINT | NO | PRI | | |
| NAME | VARCHAR(80) | NO | | | |
| KeyCode | CHAR(5) | NO | | | |
| NextID | INT | NO | | 0 | |
+----------------------+---------------+------+-----+---------+-------+
The KeyCode has 'BOOK' or 'PHONE', and the NextID field stores an int which is used to generate the next key of that type i.e. if the table looks like this:
+----------------+------------+---------+--------+
| ID | NAME | KeyCode | NextID |
+----------------+------------+---------+--------+
| 1 | book | BOOK | 3 |
| 2 | phone | PHONE | 7 |
+----------------+------------+---------+--------+
Then the next time you add a book, it should be given gkey 'BOOK-3', and NextID is incremented to 4.
2: The locking will need to be done in a stored routine, because it involves multiple statements in a transaction and uses local variables too. The core of it should look something like this:
START TRANSACTION;
SELECT KeyCode, NextID INTO v_kc, v_int FROM GoodsCategory WHERE ID = 2 FOR UPDATE;
SET v_newgkey = v_kc + '-' + CAST(v_int AS CHAR);
INSERT INTO goods (gkey, goods, ...) VALUES (v_newgkey, 2, etc);
UPDATE GoodsCategory SET NextID = NextID + 1 WHERE ID = 2;
COMMIT;
The FOR UPDATE bit is crucial; have a look at the Usage Examples in the mysql manual where it discusses how to use locks to generate an ID without interference from another process.