The Situation
I have a full-stack web application with two MySQL tables: channel_strips and mic_lookup.
DESCRIBE `channel_strips`;
Field Type Null Key
preset_id varchar(127) NO PRI
mic_function varchar(225) YES
phantom_power tinyint(1) YES
...
DESCRIBE `mic_lookup`;
Field Type Null Key
Microphone varchar(40) NO PRI
mic_function varchar(225) NO
phantom_power tinyint(1) NO
...
(many other columns)
I want the channel_strips table to always hold only combinations of mic_function and phantom_power values that can be currently found in the mic_lookup table (or null values).
What is working
On the HTML end, I've limited the input to these columns in channel_strips with a <select> element that gets values from this mysqli query: SELECT DISTINCT `mic_function`, `phantom_power` FROM `mic_lookup`; This successfully restricts the user input.
The Problem
The one situation I've identified where this fails is when entries are deleted or changed in mic_lookup such that one pre-existing combination of mic_function and phantom_power is eliminated. As a result, channel_strips could still have a combination of the two columns that is actually no longer an option. In this situation, I'd like those two columns to be nullified on rows where they hold the old combination, essentially emulating an ON DELETE SET NULL statement as if it were a foreign key.
What I've tried
For a while, I had an intermediate table, mic_functions with a single column, mic_function, which served as a foreign key to both tables' mic_function columns. However, this was before I realized that phantom_power needed to be included. Furthermore, it was very confusing from a user's perspective, since intuitively you would want to set these values in the mic_lookup table.
My next idea was to create a view instead so I'd have a 'table' that automatically updates, and reference that as a foreign key - maybe something like...
CREATE VIEW mic_functions AS(
SELECT DISTINCT
`mic_function`,
`phantom_power`
FROM
`mic_lookup`
);
ALTER VIEW mic_functions ADD CONSTRAINT PK_mic_function PRIMARY KEY(`mic_function`, `phantom_power`);
Of course, this doesn't work. You can't add a primary key to a VIEW.
Finally, I suppose I could write a bunch of php to query and perform a series of checks on channel_strips every time the mic_lookup table is updated, and execute an appropriate UPDATE query if the checks are violated, but it seems to me like there ought to be a simpler way to handle this on the SQL side of things. Maybe using SQL checks or triggers or a combination of the two would work, but I have no experience with checks and triggers.
Note
phantom_power is boolean, and I'm using MySQL version 10.4.21-MariaDB
I found a solution using triggers, by joining the tables and filtering down to where the joined primary key is null:
CREATE TRIGGER `mic_lookup_update` AFTER UPDATE
ON
`mic_lookup` FOR EACH ROW
UPDATE
`channel_strips` c
LEFT JOIN `mic_lookup` m ON
c.mic_function = m.mic_function AND c.phantom_power = m.phantom_power
SET
c.mic_function = NULL,
c.phantom_power = NULL
WHERE
m.Microphone IS NULL;
CREATE TRIGGER `mic_lookup_delete` AFTER DELETE
ON
`mic_lookup` FOR EACH ROW
UPDATE
`channel_strips` c
LEFT JOIN `mic_lookup` m ON
c.mic_function = m.mic_function AND c.phantom_power = m.phantom_power
SET
c.mic_function = NULL,
c.phantom_power = NULL
WHERE
m.Microphone IS NULL;
Table Name : Members (member_id PK), Member_Articles (articles_id PK)
I want to design, like below:
Member can write Articles, and there should be strong relationship. Without member_id, Member_Articles table can't insert any data.
Sometimes in the future, some member_id [e.g) member_id : 7] must be deleted.
However, some articles which are written by some member_id [e.g) member_id : 7] must remain in the table.
I tried PK, FK relationship. However, as you know, I had to delete first Articles before delete Member.
How can I implement this situation?
While defining relationship set null on delete to the FK, and keep an extra column let say deleted_member_id and set it default with null if you delete the user programmatically or with the help of triggers set the deleted_member_id with the memeber_id and the delete will set null to memeber_id
Better Approach (in my view)
In my view the better approach could be setting flags into your eahc table,
For example: Create a column status with default 1 when you delete any data rather then of deleting them just set the status to 0, similarly in all your SELECT add an extra condition WHERE status = 1
make member_id not null in your member_article table, and use on delete no action, to retain the articles.
My database contains a table of users. Every active user has a unique username. I'd like to be able to deactivate a user and free up the username they're using, but keep them in the same table.
Is there a way to only conditionally enforce the uniqueness constraint?
Add another column called something like isactive. The create a unique constraint on (username, isactive).
Then you can have both an active and inactive user name at the same time. You will not be able to have two active user names.
If you want multiple inactive names, use NULL for the value of isactive. NULL values can be repeated in a unique index.
No, a UNIQUE constraint can't be "conditional".
One option is to set the username column to NULL. The UNIQUE constraint will allow multiple rows with NULL value.
You could translate that to any string you wanted for display. either in the application, or in the SQL
SELECT IFNULL(t.username,'USER DELETED') AS username
FROM mytable t
If you are retaining these rows for historical/archive purposes, you probably do NOT want to update the username column. (If you change the value of the username column, then a subsequent statement will be allowed to insert a row with the same value as the previous username.)
You could instead add an additional column to your table, to represent the "user deleted" condition. For example:
user_deleted TINYINT(1) UNSIGNED DEFAULT 0 COMMENT 'boolean'
You could check this column and return the 'USER DELETED' constant in place of the username column whenever the user_deleted boolean is set:
SELECT IF(u.user_deleted,'USER DELETED',u.username) AS username
(Use a value of 1 to indicated a logical "user deleted" condition.)
The big advantage to this approach is that the username column does NOT have to be modified, the username value, and the UNIQUE constraint will prevent a new row with a duplicate username from being inserted.
Different way to achieve the same result. May not be really required for the question asked. But just for information.
Create a trigger on insert / update
Check if there is duplicate records found with current (NEW) records values.
a. This can be checked by counting dupicates or checking of OTHER records exists with the same values, but different primary key
If found raise a Signal to throw an error
This is best suited if your condition is complex to decide uniqueness. Also consider the performance cost.
Sample
DELIMITER $$
CREATE TRIGGER `my_trigger` BEFORE INSERT/UPDATE
ON `usertable`
FOR EACH ROW BEGIN
IF EXISTS (SELECT 1 FROM usertable WHERE userid <> NEW.userid AND username = NEW.username AND isactive = 1) THEN
SELECT CONCAT(NEW.username, ' exists !') INTO #error_text;
SIGNAL SQLSTATE '45000' SET message_text = #error_text;
END IF;
END$$
DELIMITER ;
I would just create another (non-unique) field called FORMER_NAME and move the original name to that field when a user goes inactive. No need for a special uniqueness constraint that's not possible.
Nope, if there is a unique index (hence the name) you can not have duplicates. Either add a extra column to make each record unique. Or change the value so its unique.
Not recommended but for example you could add a timestamp "USER DELETED 2013/08/17:233805"
This is my solution when I met a similar problem:
add a column inactive, so the unique key as: (username,inactive)
for inactive, inactive = 0 means a user is active, inactive > 0 means a user is active
when deactivate a user, just set inactive = user_id, not 1 as we usually did!
now it allows duplicated usernames for inactive users, but only unique usernames for active users.
I expanded on #gordon-linoff answer by adding a generated column which provides the nullable functionality. I would rather have a true not null active column that has a definitive true and false value I can use to read and write that is not confusing and won't get messed up by accidentally forgetting about this null behavior later on when writing code. So I compute a column with a specialized name and then use that value in the constraint, so I get the nullable unique active behavior but can use the active column as I wish.
isactive BOOL NOT NULL,
_isactive_constraint_key_ BOOL AS (CASE WHEN isactive IS true THEN true END),
CONSTRAINT active_user UNIQUE(username, _isactive_constraint_key)
The previous table this data was stored in approached 3-4gb, but the data wasn't compressed before/after storage. I'm not a DBA so I'm a little out of my depth with a good strategy.
The table is to log changes to a particular model in my application (user profiles), but with one tricky requirement: we should be able to fetch the state of a profile at any given date.
Data (single table):
id, username, email, first_name, last_name, website, avatar_url, address, city, zip, phone
The only two requirements:
be able to fetch a list of changes for a given model
be able to fetch state of model on a given date
Previously, all of the profile data was stored for a single change, even if only one column was changed. But to get a 'snapshot' for a particular date was easy enough.
My first couple of solutions in optimising the data structure:
(1) only store changed columns. This would drastically reduce data stored, but would make it quite complicated to get a snapshot of data. I'd have to merge all changes up to a given date (could be thousands), then apply that to a model. But that model couldn't be a fresh model (only changed data is stored). To do this, I'd have to first copy over all data from current profiles table, then to get snapshot apply changes to those base models.
(2) store whole of data, but convert to a compressed format like gzip or binary or whatnot. This would remove ability to query the data other than to obtain changes. I couldn't, for example, fetch all changes where email = ''. I would essentially have a single column with converted data, storing the whole of the profile.
Then, I would want to use relevant MySQL table options, like ARCHIVE to further reduce space.
So my question is, are there any other options which you feel are a better approach than 1/2 above, and, if not, which would be better?
First of all, I wouldn't worry at all about a 3GB table (unless it grew to this size in a very short period of time). MySQL can take it. Space shouldn't be a concern, keep in mind that a 500 GB hard disk costs about 4 man-hours (in my country).
That being said, in order to lower your storage requirements, create one table for each field of the table you want to monitor. Assuming a profile table like this:
CREATE TABLE profile (
profile_id INT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(50) -- and so on
);
... create two history tables:
CREATE TABLE profile_history_username (
profile_id INT NOT NULL,
username VARCHAR(50) NOT NULL, -- same type as profile.username
changedAt DATETIME NOT NULL,
PRIMARY KEY (profile_id, changedAt),
CONSTRAINT profile_id_username_fk
FOREIGN KEY profile_id_fkx (profile_id)
REFERENCES profile(profile_id)
);
CREATE TABLE profile_history_email (
profile_id INT NOT NULL,
email VARCHAR(50) NOT NULL, -- same type as profile.email
changedAt DATETIME NOT NULL,
PRIMARY KEY (profile_id, changedAt),
CONSTRAINT profile_id_fk
FOREIGN KEY profile_id_email_fkx (profile_id)
REFERENCES profile(profile_id)
);
Everytime you change one or more fields in profile, log the change in each relevant history table:
START TRANSACTION;
-- lock all tables
SELECT #now := NOW()
FROM profile
JOIN profile_history_email USING (profile_id)
WHERE profile_id = [a profile_id]
FOR UPDATE;
-- update main table, log change
UPDATE profile SET email = [new email] WHERE profile_id = [a profile_id];
INSERT INTO profile_history_email VALUES ([a profile_id], [new email], #now);
COMMIT;
You may also want to set appropriate AFTER triggers on profile so as to populate the history tables automatically.
Retrieving history information should be straightforward. In order to get the state of a profile at a given point in time, use this query:
SELECT
(
SELECT username FROM profile_history_username
WHERE profile_id = [a profile_id] AND changedAt = (
SELECT MAX(changedAt) FROM profile_history_username
WHERE profile_id = [a profile_id] AND changedAt <= [snapshot date]
)
) AS username,
(
SELECT email FROM profile_history_email
WHERE profile_id = [a profile_id] AND changedAt = (
SELECT MAX(changedAt) FROM profile_history_email
WHERE profile_id = [a profile_id] AND changedAt <= [snapshot date]
)
) AS email;
You can't compress the data without having to uncompress it in order to search it - which is going to severely damage the performance. If the data really is changing that often (i.e. more than an average of 20 times per record) then it would be more efficient to for storage and retrieval to structure it as a series of changes:
Consider:
CREATE TABLE profile (
id INT NOT NULL autoincrement,
PRIMARY KEY (id);
);
CREATE TABLE profile_data (
profile_id INT NOT NULL,
attr ENUM('username', 'email', 'first_name'
, 'last_name', 'website', 'avatar_url'
, 'address', 'city', 'zip', 'phone') NOT NULL,
value CARCHAR(255),
starttime DATETIME DEFAULT CURRENT_TIME,
endtime DATETIME,
PRIMARY KEY (profile_id, attr, starttime)
INDEX(profile_id),
FOREIGN KEY (profile_id) REFERENCES profile(id)
);
When you add a new value for an existing record, set an endtime in the masked record.
Then to get the value at a date $T:
SELECT p.id, attr, value
FROM profile p
INNER JOIN profile_date d
ON p.id=d.profile_id
WHERE $T>=starttime
AND $T<=IF(endtime IS NULL,$T, endtime);
Alternately just have a start time, and:
SELECT p.id, attr, value
FROM profile p
INNER JOIN profile_date d
ON p.id=d.profile_id
WHERE $T>=starttime
AND NOT EXISTS (SELECT 1
FROM prodile_data d2
WHERE d2.profile_id=d.profile_id
AND d2.attr=d.attr
AND d2.starttime>d.starttime
AND d2.starttime>$T);
(which will be even faster with the MAX concat trick).
But if the data is not changing with that frequency then keep it in the current structure.
You need a slow changing dimension:
i will do this only for e-mail and telephone so you understand (pay attention to the fact of i use two keys, 1 as unique in the table, and another that is unique to the user that it concerns. This is, the table key identifies the the record, and the user key identifies the user):
table_id, user_id, email, telephone, created_at,inactive_at,is_current
1, 1, mario#yahoo.it, 123456, 2012-01-02, , 2013-04-01, no
2, 2, erik#telecom.de, 123457, 2012-01-03, 2013-02-28, no
3, 3, vanessa#o2.de, 1234568, 2012-01-03, null, yes
4, 2, erik#telecom.de, 123459, 2012-02-28, null, yes
5, 1, super.mario#yahoo.it, 654321,2013-04-01, 2013-04-02, no
6, 1, super.mario#yahoo.it, 123456,2013-04-02, null, yes
most recent state of the database
select * from FooTable where inactive_at is null
or
select * from FooTable where is_current = 'yes'
All changes to mario (mario is user_id 1)
select * from FooTable where user_id = 1;
All changes between 1 jan 2013 and 1 of may 2013
select * from FooTable where created_at between '2013-01-01' and '2013-05-01';
and you need to compare with the old versions (with the help of a stored procedure, java or php code... you chose)
select * from FooTable where incative_at between '2013-01-01' and '2013-05-01';
if you want you can do a fancy sql statement
select f1.table_id, f1.user_id,
case when f1.email = f2.email then 'NO_CHANGE' else concat(f1.email , ' -> ', f2.email) end,
case when f1.phone = f2.phone then 'NO_CHANGE' else concat(f1.phone , ' -> ', f2.phone) end
from FooTable f1 inner join FooTable f2
on(f1.user_id = f2.user_id)
where f2.created_at in
(select max(f3.created_at) from Footable f3 where f3.user_id = f1.user_id
and f3.created_at < f1.created_at and f1.user_id=f3.user_id)
and f1.created_at between '2013-01-01' and '2013-05-01' ;
As you can see a juicy query, to compare the user_with the previews user row...
the state of the database on 2013-03-01
select * from FooTable where table_id in
(select max(table_id) from FooTable where inactive_at <= '2013-03-01' group by user_id
union
select id from FooTable where inactive_at is null group by user_id having count(table_id) =1 );
I think this is the easiest way of implement what you want... you could implement a multi-million tables relational model, but then it would be a pain in the arse to query it
Your database is not big enough, I work everyday with one even bigger. Now tell me is the money you save in a new server worthy the time you spend on a super-complex relational model?
BTW if the data changes too fast, this approach cannot be used...
BONUS: optimization:
create indexes on created_at, inactive_at, user_id and the pair
perform partition (both horizontal and vertical)
if you try and put all occurring changes in different tables and later if you require an instance on some date you join them along and display by comparing dates, for example if you want an instance at 1st of july you can run a query with condition where date is equal or less than 1st of july and order it in asc ordering limiting the count to 1. that way the joins will produce exactly the instance it was at 1st of july. in this manner you can even figure out the most frequently updated module.
also if you want to keep all the data flat try range partitioning on the basis of month that way mysql will handle it pretty easily.
Note: by date i mean storing unix timestamp of the date its pretty easier to compare.
I'll offer one more solution just for variety.
Schema
PROFILE
id INT PRIMARY KEY,
username VARCHAR(50) NOT NULL UNIQUE
PROFILE_ATTRIBUTE
id INT PRIMARY KEY,
profile_id INT NOT NULL FOREIGN KEY REFERENCES PROFILE (id),
attribute_name VARCHAR(50) NOT NULL,
attribute_value VARCHAR(255) NULL,
created_at DATETIME NOT NULL DEFAULT GETTIME(),
replaced_at DATETIME NULL
For all attributes you are tracking, simply add PROFILE_ATTRIBUTE records when they are updated, and mark the previous attribute record with the DATETIME it was replaced at.
Select Current Profile
SELECT *
FROM PROFILE p
LEFT JOIN PROFILE_ATTRIBUTE pa
ON p.id = pa.profile_id
WHERE p.username = 'username'
AND pa.replaced_at IS NULL
Select Profile At Date
SELECT *
FROM PROFILE p
LEFT JOIN PROFIILE_ATTRIBUTE pa
ON p.id = pa.profile_id
WHERE p.username = 'username'
AND pa.created_at < '2013-07-01'
AND '2013-07-01' <= IFNULL(pa.replaced_at, GETTIME())
When Updating Attributes
Insert the new attribute
Update the previous attribute's replaced_at value
It would probably be important that the created_at for a new attribute match the replaced_at for the corresponding old attribute. This would be so that there is an unbroken timeline of attribute values for a given attribute name.
Advantages
Simple two-table architecture (I personally don't like a table-per-field approach)
Can add additional attributes with no schema changes
Easily mapped into ORM systems, assuming an application lives on top of this database
Could easily see the history for a certain attribute_name over time.
Disadvantages
Integrity is not enforced. For example, the schema doesn't restrict on multiple NULL replaced_at records with the same attribute_name... perhaps this could be enforced with a two-column UNIQUE constraint
Let's say you add a new field in the future. Existing profiles would not select a value for the new field until they save a value to it. This is opposed to the value coming back as NULL if it were a column. This may or may not be an issue.
If you use this approach, be sure you have indexes on the created_at and replaced_at columns.
There may be other advantages or disadvantages. If commenters have input, I'll update this answer with more information.
I've created an insert only table for the purpose of speed and maintaining a history. It's structure is very generic, and is as follows:
`id` bigint(20) unsigned NOT NULL auto_increment,
`user_id` bigint(20) unsigned NOT NULL,
`property` varchar(32) NOT NULL,
`value` longblob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
It's simply a key/value table with a user_id assigned to it. This approach has its advantages as not all users have the same properties, so fields aren't wasted in a table. Also, it allows for a rolling log of changes, since I can see every change to a particular property ever made by a user.
Now, since no deletes or updates ever occur in this table, I can assume that the greatest id will always be the newest entry.
However, I want to select multiple properties at once, for example 'address1', 'address2', 'city', 'state', and I want each to be the entry of it's type with the highest id.
So, if they have changed their 'state' property 8 times, and 'city' property 4 times, then I'd only want a SELECT to return the latest of each (1 state and 1 city).
I'm not sure this can even be done efficiently with this type of a table, so I'm open to different table approaches.
Please, let me know if I need to produce anymore information or clarify my question better.
===
I tried the following, but, there could be 3 rows of 'address1' changes after the last 'address2' change. Perhaps using a GROUP BY will work?
SELECT property, value FROM kvtable WHERE user_id = 1 AND (property = 'address1' OR property = 'address2') ORDER BY id
Assuming your ids are incremental integers and you have not manually specified them out of order, you can do this with a few MAX() aggregates in a subquery. The point of the subquery is to return the latest entry per property name, per user. That is joined against the whole table to pull in the associated property values. Essentially, the subquery discards all rows which don't have a max(id) per group.
SELECT kvtable.*
FROM
kvtable
JOIN (
SELECT
MAX(id) AS id,
user_id,
property
FROM kvtable
/* optionally limit by user_id */
WHERE user_id = <someuser>
GROUP BY user_id, property
) maxids ON kvtable.id = maxids.id