Data change history with audit tables: Grouping changes - mysql

Lets say I want to store users and groups in a MySQL database. They have a relation n:m. To keep track of all changes each table has an audit table user_journal, group_journal and user_group_journal. MySQL triggers copy the current record to the journal table on each INSERT or UPDATE (DELETES are not supported, because I would need the information which application user has deleted the record--so there is a flag active that will be set to 0 instead of a deletion).
My question/problem is: Assuming I am adding 10 users into a group at once. When I'm later clicking through the history of that group in the user interface of the application I want to see the adding of those 10 users as one step and not as 10 independent steps. Is there a good solution to group such changes together? Maybe it is possible to have a counter that is incremented each time the trigger is ... triggered? I have never worked with triggers.
The best solution would be to put together all changes made within a transaction. So when the user updates the name of the group and adds 10 users in one step (one form controller call) this would be one step in the history. Maybe it is possible to define a random hash or increment a global counter each time a transaction is started and access this value in the trigger?
I don't want to make the table design more complex than having one journal table for each "real" table. I don't want to add a transaction hash into each database table (meaning the "real" tables, not the audit tables--there it would be okay of course). Also I would like to have a solution in the database--not in the application.

I played a bit around and now I found a very good solution:
The Database setup
# First of all I create the database and the basic table:
DROP DATABASE `mytest`;
CREATE DATABASE `mytest`;
USE `mytest`;
CREATE TABLE `test` (
`id` INT PRIMARY KEY AUTO_INCREMENT,
`something` VARCHAR(255) NOT NULL
);
# Then I add an audit table to the database:
CREATE TABLE `audit_trail_test` (
`_id` INT PRIMARY KEY AUTO_INCREMENT,
`_revision_id` VARCHAR(255) NOT NULL,
`id` INT NOT NULL,
`something` VARCHAR(255) NOT NULL
);
# I added a field _revision_id to it. This is
# the ID that groups together all changes a
# user made within a request of that web
# application (written in PHP). So we need a
# third table to store the time and the user
# that made the changes of that revision:
CREATE TABLE `audit_trail_revisions` (
`id` INT PRIMARY KEY AUTO_INCREMENT,
`user_id` INT NOT NULL,
`time` DATETIME NOT NULL
);
# Now we need a procedure that creates a
# record in the revisions table each time an
# insert or update trigger will be called.
DELIMITER $$
CREATE PROCEDURE create_revision_record()
BEGIN
IF #revision_id IS NULL THEN
INSERT INTO `audit_trail_revisions`
(user_id, `time`)
VALUES
(#user_id, #time);
SET #revision_id = LAST_INSERT_ID();
END IF;
END;
# It checks if a user defined variable
# #revision_id is set and if not it creates
# the row and stores the generated ID (auto
# increment) into that variable.
#
# Next I wrote the two triggers:
CREATE TRIGGER `test_insert` AFTER INSERT ON `test`
FOR EACH ROW BEGIN
CALL create_revision_record();
INSERT INTO `audit_trail_test`
(
id,
something,
_revision_id
)
VALUES
(
NEW.id,
NEW.something,
#revision_id
);
END;
$$
CREATE TRIGGER `test_update` AFTER UPDATE ON `test`
FOR EACH ROW BEGIN
CALL create_revision_record();
INSERT INTO `audit_trail_test`
(
id,
something,
_revision_id
)
VALUES
(
NEW.id,
NEW.something,
#revision_id
);
END;
$$
The application code (PHP)
$iUserId = 42;
$Database = new \mysqli('localhost', 'root', 'root', 'mytest');
if (!$Database->query('SET #user_id = ' . $iUserId . ', #time = NOW()'))
die($Database->error);
if (!$Database->query('INSERT INTO `test` VALUES (NULL, "foo")'))
die($Database->error);
if (!$Database->query('UPDATE `test` SET `something` = "bar"'))
die($Database->error);
// To simulate a second request we close the connection,
// sleep 2 seconds and create a second connection.
$Database->close();
sleep(2);
$Database = new \mysqli('localhost', 'root', 'root', 'mytest');
if (!$Database->query('SET #user_id = ' . $iUserId . ', #time = NOW()'))
die($Database->error);
if (!$Database->query('UPDATE `test` SET `something` = "baz"'))
die($Database->error);
And … the result
mysql> select * from test;
+----+-----------+
| id | something |
+----+-----------+
| 1 | baz |
+----+-----------+
1 row in set (0.00 sec)
mysql> select * from audit_trail_test;
+-----+--------------+----+-----------+
| _id | _revision_id | id | something |
+-----+--------------+----+-----------+
| 1 | 1 | 1 | foo |
| 2 | 1 | 1 | bar |
| 3 | 2 | 1 | baz |
+-----+--------------+----+-----------+
3 rows in set (0.00 sec)
mysql> select * from audit_trail_revisions;
+----+---------+---------------------+
| id | user_id | time |
+----+---------+---------------------+
| 1 | 42 | 2013-02-03 17:13:20 |
| 2 | 42 | 2013-02-03 17:13:22 |
+----+---------+---------------------+
2 rows in set (0.00 sec)
Please let me know if there is a point I missed. I will have to add an action column to the audit tables to be able to record deletions.

Assuming you're rate of adding a batch of users to a group is less than once a second....
I would suggest simply adding a column of type timestamp named something like added_timestamp to the user_group and user_group_journal. DO NOT MAKE THIS AN AUTO UPDATE TIMESTAMP OR DEFAULT IT TO CURRENT_TIMESTAMP, instead, in your code when you insert by batch into the user_group, calculate the current date and time, then manually set this for all the new user_group record.
You may need to tweak your setup to add the field to be copied the rest of the new user_group record into the user_group_journal table.
Then when you could create a query/view that groups on a group_id and the new added_timestamp column.
If more fidelity is needed then 1 second you could use a string column and populate it with a string representation of a more granular time (which you'd need to generate however the libraries your language of use allows).

Related

MySQL 8 - Trigger on INSERT - duplicate AUTO_INCREMENT id for VCS

Trying to
create trigger that is called on INSERT & sets originId = id (AUTO_INCREMENT),
I've used SQL suggested here in 1st block:
CREATE TRIGGER insert_example
BEFORE INSERT ON notes
FOR EACH ROW
SET NEW.originId = (
SELECT AUTO_INCREMENT
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME = 'notes'
);
Due to information_schema caching I have also set
information_schema_stats_expiry = 0
in my.cnf file. Now information gets updated almost instantly on every INSERT, as I've noticed..
But, performing "direct" INSERTs via console with ~2min intervals, I keep getting not updated AUTO_INCREMENT values in originId.
(They shoud be equal to id fields)
While explicit queries, fetching AUTO_) result in updated correct values.
Thus I suspect that the result of SELECT AUTO_INCREMENT... subquery gets somehow.. what? cached?
How can one get around this?
Thank you.
Edit 1
I intended to implement sort of VCS this way:
User creates new Note, app marks it as 'new' and performs an INSERT in MySQL table. It is the "origin" note.
Then user might edit this Note (completely) in UI, app will mark is as 'update' and INSERT it in MySQL table as a new row, again. But this time originId should be filled with an id of "origin" Note (by app logics). And so on.
This allows PARTITIONing by originId on SELECT, fetching only latest versions to UI.
initial Problem:
If originId of "origin" Note is NULL, MySQL 8 window function(s) in default (and only?) RESPECT_NULL mode perform(s) framing not as expected ("well, duh, it's all about your NULLs in grouping-by column").
supposed Solution:
Set originId of "origin" Notes to id on their initial and only INSERT, expecting 2 benefits:
Easily fetch "origin" Notes via originId = id,
perform correct PARTITION by originId.
resulting Problem:
id is AUTO_INCREMENT, so there's no way (known to me) of getting its new value (for the new row) on INSERT via backend (namely, PHP).
supposed Solution:
So, I was hoping to find some MySQL mechanism to solve this (avoiding manipulations with id field) and TRIGGERs seemed a right way...
Edit 2
I believed automated duplicating id AUTO_INCREMENT field (or any field) within MySQL to be extra fast & super easy, but it totally doesn't appear so now..
So, possibly, better way is to have vcsGroupId UNSIGNED INT field, responsible for "relating" Note's versions:
On create and "origin" INSERT - fill it with MAX(vcsGroupId) + 1,
On edit and "version" INSERT - fill it with "sibling"/"origin" vcsGroupId value (fetched with CTE),
On view and "normal" SELECT - perform framing with Window Function by PARTITION BY vcsGroupId, ORDER BY id or timestamp DESC, then just using 1st (or ascending order by & using last) row,
On view and "origin" SELECT - almost the same, but reversed..
It seems easier, doesn't it?
What you are doing is playing with fire. I don't know exactly what can go wrong with your trigger (beside that it doesn't work for you already), but I have a strong feeling that many things can and will go wrong. For example: What if you insert multiple rows in a single statement? I don't think, that the engine will update the information_schema for each row. And it's going to be even worse if you run an INSERT ... SELECT statement. So using the information_schema for this task is a very bad idea.
However - The first question is: Why do you need it at all? If you need to save the "origin ID", then you probably plan to update the id column. That is already a bad idea. And assuming you will find a way to solve your problem - What guarantees you, that the originId will not be changed outside the trigger?
However - the alternative is to keep the originId column blank on insert, and update it in an UPDATE trigger instead.
Assuming this is your table:
create table vcs_test(
id int auto_increment,
origin_id int null default null,
primary key (id)
);
Use the UPDATE trigger to save the origin ID, when it is changed for the first time:
delimiter //
create trigger vcs_test_before_update before update on vcs_test for each row begin
if new.id <> old.id then
set new.origin_id = coalesce(old.origin_id, old.id);
end if;
end;
delimiter //
Your SELECT query would then be something like this:
select *, coalesce(origin_id, id) as origin_id from vcs_test;
See demo on db-fiddle
You can even save the full id history with the following schema:
create table vcs_test(
id int auto_increment,
id_history text null default null,
primary key (id)
);
delimiter //
create trigger vcs_test_before_update before update on vcs_test for each row begin
if new.id <> old.id then
set new.id_history = concat_ws(',', old.id_history, old.id);
end if;
end;
delimiter //
The following test
insert into vcs_test (id) values (null), (null), (null);
update vcs_test set id = 5 where id = 2;
update vcs_test set id = 4 where id = 5;
select *, concat_ws(',', id_history, id) as full_id_history
from vcs_test;
will return
| id | id_history | full_id_history |
| --- | ---------- | --------------- |
| 1 | | 1 |
| 3 | | 3 |
| 4 | 2,5 | 2,5,4 |
View on DB Fiddle

How to insert a record into table that has a user id as a foreign key

My database structure is like so:
Table 1: customers
| userid | username | password | email |
| 1 | bob | mypassword123 | bob#gmail.com |
Please note that 'userid' is a primary key in this table
Table 2: accountbalance
| userid | balance |
| 1 | 100 |
Please note that 'userid' in accountbalance table is a foreign key to the 'userid' field in customers table.
When a new account is created, I not only want a new row in customers to be created, but I want a corresponding row in accountbalance to be created to give a started value of 100 ($100) but the problem is how do I know what the userid is?
I thought about running a query to look for the id using the username and then doing an INSERT INTO statement in the accountbalance. Would that work? Can I get a general outline?
It depends what db you use. Mysql has LAST_INSERT_INSERT_ID() function which you can call after your insert (just call SELECT LAST_INSERT_INSERT_ID()) and you'll get the id of last inserted row (in case your id is defined as AUTO_INCREMENT). If you use postgres it allows you to perform insert returning id. Something like INSERT INTO customerts(...) VALUES(...) RETURNING userid.
But as you mentioned if your username is unique I would use select using this attribute after insert, because it is db independent.
you can use trigger of mysql
delimiter |
CREATE TRIGGER temp AFTER INSERT ON customers
FOR EACH ROW
BEGIN
INSERT INTO accountbalance SET userid = NEW.userid;
END;
|
delimiter ;
ref: https://dev.mysql.com/doc/refman/8.0/en/trigger-syntax.html
set default value of balance field in accountbalance table to 100 while creating table or edit later.

selecting a column from multiple tables in mysql

TABLE 1
+----+-------+-------+-------+
| uid | color | brand | model |
+----+-------+-------+-------+
| 10 | 1 | 2 | 1 |
+----+-------+-------+-------+
TABLE 2
+----+-------+-------+-------+
| uid | quantity |model |color|
+----+-------+-------+-------+
| 25 | 2 | 2 | 1 |
+----+-------+-------+-------+
I have many tables like this where the uid column is present in every table.I have a value in a variable, say var1=25. I want to check whether var1 value matches with any of the uid value of any table.If it matches I want to print the table name. Can anyone help me with this?
I tried doing this and I found
SELECT `COLUMN_NAME`
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA`='yourdatabasename'
AND `TABLE_NAME`='yourtablename';
But this is not giving what I want since I want to select all the tables in a database irrespective of the table name.If in future any table is added then it should also get selected.
At first, information_schema table doesn't have specific tuple data.
I suggest you to consider different design.
A. Make a meta table and use triggers(attached to base tables) to maintain meta table.
CREATE TABLE meta_table (
id INT AUTO_INCREMENT,
uid INT,
table_name VARCHAR(50)
);
# When you need to add new table (table 3)
CREATE TABLE table_3 (
uid INT,
field1 INT,
field2 INT,
field3
);
DELIMITER $$
CREATE TRIGGER table_3_insert
AFTER INSERT ON table_3
FOR EACH ROW
BEGIN
INSERT INTO meta_table (uid, table_name)
VALUE (NEW.uid, "table_3");
END$$
DELIMITER ;
# If data in `table_3` might be changed or deleted,
# then create trigger for `delete` and `update`
B. Use only one table with unstructured field and parse data field in your application
CREATE TABLE table (
uid INT,
table_type INT,
data VARCHAR(255)
);
INSERT INTO table (10, 1, '{"color":1,"brand":2,"model":1}');
INSERT INTO table (10, 2, '{"quantity":2,"model":2,"color":1}');
As you mentioned "any table can be added" often, I strongly recommend B solution. It is not good design that changing schema(creating table) often.

Which update is faster using join or sequential?

This question is in sequence of my previous Question required update same table on deletion a row.
I could write two solutions using Stored Procedure instead of trigger or nested-query .
Both use a helper function my_signal(msg).
A Stored Procedure to delete employee from Employee Table.
Fist Solution: use UPDATE rows in table, without join operation:
CREATE PROCEDURE delete_employee(IN dssn varchar(64))
BEGIN
DECLARE empDesignation varchar(128);
DECLARE empSsn varchar(64);
DECLARE empMssn varchar(64);
SELECT SSN, designation, MSSN INTO empSsn, empDesignation, empMssn
FROM Employee
WHERE SSN = dssn;
IF (empSsn IS NOT NULL) THEN
CASE
WHEN empDesignation = 'OWNER' THEN
CALL my_signal('Error: OWNER can not deleted!');
WHEN empDesignation = 'WORKER' THEN
DELETE FROM Employee WHERE SSN = empSsn;
WHEN empDesignation = 'BOSS' THEN
BEGIN
UPDATE Employee
SET MSSN = empMssn
WHERE MSSN = empSsn;
DELETE FROM Employee WHERE SSN = empSsn;
END;
END CASE;
ELSE
CALL my_signal('Error: Not a valid row!');
END IF;
END//
Second solution: as I was suggested in my previous question using INNER JOIN
CREATE PROCEDURE delete_employee(IN dssn varchar(64))
BEGIN
DECLARE empDesignation varchar(128);
DECLARE empSsn varchar(64);
DECLARE empMssn varchar(64);
SELECT SSN, designation, MSSN INTO empSsn, empDesignation, empMssn
FROM Employee
WHERE SSN = dssn;
IF (empSsn IS NOT NULL) THEN
IF (empDesignation = 'OWNER') THEN
CALL my_signal('Error: OWNER can not deleted!');
END IF;
UPDATE `Employee` A INNER JOIN `Employee` B ON A.SSN= B.MSSN
SET B.MSSN = A.MSSN WHERE A.SSN = empSsn;
DELETE FROM `Employee` WHERE SSN = empSsn;
ELSE
CALL my_signal('Error: Not a valid row!');
END IF;
END//
I read here that using join is efficient for Efficient SELECT. But my problem includes only one table and I feel my solution(first) is much efficient than second because join will consume memory comparatively.
Please suggest me which is better and efficient, if Employee table is sufficiently large. Which is better for me? Reason
EDIT: I checked for small table consist of 7 rows only, and both solution take same time.
mysql> CALL delete_employee(4);
Query OK, 1 row affected (0.09 sec)
I know SQL function behaves non-deterministically because table heuristics. Which choice is better? Either if you have some idea How query can be further optimised.
After a while of thinking I am almost sure it doesn't make a difference, first solution may be even slightly slower, but in a not measurable dimension.
First intention would be, that the first solution is faster because you first fetch data by id and update only if nessesary.
But MySQL internally does nothing else in the UPDATE .. JOIN statement, just internally and as a result of this probably more efficiently as well.
Your first solution doesn't catch a default case - what happens if I neither get WORKER or BOSS?
Also your execution time (0.09s) is extremely high, which can't be explained with what I know about your database so far.
Did you set any index?
EDIT:
After looking at the table structure you've posted here
I have some improvement offers for the structure itself.
1. Use type int when you are storing integer values. The database can handle integer way more efficient
2. Why generate SSN by yourself? Using auto_increment on the PRIMARY KEY is much simpler to handle and saves you a lot of work when you add new employees
ALTER TABLE `Employee`
CHANGE `SSN` `SSN` int(11) NOT NULL AUTO_INCREMENT ,
CHANGE `MSSN` `MSSN` int(11) DEFAULT NULL,
ADD KEY `KEY_Employee_MSSN` ( `MSSN` );
3. Do you use the name for lookups? If so, it needs to be unique as well
ALTER TABLE `Employee`
ADD UNIQUE KEY `UNI_KEY_Employee` ( `name` );
4. Do you have a fixed range of designations? enum forces the input to be one of the defined values
ALTER TABLE `Employee`
CHANGE `designation` `designation` ENUM( 'BOSS', 'WORKER' ) NOT NULL DEFAULT 'WORKER',
ADD KEY `KEY_Employee_designation` ( `designation` );
Final structure
mysql> EXPLAIN `Employee`;
+-------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+---------+----------------+
| SSN | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(64) | YES | UNI | NULL | |
| designation | enum('BOSS','WORKER') | NO | MUL | WORKER | |
| MSSN | int(11) | YES | MUL | NULL | |
+-------------+-----------------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

mysql unique number generation

I want to generate a unique random integer (from 10000 to 99999) identity just by clean mySQL; any ideas?
I don't want to generate this number in php by cycling (generate number -> check it in database) because I want to use some intelligent solution in a mySQL query.
While it seems somewhat awkward, this is what can be done to achieve the goal:
SELECT FLOOR(10000 + RAND() * 89999) AS random_number
FROM table
WHERE random_number NOT IN (SELECT unique_id FROM table)
LIMIT 1
Simply put, it generates N random numbers, where N is the count of table rows, filters out those already present in the table, and limits the remaining set to one.
It could be somewhat slow on large tables. To speed things up, you could create a view from these unique ids, and use it instead of nested select statement.
EDIT: removed quotes
Build a look-up table from sequential numbers to randomised id values in range 1 to 1M:
create table seed ( i int not null auto_increment primary key );
insert into seed values (NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL);
insert into seed select NULL from seed s1, seed s2, seed s3, seed s4, seed s5, seed s6;
delete from seed where i < 100000;
create table idmap ( n int not null auto_increment primary key, id int not null );
insert into idmap select NULL, i from seed order by rand();
drop table seed;
select * from idmap limit 10;
+----+--------+
| n | id |
+----+--------+
| 1 | 678744 |
| 2 | 338234 |
| 3 | 469412 |
| 4 | 825481 |
| 5 | 769641 |
| 6 | 680909 |
| 7 | 470672 |
| 8 | 574313 |
| 9 | 483113 |
| 10 | 824655 |
+----+--------+
10 rows in set (0.00 sec)
(This all takes about 30 seconds to run on my laptop. You would only need to do this once for each sequence.)
Now you have the mapping, just keep track of how many have been used (a counter or auto_increment key field in another table).
I struggled with the solution here for a while and then realised it fails if the column has NULL entries. I reworked this with the following code;
SELECT FLOOR(10000 + RAND() * 89999) AS my_tracker FROM Table1 WHERE "tracker" NOT IN (SELECT tracker FROM Table1 WHERE tracker IS NOT NULL) LIMIT 1
Fiddle here;
http://sqlfiddle.com/#!2/620de1/1
Hope its helpful :)
The only half-way reasonable idea I can come up with is to create a table with a finite pool of IDs and as they are used remove them from that table. Those keys can be unique and a script could be created to generate that table. Then you could pull one of those keys by generating a random select from the available keys. I said 'half-way' reasonable and honestly that was being way to generous, but it beats randomly generating keys until you create a unique one I suppose.
My solution, implemented in Cakephp 2.4.7, is to create a table with one auto_incremental type field
CREATE TABLE `unique_counters` (
`counter` int(11) NOT NULL AUTO_INCREMENT,
`field` int(11) NOT NULL,
PRIMARY KEY (`counter`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I then created a php function so that every time insert a new record, reads the generated id and delete it immediately.
Mysql keeps in its memory counter status. All the numbers generated are unique until you reset the mysql counter or you run a TRUNCATE TABLE operation
find below the Model created in Cakephp to implement all
App::uses('AppModel', 'Model');
/**
* UniqueCounter Model
*
*/
class UniqueCounter extends AppModel {
/**
* Primary key field
*
* #var string
*/
public $primaryKey = 'counter';
/**
* Validation rules
*
* #var array
*/
public $validate = array(
'counter' => array(
'numeric' => array(
'rule' => array('numeric'),
//'message' => 'Your custom message here',
//'allowEmpty' => false,
//'required' => false,
//'last' => false, // Stop validation after this rule
//'on' => 'create', // Limit validation to 'create' or 'update' operations
),
),
);
public function get_unique_counter(){
$data=array();
$data['UniqueCounter']['counter']=0;
$data['UniqueCounter']['field']=1;
if($this->save($data)){
$new_id=$this->getLastInsertID();
$this->delete($new_id);
return($new_id);
}
return(-1);
}
}
Any checks on the range of belonging of the result can be implemented in the same function, by manipulating the result obtained
The RAND() function will generate a random number, but will not guarantee uniqueness. The proper way to handle unique identifiers in MySQL is to declare them using AUTO_INCREMENT.
For example, the id field in the following table will not need to be supplied on inserts, and it will always increment by 1:
CREATE TABLE animal (
id INT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (id)
);
I tried to use this answer, but it didn't work for me, so I had to change the original query a little.
SELECT FLOOR(1000 + RAND() * 89999) AS random_number
FROM Table
WHERE NOT EXISTS (SELECT ID FROM Table WHERE Table.ID=random_number) LIMIT 1
Create a unique index on the field you want the unique random #
Then run
Update IGNORE Table
Set RandomUniqueIntegerField=FLOOR(RAND() * 1000000);
I worry about why you want to do this, but you could simply use:
SELECT FLOOR(RAND() * 1000000);
See the full MySQL RAND documentation for more information.
However, I hope you're not using this as a unique identifier (use an auto_increment attribute on a suitably large unsigned integer field if this is what you're after) and I have to wonder why you'd use MySQL for this and not a scripting language. What are you trying to achieve?