Getting/creating relational data on the fly in MySQL/MariaDB - mysql

I'm developing a distributable application that logs event data to a MySQL database. Some of the data it logs is redundant, like who caused the event, etc. A dumb example might be user: bob, action: created, target: file123
The schema is normalized so instead of storing bob every time, I'd store user_id. The problem I have is that my app is merely a layer that other applications would send data to so I won't always have a user record before I need to log an event.
To accommodate this, I wrote a "get or create" procedure that checks if that user exists, if so returns the user_id, otherwise creates a new entry and returns the generated key. (I had tried ON DUPLICATE KEY UPDATE but it doesn't play well with auto-increment primary keys in this scenario, it kept generating a new key).
For example, I might use:
CREATE PROCEDURE getOrCreateUser(IN `username` VARCHAR(25), OUT `userId` INT(11))
BEGIN
SELECT user_id INTO `userId` FROM users WHERE username = `username`;
IF `userId` IS NULL THEN
INSERT INTO users (`username`) VALUES (`username`);
SET `userId` = LAST_INSERT_ID();
END IF;
END
Now, when INSERTing an event I can CALL getOrCreateUser(...) to ensure that user record exists.
This works but I'm wondering if this is a wise approach. Say the application batch inserts 1000 event records, this would be called 1000 times.
The only way to reduce that is to call this once and cache the username/userId key/value pair in memory.
I just feel like there are two issues with that approach:
That could become inefficient if I have 100k users.
With proper indexes maybe an in-memory Map isn't much better?
Some other problem I'm not thinking of...
I've never taken this approach and am looking for insight from more experienced MySQL devs.

Related

Getting id generated in a trigger for further requests

I have a table with two columns:
caseId, referring to a foreign table column
caseEventId, int, unique for a given caseId, which I want to auto-increment for the same caseId.
I know that the auto-increment option based on another column is not available in mySql with InnoDb:
MySQL Auto Increment Based on Foreign Key
MySQL second auto increment field based on foreign key
So I generate caseEventId into a trigger. My table:
CREATE TABLE IF NOT EXISTS mydb.caseEvent (
`caseId` CHAR(20) NOT NULL,
`caseEventId` INT NOT NULL DEFAULT 0,
PRIMARY KEY (`caseId`, `caseEventId`),
# Foreign key definition, not important here.
ENGINE = InnoDB;
And my trigger:
CREATE DEFINER=`root`#`%` TRIGGER `mydb`.`caseEvent_BEFORE_INSERT` BEFORE INSERT ON `caseEvent` FOR EACH ROW
BEGIN
SELECT COALESCE((SELECT MAX(caseEventId) + 1 FROM caseEvent WHERE caseId = NEW.caseId),0)
INTO #newCaseEventId;
SET NEW.`caseEventId` = #newCaseEventId;
END
With this, I get my caseEventId which auto-increments.
However I need to re-use this new caseEventId in further calls within my INSERT transaction, so I place this id into #newCaseEventId within the trigger, and use it in following instructions:
START TRANSACTION;
INSERT INTO mydb.caseEvent (caseId) VALUES ('fziNw6muQ20VGYwYPW1b');
SELECT #newCaseEventId;
# Do stuff based on #newCaseEventId
COMMIT;
This seems to work just fine but... what about concurrency, using connection pools etc...?
Is this #newCaseEventId variable going to be shared with all clients using the same connection, can I run into problems when my client server launches two concurrent transactions? This is using mysql under nodejs.
Is this safe, or is there a safer way to go about this? Thanks.
Edit 2020/09/24
FYI I have dropped this approach altogether. I was trying to use the db in a way it isn't meant to be used.
Basically I have dropped caseEventId, and any index which is supposed to increment nicely based on a given column value.
I rely instead on properly written queries on the read side, when I retrieve data, to recreate my caseEventId field...
That is no problem, the user defined variables a per client.
That means every user has its own use defined varoables
User-defined variables are session specific. A user variable defined by one client cannot be seen or used by other clients. (Exception: A user with access to the Performance Schema user_variables_by_thread table can see all user variables for all sessions.) All variables for a given client session are automatically freed when that client exits.
see manul

How to make only one filed true in a MySql table at one time?

I have a table tblsessions. At one time, only one session could be current as is session 2014-2015.
However, if i make 2015-2016 current, 2014-2015 should not be current anymore.
How could I implement this logic in table at design time?
Here is the table creation code waiting for your modification:
create table tblsessions(
sessionid int not null auto_increment,
sessionname varchar(9) not null,
current ????
primary key (sessionid)
);
You could perhaps use a trigger (depending on the version of MySQL you're running). I've assumed that current is a tinyint but you can adjust to whatever type you use:
CREATE TRIGGER curr_check BEFORE UPDATE ON tblsessions
FOR EACH ROW
BEGIN
IF NEW.current = 1 THEN
UPDATE tblsessions SET current = 0;
END IF;
END;
EDIT:
A.5.3: Does MySQL 5.6 have statement-level or row-level triggers?
In MySQL 5.6, all triggers are FOR EACH ROW—that is, the trigger is activated for each row that is inserted, updated, or deleted. MySQL 5.6 does not support triggers using FOR EACH STATEMENT.
ALTERNATIVE SOLUTION:
I have come up with another solution however I wonder if it really is a good solution.
I have created two tables:
TBLSESSIONS (session)
// session is primary key and stops duplicates
TBLCURRENTSESSION (csessionid, csession)
// csessionid is auto-int
// csession is foreign key to TBLSESSIONS.session
Each time user presses a button [Make This Session Default], I can insert that session into csession.
In code I can search for largest csessionid and find csession against it as the CURRENT SESSION.
This also allows user to switch sessions at time.
Being MySQL DBA, do you think it is a good approach to solving my basic problem? Do you see any dark sides of this solution?

MySQL perfomance: letting a UNIQUE field generate an error or manually checking it

Theoretical question about the impact on performance.
One of the fields in my table is unique. For instance, email_address in the Users table.
What has less of an impact on performance? Attempting to add an already existing email address and getting the error, or doing a search on the email field?
The UNIQUE field will probably be faster.
If you tell MySQL that a certain field is unique, it may perform some optimizations.
Additionally, if you want to insert the record if it isn't in the table already you might run into some concurrency issues. Assume there are two people trying to register with the same email address. Now, if you perform the uniqueness check yourself something like so:
bool exists = userAlreadyExists(email);
if (exists)
showWarning();
else
insertUser(email);
something like the following might happen:
User 1 executes userAlreadyExists("foo#example.com") // returns false
User 2 executes userAlreadyExists("foo#example.com") // returns false
User 1 executes insertUser("foo#example.com")
User 2 executes insertUser("foo#example.com") // which is now a duplicate
If you let MySQL perform the uniqueness check, the above won't happen.
If you check then update, you have to query the database twice. And its turn it will check the table index twice. You have both network overhead and database processing overhead.
My point of view is you have to be optimistic: update and handle gracefully the potential failure if there is some duplicate values.
The two-steps approach has one other drawback: don't forget there will be concurrent access to your database. Depending on you database setup (isolation level, database engine), there is a potential that DB was modified by an other connection between the SELECT and your UPDATE.

Is there a way to cache a View so that queries against it are quick?

I'm extremely new to Views so please forgive me if this is a silly question, but I have a View that is really helpful in optimizing a pretty unwieldy query, and allows me to select against a small subset of columns in the View, however, I was hoping that the View would actually be stored somewhere so that selecting against it wouldn't take very long.
I may be mistaken, but I get the sense (from the speed with which create view executes and from the duration of my queries against my View) that the View is actually run as a query prior to the external query, every time I select against it.
I'm really hoping that I'm overlooking some mechanism whereby when I run CREATE VIEW it can do the hard work of querying the View query *then, so that my subsequent select against this static View would be really swift.
BTW, I totally understand that obviously this VIEW would be a snapshot of the data that existed at the time the VIEW was created and wouldn't reflect any new info that was inserted/updated subsequent to the VIEW's creation. That's actually EXACTLY what I need.
TIA
What you want to do is materialize your view. Have a look at http://www.fromdual.com/mysql-materialized-views.
What you're talking about are materialised views, a feature of (at least) DB2 but not MySQL as far as I know.
There are ways to emulate them by creating/populating a table periodically, or on demand, but a true materialised view knows when the underlying data has changed, and only recalculates if required.
If the data will never change once the view is created (as you seem to indicate in a comment), just create a brand new table to hold the subset of data and query that. People always complain about slow speed but rarely about data storage requirements :-)
You can do this with:
A MySQL Event
A separate table (for caching)
The REPLACE INTO ... SELECT statement.
Here's a working example.
-- create dummy data for testing
CREATE TABLE MyTable (
id INT NOT NULL,
groupvar INT NOT NULL,
myvar INT
);
INSERT INTO MyTable VALUES
(1,1,1),
(2,1,1),
(3,2,1);
-- create the view, making sure rows have a unique identifier (groupvar)
CREATE VIEW MyView AS
SELECT groupvar, SUM(myvar) as myvar_sum
FROM MyTable
GROUP BY groupvar;
-- create cache table, setting primary key to unique identifier (groupvar)
CREATE TABLE MyView_Cache (PRIMARY KEY (groupvar))
SELECT *
FROM MyView;
-- create a table to keep track of when the cache has been updated (optional)
CREATE TABLE MyView_Cache_updated (update_id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY (update_id));
-- create event to update cache table (e.g., daily)
DELIMITER |
CREATE EVENT MyView_Cache_Event
ON SCHEDULE EVERY 1 DAY STARTS CURRENT_TIMESTAMP + INTERVAL 1 HOUR
DO
BEGIN
REPLACE INTO MyView_Cache
SELECT *
FROM MyView_Cache;
INSERT INTO MyView_Cache_updated
SELECT NULL, NOW() AS last_updated;
END |
DELIMITER ;
You can now query MyView_Cache for faster response times, and query MyView_Cache_updated to inform users of the last time the cache was updated (in this example, daily).
Since a view is basically a SELECT statement you can use query cache to improve performance.
But first you should check if :
you can add indexes in the tables involved to speed up the query (use EXPLAIN)
the data isn't changing very often you can materialize the view (make snapshots)
Use a materiallised view.. It can store data like count sum etc but yes after updating the table you need to refresh the view to get correct results as they are not auto updated.. Moreover after querying from view the results are stored in cache so the memory cycles reduces to 2 which are 4 in case of querying from the table itself. So it gets efficient from the second time.. When you query for 1st time from view the data is fetched from main memory and is stored in cache after it.

Adding a time dimension to MySQL cells

Is there a way to keep a timestamped record of every change to every column of every row in a MySQL table? This way I would never lose any data and keep a history of the transitions. Row deletion could be just setting a "deleted" column to true, but would be recoverable.
I was looking at HyperTable, an open source implementation of Google's BigTable, and this feature really wet my mouth. It would be great if could have it in MySQL, because my apps don't handle the huge amount of data that would justify deploying HyperTable. More details about how this works can be seen here.
Is there any configuration, plugin, fork or whatever that would add just this one functionality to MySQL?
I've implemented this in the past in a php model similar to what chaos described.
If you're using mysql 5, you could also accomplish this with a stored procedure that hooks into the on update and on delete events of your table.
http://dev.mysql.com/doc/refman/5.0/en/stored-routines.html
I do this in a custom framework. Each table definition also generates a Log table related many-to-one with the main table, and when the framework does any update to a row in the main table, it inserts the current state of the row into the Log table. So I have a full audit trail on the state of the table. (I have time records because all my tables have LoggedAt columns.)
No plugin, I'm afraid, more a method of doing things that needs to be baked into your whole database interaction methodology.
Create a table that stores the following info...
CREATE TABLE MyData (
ID INT IDENTITY,
DataID INT )
CREATE TABLE Data (
ID INT IDENTITY,
MyID INT,
Name VARCHAR(50),
Timestamp DATETIME DEFAULT CURRENT_TIMESTAMP)
Now create a sproc that does this...
INSERT Data (MyID, Name)
VALUES(#MyID,#Name)
UPDATE MyData SET DataID = ##IDENTITY
WHERE ID = #MyID
In general, the MyData table is just a key table. You then point it to the record in the Data table that is the most current. Whenever you need to change data, you simply call the sproc which Inserts the new data into the Data table, then updates the MyData to point to the most recent record. All if the other tables in the system would key themselves off of the MyData.ID for foreign key purposes.
This arrangement sidesteps the need for a second log table(and keeping them in sync when the schema changes), but at the cost of an extra join and some overhead when creating new records.
Do you need it to remain queryable, or will this just be for recovering from bad edits? If the latter, you could just set up a cron job to back up the actual files where MySQL stores the data and send it to a version control server.