I have MySQL tables that are all InnoDB.
We have so many copies of various databases spread across multiple servers (trust me we're talking hundreds here), and many of them are not being queried at all.
How can I get a list of the MAX(LastAccessDate) for example for all tables within a specific database? Esp. considering that they are InnoDB tables.
I would prefer knowing even where the "select" query was run, but would settle for "insert/update" as well, since, if a db hasn't changed in a long time, it's probably dead.
If you have a table that always gets values inserted you can add a trigger to the update/insert. Inside this trigger you can set the current timestamp in a dedicated database, including the name of the database from which the insert took place.
This way the only requirement of your database is that it supports triggers.
Alternatively you could take a look this link:
odify date and create date for a table can be retrieved from sys.tables catalog view. When any structural changes are made the modify date is updated. It can be queried as follows:
USE [SqlAndMe]
GO
SELECT [TableName] = name,
create_date,
modify_date
FROM sys.tables
WHERE name = 'TransactionHistoryArchive'
GO
sys.tables only shows modify date for structural changes. If we need to check when was the tables last updated or accessed, we can use dynamic management view sys.dm_db_index_usage_stats. This DMV returns counts of different types of index operations and last time the operation was performed.
It can be used as follows:
USE [SqlAndMe]
GO
SELECT [TableName] = OBJECT_NAME(object_id),
last_user_update, last_user_seek, last_user_scan, last_user_lookup
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID('SqlAndMe')
AND OBJECT_NAME(object_id) = 'TransactionHistoryArchive'
GO
last_user_update – provides time of last user update
last_user_* – provides time of last scan/seek/lookup
It is important to note that sys.dm_db_index_usage_stats counters are reset when SQL Server service is restarted.
Hope This Helps!
Related
We're using MariaDb in production and we've added a MariaDb slave so that our data team can perform some ETL tasks from this slave to our datawarehouse. However, they lack a proper Change Data Capture feature (i.e. they want to know which rows from the production table changed since yesterday in order to query rows that actually changed).
I saw that MariaDb's 10.3 had an interesting feature that allowed to perform a SELECT on an older version of a table. However, I haven't found resources that supported the idea that it could be used for CDC, any feedback on this feature?
If not, we'll probably resort to streaming the slave's binlogs to our datawarehouse but that looks challenging..
Thanks for your help!
(As a supplement to Stefans answer)
Yes, the System-Versioning can be used for CDC because the validity-period in ROW_START (Object starts to be valid) and ROW_END (Object is now invalid) can be interpreted when an INSERT-, UPDATE- or DELETE-query happened. But it's more cumbersome as with alternative CDC-variants.
INSERT:
Object was found for the first time
ROW_START is the insertion time
UPDATE:
Object wasn't found for the first time
ROW_START is the update time
DELETE:
ROW_END lies in the past
there is no new entry for this object in the next few lines
I'll add a picture to clarify this.
You can see that this versioning is space saving because you can combine the information about INSERT and DELETE of an object in one line, but to check for DELETEs is costly.
In the example above I used a Table with a clear Primary Key. So a check for the-same-object is easy: just look at the id. If you want to capture changes in talbes with an key-combination this can also make the whole process more annoying.
Edit: another point is that the protocol-Data is kept in the same table as the "real" data. Maybe this is faster for an INSERT than known alternativ solution like the tracking per TRIGGER (like here), but if changes are made quite frequent on the table and you want to process/analyse the CDC-Data this can cause performance problems.
MariaDB supports System-Versioned Tables since version 10.3.4. System version tables are specified in the SQL:2011 standard. They can be used for automatically capturing previous versions of rows. Those versions can then be queried to retrieve their values as they have been set at a specific point in time.
The following text and code example is from the official MariaDB documentation
With system-versioned tables, MariaDB Server tracks the points in time
when rows change. When you update a row on these tables, it creates a
new row to display as current without removing the old data. This
tracking remains transparent to the application. When querying a
system-versioned table, you can retrieve either the most current
values for every row or the historic values available at a given point
in time.
You may find this feature useful in efficiently tracking the time of
changes to continuously-monitored values that do not change
frequently, such as changes in temperature over the course of a year.
System versioning is often useful for auditing.
With adding SYSTEM VERSIONING to a newly created or an already existing table (using ALTER), the table will be expanded by row_start and row_end time stamp columns which allow retrieving the record valid within the time between the start and the end timestamps.
CREATE TABLE accounts (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255),
amount INT
) WITH SYSTEM VERSIONING;
It is then possible to retrieve data as it was at a specific time (with SELECT * FROM accounts FOR SYSTEM_TIME AS OF '2019-06-18 11:00';), all versions within a specific time range
SELECT * FROM accounts
FOR SYSTEM_TIME
BETWEEN (NOW() - INTERVAL 1 YEAR)
AND NOW();
or all versions at once:
SELECT * FROM accounts
FOR SYSTEM_TIME ALL;
Is there any way to detect when an ALTER TABLE statement is executed in MySQL? For example, if the following statement were executed on some_table, is there any way to detect that the column name changed from column_name_a to column_name_b and log it in another table in the DB?
ALTER TABLE `some_table`
CHANGE COLUMN `column_name_a` `column_name_b` VARCHAR(255) NULL DEFAULT NULL;
Thanks.
To my knowledge it is unfortunately not possible to put triggers on the INFORMATION_SCHEMA tables, since they are strictly spoken views and triggers can't be made to work on views. If triggers would be possible on the INFORMATION SCHEMA, then you could have a trigger on updates of the INFORMATION_SCHEMA.COLUMNS table to identify name changes.
However, what you can do is one of the following things:
option 1) Maintain a real table with all column names. Then create a function that checks for a discrepancy between the INFORMATION_SCHEMA.COLUMNS table abd your table. If there is one, you know the name has changed. You need to copy over the new name to your column name table and do whatever else you wanted to do upon name change.
The function to check for discrepancies then must be run periodically via the mysql scheduler in order to detect name changes as quickly as possible. Note that this is not a real time solution. There will be a lag between the ÀLTER TABLE command and its detection. If this is unacceptable in your scenario you need to go with
option 2) Do not call ÀLTER TABLE directly, but wrap it in a function. Within this function you can also call other functions to achieve what you need to achieve. If may be worth while to formulate the needed steps in a higher programming language that you use to drive your application. If this is not possible, you will be limited to the possibilities that are offered in functions/procedures in the mysql environment.
Sorry to not have a simpler way of doing this for you.
I have created a system using PHP/MySQL that downloads a large XML dataset, parses it and then inserts the parsed data into a MySQL database every week.
This system is made up of two databases with the same structure. One is a production database and one is a temporary database where the data is parsed and inserted into first.
When the data has been inserted into the temporary database I perform a merge by inserting/replacing the data in the production database. I have done all of the above so far. I then realised, data that might have been removed in a new dataset will be left to linger in the production database.
I need to perform a check to see if the new data is still in the production database, if it is then leave it, if it isn't delete the row from the production database so that the rows aren't left to linger.
For arguments sake, let's say the two databases are called database_temporary and database_production.
How can I go about doing this?
If you are using SQL to merge, a simple SQL can do the delete as well:
delete from database_production.table
where pk not in (select pk from database_temporary.table)
Notes:
This assumes that there is a a row can be uniquely identified. This may be based on a single column, multiple columns or another mechanism.
If your dataset is large, a not exists mey perform better than not in. See What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL? and NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
An example not exists:
delete from database_production.table p
where not exists (select 1 from database_temporary.table t where t.pk = p.pk)
Performance Notes:
As pointed out by #mgonzalez in the comments on the question, you may want to use a timestamp column (something like last modified) for comparing/merging in general so that you vompare only changed rows. This does not apply to the delete specifically, you cannot use timestamp for the delete because, well, the row would not exist.
How can I use trigger to simply update a table with the last time a table was edited? I know that by using triggers it is "for each row" but if someone's inserting more than one row, it'd be pointlessly inserting or altering the table over and over again. Is there any way to do this without doing it over and over again?
I'd like to be able to just have it do it once for all of the inserts instead of having it done time and time again. If not I guess I can force it, via a wrapper.
edit 1:
Well to explain some more of the design I guess then.
I'm going to be having a table in another database to handle the last_updated data for things like chat, or the players "mailbox", and another one for the development things like tables for quests, skills, items etc. And I want to be able to know when a table was last updated so that I can easily see before I go scan the table to see for new things.
Basically this is what I'd like to do(or something similar), I'm also using PHP so it's likely to be PHP-based approach in the code but the SQL should be kind of standard. I'm not going to do full code but rather semi-runnable.
last_modified=mysql_query("select last_modified from various_stats.table_last_updated where database_name=`database_name` and `table_name`");
if(last_modified>last_checked_time){
data_to_get_updated=mysql_query("select something from various_<something>.table_name where last_modified>last_checked_time");
}
else{
do_nothing;
}
edit 2: I'm using InnoDB, and thus I cannot use the information schema's update_time since it never changes.
will this help you, if im on the right track that is:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'dbname'
AND TABLE_NAME = 'tabname'
The above solution is for myisam, for innodb the norm is to set a sceduled script, this can be set as a cron job or a windows scheduled task, if you dont have that kind of control over your web host, you could possibly set up a small server at your work office and run the cron from there. if you do this every say 20 seconds you could simply record the current top auto incremented ID and use this as a guid, if current ID is higher than the last recorded ID you then update your records to show the last changed time to be now.
as this will only be one call to a server every XX seconds, it wont really hammer the server too much and should just run silently in the background.
If you do go down the scheduled task root, it would be wise to add error capture in your script so that you can be alerted via email if something stops working etc.
Here is a chunk of the SQL I'm using for a Perl-based web application. I have a number of requests and each has a number of accessions, and each has a status. This chunk of code is there to update the table for every accession_analysis that shares all these fields for each accession in a request.
UPDATE accession_analysis
SET analysis_id = ? ,
reference_id = ? ,
status = ? ,
extra_parameters = ?
WHERE analysis_id = ?
AND reference_id = ?
AND status = ?
AND extra_parameters = ?
and accession_id is (
SELECT accesion_id
FROM accessions
where request_id = ?
)
I have changed the tables so that there's a status table for accession_analysis, so when I update, I update both accession_analysis and accession_analysis_status, which has status, status_text and the id of the accession_analysis, which is a not null auto_increment variable.
I have no strong idea about how to modify this code to allow this. My first pass grabbed all the accessions and looped through them, then filtered for all the fields, then updated. I didn't like that because I had many connections with short SQL commands, which I understood to be bad, but I can't help but think the only way to really do this is to go back to the loop in Perl holding two simpler SQL statements.
Is there a way to do this in SQL that, with my relative SQL inexperience, I'm just not seeing?
The answer depends on which DBMS you're using. The easiest way is to create a trigger on one table that provides the logic of updating the other table. (For any DB newbies -- a trigger is procedural code attached to a table at the DBMS (not application) layer that runs in response to an insert, update or delete on the table.). A similar, slightly less desirable method is to put the logic in a stored procedure and execute that instead of the update statement you're now using.
If the DBMS you're using doesn't support either of these mechanisms, then there isn't a good way to do what you're after while guaranteeing transactional integrity. However if the problem you're solving can tolerate a timing difference in the two tables' updates (i.e. The data in one of the tables is only used at predetermined times, like reporting or some type of batched operation) you could write to one table (live) and create a separate process that runs when needed (later) to update the second table using data from the first table. The correctness of allowing data to be updated at different times becomes a large and immovable design assumption, however.
If this is mostly about connection speed, then one option you have is to write a stored procedure that handles the "double update or insert" transparently. See the manual for stored procedures:
http://dev.mysql.com/doc/refman/5.5/en/create-procedure.html
Otherwise, You probably cannot do it in one statement, see the MySQL INSERT syntax:
http://dev.mysql.com/doc/refman/5.5/en/insert.html
The UPDATE syntax allows for multi-table updates (not in combination with INSERT, though):
http://dev.mysql.com/doc/refman/5.5/en/update.html
Each table needs its own INSERT / UPDATE in the query.
In fact, even if you create a view by JOINing multiple tables, when you INSERT into the view, you can only INSERT with fields belonging to one of the tables at a time.
The modifications made by the INSERT statement cannot affect more than one of the base tables referenced in the FROM clause of the view. For example, an INSERT into a multitable view must use a column_list that references only columns from one base table. For more information about updatable views, see CREATE VIEW.
Inserting data into multiple tables through an sql view (MySQL)
INSERT (SQL Server)
Same is true of UPDATE
The modifications made by the UPDATE statement cannot affect more than one of the base tables referenced in the FROM clause of the view. For more information on updatable views, see CREATE VIEW.
However, you can have multiple INSERTs or UPDATEs per query or stored procedure.