Postgres: BEFORE UPDATE trigger - partitioning

Description
In our environment (Postgres 9.3) we use extensive partitioning on dates. Additionally we use redirects to redirect INSERTs in the 'main' table to the corresponding child table (so due note that there actually is no data in the main table, all the data is in the child tables.
Problem
One of the processes is executing an UPDATE on the main table, but how could I redirect one of these UPDATEs to the correct child table?
So for instance if I say something simple as:
UPDATE transactions SET text = 'new text' WHERE id = 1 AND date = 201601;
That I redirect this UPDATE to the transactions_201601 partition?

Apparently it is not common practice to redirect your UPDATE queries to the correct table.
By ensuring that the INDEXes are setup in a correct way you make sure that the UPDATE is executed on the right partition table. So all in all setting up correct indexes is the solution.

Related

Update a specific column if it exists, without failing if it does not

I am working with an application which needs to function with any of 300+ different MySQL databases on the same server. The databases all have nearly identical table structures, with slight variations. For example, a particular column might be present in a table for only some of the databases.
I'm wondering if there is a way that, when performing an update on a table, I can update a specific column if it exists, but still successfully execute if the column does not exist.
For example, say I have a basic update statement like this:
UPDATE some_table
SET col1 = "some value",
col2 = "another value",
col3 = "a third value"
WHERE id = 567
What can I do to make it so that, if col3 doesn't actually exist when that query is run, the statement still executes and col1 and col2 are still updated with the new values?
I have tried using IF and CASE, but those seem to only allow changing the value based on some condition, not whether or not a column actually gets updated.
I know I can query the database for the existence of the column, then use a simple if condition in the application code use a different query. However, that requires me to query the database twice: once to see if the column exists, and again to actually update it. I'd prefer to do it with one SQL query if possible. I feel like that application code might start to get unwieldy with lots of extra code to check the existence of this-or-that column and conditionally build queries, instead of just having one query which works regardless of which database the application happens to be running against at the time.
To clarify, any given instance of the application is ever only running against one database; there is a different application instance for each database, but the instances will all be running the same code. These are legacy databases that legacy code is also relying on, so I don't want to modify the actual structures in the database to make them more consistent, for fear of breaking the legacy code.
No, the syntax of your SQL query, including all column identifiers you reference, must be fixed at the time it is parsed, before it validates that the columns exist.
A given UPDATE will either succeed fully or fail fully. There is no way to update some of the columns if the query fails to update all of them.
You have two choices:
Query INFORMATION_SCHEMA.COLUMNS first, to check what columns exist in the table for a given schema. Then format your UPDATE query, including clauses to set each column only if the column exists in that instance of the table.
Or...
Run several UPDATE statements, one for each column you want to update. Each statement will succeed or fail independently, but you can catch the error and continue on to the remaining statements. You can put all these statements in a transaction, so the set of changes is committed atomically, regardless of how many succeed (a single failed statement does not roll back a transaction).
Either way, it requires you to write more code. That's the unavoidable cost of supporting such variable table structure.

MySQL - Enforcing update of row to ONLY be possible when a certain key is provided

This is something I can't seem to find information on.
Let's say I have a table users, and for security purposes, I want any SQL query to only executable if a reference to the id columns is made.
E.g. this should NOT work:
UPDATE users SET source="google" WHERE created_time < 20210303;
The above update statement is syntactically valid, but because it isn't making a reference to the id column, it should not be executable.
Only the below would be executable:
UPDATE users SET source="google" WHERE id in (45,89,318);
Is there any way to enforce this from the MySQL server's end?
I think the only way you can really do what you want is to use a stored procedure, where you pass in the ids and to the update there. You would set up the security as:
Turn off updates to the underlying table for all-but-one user.
Run the stored procedure as the user with permissions to modify the table (using DEFINER).
This will be cumbersome because you will need to pass in all the values in the table.
You can come close with safe update mode. However, that also allows LIMIT as well as key comparisons, so that is not sufficient for your purposes.
Note: This sort of issue is usually handled in another way. Most users would not have permissions to modify such a table. Then "special" users who do would be assumed to be more knowledgable and careful about changes. If the data is sensitive, then the changes would be logged, so it would be (relatively) easy to undo changes that have been made.

re-inserting a table record and updating an auto increment primary index

I'm running MariaDB 5.5.56.
I'm looking to copy an entire row in a database, change one column, then insert the entire row back into the original database (I don't want to have to specify the individual fields because there's a lot of them). The problem I'm running into is how to deal with an auto-increment/primary key column.
example:
create temporary table t_ownership like ownership;
insert into t_ownership (select * from ownership where name='x' LIMIT 1);
update t_ownership set id='something else';
insert into ownership (select * from t_ownership);
I have a column "recno" that is an auto-increment that will create a collision in the database when I try to re-insert the slightly changed record back into the original table.
Something like this seems to work but doesn't result in an insert:
insert into ownership (select * from t_ownership) ON DUPLICATE KEY UPDATE recno=LAST_INSERT_ID(ownership.recno);
The above statement executes without error but does not add a row to table ownership.
So I think I'm close but not quite there...
What would be the best way to do this? I'd like to avoid doing an insert where I manually specify field/values. I just need to regenerate a new A.I. recno column on the insert.
NULL values inserted into auto-incremented fields end up just getting the next auto-increment value, behaving equivalent to INSERTing without specifying the field; so you should be able to update the source (temp copy) to have NULL for that field.
However, one potential issue that could present itself in scenarios like yours is that the CREATE TEMPORARY TABLE ... LIKE could result in a table that would not allow you to set such fields to NULL; this would require you to either ALTER the temporary table, or create it in a more explicit manner. Either way, it now makes code/queries that do not specify columns even more reliant on knowing columns.
Personally, I would take this route in the first place.
INSERT INTO theTable([list all but the auto-inc column])
SELECT [list all but the auto-inc column, with any replacements or modifications desired]
FROM ...[original query]...
It accomplishes the task in one query, makes the queries more self documenting, and only at the cost of a little typing (most of which a decent database browser, or query builder, will do for you).
The only argument really in favor of your current approach is that the table involved can be changed without necessarily breaking your queries; but that begs the question of whether it would be better for such table changes to break the queries, forcing them to be re-examined. If it is not an issue, it is a minor revision; but the alternative is queries that continue to be valid that have the potential to cause unexpected behavior due to copying information they were never intended to.

MySQL: Best way to update a large table

I have a table with huge amount of data. The source of data is an external api. Every few hours, I need to sync the database so that the changes are up to date from the external api. I am doing a full sync (api doesn't allow delta sync).
While sync happens, I want to make sure that the data from the database is also available for read. So, I am following below steps:
I have a cloumn in the table which acts as a flag for whether or not data is readable. Only the data with flag set is marked for read.
I am inserting all the data from the api into the table.
Once all the data is written, I am deleting all the data in the table with flag set.
After deletion, I am updating the table and setting the flag for all the rows.
Table has around ~50 million rows and is expected to grow. There is a customerId field in the table. Sync usually happens based on customerId by passing it to the api.
My problem is, step 3 and 4 above are taking a lot of time. Queries are something like:
Step 3 --> delete from foo where customer_id=12345678 and flag=1
Step 4 --> update foo set flag=1 where customer_id=12345678
I have tried partitioning the table based on customer_id and it works great where customer_id has less number of rows but for some customer_id, the number of rows in each partition itself goes till ~5 million.
Around 90% of data doesn't change between two syncs. How can I make this fast?
I was thinking of using just the update queries instead of insert queries and then check if there was any update. If not, I can issue an insert query for the same row. This way any updates will be taken care of along with the insert. But I am not sure if the operation will block read queries for this while update is in progress.
For your setup (read only data, full sync), the fastest way to update the table is to not update at all, but to import the data into a different table and to rename it afterwards to make it the new table.
Create a table like your original table, e.g. use
create table foo_import like foo;
If you have e.g. triggers, add them too.
From now on, let the import api write its (full) sync to this new table.
After a sync is done, swap the two tables:
RENAME TABLE foo TO foo_tmp,
foo_import TO foo,
foo_tmp to foo_import;
It will (literally) just require a second.
This command is atomic: it will wait for transactions that access these tables to finish, it will not present a situation where there is no table foo and it will completely fail (and not do anything) if one of the tables doesn't exist or foo_tmp already exists.
As a final step, empty your import table (that now contains your old data) to be ready for your next import:
truncate foo_import;
This will again just require a second.
The rest of your querys probably assume that flag=1. Until (if at all) you update the code to not use the flag anymore, you can set its default value to 1 to keep it compatible, e.g. use
alter table foo modify column flag tinyint default 1;
Since you don't have foreign keys, it doesn't have to bother you, but for others with a similar problem it might be useful to know that foreign keys will get adjusted, so foreign keys that are referencing foo will reference foo_import after renaming the tables. To make them point to the new table foo again, they have to be dropped and recreated. Everything else (e.g. views, queries, procedures) will resolve by the current name, so they will always access the current foo.
CREATE TABLE new LIKE real;
Load `new` by whatever means you have; take as long as needed.
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
The RENAME is atomic and "instantaneous"; real is "always" available.
(I don't see the need for flag.)
OR...
Since you are actually updating a chunk of a table, consider these...
If the chunk is small...
Load the new data into a tmp table
DELETE the old rows
INSERT ... SELECT ... to move the new rows in. (Having the new data already in a table is probably the fastest way to achieve this.)
If the chunk is big, and you don't want to lock the table for "too long", there are some other tricks. But first, is there some form of unique row number for each row for the customer? (I'm thinking about batch-moving a bunch or rows at a time, but need more specifics before spelling it out.)

Can I INSERT/UPDATE into two tables with one query?

Here is a chunk of the SQL I'm using for a Perl-based web application. I have a number of requests and each has a number of accessions, and each has a status. This chunk of code is there to update the table for every accession_analysis that shares all these fields for each accession in a request.
UPDATE accession_analysis
SET analysis_id = ? ,
reference_id = ? ,
status = ? ,
extra_parameters = ?
WHERE analysis_id = ?
AND reference_id = ?
AND status = ?
AND extra_parameters = ?
and accession_id is (
SELECT accesion_id
FROM accessions
where request_id = ?
)
I have changed the tables so that there's a status table for accession_analysis, so when I update, I update both accession_analysis and accession_analysis_status, which has status, status_text and the id of the accession_analysis, which is a not null auto_increment variable.
I have no strong idea about how to modify this code to allow this. My first pass grabbed all the accessions and looped through them, then filtered for all the fields, then updated. I didn't like that because I had many connections with short SQL commands, which I understood to be bad, but I can't help but think the only way to really do this is to go back to the loop in Perl holding two simpler SQL statements.
Is there a way to do this in SQL that, with my relative SQL inexperience, I'm just not seeing?
The answer depends on which DBMS you're using. The easiest way is to create a trigger on one table that provides the logic of updating the other table. (For any DB newbies -- a trigger is procedural code attached to a table at the DBMS (not application) layer that runs in response to an insert, update or delete on the table.). A similar, slightly less desirable method is to put the logic in a stored procedure and execute that instead of the update statement you're now using.
If the DBMS you're using doesn't support either of these mechanisms, then there isn't a good way to do what you're after while guaranteeing transactional integrity. However if the problem you're solving can tolerate a timing difference in the two tables' updates (i.e. The data in one of the tables is only used at predetermined times, like reporting or some type of batched operation) you could write to one table (live) and create a separate process that runs when needed (later) to update the second table using data from the first table. The correctness of allowing data to be updated at different times becomes a large and immovable design assumption, however.
If this is mostly about connection speed, then one option you have is to write a stored procedure that handles the "double update or insert" transparently. See the manual for stored procedures:
http://dev.mysql.com/doc/refman/5.5/en/create-procedure.html
Otherwise, You probably cannot do it in one statement, see the MySQL INSERT syntax:
http://dev.mysql.com/doc/refman/5.5/en/insert.html
The UPDATE syntax allows for multi-table updates (not in combination with INSERT, though):
http://dev.mysql.com/doc/refman/5.5/en/update.html
Each table needs its own INSERT / UPDATE in the query.
In fact, even if you create a view by JOINing multiple tables, when you INSERT into the view, you can only INSERT with fields belonging to one of the tables at a time.
The modifications made by the INSERT statement cannot affect more than one of the base tables referenced in the FROM clause of the view. For example, an INSERT into a multitable view must use a column_list that references only columns from one base table. For more information about updatable views, see CREATE VIEW.
Inserting data into multiple tables through an sql view (MySQL)
INSERT (SQL Server)
Same is true of UPDATE
The modifications made by the UPDATE statement cannot affect more than one of the base tables referenced in the FROM clause of the view. For more information on updatable views, see CREATE VIEW.
However, you can have multiple INSERTs or UPDATEs per query or stored procedure.