replace row sqlalchemy with new version - sqlalchemy

I'm taking json data from a remote server. Each entry contains a modified date. If the entry exists in my local database I'd like to replace the current local entry with the new one.
I'm using sqlalchemy (latest version) and I can add a new instance just fine. I am assuming I will be able to query to detect whether I need to replace the local version or not (or just run an insert). My question is whether it is better to do a delete call then a add call, or whether there is a more efficient approach. To complicate matters, the change may have been triggered not by the primary object, but by an object it points to. In other words this object holds objects that are in other tables - one-to-many relationship, and one of those objects may have changed or been deleted.
Finally, the data that I am receiving via json only contains non-null fields (not all fields). All non-specified fields should be null/default after updating the row.

Related

Update and modify data in MySQL with adding new items and removing items, which are not in an update patch

From time-to-time, my system generates JSONs with the current state of the data to be stored in MySQL DB. As long as these JSONs are small, there is no issue to apply updates by DELETE the entire current data from the table and INSERT data from JSON.
However, this approach is not suitable if the size of JSONs becomes significant.
Obviously, for «data to be added» case I can use INSERT, the problems begin when I have to identify removed data. The key idea: if the items are in JSON, then insert them, but if the existing in the DB items are not in the JSON anymore, thus, it should be removed.
Can the DELETE/INSERT approach be replaced with MySQL built-in merge-like functionality to apply the data update on DB? Or the only way is to implement the merge logic manually?

Simple audit trail with MySql storing changes in single audit table in JSON

The title nearly says it all.
I would like to write a trigger that:
Uses a table called "audit_trail" with fields table_name, by, timestamp, operation, contents where contents is in JSON format
The trigger listens for update or insert on each table
If the table has a column called last_modified_by, then:
Make up a JSON version of the record updated/inserted
Add a record to the audit_trail table, with all relevant fields including contents which would have the JSON representation of the record updated/inserted
Is this technically possible with MySql? I really don't want to code this into the application itself, as it would be messy.
Please note that I am fully aware about the limitation about recording this info as JSON (hard to query, etc.). The only requirement my app has is that an admin must be able to see the "history" of a record, of when/who modified it.
While this is quite trivial, there are things I just cannot work out:
Things I can't work out:
How do you write a trigger that will get triggered on insert or update on ANY table
How to get the JSON version of a record
How to get the trigger to store the JSON onto the contents column
Ideas?

Add new field to Solr schema and import only that perticular column

I want to know if there is any way to update only one field in solr using import handler. Here are some steps I performed :
1) I have defined a schema which contains some dynamic fields.
2) I added some records to solr using data import handler from sql table.
3) A new column gets introduced in the sql table and entries for that column in sql table has been populated based on some existing columns (No new rows has been added).
Is there any way to index only this newly generated column without importing the whole data again?
You can use atomic update. I dont know how to use the DIH to do atomic updates, but if you can form a document such that it adheres to the atomic updates format you can probably update the document. But, to apply atomic updates there are some guidelines which are based on the architecture you have used for SOLR. The most important one according to me is that all the fields of the documents should be stored, else you will loose the index for the unstored fields.

Web2py: how to call a call a function on table creation

I'm migrating from sqlite to mysql, and need to create indexes on columns in my database. Annoyingly, mySQL doesn't have a CREATE INDEX IF NOT EXISTS facility. So I was wondering if I could just create a new index when the table itself is created by web2py, and not any other time. But where in the code do I place a routine that is only called when web2py calls 'create table'?
The web2py API does not include a way to determine whether a given table has just been created as part of the current request (if migrations are turned on, the table is created on the first request, and otherwise, it is assumed the table already exists). When a table is created in the database via the web2py migrations mechanism, a *.table file is created in the application's /databases folder. So, to determine whether a table was just created, you would have to determine that no *.table file exists right before db.define_table is called, and that the *.table file does exist right after. You probably don't want to do this on every request, so maybe better to simply handle index creation outside of the application.
A better approach would probably be to manually generate the SQL to check whether the index exists, but again, you would want to avoid that on every request in production.
Note, there has been discussion about adding index creation functionality to the DAL, but it is not there yet.

SQL Server: unique key for batch loads

I am working on a data warehousing project where several systems are loading data into a staging area for subsequent processing. Each table has a "loadId" column which is a foreign key against the "loads" table, which contains information such as the time of the load, the user account, etc.
Currently, the source system calls a stored procedure to get a new loadId, adds the loadId to each row that will be inserted, and then calls a third sproc to indicate that the load is finished.
My question is, is there any way to avoid having to pass back the loadId to the source system? For example, I was imagining that I could get some sort of connection Id from Sql Server, that I could use to look up the relevant loadId in the loads table. But I am not sure if Sql Server has a variable that is unique to a connection?
Does anyone know?
Thanks,
I assume the source systems are writing/committing the inserts into your source tables, and multiple loads are NOT running at the same time...
If so, have the source load call a stored proc, newLoadStarting(), prior to starting the load proc. This stored proc will update a the load table (creates a new row, records start time)
Put a trigger on your loadID column that will get max(loadID) from this table and insert as the current load id.
For completeness you could add an endLoading() proc which sets an end date and de-activates that particular load.
If you are running multiple loads at the same time in the same tables...stop doing that...it's not very productive.
a local temp table (with one pound sign #temp) is unique to the session, dump the ID in there then select from it
BTW this will only work if you use the same connection
In the end, I went for the following solution "pattern", pretty similar to what Markus was suggesting:
I created a table with a loadId column, default null (plus some other audit info like createdDate and createdByUser);
I created a view on the table that hides the loadId and audit columns, and only shows rows where loadId is null;
The source systems load/view data into the view, not the table;
When they are done, the source system calls a "sp__loadFinished" procedure, which puts the right value in the loadId column and does some other logging (number of rows received, date called, etc). I generate this from a template as it is repetitive.
Because loadId now has a value for all those rows, it is no longer visible to the source system and it can start another load if required.
I also arrange for each source system to have its own schema, which is the only thing it can see and is its default on logon. The view and the sproc are in this schema, but the underlying table is in a "staging" schema containing data across all the sources. I ensure there are no collisions through a naming convention.
Works like a charm, including the one case where a load can only be complete if two tables have been updated.