How to do multiple queries in Spring Batch (specifically use LAST_INSERT_ID()) - mysql

I am trying to write a Spring Batch Starter job that reads a CSV file and inserts the records into a MySQL DB. When it begins I want to save the start time in a tracking table, and when it ends, the end time in that same table. The table structure is like:
TRACKING : id, start_time, end_time
DATA: id, product, version, server, fk_trk_id
I am unable to find an example project that does such a thing. I believe this needs to be a Spring Batch Starter project that can handle multiple queries. i.e.
// insert start time
1. INSERT INTO tracking (start_time) VALUES (NOW(6));
// get last inserted id for foreign key
2. SET #last_id_in_tracking = LAST_INSERT_ID();
// read from CSV and insert data into 'data' DB table
3. INSERT INTO data (product, version, server, fk_trk_id) VALUES (mysql, 5.1.42, Server1, #last_id_in_tracking);
4. INSERT INTO data (product, version, server, fk_trk_id) VALUES (linux, 7.0, Server2, #last_id_in_tracking);
5. INSERT INTO data (product, version, server, fk_trk_id) VALUES (java, 8.0, Server3, #last_id_in_tracking);
// insert end time
6. UPDATE tracking SET end_time = NOW(6) WHERE fk_trk_id = #last_id_in_table1;
I'd like sample code and explanation on how to use those queries to multiple tables in the same Spring Batch Starter job.
start of edit section - additional question
I do have an additional question. In my entities I have them set-up to represent the relationships with annotations (i.e #ManyToOne, #JoinColumn)...
In your code, how would I get the trackingId from a referenced object? Let me explain:
My Code (Data.java):
#JsonManagedReference
#ManyToOne
#JoinColumn(name = "id")
private Tracking tracking;
Your code (Data.java):
#Column(name = "fk_trk_id")
private Long fkTrkId;
Your code (JobConfig.java):
final Data data = new Data();
data.setFkTrkId(trackingId);
How do I set the id with "setFkTrkId" when the relationship in my Entity is an object?
end of edit section - additional question

Here is an example app that does what you're asking. Please see the README for details.
https://github.com/joechev/examples/tree/master/csv-reader-db-writer

I have created a project for you as an example. Please refer to https://bigzidane.wordpress.com/2018/02/25/spring-batch-mysql-reader-writer-processor-listener/
This example simply has a Reader/Processor/Writer. The reader will read a CSV file and then process something and then write to database.
And we have a listener to capture StartJob and EndJob. For Start Job, we will insert an entry to DB and then return a generatedId. We will pass the same ID to writer when we stored entries.
Note: I'm sorry I'm reused an example I have already. So it may not match 100% as your question but technically it should be the same.
Thanks,
Nghia

Related

.NET insert data from SQL Server to MySQL without looping through data

I'm working a .NET app where the user selects some filters like date, id, etc.
What I need to do is query a SQL Server database table with those filters, and dump them into a MySQL table. I don't need all fields in the table, only a few.
So far, I need to loop through all records in the SQL Server Dataset and insert them one by one on my MySQL table.
Is there anyway of achieving better performance? I've been playing with Dapper but cant figure out a way to do something like:
Insert into MySQLTable (a,b,c)
Select a,b,c from SQLServerTable
where a=X and b=C
Any ideas?
Linked server option is not possible because we have no access to the SQL server configuration, so looking for the most efficient way of bulk inserting data.
If I were to do this inside .NET with dapper I'd use c# and do the following;
assumptions:
a table in both tables with the same schema;
CREATE Events (
EventId int,
EventName varchar(10));
A .Net class
public class Event
{
public int EventId { get; set; }
public string EventName { get; set; }
}
The snippet below should give you something you can use as a base.
List<Event> Events = new List<Event>();
var sqlInsert = "Insert into events( EventId, EventName ) values (#EventId, #EventName)";
using (IDbConnection sqlconn = new SqlConnection(Sqlconstr))
{
sqlconn.Open();
Events = sqlconn.Query<Event>("Select * from events").ToList();
using (IDbConnection mySqlconn = new SqlConnection(Sqlconstr))
{
mySqlconn.Open();
mySqlconn.Execute(sqlInsert, Events);
}
}
The snippet above selects the rows from the events table in SQL Server and populates the Events list. normally Dapper will return an IEnumerable<>, but you are casting that ToList(). Now with the Events list, you connect to MySQL and execute the insert statements against the Events list.
This snippet is just a barebones example. Without a transaction on the Execute, each row will be autocommitted. If you add a Transaction, it will commit when all the items in the Events list are inserted.
Of course there are disadvantages doing it this way. One important thing to realize is that if you are trying to insert 1million rows from SQL to MySQL, that list will contain 1million entries which will increase the memory footprint. In those cases I'd use Dapper's Buffered = false option. This will return the 1million rows 1 row at a time. Your c# code can them enumerate over the results and add the row to a list and keep a counter. after 1000 rows have been inserted into list you can do the insert part into MySQL then clear the list and contine enumerating over the rows.
this will keep the memory footprint of your application small while processing a large number of rows.
With all that said, nothing beats bulk insert at the server level.
-HTH

Django admin - model visible to superuser, not staff user

I am aware of syncdb and makemigrations, but we are restricted to do that in production environment.
We recently had couple of tables created on production. As expected, tables were not visible on admin for any user.
Post that, we had below 2 queries executed manually on production sql (i ran migration on my local and did show create table query to fetch raw sql)
django_content_type
INSERT INTO django_content_type(name, app_label, model)
values ('linked_urls',"urls", 'linked_urls');
auth_permission
INSERT INTO auth_permission (name, content_type_id, codename)
values
('Can add linked_urls Table', (SELECT id FROM django_content_type where model='linked_urls' limit 1) ,'add_linked_urls'),
('Can change linked_urls Table', (SELECT id FROM django_content_type where model='linked_urls' limit 1) ,'change_linked_urls'),
('Can delete linked_urls Table', (SELECT id FROM django_content_type where model='linked_urls' limit 1) ,'delete_linked_urls');
Now this model is visible under super-user and is able to grant access to staff users as well, but staff users cant see it.
Is there any table entry that needs to be entered in it?
Or is there any other way to do a solve this problem without syncdb, migrations?
We recently had couple of tables created on production.
I can read what you wrote there in two ways.
First way: you created tables with SQL statements, for which there are no corresponding models in Django. If this is the case, no amount of fiddling with content types and permissions that will make Django suddenly use the tables. You need to create models for the tables. Maybe they'll be unmanaged, but they need to exist.
Second way: the corresponding models in Django do exist, you just manually created tables for them, so that's not a problem. What I'd do in this case is run the following code, explanations follow after the code:
from django.contrib.contenttypes.management import update_contenttypes
from django.apps import apps as configured_apps
from django.contrib.auth.management import create_permissions
for app in configured_apps.get_app_configs():
update_contenttypes(app, interactive=True, verbosity=0)
for app in configured_apps.get_app_configs():
create_permissions(app, verbosity=0)
What the code above does is essentially perform the work that Django performs after it runs migrations. When the migration occurs, Django just creates tables as needed, then when it is done, it calls update_contenttypes, which scans the table associated with the models defined in the project and adds to the django_content_type table whatever needs to be added. Then it calls create_permissions to update auth_permissions with the add/change/delete permissions that need adding. I've used the code above to force permissions to be created early during a migration. It is useful if I have a data migration, for instance, that creates groups that need to refer to the new permissions.
So, finally i had a solution.I did lot of debugging on django and apparanetly below function (at django.contrib.auth.backends) does the job for providing permissions.
def _get_permissions(self, user_obj, obj, from_name):
"""
Returns the permissions of `user_obj` from `from_name`. `from_name` can
be either "group" or "user" to return permissions from
`_get_group_permissions` or `_get_user_permissions` respectively.
"""
if not user_obj.is_active or user_obj.is_anonymous() or obj is not None:
return set()
perm_cache_name = '_%s_perm_cache' % from_name
if not hasattr(user_obj, perm_cache_name):
if user_obj.is_superuser:
perms = Permission.objects.all()
else:
perms = getattr(self, '_get_%s_permissions' % from_name)(user_obj)
perms = perms.values_list('content_type__app_label', 'codename').order_by()
setattr(user_obj, perm_cache_name, set("%s.%s" % (ct, name) for ct, name in perms))
return getattr(user_obj, perm_cache_name)
So what was the issue?
Issue lied in this query :
INSERT INTO django_content_type(name, app_label, model)
values ('linked_urls',"urls", 'linked_urls');
looks fine initially but actual query executed was :
--# notice the caps case here - it looked so trivial, i didn't even bothered to look into it untill i realised what was happening internally
INSERT INTO django_content_type(name, app_label, model)
values ('Linked_Urls',"urls", 'Linked_Urls');
So django, internally, when doing migrate, ensures everything is migrated in lower case - and this was the problem!!
I had a separate query executed to lower case all the previous inserts and voila!

Does Statement.RETURN_GENERATED_KEYS generate any extra round trip to fetch the newly created identifier?

JDBC allows us to fetch the value of a primary key that is automatically generated by the database (e.g. IDENTITY, AUTO_INCREMENT) using the following syntax:
PreparedStatement ps= connection.prepareStatement(
"INSERT INTO post (title) VALUES (?)",
Statement.RETURN_GENERATED_KEYS
);
while (resultSet.next()) {
LOGGER.info("Generated identifier: {}", resultSet.getLong(1));
}
I'm interested if the Oracle, SQL Server, postgresQL, or MySQL driver uses a separate round trip to fetch the identifier, or there is a single round trip which executes the insert and fetches the ResultSet automatically.
It depends on the database and driver.
Although you didn't ask for it, I will answer for Firebird ;). In Firebird/Jaybird the retrieval itself doesn't require extra roundtrips, but using Statement.RETURN_GENERATED_KEYS or the integer array version will require three extra roundtrips (prepare, execute, fetch) to determine the columns to request (I still need to build a form of caching for it). Using the version with a String array will not require extra roundtrips (I would love to have RETURNING * like in PostgreSQL...).
In PostgreSQL with PgJDBC there is no extra round-trip to fetch generated keys.
It sends a Parse/Describe/Bind/Execute message series followed by a Sync, then reads the results including the returned result-set. There's only one client/server round-trip required because the protocol pipelines requests.
However sometimes batches that can otherwise be streamed to the server may be broken up into smaller chunks or run one by on if generated keys are requested. To avoid this, use the String[] array form where you name the columns you want returned and name only columns of fixed-width data types like integer. This only matters for batches, and it's a due to a design problem in PgJDBC.
(I posted a patch to add batch pipelining support in libpq that doesn't have that limitation, it'll do one client/server round trip for arbitrary sized batches with arbitrary-sized results, including returning keys.)
MySQL receives the generated key(s) automatically in the OK packet of the protocol in response to executing a statement. There is no communication overhead when requesting generated keys.
In my opinion even for such a trivial thing a single approach working in all database systems will fail.
The only pragmatic solution is (in analogy to Hibernate) to find the best working solution for each target RDBMS (and
call it a dialect of your one for all solution:)
Here the information for Oracle
I'm using a sequence to generate key, same behavior is observed for IDENTITY column.
create table auto_pk
(id number,
pad varchar2(100));
This works and use only one roundtrip
def stmt = con.prepareStatement("insert into auto_pk values(auto_pk_seq.nextval, 'XXX')",
Statement.RETURN_GENERATED_KEYS)
def rowCount = stmt.executeUpdate()
def generatedKeys = stmt.getGeneratedKeys()
if (null != generatedKeys && generatedKeys.next()) {
def id = generatedKeys.getString(1);
But unfortunately you get ROWID as a result - not the generated key
How is it implemented internally? You can see it if you activate a 10046 trace (BTW this is also the best way to see
how many roundtrips were performed)
PARSING IN CURSOR
insert into auto_pk values(auto_pk_seq.nextval, 'XXX')
RETURNING ROWID INTO :1
END OF STMT
So you see the JDBC Standard 3.0 is implemented, but you don't get a requested result. Under the cover is used the
RETURNING clause.
The right approach to get the generated key in Oracle is therefore:
def stmt = con.prepareStatement("insert into auto_pk values(auto_pk_seq.nextval, 'XXX') returning id into ?")
stmt.registerReturnParameter(1, Types.INTEGER);
def rowCount = stmt.executeUpdate()
def generatedKeys = stmt.getReturnResultSet()
if (null != generatedKeys && generatedKeys.next()) {
def id = generatedKeys.getLong(1);
}
Note:
Oracle Release 12.1.0.2.0
To activate the 10046 trace use
con.createStatement().execute "alter session set events '10046 trace name context forever, level 12'"
con.createStatement().execute "ALTER SESSION SET tracefile_identifier = my_identifier"
Depending on frameworks or libraries to do things that are perfectly possible in plain SQL is bad design IMHO, especially when working against a defined DBMS. (The Statement.RETURN_GENERATED_KEYS is relatively innocuous, although it apparently does raise a question for you, but where frameworks are built on separate entities and doing all sorts of joins and filters in code or have custom-built transaction isolation logic things get inefficient and messy very quickly.)
Why not simply:
PreparedStatement ps= connection.prepareStatement(
"INSERT INTO post (title) VALUES (?) RETURNING id");
Single trip, defined result.

How to get database sql values from an active record object?

My original problem is that I need to insert a lot of records to DB, so to speed up, I want to use mysqlimport which takes a file of row values and load them to specified table. So suppose I have model Book, I couldn't simply use book.attributes.values as one of the fields is a hash that is serialized to db (using serialize), so I need to know what is the format this hash will be stored in in the db. Same for time and dates fields. Any help?
How about using SQL insert statements instead of serialization?
book = Book.new(:title => 'Much Ado About Nothing', author: 'William Shakespeare')
sql = book.class.arel_table.create_insert
.tap { |im| im.insert(record.send(
:arel_attributes_with_values_for_create,
record.attribute_names)) }
.to_sql

Entity Framework related objects insertion with stored procedure and auto_increment field

I have a problem inserting a related row through Entity Framework 5. I'm using it with RIA Services and .NET Framework version 4.5. The database system is MySQL 5.6. The connector version is 6.6.5.
It raises a Foreign Key constraint exception.
I've chosen to simplify the model to expose my issue.
LDM
Provider(id, name, address)
Article(id, name, price)
LinkToProvider(provider_id, article_id, provider_price)
// Id's are auto_increment columns.
First I create a new instance of Article. I add an instance of LinkToProvider to the LinkProvider collection of the article. In this LinkToProvider object the product itself is referenced. An existing provider is also referenced.
Then I submit the changes.
Sample code from the DataViewModel
this.CurrentArticle = new Article();
...
this.CurrentArticle.LinkToProvider.Add(
new LinkToProvider { Article = this.CurrentArticle, Provider =
this.ProviderCollection.CurrentItem }
);
...
this.DomainContext.articles.Add(this.CurrentArticle);
this.DomainContext.SubmitChanges();
NOTE :
At the begining Entity Framework inserts the product well. Then it fails because it tries to insert a row in the LinkToPrivder table with an unkown product id like the following.
INSERT
INTO LinkToProvider
VALUES(5, 0, 1.2)
It puts 0 instead of the generated id.
But if I insert a product alone without any relations the product id is generated in the database correctly.
Any help will be much appreciated !
Thank you.
I found the answer.
You need to bind the result from the stored procedure to the id column in the edmx model
So I have to modify my stored procedure to add an instruction to show the last instered id for the article table on the standard output.
SELECT LAST_INSERT_ID() AS NewArticleId;
Then I added the binding with the name of the column name returned by the stored procedure. Here it's NewArticleId.
It's explained here : http://learnentityframework.com/LearnEntityFramework/tutorials/using-stored-procedures-for-insert-update-amp-delete-in-an-entity-data-model/.