Spring Batch: ItemProcessor query Database? - mysql

I have a scenario where I need to parse flat files and process those records into mysql database inserts (schema already exists).
I'm using the FlatFileItemReader to parse the files and a JdbcCursorItemWriter to insert in the database.
I'm also using an ItemProcessor to convert any column values or skip records that I don't want.
My problem is, some of those inserts need to have a foreign key to some other table that already has data into it.
So I was thinking to do a select to retrieve the ID and update the pojo, inside the ItemProcessor logic.
Is this the best way to do it? I can consider alternatives as I'm just beginning to write all this.
Thanks!

The ItemProcessor in a Spring Batch step is commonly used for enrichment of data and querying a db for something like that is common.
For the record, another option would be to use a sub select in your insert statement to get the foreign key value as the record is being inserted. This may be a bit more performant give it removes the additional db hit.

for the batch process - if you require any where you can call use below method anywhere in batch using your batch listeners
well the below piece of code which I wrote , worked for me --
In you Main class - load your application context in a static variable - APP_CONTEXT
If you are not using XML based approach - then get the dataSource by auto-wiring it and then you can use below code -
Connection conn = null;
PreparedStatement pstmt= null;
try {
DataSource dataSource = (DataSource) Main.APP_CONTEXT
.getBean("dataSource");
conn = dataSource.getConnection();
pstmt = conn.prepareStatement(" your SQL query to insert ");
pstmtMstr.executeQuery();
} catch (Exception e) {
}finally{
if(pstmt!=null){
pstmt.close();
}if(conn!=null){
conn.close();
}
}

Related

.NET insert data from SQL Server to MySQL without looping through data

I'm working a .NET app where the user selects some filters like date, id, etc.
What I need to do is query a SQL Server database table with those filters, and dump them into a MySQL table. I don't need all fields in the table, only a few.
So far, I need to loop through all records in the SQL Server Dataset and insert them one by one on my MySQL table.
Is there anyway of achieving better performance? I've been playing with Dapper but cant figure out a way to do something like:
Insert into MySQLTable (a,b,c)
Select a,b,c from SQLServerTable
where a=X and b=C
Any ideas?
Linked server option is not possible because we have no access to the SQL server configuration, so looking for the most efficient way of bulk inserting data.
If I were to do this inside .NET with dapper I'd use c# and do the following;
assumptions:
a table in both tables with the same schema;
CREATE Events (
EventId int,
EventName varchar(10));
A .Net class
public class Event
{
public int EventId { get; set; }
public string EventName { get; set; }
}
The snippet below should give you something you can use as a base.
List<Event> Events = new List<Event>();
var sqlInsert = "Insert into events( EventId, EventName ) values (#EventId, #EventName)";
using (IDbConnection sqlconn = new SqlConnection(Sqlconstr))
{
sqlconn.Open();
Events = sqlconn.Query<Event>("Select * from events").ToList();
using (IDbConnection mySqlconn = new SqlConnection(Sqlconstr))
{
mySqlconn.Open();
mySqlconn.Execute(sqlInsert, Events);
}
}
The snippet above selects the rows from the events table in SQL Server and populates the Events list. normally Dapper will return an IEnumerable<>, but you are casting that ToList(). Now with the Events list, you connect to MySQL and execute the insert statements against the Events list.
This snippet is just a barebones example. Without a transaction on the Execute, each row will be autocommitted. If you add a Transaction, it will commit when all the items in the Events list are inserted.
Of course there are disadvantages doing it this way. One important thing to realize is that if you are trying to insert 1million rows from SQL to MySQL, that list will contain 1million entries which will increase the memory footprint. In those cases I'd use Dapper's Buffered = false option. This will return the 1million rows 1 row at a time. Your c# code can them enumerate over the results and add the row to a list and keep a counter. after 1000 rows have been inserted into list you can do the insert part into MySQL then clear the list and contine enumerating over the rows.
this will keep the memory footprint of your application small while processing a large number of rows.
With all that said, nothing beats bulk insert at the server level.
-HTH

how to insert data into mysql database using spark java framework

I am new to the spark java framework, how to insert the values into mysql database using sparkjava framework?
Assuming you have already read the data you want to insert into an RDD, You can use the following code to insert the records into database.
rdd.foreach(new VoidFunction<String>() {
#Override
public void call(String s) throws Exception {
//You Code to parse the String and insert the values into MYSQL.
}
});
To build on Shivanand's answer, you really want to use a foreachPartition as opposed to foreach. With foreach, you will be opening a db connection for every element, as opposed to once per partition. This is beneficial for a couple reasons, but most importantly its going to take sometime to open the connection. That overhead will then be carried over for every element. You will also be trying to open a lot of connections, and will probably make one of the admins pretty mad when they see potentially millions of db connection requests.
If you have already read the file, then use the foreachpartition function. It will help you to insert it into MySQL.
Take Example-
rdd.foreachpartition(new VoidFunctioin<String> x)
{
//Here make a connection enter code here
public void call(Iterator<String> x)
{
Connection c=(Connection)DriverManager.getConnection("your conncetion name,localhost,dbname");
while(x.hasNext())
{
String it=x.next();
PreparedStatement ps=c.preparestatement("Insert into table_name (coln_name) values ("?")");
ps.execute();
}
}
});
It will insert it into MySQL.

Hibernate vs SQL:Best way to Update column of all rows in a table

I have a database table alert and am using Hibernate.I am using mySQL.
I want to update all columns of the database.
The table name is alert where as the mapping class of the table is Alert.
Using SQLQUERY:
session.beginTransaction();
session.createSQLQuery("update alert set retryCount=3");
session.getTransaction().commit();
session.close();
is not working.
Using HQL using dynaming update attribute
Query query=session.createQuery("from Alert");
for(int i=0;i<query.list().size();i++){
Alert alert=(Alert)query.list().get(i);
alert.setretryCount(3);
session.update(alert);
}
session.getTransaction().commit();
is working
Though the second one is working I think it will take much more time than a normal sql query.Is it so?Whats the best way to update a set of columns of all rows while using hibernate.
Hi ! Have you tried this ?
session.beginTransaction();
String queryString = "update Alert a set a.retryCount=3"
Query query = session.createQuery(queryString);
session.executeUpdate(query);
session.close();
This is on the hibernate official documentation. This is an HQL request using entities as objects and not as SQL tables. I think it's better to do it this way since you don't need to care about your database model.
Could you give us the displayed stacktrace for the non-working method ?

Is this database dump design ok?

I have written a Java program to do the following and would like opinions on my design:
Read data from a CSV file. The file is a database dump with 6 columns.
Write data into a MySQL database table.
The database table is as follows:
CREATE TABLE MYTABLE
(
ID int PRIMARY KEY not null auto_increment,
ARTICLEID int,
ATTRIBUTE varchar(20),
VALUE text,
LANGUAGE smallint,
TYPE smallint
);
I created an object to store each row.
I used OpenCSV to read each row into a list of objects created in 1.
Iterate this list of objects and using PreparedStatements, I write each row to the database.
The solution should be highly amenable to the changes in requirements and demonstrate good approach, robustness and code quality.
Does that design look ok?
Another method I tried was to use the 'LOAD DATA LOCAL INFILE' sql statement. Would that be a better choice?
EDIT: I'm now using OpenCSV and it's handling the issue of having commas inside actual fields. The issue now is nothing is writing to the DB. Can anyone tell me why?
public static void exportDataToDb(List<Object> data) {
Connection conn = connect("jdbc:mysql://localhost:3306/datadb","myuser","password");
try{
PreparedStatement preparedStatement = null;
String query = "INSERT into mytable (ID, X, Y, Z) VALUES(?,?,?,?);";
preparedStatement = conn.prepareStatement(query);
for(Object o : data){
preparedStatement.setString(1, o.getId());
preparedStatement.setString(2, o.getX());
preparedStatement.setString(3, o.getY());
preparedStatement.setString(4, o.getZ());
}
preparedStatement.executeBatch();
}catch (SQLException s){
System.out.println("SQL statement is not executed!");
}
}
From a purely algorithmic perspective, and unless your source CSV file is small, it would be better to
prepare your insert statement
start a transaction
load one (or a few) line(s) from it
insert the small batch into your database
return to 3. while there are some lines remainig
commit
This way, you avoid loading the entire dump in memory.
But basically, you probably had better use LOAD DATA.
If the no. of rows is huge, then the code will fail at Step 2 with out of memory error. You need to figure out a way to get rows in chunks and perform a batch with prepared statement for that chunk, continue till all the rows are processed. This will work for any no. of rows and also the batching will improve performance. Other than this I don't see any issue with the design.

Generic SQL for update / insert

I'm writing a DB layer which talks to MS SQL Server, MySQL & Oracle. I need an operation which can update an existing row if it contains certain data, otherwise insert a new row; All in one SQL operation.
Essentially I need to save over existing data if it exists, or add it if it doesn't
Conceptually this is the same as upsert except it only needs to work on a single table. I'm trying to make sure I don't need to delete then insert as this has a performance impact.
Is there generic SQL to do this or do I need vendor specific solutions?
Thanks.
You need vendor specific SQL as MySQL (unlike MS and Oracle) doesn't support MERGE
http://en.wikipedia.org/wiki/Merge_(SQL)
I suspect that sooner rather than later, you're going to need a vendor specific implementation of your DB layer - SQL portability is pretty much a myth as soon as you do anything even slightly advanced.
I am pretty sure this is going to be vendor specific. For SQL Server, you can accomplish this using the MERGE statement.
If you are using SQL Server 2008, use Merge Statement. But keep in mind that if your Insert part has some condition involve, then it cannot be used. In which case you need to write your own way for accomplishing this. And in your case it has to be since you are involving MySQL which does not have a Merge Statement.
Why are you not using an ORM layer (like Entity Framework) for this purpose?
Just some pseudo code(in C#)
public int SaveTask(tblTaskActivity task, bool isInsert)
{
int result = 0;
using (var tmsEntities = new TMSEntities())
{
if (isInsert) //for insert
{
tmsEntities.AddTotblTaskActivities(task);
result = tmsEntities.SaveChanges();
}
else //for update
{
var taskActivity = tmsEntities.tblTaskActivities.Where(i => i.TaskID == task.TaskID).FirstOrDefault();
taskActivity.Priority = task.Priority;
taskActivity.ActualTime = task.ActualTime;
result = tmsEntities.SaveChanges();
}
}
return result;
}
In MySQL you have something similar to merge:
insert ... on duplicate key update ...
MySQL Reference - Insert on duplicate key update