I'm working a .NET app where the user selects some filters like date, id, etc.
What I need to do is query a SQL Server database table with those filters, and dump them into a MySQL table. I don't need all fields in the table, only a few.
So far, I need to loop through all records in the SQL Server Dataset and insert them one by one on my MySQL table.
Is there anyway of achieving better performance? I've been playing with Dapper but cant figure out a way to do something like:
Insert into MySQLTable (a,b,c)
Select a,b,c from SQLServerTable
where a=X and b=C
Any ideas?
Linked server option is not possible because we have no access to the SQL server configuration, so looking for the most efficient way of bulk inserting data.
If I were to do this inside .NET with dapper I'd use c# and do the following;
assumptions:
a table in both tables with the same schema;
CREATE Events (
EventId int,
EventName varchar(10));
A .Net class
public class Event
{
public int EventId { get; set; }
public string EventName { get; set; }
}
The snippet below should give you something you can use as a base.
List<Event> Events = new List<Event>();
var sqlInsert = "Insert into events( EventId, EventName ) values (#EventId, #EventName)";
using (IDbConnection sqlconn = new SqlConnection(Sqlconstr))
{
sqlconn.Open();
Events = sqlconn.Query<Event>("Select * from events").ToList();
using (IDbConnection mySqlconn = new SqlConnection(Sqlconstr))
{
mySqlconn.Open();
mySqlconn.Execute(sqlInsert, Events);
}
}
The snippet above selects the rows from the events table in SQL Server and populates the Events list. normally Dapper will return an IEnumerable<>, but you are casting that ToList(). Now with the Events list, you connect to MySQL and execute the insert statements against the Events list.
This snippet is just a barebones example. Without a transaction on the Execute, each row will be autocommitted. If you add a Transaction, it will commit when all the items in the Events list are inserted.
Of course there are disadvantages doing it this way. One important thing to realize is that if you are trying to insert 1million rows from SQL to MySQL, that list will contain 1million entries which will increase the memory footprint. In those cases I'd use Dapper's Buffered = false option. This will return the 1million rows 1 row at a time. Your c# code can them enumerate over the results and add the row to a list and keep a counter. after 1000 rows have been inserted into list you can do the insert part into MySQL then clear the list and contine enumerating over the rows.
this will keep the memory footprint of your application small while processing a large number of rows.
With all that said, nothing beats bulk insert at the server level.
-HTH
Related
I'm using Lucee 5.x and Maria DB (MySQL).
I have a user supplied comma delimited list. I need to query the database and if the item isn't in the database, I need to add it.
user supplied list
green
blue
purple
white
database items
black
white
red
blue
pink
orange
lime
It is not expected that the database list would grow to more than 30 items but end-users always find 'creative' ways to use the tools we provide them.
So using the user supplied list above, only green and purple should be added to the database.
Do I compare the user supplied list against the database items or vice versa? Would the process change if the user supplied list count exceeds what is in the database (meaning if the user submits 10 items and the database only contains 5 items)? I'm not sure which loop is the better way to determine which items are new. Needs to be in cfscript and I'm looking at the looping options as outlined here (https://www.petefreitag.com/cheatsheets/coldfusion/cfscript/)
FOR Loop
FOR IN Loop (Array)
FOR IN Loop (Query)
I tried MySQL of NOT IN but that left me with the existing database values in addition to the new ones. I know this should be simple and I'm over complicating this somewhere and/or am too close to the problem to see the solution.
You could do this:
get a list with existing items from database
append user supplied list
remove duplicates
update db if items were added
<cfscript>
var userItems = '"green","blue","purple","white"';
var dbItems = '"black","white","red","blue","pink","orange","lime"';
var result = ListRemoveDuplicates( ListAppend(dbItems, userItems));
if (ListLen(result) neq ListLen(dbItems)) {
// update db
}
</cfscript>
Update (only new items)
<cfscript>
var userItems = '"green","blue","purple","white"';
var dbItems = '"black","white","red","blue","pink","orange","lime"';
var newItems = '';
ListEach(userItems, function (item) {
if (not ListFind(dbItems, item)) {
newItems = ListAppend(newItems, item);
}
})
</cfscript>
trycf.com gist:
(https://trycf.com/gist/f6a44821165338b3c10b7808606979e6/lucee5?theme=monokai)
Again, since this is an operation that the database can do, I'd feed the input data to the database and then let it decide how to deal with multiple keys. I don't recommend using CF to loop through your values to check them and then doing the INSERT. This will require multiple trips to the database and then processing on the application server that isn't really needed.
My suggestion is to use MariaDB's INSERT....ON DUPLICATE KEY UPDATE... syntax. This will also require that whatever field you are trying to insert on actually has a UNIQUE constraint on it. Without that constraint, then your database itself doesn't care if you have duplicate data, when can cause its own set of issues.
For the database, we have
CREATE TABLE t1 (mycolor varchar(50)
, CONSTRAINT constraint_mycolor UNIQUE (mycolor)
) ;
INSERT INTO t1(mycolor)
VALUES ('black'),('white'),('red'),('blue'),('pink'),('orange'),('lime')
;
The ColdFusion is:
<cfscript>
myInputValues = "green,blue,purple,white" ;
myQueryValues = "" ;
function sanitizeValue ( String inVal required ) {
// do sanitization stuff here
var sanitizedInVal = arguments.inVal ;
return sanitizedInVal ;
}
myQueryValues = myInputValues.listMap(
function(i) {
return "('" & sanitizeValue(i) & "')" ;
}
) ;
// This will take parameterization out of the cfquery tag and
preform sanitization and validation before building the
query string.
myQuery = new query();
myQuery.name = "myQuery";
myQuery.setDataSource("dsn");
sqlString = "INSERT INTO t1(mycolor) VALUES "
& myQueryValues
& " ON DUPLICATE KEY UPDATE mycolor=mycolor;"
;
myQuery.setSQL(sqlString);
myQueryResult = myQuery.execute().getResult();
</cfscript>
First, build up your input values (myInputValues). You'll want to do validation and sanitization on them to prevent nastiness from entering your database. I created a sanitizeValue function to be the placeholder for the sanitization and validation operations.
myQueryValues will become a string list of the values in the proper format that we will use to insert into the database.
Then we just build up a new query(), using myQueryValues in the sqlString to get our query. Again, since we are building a string for multiple values to INSERT, I don't think there's a way to user queryparam for those VALUES. But since we cleaned up our string earlier, it should do much of what cfqueryparam does anyway.
We use MariaDB's INSERT INTO .... ON DUPLICATE KEY UPDATE ... syntax to only insert unique values. Again, this requires that the database itself has a constraint to prevent duplicates in whatever column we're inserting.
For a demo: https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=4308da3addb9135e49eeee451c6e9e58
This should do what you're looking to do without beating up on your database too much. I don't have a Lucee or MariaDB server set up to test, so you'll have to give it a shot and see how it performs. I don't know how big your database is or will become, but this should still query pretty quickly.
I am trying to write a Spring Batch Starter job that reads a CSV file and inserts the records into a MySQL DB. When it begins I want to save the start time in a tracking table, and when it ends, the end time in that same table. The table structure is like:
TRACKING : id, start_time, end_time
DATA: id, product, version, server, fk_trk_id
I am unable to find an example project that does such a thing. I believe this needs to be a Spring Batch Starter project that can handle multiple queries. i.e.
// insert start time
1. INSERT INTO tracking (start_time) VALUES (NOW(6));
// get last inserted id for foreign key
2. SET #last_id_in_tracking = LAST_INSERT_ID();
// read from CSV and insert data into 'data' DB table
3. INSERT INTO data (product, version, server, fk_trk_id) VALUES (mysql, 5.1.42, Server1, #last_id_in_tracking);
4. INSERT INTO data (product, version, server, fk_trk_id) VALUES (linux, 7.0, Server2, #last_id_in_tracking);
5. INSERT INTO data (product, version, server, fk_trk_id) VALUES (java, 8.0, Server3, #last_id_in_tracking);
// insert end time
6. UPDATE tracking SET end_time = NOW(6) WHERE fk_trk_id = #last_id_in_table1;
I'd like sample code and explanation on how to use those queries to multiple tables in the same Spring Batch Starter job.
start of edit section - additional question
I do have an additional question. In my entities I have them set-up to represent the relationships with annotations (i.e #ManyToOne, #JoinColumn)...
In your code, how would I get the trackingId from a referenced object? Let me explain:
My Code (Data.java):
#JsonManagedReference
#ManyToOne
#JoinColumn(name = "id")
private Tracking tracking;
Your code (Data.java):
#Column(name = "fk_trk_id")
private Long fkTrkId;
Your code (JobConfig.java):
final Data data = new Data();
data.setFkTrkId(trackingId);
How do I set the id with "setFkTrkId" when the relationship in my Entity is an object?
end of edit section - additional question
Here is an example app that does what you're asking. Please see the README for details.
https://github.com/joechev/examples/tree/master/csv-reader-db-writer
I have created a project for you as an example. Please refer to https://bigzidane.wordpress.com/2018/02/25/spring-batch-mysql-reader-writer-processor-listener/
This example simply has a Reader/Processor/Writer. The reader will read a CSV file and then process something and then write to database.
And we have a listener to capture StartJob and EndJob. For Start Job, we will insert an entry to DB and then return a generatedId. We will pass the same ID to writer when we stored entries.
Note: I'm sorry I'm reused an example I have already. So it may not match 100% as your question but technically it should be the same.
Thanks,
Nghia
I have a scenario where I need to parse flat files and process those records into mysql database inserts (schema already exists).
I'm using the FlatFileItemReader to parse the files and a JdbcCursorItemWriter to insert in the database.
I'm also using an ItemProcessor to convert any column values or skip records that I don't want.
My problem is, some of those inserts need to have a foreign key to some other table that already has data into it.
So I was thinking to do a select to retrieve the ID and update the pojo, inside the ItemProcessor logic.
Is this the best way to do it? I can consider alternatives as I'm just beginning to write all this.
Thanks!
The ItemProcessor in a Spring Batch step is commonly used for enrichment of data and querying a db for something like that is common.
For the record, another option would be to use a sub select in your insert statement to get the foreign key value as the record is being inserted. This may be a bit more performant give it removes the additional db hit.
for the batch process - if you require any where you can call use below method anywhere in batch using your batch listeners
well the below piece of code which I wrote , worked for me --
In you Main class - load your application context in a static variable - APP_CONTEXT
If you are not using XML based approach - then get the dataSource by auto-wiring it and then you can use below code -
Connection conn = null;
PreparedStatement pstmt= null;
try {
DataSource dataSource = (DataSource) Main.APP_CONTEXT
.getBean("dataSource");
conn = dataSource.getConnection();
pstmt = conn.prepareStatement(" your SQL query to insert ");
pstmtMstr.executeQuery();
} catch (Exception e) {
}finally{
if(pstmt!=null){
pstmt.close();
}if(conn!=null){
conn.close();
}
}
My original problem is that I need to insert a lot of records to DB, so to speed up, I want to use mysqlimport which takes a file of row values and load them to specified table. So suppose I have model Book, I couldn't simply use book.attributes.values as one of the fields is a hash that is serialized to db (using serialize), so I need to know what is the format this hash will be stored in in the db. Same for time and dates fields. Any help?
How about using SQL insert statements instead of serialization?
book = Book.new(:title => 'Much Ado About Nothing', author: 'William Shakespeare')
sql = book.class.arel_table.create_insert
.tap { |im| im.insert(record.send(
:arel_attributes_with_values_for_create,
record.attribute_names)) }
.to_sql
I have written a Java program to do the following and would like opinions on my design:
Read data from a CSV file. The file is a database dump with 6 columns.
Write data into a MySQL database table.
The database table is as follows:
CREATE TABLE MYTABLE
(
ID int PRIMARY KEY not null auto_increment,
ARTICLEID int,
ATTRIBUTE varchar(20),
VALUE text,
LANGUAGE smallint,
TYPE smallint
);
I created an object to store each row.
I used OpenCSV to read each row into a list of objects created in 1.
Iterate this list of objects and using PreparedStatements, I write each row to the database.
The solution should be highly amenable to the changes in requirements and demonstrate good approach, robustness and code quality.
Does that design look ok?
Another method I tried was to use the 'LOAD DATA LOCAL INFILE' sql statement. Would that be a better choice?
EDIT: I'm now using OpenCSV and it's handling the issue of having commas inside actual fields. The issue now is nothing is writing to the DB. Can anyone tell me why?
public static void exportDataToDb(List<Object> data) {
Connection conn = connect("jdbc:mysql://localhost:3306/datadb","myuser","password");
try{
PreparedStatement preparedStatement = null;
String query = "INSERT into mytable (ID, X, Y, Z) VALUES(?,?,?,?);";
preparedStatement = conn.prepareStatement(query);
for(Object o : data){
preparedStatement.setString(1, o.getId());
preparedStatement.setString(2, o.getX());
preparedStatement.setString(3, o.getY());
preparedStatement.setString(4, o.getZ());
}
preparedStatement.executeBatch();
}catch (SQLException s){
System.out.println("SQL statement is not executed!");
}
}
From a purely algorithmic perspective, and unless your source CSV file is small, it would be better to
prepare your insert statement
start a transaction
load one (or a few) line(s) from it
insert the small batch into your database
return to 3. while there are some lines remainig
commit
This way, you avoid loading the entire dump in memory.
But basically, you probably had better use LOAD DATA.
If the no. of rows is huge, then the code will fail at Step 2 with out of memory error. You need to figure out a way to get rows in chunks and perform a batch with prepared statement for that chunk, continue till all the rows are processed. This will work for any no. of rows and also the batching will improve performance. Other than this I don't see any issue with the design.