Spark jdbc batch processing not inserting all records - mysql

In my spark job, I'm using jdbc batch processing to insert records into MySQL. But I noticed that all the records were not making it into MySQL. For example;
//count records before insert
println(s"dataframe: ${dataframe.count()}")
dataframe.foreachPartition(partition => {
Class.forName(jdbcDriver)
val dbConnection: Connection = DriverManager.getConnection(jdbcUrl, username, password)
var preparedStatement: PreparedStatement = null
dbConnection.setAutoCommit(false)
val batchSize = 100
partition.grouped(batchSize).foreach(batch => {
batch.foreach(row => {
val productName = row.getString(row.fieldIndex("productName"))
val quantity = row.getLong(row.fieldIndex("quantity"))
val sqlString =
s"""
|INSERT INTO myDb.product (productName, quantity)
|VALUES (?, ?)
""".stripMargin
preparedStatement = dbConnection.prepareStatement(sqlString)
preparedStatement.setString(1, productName)
preparedStatement.setLong(2, quantity)
preparedStatement.addBatch()
})
preparedStatement.executeBatch()
dbConnection.commit()
preparedStatement.close()
})
dbConnection.close()
})
I see 650 records in the dataframe.count but when I checked mysql, I see 195 records. And this is deterministic. I tried different batch sizes and still see the same number. But when I moved preparedStatement.executeBatch() inside the batch.foreach() i.e. the next line right after preparedStatement.addBatch(), I see the full 650 records in mysql..which isnt batching the insert statements anymore as its executing it immediately after adding it within a single iteration. What could be the issue preventing batching the queries?

It seems you're creating a new preparedStatement in each iteration, which means preparedStatement.executeBatch() is applied to the last batch only i.e. 195 instead of 650 records. Instead, you should create one preparedStatement then substitute the parameters in the iteration, like this:
dataframe.foreachPartition(partition => {
Class.forName(jdbcDriver)
val dbConnection: Connection = DriverManager.getConnection(jdbcUrl, username, password)
val sqlString =
s"""
|INSERT INTO myDb.product (productName, quantity)
|VALUES (?, ?)
""".stripMargin
var preparedStatement: PreparedStatement = dbConnection.prepareStatement(sqlString)
dbConnection.setAutoCommit(false)
val batchSize = 100
partition.grouped(batchSize).foreach(batch => {
batch.foreach(row => {
val productName = row.getString(row.fieldIndex("productName"))
val quantity = row.getLong(row.fieldIndex("quantity"))
preparedStatement = dbConnection.prepareStatement(sqlString)
preparedStatement.setString(1, productName)
preparedStatement.setLong(2, quantity)
preparedStatement.addBatch()
})
preparedStatement.executeBatch()
dbConnection.commit()
preparedStatement.close()
})
dbConnection.close()
})

Related

Flutter + MySQL: Mutltiple queries in a row not working (1 insert + 2 update)

I have a form with some data that is pushed to a database upon pressing a button. The data is sports score data, and in the end, I want to count only the top 10 scores for each player. Besides the values that I insert (id, course, date, score) the table also holds a column NotActive that is 1 f the score is among the top 10 for the player and should be counted, and 0 if it isn't.
To do this I use myqsl1 and have created three queries to run upon clicking the submit button (granted that the form completes the validation). I have one query for inserting data, a second query for setting all NotActive to 1 for that player, and a third query where I want to set NotActive to 0 for the top 10 highest scores from that player (see code below).
void insertScore(id, date, course, score) async {
dynamic conn = await connect();
String query1 =
"insert into Score (Bandit_ID, Bane, dato, Scores) values (?, ?, ?, ?)";
await conn.query(query1, [id, course, date, score]);
conn.close();
}
void updateDbActive1(id) async {
dynamic conn = await connect();
String query = "UPDATE Score SET IkkeAktiv = 1 WHERE Bandit_ID = ?";
await conn.query(query, [id]);
conn.close();
}
void updateDbActive2(id) async {
dynamic conn = await connect();
String query =
"UPDATE Score SET IkkeAktiv = 0 WHERE Bandit_ID = ? ORDER BY Scores DESC LIMIT 10";
await conn.query(query, [id]);
conn.close();
}
When running the queries inside phpMyAdmin the queries all output the expected result. But when I execute all three queries on the buttonpress this happens:
insertScore() runs as expected and inputs the values with NotActive defaulting to 1.
updateDbActive1() runs and is able to set every NotActive = 1 for the given player. (Though, if I change it to set every NotActive = 0 instead, it will update all except the most recently added one.)
updateDbActive2() doesn't update anything at all.
The queries are called consecutively after the button's onPressed and a form validation:
ElevatedButton(
onPressed: () {
if (_formKey.currentState!.validate()) {
_formKey.currentState!.save();
id = nameController.text;
date = dateController.text;
course = courseController.text;
score = scoreController.text;
setState(() {
nameController.text = "";
dateController.text = "";
courseController.text = "";
scoreController.text = "";
});
database.insertScore(id, date, course, score);
database.updateDbActive1(id);
database.updateDbActive2(id);
shouldDisplay = !shouldDisplay;
}
shouldDisplay ? showAlertDialog(...)
...
Have I missed something about making multiple queries that should invalidate my setup? The queries should be good - but if you have a better way to make the two updates I will appreciate any input!
I found one way to fix it. I added delay to both update functions, and it works as intended. Below is the updated code. If anyone can explain why this works and my first attempt doesn't, I would appreciate it. If you have a better solution, please don't hesitate to bring it forward.
void insertScore(id, date, course, score) async {
dynamic conn = await connect();
String query1 =
"insert into Score (Bandit_ID, Bane, dato, Scores) values (?, ?, ?,?)";
await conn.query(query1, [id, course, date, score]);
conn.close();
}
Future<void> updateDbActive1(id) async {
Future.delayed(const Duration(milliseconds: 100), () async {
dynamic conn = await connect();
String query = "UPDATE Score SET IkkeAktiv = 1 WHERE Bandit_ID = ?";
await conn.query(query, [id]);
conn.close();
});
}
Future<void> updateDbActive2(id) async {
Future.delayed(const Duration(milliseconds: 100), () async {
dynamic conn = await connect();
String query =
"UPDATE Score SET IkkeAktiv = 0 WHERE Bandit_ID = ? ORDER BY Scores DESC LIMIT 10";
await conn.query(query, [id]);
conn.close();
});
}
}

Retrive ids of ResultSet and return as java.sql.Array

I have the follow:
def getIds(name: String): java.sql.Array = {
val ids: Array[Integer] = Array()
val ps: PreparedStatement = connection.prepareStatement("SELECT id FROM table WHERE name = ?")
ps.setString(1, name)
val resultSet = ps.executeQuery()
while(resultSet.next()) {
val currentId = resultSet.getInt(1)
ids :+ currentId
}
return connection.createArrayOf("INTEGER", ids.toArray)
}
My intention is to use this method output to put into another PreparedStatement using .setArray(1, <array>)
But I'm getting the follow error: java.sql.SQLFeatureNotSupportedException
I'm using MySQL. Already tried INTEGER, INT, BIGINT. No success with none of then.
Researching more found this:
It seems that MySQL doesn't have array variables. May U can try temporary tables instead of array variables
So my solution was to create a temp table with just ids:
val idsStatement = connection.prepareStatement(
"CREATE TEMPORARY TABLE to_delete_ids SELECT id FROM table WHERE name = ?")
idsStatement.setString(1, name)
idsStatement.executeUpdate()
Than do inner join with other statments/queries to achieve same result:
val statementDeleteUsingIds = connection.prepareStatement(
"DELETE to_delete_rows FROM table2 to_delete_rows INNER JOIN to_delete_ids tdi ON tdi.id = to_delete_rows.other_tables_id")
statementDeleteUsingIds.executeUpdate()

Rollback transaction not working properly

In database manipulation command such as insert, update or delete can sometime throws exception due to invalid data. To protect the integrity of application data we must make sure when we a transaction was failed we must rollback
PreparedStatement ps = null;
Connection conn = null;
try {
conn = DriverManager.getConnection( URL, USERNAME, PASSWORD );
String query = "INSERT INTO tbl1(id, username) " +
"VALUES (?, ?)";
ps = conn.prepareStatement( query );
ps.setString( 1, "javaduke" );
ps.execute();
query = "INSERT INTO tbl2 (id, tbl1_id, " +
"quantity, price) VALUES (?, ?, ?, ?)";
ps = conn.prepareStatement( query );
ps.setInt( 1, id );
ps.setInt( 2, tbl_id );
ps.setInt( 3, 10 );
ps.setDouble( 4, 29.99 );
ps.execute();
}
catch ( SQLException e )
{
conn.rollback()
e.printStackTrace();
}
I guess this is Java.
Right after you get your connection object, turn off autocommit, like so.
conn = DriverManager.getConnection( URL, USERNAME, PASSWORD );
conn.setAutoCommit(false);
Right after your last execute() do this.
conn.commit();
Then the rollback() in your exception handler should do what you expect.
This should extend the scope of your transaction to beyond a single SQL query.

Is it possible to get the result generated by eclipse link?

I have this entity with Generationtype table working..
#Id
#Basic(optional = false)
#Column(name = "app_users_pk")
#TableGenerator( name = "appseqstore", table = "app_seq_store", pkColumnName = "app_seq_name", pkColumnValue = "app_users_pk", valueColumnName = "app_seq_value", initialValue = 1, allocationSize = 1 )
#GeneratedValue( strategy = GenerationType.TABLE,generator = "appseqstore")
private Long appUsersPk;
when I execute a create command this is what eclipselink do.. (on my glassfish log)
UPDATE app_seq_store SET app_seq_value = app_seq_value + ? WHERE app_seq_name = ?
bind => [1, app_users_pk]
SELECT app_seq_value FROM app_seq_store WHERE app_seq_name = ?
bind => [app_users_pk]
INSERT INTO app_users (app_users_pk, username) VALUES (?, ?)
bind => [33, try lang]
SELECT app_users_pk, username FROM app_users
When I check the mysql log I found that eclipselink make 3 queries
1st is to update the table by incrementing by 1
2nd get the incremented value
and 3rd insert using the the incremented value
(on mysql query log)
SET autocommit=0
UPDATE app_seq_store SET app_seq_value = app_seq_value + 1 WHERE app_seq_name = 'app_users_pk'
SELECT app_seq_value FROM app_seq_store WHERE app_seq_name = 'app_users_pk'
INSERT INTO app_users (app_users_pk, username) VALUES (33, 'try lang')
commit
SET autocommit=1
now I see that eclipselink issued an "select" query..
is it possible to get the result from that query? because i also want that key to be used in other related table, or is their any other way to achieve this purpose?

fatch only those values from case class which follow the condition

def getAll(userid:BigInteger) = {
DB.withConnection { implicit Connection =>
val dat = SQL("select * from id_info_user where user_id=" + userid)
var data = dat().map(row =>
RecordAll(row[Int]("country"),row[Int]("age"),row[Int]("gender"),row[Int] ("school"),row[Int]("college"),row[Int]("specialization"),row[Int]("company"))).toList
data
}
}
Database contains six columns which have only zero or one value.
This give me the list of row values but i want only those values which are one.