saving large object takes too long on hibernate - mysql

I have an object with a Blob column requestData and a Text Column "requestDataText" .
These two fields may hold large Data. In my example , the blob data is around 1.2 MBs and the Text column holds the text equivalent of that Data.
When i try to commit this single entity , it takes around 20 seconds .
DBUtil.beginTransaction();
session.saveOrUpdate(entity);
DBUtil.commitTransaction();
Is there something wrong or is there a way to shorten this period ?
package a.db.entity;
// Generated Feb 22, 2016 11:57:10 AM by Hibernate Tools 3.2.1.GA
/**
* Foo generated by hbm2java
*/
#Entity
#Table(name="foo"
,catalog="bar"
)
public class Foo implements java.io.Serializable {
private Long id;
private Date reqDate;
private byte[] requestData;
private String requestDataText;
private String functionName;
private boolean confirmed;
private boolean processed;
private boolean errorOnProcess;
private Date processStartedAt;
private Date processFinishedAt;
private String responseText;
private String processResult;
private String miscData;
public AsyncRequestLog() {
}
#Id #GeneratedValue(strategy=IDENTITY)
#Column(name="Id", unique=true, nullable=false)
public Long getId() {
return this.id;
}
public void setId(Long id) {
this.id = id;
}
...
}

I just noticed you're starting a transaction and then doing a saveOrUpdate() which might explain the slow down, as hibernate will try to retrieve the row from the DB first (as explained on this other SO answer).
If you know if the entity is new call save() and if you the entity has to be updated call update().
Another suggestion, but I'm not sure if this applies any more to MySQL, try to store the blobs/clobs in a different table from where you store the data, if you are intending to update the blob/clobs. In the past this mix made MySQL run slow as it had to resize the 'block' allocated to a row. So have one table with all the attributes and a different table just for the blob/clob. This is not the case if the table is read-only.

Related

Mongojack: invalid hexadecimal representation of an ObjectId

Goal
I am trying to push some data to a mongo db using mongojack.
I expect the result to be something like this in the db:
{
"_id": "840617013772681266",
"messageCount": 69420,
"seedCount": 18,
"prefix": "f!",
"language": "en"
}
Problem
Instead, I get this error in my console.
Caused by: java.lang.IllegalArgumentException: invalid hexadecimal representation of an ObjectId: [840617013772681266]
at org.bson.types.ObjectId.parseHexString(ObjectId.java:390)
at org.bson.types.ObjectId.<init>(ObjectId.java:193)
at org.mongojack.internal.ObjectIdSerializer.serialiseObject(ObjectIdSerializer.java:66)
at org.mongojack.internal.ObjectIdSerializer.serialize(ObjectIdSerializer.java:49)
at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:728)
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:770)
... 59 more
Code
This is the code that gets called when I try to create a new Guild in the db:
public static Guild getGuild(String id) throws ExecutionException {
return cache.get(id);
}
cache is the following (load get executed):
private static LoadingCache<String, Guild> cache = CacheBuilder.newBuilder()
.expireAfterAccess(10, TimeUnit.MINUTES)
.build(
new CacheLoader<>() {
#Override
public Guild load(#NotNull String id) {
return findGuild(id).orElseGet(() -> new Guild(id, "f!"));
}
});
The findGuild method that gets called first:
public static Optional<Guild> findGuild(String id) {
return Optional.ofNullable(guildCollection.find()
.filter(Filters.eq("_id", id)).first());
}
And finally the Guild document.
#Getter
#Setter
public class Guild implements Model {
public Guild(String id, String prefix) {
this.id = id;
this.prefix = prefix;
}
public Guild() {
}
private String id;
/*
If a Discord guild sent 1,000,000,000 messages per second,
it would take roughly 292471 years to reach the long primitive limit.
*/
private long messageCount;
private long seedCount;
// The default language is specified in BotValues.java's bot.yaml.
private String language;
private String prefix;
#ObjectId
#JsonProperty("_id")
public String getId() {
return id;
}
#ObjectId
#JsonProperty("_id")
public void setId(String id) {
this.id = id;
}
}
What I've tried
I've tried multiple things, such as doing Long.toHexString(Long.parseLong(id)) truth is I don't understand the error completely and after seeing documentation I'm left with more questions than answers.
ObjectId is a 12-byte value that is commonly represented as a sequence of 24 hex digits. It is not an integer.
You can either create ObjectId values using the appropriate ObjectId constructor or parse a 24-hex-digit string. You appear to be trying to perform an integer conversion to ObjectId which generally isn't a supported operation.
You can technically convert the integer 840617013772681266 to an ObjectId by zero-padding it to 12 bytes, but standard MongoDB driver tooling doesn't do that for you and considers this invalid input (either as an integer or as a string) for conversion to ObjectId.
Example in Ruby:
irb(main):011:0> (v = '%x' % 840617013772681266) + '0' * (24 - v.length)
=> "baa78b862120032000000000"
Note that while the resulting value would be parseable as an ObjectId, it isn't constructed following the ObjectId rules and thus the value cannot be sensibly decomposed into the ObjectId components (machine id, counter and a random value).

OutOfMemoryException loading data via JPA: Need help analyzing

I wrote an application (Springboot + Data JPA + Data Rest) that keeps throwing OutOfMemoryException at me when the application loads. I can skip that code that runs on application start but then the exception may happen later down the road. It's probably best to show you what happens on application start because it's actually super simple and should not cause any problems imho:
#SpringBootApplication
#EnableAsync
#EnableJpaAuditing
public class ScraperApplication {
public static void main(String[] args) {
SpringApplication.run(ScraperApplication.class, args);
}
}
#Component
#RequiredArgsConstructor(onConstructor = #__(#Autowired))
public class DefaultDataLoader {
private final #NonNull LuceneService luceneService;
#Transactional
#EventListener(ApplicationReadyEvent.class)
public void load() {
luceneService.reindexData();
}
}
#Service
#RequiredArgsConstructor(onConstructor = #__(#Autowired))
public class LuceneService {
private static final Log LOG = LogFactory.getLog(LuceneService.class);
private final #NonNull TrainingRepo trainingRepo;
private final #NonNull EntityManager entityManager;
public void reindexData() {
LOG.info("Reindexing triggered");
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.purgeAll(Training.class);
LOG.info("Index purged");
int page = 0;
int size = 100;
boolean morePages = true;
Page<Training> pageData;
while (morePages) {
pageData = trainingRepo.findAll(PageRequest.of(page, size));
LOG.info("Loading page " + (page + 1) + "/" + pageData.getTotalPages());
pageData.getContent().stream().forEach(t -> fullTextEntityManager.index(t));
fullTextEntityManager.flushToIndexes(); // flush regularly to keep memory footprint low
morePages = pageData.getTotalPages() > ++page;
}
fullTextEntityManager.flushToIndexes();
LOG.info("Index flushed");
}
}
You can see what I am doing is clear out the index, read all Trainings from the TrainingRepo in a paged way (100 at a time) and write them into the index. Not much going on actually. A few minutes after the "Index purged" message I get this - and only this:
java.lang.OutOfMemoryError: Java heap space
In the logs I get to see "Index purged" but never see any "Loading page ..." message, so it must be stuck on the findAll() call.
I had the JVM write a heap dump and loaded it into Eclipse Memory Analyzer and got a full stack trace: https://gist.github.com/mathias-ewald/2fddb9762427374bb04d332bd0b6b499
I also looked around the report a bit, but I need help interpreting this information which is why I attached some screenshots from Eclipse Memory Analyzer.
EDIT:
I just enabled "show-sql" and saw this before everything hung:
Hibernate: select training0_.id as id1_9_, training0_.created_date as created_2_9_, training0_.description as descript3_9_, training0_.duration_days as duration4_9_, training0_.execution_id as executi14_9_, training0_.level as level5_9_, training0_.modified_date as modified6_9_, training0_.name as name7_9_, training0_.price as price8_9_, training0_.product as product9_9_, training0_.quality as quality10_9_, training0_.raw as raw11_9_, training0_.url as url12_9_, training0_.vendor as vendor13_9_ from training training0_ where not (exists (select 1 from training training1_ where training0_.url=training1_.url and training0_.created_date<training1_.created_date)) limit ?
Hibernate: select execution0_.id as id1_1_0_, execution0_.created_date as created_2_1_0_, execution0_.duration_millis as duration3_1_0_, execution0_.message as message4_1_0_, execution0_.modified_date as modified5_1_0_, execution0_.scraper as scraper6_1_0_, execution0_.stats_id as stats_id8_1_0_, execution0_.status as status7_1_0_, properties1_.execution_id as executio1_2_1_, properties1_.properties as properti2_2_1_, properties1_.properties_key as properti3_1_, stats2_.id as id1_5_2_, stats2_.avg_quality as avg_qual2_5_2_, stats2_.max_quality as max_qual3_5_2_, stats2_.min_quality as min_qual4_5_2_, stats2_.null_products as null_pro5_5_2_, stats2_.null_vendors as null_ven6_5_2_, stats2_.products as products7_5_2_, stats2_.tags as tags8_5_2_, stats2_.trainings as training9_5_2_, stats2_.vendors as vendors10_5_2_, producthis3_.stats_id as stats_id1_6_3_, producthis3_.product_histogram as product_2_6_3_, producthis3_.product_histogram_key as product_3_3_, taghistogr4_.stats_id as stats_id1_7_4_, taghistogr4_.tag_histogram as tag_hist2_7_4_, taghistogr4_.tag_histogram_key as tag_hist3_4_, vendorhist5_.stats_id as stats_id1_8_5_, vendorhist5_.vendor_histogram as vendor_h2_8_5_, vendorhist5_.vendor_histogram_key as vendor_h3_5_ from execution execution0_ left outer join execution_properties properties1_ on execution0_.id=properties1_.execution_id left outer join stats stats2_ on execution0_.stats_id=stats2_.id left outer join stats_product_histogram producthis3_ on stats2_.id=producthis3_.stats_id left outer join stats_tag_histogram taghistogr4_ on stats2_.id=taghistogr4_.stats_id left outer join stats_vendor_histogram vendorhist5_ on stats2_.id=vendorhist5_.stats_id where execution0_.id=?
Apparently, it creates the statement to fetch all the Training entities but the Execution statement is the last it manages to execute.
I changed the relation from Training to Execution from #ManyToOne to #ManyToOne(fetch = FetchType.LAZY) and suddenly I the code was able to load data into the index again. So I am thinking something might be wrong with my Execution entity mapping. Let me share the code with you:
#Entity
#Data
#EntityListeners(AuditingEntityListener.class)
public class Execution {
public enum Status { SCHEDULED, RUNNING, SUCCESS, FAILURE };
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#ToString.Include
private Long id;
#Column(updatable = false)
private String scraper;
#CreatedDate
private LocalDateTime createdDate;
#LastModifiedDate
private LocalDateTime modifiedDate;
#Min(0)
#JsonProperty(access = Access.READ_ONLY)
private Long durationMillis;
#ElementCollection(fetch = FetchType.EAGER)
private Map<String, String> properties;
#NotNull
#Enumerated(EnumType.STRING)
private Status status;
#Column(length = 9999999)
private String message;
#EqualsAndHashCode.Exclude
#OneToOne(cascade = CascadeType.ALL)
private Stats stats;
}
And since it is a relation of Execution, here's the Stats entity, too:
#Entity
#Data
public class Stats {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#ToString.Include
private Long id;
private Long trainings;
private Long vendors;
private Long products;
private Long tags;
private Long nullVendors;
private Long nullProducts;
private Double minQuality;
private Double avgQuality;
private Double maxQuality;
#ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> vendorHistogram;
#ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> productHistogram;
#ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> tagHistogram;
}
All this is running in a single transaction and I can't see a clear here, so the EntityManager loading all this data still references it.
To fix this inject the EntityManager and invoke clear. Or alternatively make the scope of the transaction the processing of one page.
I recommend the TransactionTemplate for this.
I'm not familiar with the FullTextEntityManager but it might have similar problems.
For more background you might want to read up on the JPA entity lifecycle.
I believe it has to do with your FullTextEntityManager not finding enough memory. You have to configure your queryPlanCache.Go through this thread on how to Stackoverflow and this one too.

How handle currents updates in spring-boot hibernate problem? Also need to make app scalable

Project type :- Spring-boot JPA project
Hi,
I have below Rest service which increments a number in database.
#RestController
public class IncrementController {
#Autowired
MyNumberRepository mynumberRepository;
#GetMapping(path="/incrementnumber")
public String incrementNumber(){
Optional<MyNumber> mynumber = mynumberRepository.findById(1);
int i = mynumber.get().getNumber();
System.out.println("value of no is "+i);
i = i+1;
System.out.println("value of no post increment is "+i);
mynumber.get().setNumber(i);
MyNumber entity = new MyNumber();
entity.setId(1);
entity.setNumber(i);
mynumberRepository.save(entity);
return "done";
}
}
Entity is as below :-
#Entity
#Table(name = "my_number")
public class MyNumber {
#Id
private Integer id;
private Integer number;
public Integer getId() {
return id;
}
public void setId(Integer id) {
this.id = id;
}
public Integer getNumber() {
return number;
}
public void setNumber(Integer number) {
this.number = number;
}
}
Below is the Repository :-
public interface MyNumberRepository extends JpaRepository<MyNumber, Integer>{
}
The service works well when I call increment number sequentially , but when concurrent threads call the incrementservice then i get non consistent results. How can I handle this situation ?
Also have to deploy the app on multiple places and connecting to same DB. i.e Scalability concern.
Thanks,
Rahul
You must use a pessimistic lock. This will issue a SELECT FOR UPDATE and lock the row for the transaction and it's not possible for another transaction to overwrite the row.
public interface MyNumberRepository extends JpaRepository<MyNumber, Integer> {
#Lock(LockModeType.PESSIMISTIC_WRITE)
Optional<MyNumber> findById(Integer id);
}
And then you have to make your REST method transactional by adding #Transactional
#RestController
public class IncrementController {
#Autowired
MyNumberRepository mynumberRepository;
#Transactional
#GetMapping(path="/incrementnumber")
public String incrementNumber(){
Optional<MyNumber> mynumber = mynumberRepository.findById(1);
int i = mynumber.get().getNumber();
System.out.println("value of no is "+i);
i = i+1;
System.out.println("value of no post increment is "+i);
mynumber.get().setNumber(i);
MyNumber entity = new MyNumber();
entity.setId(1);
entity.setNumber(i);
mynumberRepository.save(entity);
return "done";
}
}
Above solution will work , but i feel you are doing over-engineering for very simple problem.
My recommendation would be to use database sequence.I feel your requirement is quite straight forward.In your service u can simply call getnextvalue on the sequence and then set the value in the Id field.This way u don't have to manage locks also as Database will do that for you.
In oracle particularly sequences are managed in a different transactions . So if ur calling code fails with exception , still the value of sequence will be incremented . This will ensure that multi-threads will not see the same value of the sequence in case of exceptions.
Instead of locking transaction, you could also use an Oracle sequence or MySQL "AUTO_INCREMENT" feature which will prevent any ID being returned twice.
https://community.oracle.com/thread/4156674
Thread safety of MySql's Select Last_Insert_ID

Hibernate: Storing an fixed length array in one database table row

I have been trying to find a solution to store a fixed length array as a property of an object using hibernate in the same DB table as the object not using a BLOB for the array.
I currently have a class ProductionQCSession which looks like
#Entity
public class ProductionQCSession extends IdEntity {
private Long id;
private Float velocity;
private Float velocityTarget;
private Float[] velocityProfile;
public ProductionQCSession() {
}
#Id #GeneratedValue(strategy=GenerationType.AUTO)
#Override
public Long getId() {
return id;
}
#SuppressWarnings("unused")
public void setId(Long id) {
this.id = id;
}
#Basic
public Float getVelocity() {
return velocity;
}
public void setVelocity(Float velocity) {
this.velocity = velocity;
}
#Basic
public Float[] getVelocityProfile() {
return velocityProfile;
}
public void setVelocityProfile(Float[] velocityProfile) {
this.velocityProfile = velocityProfile;
}
}
Ideally I would like the DB structure to be
id|velocity|VPValue0|VPValue1|VPValue2|VPValue3|...
21| 2.1| 0.1| 0.2| -0.1| 0.3|...
I know with a high certainty that we always have 15 items in the velocityProfile array and those values as just as much properties of the object as any other property therefore I think it makes sense to add them to the database table schema, if it's possible. I would prefer to have it this way as it would be easy to get a overview of the data just doing a raw table print.
The current code just stores the array data as a BLOB.
I have looked http://ndpsoftware.com/HibernateMappingCheatSheet.html mapping cheat sheet, but could not seem to find any good solution.
I'm I just trying to do something nobody else would do?
Essentially, you're trying to have a multi-value field, which is not a relational database concept. A normalized solution would put those into a child table, which Hibernate would let you access directly from the parent row (and return it as a collection).
If you are adamant that it should be in a single table, then you'll need to create 15 individual columns....and hope that in the future you don't suddenly need 16.
The solution ended up being using the standardised method of using a child table even though it makes the data analysis slightly more complicated. The following code was used.
#ElementCollection
#CollectionTable(name ="QCVelocityProfile")
public List<Float> getVelocityProfile() {
return velocityProfile;
}
public void setVelocityProfile(List<Float> velocityProfile) {
this.velocityProfile = velocityProfile;
}

The most efficient way to store photo reference in a database

I'm currently looking to store approximately 3.5 million photo's from approximately 100/200k users. I'm only using a mysql database on aws. My question is in regards to the most efficient way to store the photo reference. I'm only aware of two ways and I'm looking for an expert opinion.
Choice A
A user table with a photo_url column, in that column I would build a comma separated list of photo's that both maintain the name and sort order. The business logic would handle extracting the path from the photo name and append photo size. The downside is the processing expense.
Database example
"0ea102, e435b9, etc"
Business logic would build the following urls from photo name
/0e/a1/02.jpg
/0e/a1/02_thumb.jpg
/e4/35/b9.jpg
/e4/35/b9_thumb.jpg
Choice B - Relational Table joined on user table with the following fields. I'm just concerned I may have potential database performance issues.
pk
user_id
photo_url_800
photo_url_150
photo_url_45
order
Does anybody have any suggestions on the better solution?
The best and most common answer would be: choice B - Relational Table joined on user table with the following fields.
id
order
user_id
desc
photo_url_800
photo_url_150
photo_url_45
date_uploaded
Or a hybrid, wherein, you store the file names individually and add the photo directory with your business logic layer.
My analysis, your first option is a bad practice. Comma separated fields are not advisable for database. It would be difficult for you to update these fields and add description on it.
Regarding the table optimization, you might want to see these articles:
Optimizing MyISAM Queries
Optimizing InnoDB Queries
Here is an example of my final solution using the hibernate ORM, Christian Mark, and my hybrid solution.
#Entity
public class Photo extends StatefulEntity {
private static final String FILE_EXTENSION_JPEG = ".jpg";
private static final String ROOT_PHOTO_URL = "/photo/";
private static final String PHOTO_SIZE_800 = "_800";
private static final String PHOTO_SIZE_150 = "_150";
private static final String PHOTO_SIZE_100 = "_100";
private static final String PHOTO_SIZE_50 = "_50";
#ManyToOne
#JoinColumn(name = "profile_id", nullable = false)
private Profile profile;
//Example "a1d2b0" which will later get parsed into "/photo/a1/d2/b0_size.jpg"
//using the generatePhotoUrl business logic below.
#Column(nullable = false, length = 6)
private String fileName;
private boolean temp;
#Column(nullable = false)
private int orderBy;
#Temporal(TemporalType.TIMESTAMP)
private Date dateUploaded;
public Profile getProfile() {
return profile;
}
public void setProfile(Profile profile) {
this.profile = profile;
}
public String getFileName() {
return fileName;
}
public void setFileName(String fileName) {
this.fileName = fileName;
}
public Date getDateUploaded() {
return dateUploaded;
}
public void setDateUploaded(Date dateUploaded) {
this.dateUploaded = dateUploaded;
}
public boolean isTemp() {
return temp;
}
public void setTemp(boolean temp) {
this.temp = temp;
}
public int getOrderBy() {
return orderBy;
}
public void setOrderBy(int orderBy) {
this.orderBy = orderBy;
}
public String getPhotoSize800() {
return generatePhotoURL(PHOTO_SIZE_800);
}
public String getPhotoSize150() {
return generatePhotoURL(PHOTO_SIZE_150);
}
public String getPhotoSize100() {
return generatePhotoURL(PHOTO_SIZE_100);
}
public String getPhotoSize50() {
return generatePhotoURL(PHOTO_SIZE_50);
}
private String generatePhotoURL(String photoSize) {
String firstDir = getFileName().substring(0, 2);
String secondDir = getFileName().substring(2, 4);
String photoName = getFileName().substring(4, 6);
StringBuilder sb = new StringBuilder();
sb.append(ROOT_PHOTO_URL);
sb.append("/");
sb.append(firstDir);
sb.append("/");
sb.append(secondDir);
sb.append("/");
sb.append(photoName);
sb.append(photoSize);
sb.append(FILE_EXTENSION_JPEG);
return sb.toString();
}
}