How to optimize the hibernate lucene fetching records time - mysql

I am doing hibernate lucene with boolean search to fetch the records from the database.Present in my table there are 250 records.I think it's not a matter for hibernate lucene to fetch.But what's happening is, it is taking 1.30 min to 2.30 minutes.
What's my flow is from my controller I am getting some search keywords and I am passing it to service and then to dao.In service layer I started the transaction.Finally Dao will return Listof records.After getting the records I am setting that list to List of XYZTableVo objects .
I don't know where exactly the time is taking whether in the lucene or in Object preparation of VO.
Following is my snippet
Session session = getSession();
FullTextSession fullTextSession = Search.getFullTextSession(session);
SearchFactory searchFactory = fullTextSession.getSearchFactory();
fullTextSession
.createIndexer(XYZTable.class)
.typesToIndexInParallel(20)
.batchSizeToLoadObjects(25)
.cacheMode(CacheMode.NORMAL)
.threadsToLoadObjects(5)
.startAndWait();
searchFactory.optimize(CvlizerJobseeker.class);
MultiFieldQueryParser parser = new MultiFieldQueryParser(new String[] { "Skills.skill" },
new StandardAnalyzer());
parser.setDefaultOperator(Operator.OR);
org.apache.lucene.search.Query luceneQuery = null;
QueryBuilder qb = fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(XYZTable.class)
.get();
BooleanQuery boolQuery = new BooleanQuery();
if (locationList != null) {
if (locationList.get(2) != null) {
boolQuery.add(qb.keyword().onField("XYZTablePersonalInfo.XYZTableAddress.postalCode")
.matching(locationList.get(2)).createQuery(), BooleanClause.Occur.MUST);
}
else if (locationList.get(1) != null) {
boolQuery.add(qb.keyword().onField("XYZTablePersonalInfo.XYZTableAddress.city")
.matching(locationList.get(1)).createQuery(), BooleanClause.Occur.MUST);
}
}
if (StringUtils.isEmpty(query) != true && StringUtils.isBlank(query) != true) {
try {
luceneQuery = parser.parse(query.toUpperCase());
} catch (ParseException e) {
luceneQuery = parser.parse(parser.escape(query.toUpperCase()));
}
boolQuery.add(luceneQuery, BooleanClause.Occur.MUST);
}
boolQuery.add(qb.keyword().onField("isValid").matching(false).createQuery(), BooleanClause.Occur.MUST);
FullTextQuery createFullTextQuery = fullTextSession.createFullTextQuery(boolQuery, XYZTable.class);
createFullTextQuery.list();

Related

Concurrency issue in flink stream job

I have a flink streaming job which does user fingerprinting based on click-stream event data. Code snippet is attached below
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// setting event time characteristic for processing
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
DataStream<EventData> input = ConfluentKafkaSource.
createKafkaSourceFromApplicationProperties(env);
final OutputTag<EventData> emailPresentTag = new OutputTag<>("email-present") {
};
final OutputTag<EventData> dispatchIdPresentTag = new OutputTag<>("dispatch-id-present") {
};
final OutputTag<EventData> residueTag = new OutputTag<>("residue") {
};
SingleOutputStreamOperator<EventData> splitStream = input
.process(new ProcessFunction<EventData, EventData>() {
#Override
public void processElement(
EventData data,
Context ctx,
Collector<EventData> out) {
if (data.email != null && !data.email.isEmpty()) {
// emit data to side output for emailPresentTag
ctx.output(emailPresentTag, data);
} else if (data.url != null && data.url.contains("utm_source=starling")) {
// emit data to side output for dispatchIdPresentTag
ctx.output(dispatchIdPresentTag, data);
} else {
// emit data to side output for ip/campaign attributing
ctx.output(residueTag, data);
}
}
});
DataStream<EventData> emailPresentStream = splitStream.getSideOutput(emailPresentTag);
DataStream<EventData> dispatchIdPresentStream = splitStream.getSideOutput(dispatchIdPresentTag);
DataStream<EventData> residueStream = splitStream.getSideOutput(residueTag);
// process the 3 split streams separately based on their corresponding logic
DataStream<EventData> enrichedEmailPresentStream = emailPresentStream.
keyBy(e -> e.lbUserId == null ? e.eventId : e.lbUserId).
window(TumblingProcessingTimeWindows.of(Time.seconds(30))).
process(new AttributeWithEmailPresent());
DataStream<EventData> enrichedDispatchIdPresentStream = dispatchIdPresentStream.
keyBy(e -> e.lbUserId == null ? e.eventId : e.lbUserId).
window(TumblingProcessingTimeWindows.of(Time.seconds(30))).
process(new AttributeWithDispatchPresent());
DataStream<EventData> enrichedResidueStream = residueStream.
keyBy(e -> e.lbUserId == null ? e.eventId : e.lbUserId).
window(TumblingProcessingTimeWindows.of(Time.seconds(30))).
process(new AttributeWithIP());
DataStream<EventData> dataStream = enrichedEmailPresentStream.union(enrichedDispatchIdPresentStream, enrichedResidueStream);
final OutputTag<EventData> attributedTag = new OutputTag<>("attributed") {
};
final OutputTag<EventData> unattributedTag = new OutputTag<>("unattributedTag") {
};
SingleOutputStreamOperator<EventData> splitEnrichedStream = dataStream
.process(new ProcessFunction<EventData, EventData>() {
#Override
public void processElement(
EventData data,
Context ctx,
Collector<EventData> out) {
if (data.attributedEmail != null && !data.attributedEmail.isEmpty()) {
// emit data to side output for emailPresentTag
ctx.output(attributedTag, data);
} else {
// emit data to side output for ip/campaign attributing
ctx.output(unattributedTag, data);
}
}
});
//splitting attributed and unattributed stream
DataStream<EventData> attributedStream = splitEnrichedStream.getSideOutput(attributedTag);
DataStream<EventData> unattributedStream = splitEnrichedStream.getSideOutput(unattributedTag);
// attributing backlog unattributed events using attributed stream and flushing resultant attributed
// stream to kafka enriched_clickstream_event topic.
attributedStream = attributedStream.windowAll(TumblingProcessingTimeWindows.of(Time.seconds(30))).
process(new AttributeBackLogEvents()).forceNonParallel();
attributedStream.
addSink(ConfluentKafkaSink.createKafkaSinkFromApplicationProperties()).
name("Enriched Event kafka topic sink");
//handling unattributed events. Flushing them to mysql
Properties dbProperties = ConfigReader.getConfig().get(REPORTINGDB_PREFIX);
ObjectMapper objectMapper = new ObjectMapper();
unattributedStream.addSink(JdbcSink.sink(
"INSERT IGNORE INTO events_store.unattributed_event (event_id, lb_user_id, ip, event) values (?,?,?,?)",
(ps, t) -> {
ps.setString(1, t.eventId);
ps.setString(2, t.lbUserId);
ps.setString(3, t.ip);
try {
ps.setString(4, objectMapper.writeValueAsString(t));
} catch (JsonProcessingException e) {
logger.error("[UserFingerPrintJob] "+ e.getMessage());
}
},
JdbcExecutionOptions.builder()
.withBatchIntervalMs(Long.parseLong(dbProperties.getProperty(REPORTINGDB_FLUSH_INTERVAL)))
.withMaxRetries(Integer.parseInt(dbProperties.getProperty(REPORTINGDB_FLUSH_MAX_RETRIES)))
.build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl(dbProperties.getProperty(REPORTINGDB_URL_PROPERTY_NAME))
.withDriverName(dbProperties.getProperty(REPORTINGDB_DRIVER_PROPERTY_NAME))
.withUsername(dbProperties.getProperty(REPORTINGDB_USER_PROPERTY_NAME))
.withPassword(dbProperties.getProperty(REPORTINGDB_PASSWORD_PROPERTY_NAME))
.build())).name("Unattributed event ReportingDB sink");
env.execute("UserFingerPrintJob");
Steps involved:
Splitting the stream to 3 streams based on 3 criteria and then attributing them with email and then collecting as union of these 3 streams.
Events which are unattributed in above step are sinked to mysql as backlog unattributed events.
Events which are attributed are passed on to AttributeBackLogEvents ProcessFunction. I'm assuming issue is here.
In AttributeBackLogEvents function, I'm fetching all events from mysql which have cookie-id(lb_user_id) or ip present in input attributed events. Those events are then attributed and percolated down to the kafka sink along with input attributed events. For some of these unattributed events, I'm seeing duplicate attributed events with timestamp difference of 30seconds(which is the processing time window). What i think is that while one task of AttributeBackLogEvents function is still processing, a seaparate task is fetching the same events from mysql and both the tasks are processing simultaneously. Basically i want to enforce record level lock in mysql/code so that same event don't get picked up. One way may be to use select for update, but given the size of data can lead to deadlock(or will this approach be useful?). I tried forceNonParallel() method too but isn't helpful.

Web API returning null JSON objects C#

I have a web API returning 117k JSON objects.
Edit: The API is calling MySQL to fetch 117k rows of data, putting them into a IEnumerable and sending them through JSON
All I see is
[{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},... the entire page...
I wanted to ask how someone what is happening and how you would handle a large JSON transfer. Prefer to get it all in one go to avoid querying back and forth (delay time).
The function call is this:
public IEnumerable<Score> Get(int id)
{
string mConnectionString = System.Configuration.ConfigurationManager.AppSettings["mysqlConnectionString"];
MySqlConnection mConn;
MySqlDataReader mReader;
List<Score> returnedRows = new List<Score>();
if (String.IsNullOrEmpty(mConnectionString))
{
return returnedRows;
}
try
{
// prepare the dump query
MySqlCommand dumpCmd;
string query = "SELECT * FROM score where id = "+id+";";
using (mConn = new MySqlConnection(mConnectionString))
{
using (dumpCmd = new MySqlCommand())
{
dumpCmd.Connection = mConn;
dumpCmd.CommandText = query;
mConn.Open();
mReader = dumpCmd.ExecuteReader(); /
if (mReader.HasRows)
{
while (mReader.Read())
{
string[] rowCols = new string[mReader.FieldCount]; // there are 20+ columns, at least the primary keys are not null
for (int i = 0; i < rowCols.Length; ++i)
{
rowCols[i] = mReader.GetString(i);
}
returnedRows.Add(new Score(rowCols));
}
mConn.Close();
return returnedRows;
}
else
{
// should return a 404 cause nothing found
mConn.Close();
}
}
}
}
catch (Exception e)
{
return returnedRows;
}
return returnedRows;
}
Either mReader.GetString(i) is returning null or you have no data in the columns.

ADO.NET Error: inserts duplicate records on SQL Server

This is weird. I have this POS winform app that connects to a SQL Server 2008 R2 database through ADO.NET. It works fine, saving thousands of records every day, but I noticed a strange behavior: When the client machine hangs for any reason when saving the record and then it reboots, the next time my app runs, it resends the previous dataset to the DB, even twice, resulting in a duplicate sales entry. I guess the ADO.NET layer keeps the transaction as not saved despite it was saved.
public int UpdateVentas(dsetVentas ds)
{
Int32 _newId = 0;
SqlDataAdapter da = null;
SqlDataAdapter daDetalle = null;
SqlDataAdapter daPagos = null;
using (TransactionScope scope = Utils.GetTransactionScope())
{
try
{
conn.Open();
//tran = conn.BeginTransaction();
// prepare adapters
da = getDA2updateVentas(conn);
daDetalle = getDA2updateVentasDetalle(conn);
daPagos = getDA2updatePagos(conn);
// prepare table of deleted, added and modified details
DataTable DeletedDetalle = ds.VentasDetalle.GetChanges(DataRowState.Deleted);
DataTable AddedDetalle = ds.VentasDetalle.GetChanges(DataRowState.Added);
DataTable ModifiedDetalle = ds.VentasDetalle.GetChanges(DataRowState.Modified);
// payments
DataTable DeletedPagos = ds.Pagos.GetChanges(DataRowState.Deleted);
DataTable AddedPagos = ds.Pagos.GetChanges(DataRowState.Added);
DataTable ModifiedPagos = ds.Pagos.GetChanges(DataRowState.Modified);
// execute in proper order
// deleted rows
if ((DeletedDetalle != null))
{
daDetalle.Update(DeletedDetalle);
}
if (DeletedPagos != null)
{
daPagos.Update(DeletedPagos);
}
// save main record
// ///////////////////////////////////
//
da.Update(ds, "Ventas");
//
// ///////////////////////////////////
// gets new ID
//dsetVentas.VentasRow r = (dsetVentas.VentasRow)ds.Ventas.Rows[0]; //_newId = r.IdVenta;
_newId = ds.Ventas[0].IdVenta;
// updates details & payments
if ((ModifiedDetalle != null))
{
daDetalle.Update(ModifiedDetalle);
}
if (ModifiedPagos != null)
{
daPagos.Update(ModifiedPagos);
}
// inserts details & payments
if ((AddedDetalle != null))
{
foreach (dsetVentas.VentasDetalleRow detalle in AddedDetalle.Rows)
{
detalle.IdVenta = _newId;
}
daDetalle.Update(AddedDetalle);
}
if (AddedPagos != null)
{
foreach (dsetVentas.PagosRow pago in AddedPagos.Rows)
{
pago.IdDocumento = _newId;
}
daPagos.Update(AddedPagos);
}
scope.Complete();
ds.AcceptChanges();
}
catch (Exception ex1)
{
_newId = 0;
_ErrorMessage = ex1.ToString();
}
finally
{
if (conn.State != ConnectionState.Closed)
{
conn.Close();
}
}
}
da = null;
return _newId;
}

APEX, Unit Test, Callout No Response with Static Resource

Bit stuck on another one i'm afraid, i am trying to write a unit test for a bulk APEX class.
The class has a calllout to the google api, so i have created a static resource which i am feeding in via a mock, so i can complete testing of processing the JSON that is returned. However for some reason the response is always empty.
Now the very odd thing is that if i use exactly the same callout/JSON code, and the same mock code on a previous #future call, then it does return fine.
Here is the class:
global class mileage_bulk implements Database.Batchable<sObject>,
Database.AllowsCallouts
{
global Database.QueryLocator start(Database.BatchableContext BC)
{
String query = 'SELECT Id,Name,Amount,R2_Job_Ref__c,R2_Shipping_Post_Code__c,Shipping_Postcode_2__c FROM Opportunity WHERE R2_Shipping_Post_Code__c != null';
return Database.getQueryLocator(query);
//system.debug('Executing'+query);
}
global void execute(Database.BatchableContext BC, List<Opportunity> scope)
{
system.debug(scope);
for(Opportunity a : scope)
{
String startPostcode = null;
startPostcode = EncodingUtil.urlEncode('HP27DU', 'UTF-8');
String endPostcode = null;
String endPostcodeEncoded = null;
if (a.R2_Shipping_Post_Code__c != null){
endPostcode = a.R2_Shipping_Post_Code__c;
Pattern nonWordChar = Pattern.compile('[^\\w]');
endPostcode = nonWordChar.matcher(endPostcode).replaceAll('');
endPostcodeEncoded = EncodingUtil.urlEncode(endPostcode, 'UTF-8');
}
Double totalDistanceMeter = null;
Integer totalDistanceMile = null;
String responseBody = null;
Boolean firstRecord = false;
String ukPrefix = 'UKH';
if (a.R2_Job_Ref__c != null){
if ((a.R2_Job_Ref__c).toLowerCase().contains(ukPrefix.toLowerCase())){
system.debug('Is Hemel Job');
startPostcode = EncodingUtil.urlEncode('HP27DU', 'UTF-8');
} else {
system.debug('Is Bromsgrove Job');
startPostcode = EncodingUtil.urlEncode('B604AD', 'UTF-8');
}
}
// build callout
Http h = new Http();
HttpRequest req = new HttpRequest();
req.setEndpoint('http://maps.googleapis.com/maps/api/directions/json?origin='+startPostcode+'&destination='+endPostcodeEncoded+'&units=imperial&sensor=false');
req.setMethod('GET');
req.setTimeout(60000);
system.debug('request follows');
system.debug(req);
try{
// callout
HttpResponse res = h.send(req);
// parse coordinates from response
JSONParser parser = JSON.createParser(res.getBody());
responseBody = res.getBody();
system.debug(responseBody);
while (parser.nextToken() != null) {
if ((parser.getCurrentToken() == JSONToken.FIELD_NAME) &&
(parser.getText() == 'distance')){
parser.nextToken(); // object start
while (parser.nextToken() != JSONToken.END_OBJECT){
String txt = parser.getText();
parser.nextToken();
//system.debug(parser.nextToken());
//system.debug(txt);
if (firstRecord == false){
//if (txt == 'text'){
//totalDistanceMile = parser.getText();
system.debug(parser.getText());
//}
if (txt == 'value'){
totalDistanceMeter = parser.getDoubleValue();
double inches = totalDistanceMeter*39.3701;
totalDistanceMile = (integer)inches/63360;
system.debug(parser.getText());
firstRecord = true;
}
}
}
}
}
} catch (Exception e) {
}
//system.debug(accountId);
system.debug(a);
system.debug(endPostcodeEncoded);
system.debug(totalDistanceMeter);
system.debug(totalDistanceMile);
// update coordinates if we get back
if (totalDistanceMile != null){
system.debug('Entering Function to Update Object');
a.DistanceM__c = totalDistanceMile;
a.Shipping_Postcode_2__c = a.R2_Shipping_Post_Code__c;
//update a;
}
}
update scope;
}
global void finish(Database.BatchableContext BC)
{
}
}
and here is the test class;
#isTest
private class mileage_bulk_tests{
static testMethod void myUnitTest() {
Opportunity opp1 = new Opportunity(name = 'Google Test Opportunity',R2_Job_Ref__c = 'UKH12345',R2_Shipping_Post_Code__c = 'AL35QW',StageName = 'qualified',CloseDate = Date.today());
insert opp1;
Opportunity opp2 = new Opportunity(name = 'Google Test Opportunity 2',StageName = 'qualified',CloseDate = Date.today());
insert opp2;
Opportunity opp3 = new Opportunity(name = 'Google Test Opportunity 3',R2_Job_Ref__c = 'UKB56789',R2_Shipping_Post_Code__c = 'AL35QW',StageName = 'qualified',CloseDate = Date.today());
insert opp3;
StaticResourceCalloutMock mock = new StaticResourceCalloutMock();
mock.setStaticResource('googleMapsJSON');
mock.setStatusCode(200); // Or other appropriate HTTP status code
mock.setHeader('Content-Type', 'application/json'); // Or other appropriate MIME type like application/xml
//Set the mock callout mode
Test.setMock(HttpCalloutMock.class, mock);
system.debug(opp1);
system.debug(opp1.id);
//Call the method that performs the callout
Test.startTest();
mileage_bulk b = new mileage_bulk();
database.executeBatch((b), 10);
Test.stopTest();
}
}
Help greatly appreciated!
Thanks
Gareth
Not certain what 'googleMapsJSON' looks like, perhaps you could post for us.
Assuming your mock resource is well formatted, make sure the file extension is ".json" and was saved with UTF-8 encoding.
If #2 does not work, you should try saving your resource as .txt - I've run in to this before where it needed a plain text resource but expected application/json content type
Be certain that the resource name string you are providing has the same casing as the name of the resource. It is case sensitive.
Are you developing on a namespaced package environment? Try adding the namespace to the resource name if so.
Otherwise, your code looks pretty good at first glance.

No results in Australia using Bing Maps SOAP

I'm creating an app for WP8 and i've been using the Bing Maps tutorial. However I don't get any results in Australia. Do I need to use a completely different API? geolocale contains a string such as "20.002, -150.2222" even if I change it to just "California" it gets results. What am I doing wrong?
I've tried to find answers in a lot of places but can't seem to find anything that's relevant.
try
{
searchService.SearchCompleted += new EventHandler<SearchService.SearchCompletedEventArgs>(MySearchCompleted);
SearchService.SearchRequest mySearchRequest = new SearchService.SearchRequest();
mySearchRequest.Credentials = new SearchService.Credentials();
mySearchRequest.Credentials.ApplicationId = "key";
SearchService.StructuredSearchQuery ssQuery = new SearchService.StructuredSearchQuery();
ssQuery.Keyword = "coffee";
ssQuery.Location = geolocale;
mySearchRequest.StructuredQuery = ssQuery;
searchService.SearchAsync(mySearchRequest);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
Coordinate 20.002, -150.2222 is in the middle of the Pacific ocean. Also, the Bing Maps SOAP services are an old legacy API. The Bing Spatial Data Services should be used.
http://msdn.microsoft.com/en-us/library/ff701734.aspx
http://rbrundritt.wordpress.com/2012/01/17/dynamically-updating-data-in-bing-maps-v7/
To use the Bing Spatial Data Services in WP8 first copy the Response, ResultSet, and Result classes from this project: http://code.msdn.microsoft.com/Augmented-Reality-with-bcb17045/sourcecode?fileId=85735&pathId=1819751232
You can then use the following code to generate your search query.
string baseURL;
//Switch between the NAVTEQ POI data sets for NA and EU based on where the user is.
if (Longitude < -30)
{
//Use the NAVTEQ NA data source: http://msdn.microsoft.com/en-us/library/hh478192.aspx
baseURL = "http://spatial.virtualearth.net/REST/v1/data/f22876ec257b474b82fe2ffcb8393150/NavteqNA/NavteqPOIs";
}
else
{
//Use the NAVTEQ EU data source: http://msdn.microsoft.com/en-us/library/hh478193.aspx
baseURL = "http://spatial.virtualearth.net/REST/v1/data/c2ae584bbccc4916a0acf75d1e6947b4/NavteqEU/NavteqPOIs";
}
//Search radius should be converted from meters to KM.
string poiRequest = string.Format("{0}?spatialFilter=nearby({1:N5},{2:N5},{3:N2})&$format=json&$top={4}&key={5}",
baseURL, Latitude, Longitude, SearchRadius / 1000, MaxResultsPerQuery, BingMapsKey);
You will need a method to pass this query to and serialize the results. Use the following:
private void GetResponse(Uri uri, Action<Response> callback)
{
System.Net.WebClient client = new System.Net.WebClient();
client.OpenReadCompleted += (s, a) =>
{
try
{
using (var stream = a.Result)
{
DataContractJsonSerializer ser = new DataContractJsonSerializer(typeof(Response));
if (callback != null)
{
callback(ser.ReadObject(stream) as Response);
}
}
}
catch (Exception e)
{
if (callback != null)
{
callback(null);
}
}
};
client.OpenReadAsync(uri);
}
Finally you will need to call the GetResponse method to make your query like this:
GetResponse(new Uri(poiRequest), (response) =>
{
if (response != null &&
response.ResultSet != null &&
response.ResultSet.Results != null &&
response.ResultSet.Results.Length > 0)
{
//Do something with the results
}
});