Entity Framework is quite slow - entity-framework-4.1

I gave a simple EF query
return context.GlobalEntities
.Where(ge => ge.IsActive == true)
.ToList()
which gets translated into the following SQL query
SELECT
[Extent1].[GlobalEntityId] AS [GlobalEntityId],
[Extent1].[GlobalEntityName] AS [GlobalEntityName],
[Extent1].[GlobalEntityAlias] AS [GlobalEntityAlias],
[Extent1].[GlobalEntityTypeId] AS [GlobalEntityTypeId],
[Extent1].[IsActive] AS [IsActive],
[Extent1].[CreatedBy] AS [CreatedBy],
[Extent1].[CreatedAt] AS [CreatedAt],
[Extent1].[ChangedBy] AS [ChangedBy],
[Extent1].[ChangedAt] AS [ChangedAt],
[Extent1].[EmailAddress] AS [EmailAddress],
[Extent1].[AverageActionTimeInHrs] AS [AverageActionTimeInHrs],
[Extent1].[AverageCreationTimeInHrs] AS [AverageCreationTimeInHrs]
FROM [dbo].[GlobalEntity] AS [Extent1]
WHERE 1 = [Extent1].[IsActive]
When i execute the above code it takes 45 seconds to execute where as when i execute the same query from SSMS it takes only 1 sec.
Any idea on why is it taking so long for entity framework to execute a simple query. Here is what SQL profiler says
SQL:BatchCompleted SELECT
[Extent1].[GlobalEntityId] AS [GlobalEntityId],
[Extent1].[GlobalEntityName] AS [GlobalEntityName],
[Extent1].[GlobalEntityAlias] AS [GlobalEntityAlias],
[Extent1].[GlobalEntityTypeId] AS [GlobalEntityTypeId],
[Extent1].[IsActive] AS [IsActive],
[Extent1].[CreatedBy] AS [CreatedBy],
[Extent1].[CreatedAt] AS [CreatedAt],
[Extent1].[ChangedBy] AS [ChangedBy],
[Extent1].[ChangedAt] AS [ChangedAt],
[Extent1].[EmailAddress] AS [EmailAddress],
[Extent1].[AverageActionTimeInHrs] AS [AverageActionTimeInHrs],
[Extent1].[AverageCreationTimeInHrs] AS [AverageCreationTimeInHrs]
FROM [dbo].[GlobalEntity] AS [Extent1]
WHERE 1 = [Extent1].[IsActive]
Application Name: EntityFramework
NTUserName: user
CPU: 187
Reads: 1357
Writes: 0
Duration: 45157
ProcessID: 9520
StartTime: 2012-07-06 10:38:40.797
End TIme: 2012-07-06 10:39:25.953

Related

pymysql 2- applicaions SELECT INSERT Query

Application 1 keeps connection open and selects data from database. I need to keep this connection open for my application. Application 2 inserts the data into the database. Application one is only able to see the data that was in the DB at the time of the connection. Even with a new Select Query the data is not up to date. You can see the output below where I started application 2 which started inserting data. When I started application one, up to 5 was already inserted into the DB. Application 2 went on and continued to add new rows up to 9, but Application 1 cannot see the new rows UNLESS I close the connection and re-open it. I am working with very high speeds and the connection time kills my application. I need to be able to leave the connection open the entire time.
#Applicaion 1
import pymysql
import time as t
queryconn = pymysql.connect(host='x, user='x', passwd='x',
db='TestDB')
while True:
queryconncurser = queryconn.cursor()
# query database
queryconncurser.execute(f"SELECT * FROM `testQuerys`")
items = queryconncurser.fetchall()
print(items)
t.sleep(4)
output:
(2,), (3,), (4,), (5,)
(2,), (3,), (4,), (5,)
(2,), (3,), (4,), (5,)
(2,), (3,), (4,), (5,)
(2,), (3,), (4,), (5,)
#Application 2
import pymysql
import time as t
n = 1
while True:
queryconn = pymysql.connect(host='x', user='x', passwd='x', db='TestDB')
queryconncurser = queryconn.cursor()
n = n + 1
# query database
queryconncurser.execute(f"INSERT INTO testQuerys (testData) VALUES ({n}) ")
queryconn.commit()
print(n)
t.sleep(4)
queryconn.close()
Output:
2
3
4
5
6
7
8
9

how parallel fetch data from MySQL with Sequel Pro in R

I want to fetch data from mysql with seqlpro in R but when I run the query it takes ages.
here is my code :
old_value<- data.frame()
new_value<- data.frame()
counter<- 0
for (i in 1:length(short_list$id)) {
mydb = OpenConn(dbname = '**', user = '**', password = '**', host = '**')
query <- paste0("select * from table where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_old <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1
query <- paste0("select * from table2 where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_new <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1)
RMySQL::dbDisconnect(mydb)
new_value<- rbind(temp_new,new_value)
old_value<- rbind(temp_old,old_value)
counter=counter+1
base::print(paste("completed for ",counter),sep="")
}
is there any way that I can writ it more efficient and call the queries faster because i have around 5000 rows which should go into the loop. Actually this query works but it takes time.
I have tried this but still it gives me error :
#parralel computing
clust <- makeCluster(length(6))
clusterEvalQ(cl = clust, expr = lapply(c('data.table',"RMySQL","dplyr","plyr"), library, character.only = TRUE))
clusterExport(cl = clust, c('config','short_list'), envir = environment())
new_de <- parLapply(clust, short_list, function(id,country) {
for (i in 1:length(short_list$id)) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
query <- paste0("select * from table1 where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_data <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1) %>% data.table::data.table()
RMySQL::dbDisconnect(mydb)
return(temp_data)}
})
stopCluster(clust)
gc(reset = T)
new_de <- data.table::rbindlist(new_de, use.names = TRUE)
I have also defined the list of short_list as following:
short_list<- as.list(short_list)
and inside short_list is:
id. country
2 US
3 UK
... ...
However it gives me this error:
Error in checkForRemoteErrors(val) :
one node produced an error: object 'i' not found
However when I remove i from the id[i] and country[i] it only give me the first row result not get all ids and country result.
I think an alternative is to upload the ids you need into a temporary table, and query for everything at once.
tmptable <- "mytemptable"
dbWriteTable(conn, tmptable, short_list, create = TRUE)
alldat <- dbGetQuery(conn, paste("
select t1.*
from ", tmptable, " tmp
left join table1 t1 on tmp.id=t1.id and tmp.country=t1.country
where t1.`date` >= '2019-04-31' and t1.`date` <= '2020-09-1'"))
dbExecute(conn, paste("drop table", tmptable))
(Many DBMSes use a leading # to indicate a temporary table that is only visible to the local user, is much less likely to clash in the schema namespace, and is automatically cleaned when the connection is closed. I generally encourage use of temp-tables here, check with your DB docs, schema, and/or DBA for more info here.)
The order of tables is important: by pulling all from mytemptable and then left join table1 onto it, we are effectively filtering out any data from table1 that does not include a matching id and country.
This doesn't solve the speed of data download, but some thoughts on that:
Each time you iterate through the queries, you have not-insignificant overhead; if there's a lot of data then this overhead should not be huge, but it's still there. Using a single query will reduce this overhead significantly.
Query time can also be affected by any index(ices) on the tables. Outside the scope of this discussion, but might be relevant if you have a large-ish table. If the table is not indexed efficiently (or the query is not structured well to use those indices), then each query will take a finite amount of time to "compile" and return data. Again, overhead that will be reduced with a single more-efficient query.
Large queries might benefit from using the command-line tool mysql; it is about as fast as you're going to get, and might iron over any issues in RMySQL and/or DBI. (I'm not saying they are inefficient, but ... it is unlikely that a free open-source driver will be faster than MySQL's own command-line utility.
As for doing this in parallel ...
You're using parLapply incorrectly. It accepts a single vector/list and iterates over each object in that list. You might use it iterating over the indices of a frame, but you cannot use it to iterate over multiple columns within that frame. This is exactly like base R's lapply.
Let's show what is going on when you do your call. I'll replace it with lapply (because debugging in multiple processes is difficult).
# parLapply(clust, mtcars, function(id, country) { ... })
lapply(mtcars, function(id, country) { browser(); 1; })
# Called from: FUN(X[[i]], ...)
debug at #1: [1] 1
id
# [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2
# [24] 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
country
# Error: argument "country" is missing, with no default
Because the argument (mtcars here, short_list in yours) is a data.frame, since it is a list-like object, lapply (and parLapply) operate on each column at a time. You were hoping that it would "unzip" the data, applying the first column's value to id and the second column's value to country. In fact, the is a function that does this: Map (and the parallel's clusterMap, as I suggested in my comment). More on that later.
The intent of parallelizing things is to not use the for loop inside the parallel function. If short_list has 10 rows, and if your use of parLapply were correct, then you would be querying all rows 10 times, making your problem significantly worse. In pseudo-code, you'd be doing:
parallelize for each row in short_list:
# this portion is run simultaneously in 10 difference processes/threads
for each row in short_list:
query for data related to this row
Two alternatives:
Provide a single argument to parLapply representing the rows of the frame.
new_de <- new_de <- parLapply(clust, seqlen(NROW(short_list)), function(rownum) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
on.exit({ DBI::dbDisconnect(mydb) })
tryCatch(
DBI::dbGetQuery(mydb, "
select * from table1
where id=? and country=?
and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1'",
params = list(short_list$id[rownum], short_list$country[rownum])),
error = function(e) e)
})
Use clusterMap for the same effect.
new_de <- clusterMap(clust, function(id, country) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
on.exit({ DBI::dbDisconnect(mydb) })
tryCatch(
DBI::dbGetQuery(mydb, "
select * from table1
where id=? and country=?
and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1'",
params = list(id, country)),
error = function(e) e)
}, short_list$id, short_list$country)
If you are not familiar with Map, it is like "zipping" together multiple vectors/lists. For example:
myfun1 <- function(i) paste(i, "alone")
lapply(1:3, myfun1)
### "unrolls" to look like
list(
myfun1(1),
myfun1(2),
myfun1(3)
)
myfun3 <- function(i,j,k) paste(i, j, k, sep = '-')
Map(f = myfun3, 1:3, 11:13, 21:23)
### "unrolls" to look like
list(
myfun3(1, 11, 21),
myfun3(2, 12, 22),
myfun3(3, 13, 23)
)
Some liberties I took in that adapted code:
I shifted from the dbSendQuery/dbFetch double-tap to a single call to dbGetQuery.
I'm using DBI functions, since DBI functions provide a superset of what each driver's package provides. (You're likely using some of it anyway, perhaps without realizing it.) You can switch back with no issue.
I added tryCatch, since sometimes errors can be difficult to deal with in parallel processes. This means you'll need to check the return value from each of your processes to see if either inherits(ret, "error") (problem) or is.data.frame (normal).
I used on.exit so that even if there's a problem, the connection closure should still occur.

Executing new database queries while looping through a query result

I have a relatively heavy MySQL select query that returns ca 150 MB of data in 25.000 rows. Looping through the whole dataset and performing the required operations, takes about 45 minutes.
For a few of the records (less than 10) I need to perform another lookup in the same database.
If I write my Python 3 code like this:
con = mysql.connection.connect( host='...', user='...', password='...', database='...' )
cursor = con.cursor( dictionary=True, buffered=True )
cursor.execute('SELECT ...')
for row in cursor:
# processing data ...
if row['...'] == ...:
cursor2 = con.cursor()
cursor2.execute('SELECT ...')
some_var = cursor2.fetch_one()[0]
# more processing...
Then, at the second execution of cursor2, I get:
mysql.connector.errors.OperationalError: MySQL Connection not available.
If I initialize cursor2 before the loop:
con = mysql.connection.connect( host='...', user='...', password='...', database='...' )
cursor = con.cursor( dictionary=True, buffered=True )
cursor2 = con.cursor()
cursor.execute('SELECT ...')
for row in cursor:
# processing data ...
if row['...'] == ...:
cursor2.execute('SELECT ...')
some_var = cursor2.fetch_one()[0]
# more processing...
Then, at the second execution I get:
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
I'm not really sure if these errors are caused by time-outs, or if it is caused by my wrong handling of multiple cursors.
If it helps, these are the timeout settings of my database:
connect_timeout 10
deadlock_timeout_long 50000000
deadlock_timeout_short 10000
delayed_insert_timeout 300
idle_readonly_transaction_timeout 0
idle_transaction_timeout 0
idle_write_transaction_timeout 0
innodb_flush_log_at_timeout 1
innodb_lock_wait_timeout 50
innodb_rollback_on_timeout OFF
interactive_timeout 28800
lock_wait_timeout 86400
net_read_timeout 30
net_write_timeout 60
rpl_semi_sync_master_timeout 10000
rpl_semi_sync_slave_kill_conn_timeout 5
slave_net_timeout 60
thread_pool_idle_timeout 60
wait_timeout 420
How should I write this code instead? This is only my second Python script, so please formulate any help at a beginners level. 😉
It turns out connections are behaving a bit different from what I was expecting. The comment from #Mike67 pointed in the right direction, I just missing some of the details. The solution is to create and destroy a new connection every time a new query is needed.
con = mysql.connection.connect( host='...', user='...', password='...', database='...' )
cursor = con.cursor( dictionary=True, buffered=True )
cursor.execute('SELECT ...')
for row in cursor:
# processing data ...
if row['...'] == ...:
con2 = mysql.connection.connect( host='...', user='...', password='...', database='...' )
cursor2 = con2.cursor()
cursor2.execute('SELECT ...')
some_var = cursor2.fetch_one()[0]
cursor2.close()
con2.close()
# more processing...
I have a feeling there must be a more efficient way to do this, but this works.

Spring JPA : Multiple Aurora RDS MySQL operation is taking too much time

Having POST API which insert data into multiple tables. (MySQL)
Implementation is using JPA.
Below are sequence of operation which are happening, any suggestion : how to optimize this.
SQL Queries :
1) Select * from University where UID = 'UNI1';
2) If (University Not Exist) then Insert INTO University ...
3) Select * from College where UID = 'UNI1'
4) If (College Not Exist) then Insert INTO College ...
**In Loop (For Each Student)**
5) Delete * from CollegeStudent;
LOOP :
6) Select * from Student where StudentId = 'ST22'
7) If (Student Not Exist) then Insert INTO Student ...
8) Insert INTO CollegeStudent (Student, College);
LOOP ENDS;
Code Snippet:
#Transactional
public void persistStudentResults(String universityId, String collegeId, List<Student> studentList) {
University university= universityRepository.findByUniversityId(universityId);
if (university == null) {
university = createUniversityObject(universityId);
universityRepository.save(university );
}
College college = collegeRepository.getCollegeByCollegeId(university.getUniversityId(), collegeId);
if (college == null) {
college = createCollegeObject(university , collegeId);
collegeRepository.save(deviceDetails);
}
collegeStudentRepository.deleteByCollegeId(university.getUniversityId(), college.getCollegeId());
for (Student student: studentList) {
Student dbStudent = studentRepository.findByStudentId(student.getStudentId());
if (dbStudent == null) {
dbStudent = createStudentObject(student);
studentRepository.save(dbStudent);
}
CollegeStudent collegeStudent = createCollegeStudentObject(dbStudent, college);
collegeStudentRepository.save(collegeStudent);
}
}
Hibernate Logs :
className=org.hibernate.engine.internal.StatisticalLoggingSessionEventListener, methodName=end> StatisticalLoggingSessionEventListener - Session Metrics {
308714170 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
524069 nanoseconds spent preparing 1 JDBC statements;
309001256 nanoseconds spent executing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
197852 nanoseconds spent executing 1 flushes (flushing a total of 1 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
It seems for each save() it is creating new connection.
Time Taken :
Number of Students : 5
MySQL DB : 243 ms
Aurora DB : 32 sec
(If directly inserted into DB using DBeaver : 1.5 Sec)
Number of Students : 30
MySQL DB : 1 sec
Aurora DB : 173 sec
(If directly inserted into DB using DBeaver : 9 Sec)
Add index keys to mostly use columns in your tables.

help with linq to sql query

I have data like this
productid cost
prant 6.70
prant 0
prant 7.0
gant 8.7
gant 0.1
gant 4.5
how can i flatten them into one result as "prant 13.70 gant 13.3" in Linq To sql
My linq query gives me two rows
Results:
prant 13.7
gant 13.3
Query:
from c in test
group new {c}
by new {
c.productid
} into g
select new
{
ProductIdandAmount = g.Key.productid + g.Sum(p => p.c.cost)
};
can someone help me out
Thanks
You've implement map, now you need to implement reduce:
var query = from c in test ...
var summary = string.Join(" ", query.ToArray());
Error:The best overloaded method match for 'string.Join(string, string[])' has some invalid arguments,
Argument '2': cannot convert from 'AnonymousType#1[]' to 'string[]'
i tried the code you gave me give me two errors....