Python hangs on fetchall using MySQL connector - mysql

I am fairly new to Python and MySQL. I am writing code that queries 60 different tables each containing records for each second in a five minute period. The code executes every five minutes. A few of the queries can reach 1/2 MB of data but most are in the 50 KB range. I am running on a workstation running Windows 7,64-bit using MySQL Connector/Python. I am testing my code using PowerShell windows but the code will eventually run as a scheduled task. The workstation has plenty of RAM (8 GB). Other processes are running but according to the Task Manager only half of memory is being used. Mostly everything performs as expected but sometimes processing hangs. I have inserted print statements in the code (I've also used debugger tracing) to determine where the hang occurs. It is occurring on a call to fetchall. Below is the germane parts of the code. All CAPS are (pseudo)constants.
mncdb = mysql.connector.connect(
option_files=ENV_MCG_MYSQL_OPTION_FILE,
option_groups=ENV_MCG_MYSQL_OPTION_GROUP,
host=ut_get_workstation_hostname(),
database=ENV_MNC_DATABASE_NAME
)
for generic_table_id in DBR_TABLE_INDEX:
site_table_id = DBR_SITE_TABLE_NAMES[site_id][generic_table_id]
db_cursor = mncdb.cursor()
db_command = (
"SELECT *"
+" FROM "
+site_table_id
+" WHERE "
+DBR_DATETIME_FIELD
+" >= '"
+query_start_time+"'"
+" AND "
+DBR_DATETIME_FIELD
+" < '"
+query_end_time+"'"
)
try:
db_cursor.execute(db_command)
print "selected data for table "+site_table_id
try:
table_info = db_cursor.fetchall()
print "extracted data for table "+site_table_id
except:
print "DB exception "+formatExceptionInfo()
print "FETCH failed to return any rows..."
table_info = []
raise
except:
print "uncaught DB error "+formatExceptionInfo()
raise
.
.
.
other processing that uses the data
.
.
.
db_cursor.close()
mncdb.close()
.
.
.
No exceptions are being raised. In a separate PowerShell window I can access the data being processed by the code. For my testing all data in the database is loaded before the code is executed. No processes are updating the database while the code is being tested. The hanging can occur on the first execution of the code or after several hours of execution.
My question is what could be causing the code to hang on the fetchall statement?

You can alleviate this by setting the fetch size:
mncdb = mysql.connector.connect(option_files=ENV_MCG_MYSQL_OPTION_FILE, option_groups=ENV_MCG_MYSQL_OPTION_GROUP,host=ut_get_workstation_hostname(,database=ENV_MNC_DATABASE_NAME, cursorclass = MySQLdb.cursors.SSCursor)
But before you do this, you should also use the mysql excuse for prepared statements instead of string concatenation when building your statement.

Hanging could involve the MySQL tables themselves and not specifically the Python code. Do they contain many records? Are they very wide tables? Are they indexed on the datetime_field?
Consider various strategies:
Specifically select the needed columns instead of the asterisk, calling all columns.
Index on the DBR_DATETIME_FIELD being used in the where clause (i.e., implicit join).
Diagnose further with printed timers print(datetime.datetime.now()) to see which are the bottleneck tables. In doing so, be sure to import the datetime module.

Related

Workaround for load local data infile? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
My web host has upgraded its servers. The newer 5.7.27 version of MySQL that they installed has LOAD DATA LOCAL INFILE disabled by default, resulting in Error 1148 when I try to execute the command. Unfortunately I can't start or stop the MySQL instance as that is under the control of the web host. What are some workarounds or alternate methods that will allow me to import data with the least effort? All the data I want to import are currently in TSV (tab separated value) format, but I could switch to CSV or something else if required. I have Workbench installed as well if it helps.
The problem is basically the same as this one, except I cannot access and reconfigure the server (the selected answer to that question).
I'd write a Python script to take your TSV input, and use it to generate INSERT statements in a loop. Each statement would handle perhaps 100-200* new rows. Then it would execute those statements.
Run it on the same server. Do it in a transaction so you don't make a mess on your first few tries if there are errors.
There you have it: TSV import.
* Or, well, whatever you want. Doing them one at a time will be slow (because there is a small overhead associated with the execution of each SQL statement), but you probably can't just dump them all into a single INSERT unless the amount of information is small. Check your server settings/limits, and come up with a reasonable batch size for your use case. For <2000 rows, and reasonably "short" row data, 100-200 rows per statement would usually be appropriate.
Pseudo-code:
batchSize = 100
buffer = []
handleInput():
for each line in tsvFile:
data = parse(line)
add data to buffer
if size(buffer) > batchSize:
flushBuffer
if size(buffer) > batchSize:
flushBuffer
flushBuffer:
str = "INSERT INTO tbl (col1, col2, col3) VALUES"
for each row in buffer:
if !str.empty():
str += ","
str += "(" + row[col1] + ", " + row[col2] + ", " + row[col3];
executeSqlStatement(str)
buffer = []

Hive on Cosmos global instance return error code 12 in Java client

I recently developed a Java client which allow me to query my Hive tables from a simple url .
Unfortunately , since last Thursday the queries seems to have some issues . From time to time , my query which worked before , doesn't return me anything .
So I decided to take a look at my logs, and everytime I do a query this occur :
java.sql.SQLException: Query returned non-zero code: 12, cause: FAILED: Hive Internal Error: java.lang.RuntimeExcep
tion(org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create
directory /tmp/hive-root/hive_2015-06-29_09-19-53_268_7855618362212093455. Name node is in safe mode.
Resources are low on NN. Safe mode must be turned off manually.
I think the issue come from the node itself , because I didn't do any changes on my code or in my hive tables . What do you think the problem come from ? And what can I do to resolve it ?
Thank you for your reading my question .
This was due the cluster automatically entered in safe mode. We fixed it by freeing/adding some disk space.

"foreach" loop : Using all cores in R (especially if we are sending sql queries inside foreach loop)

I intend to use "foreach" to uitlize all the cores in my CPU. The catch is i need to send a sql query inside the loop. The script is working fine with normal 'for' loop, but it is giving following error when i change it to 'foreach'.
The error is :
select: Interrupted system call
select: Interrupted system call
select: Interrupted system call
Error in { : task 1 failed - "expired MySQLConnection"
The code i used is :
library(foreach)
library(doMC)
library(RMySQL)
library(multicore)
registerDoMC(cores=6)
m <- dbDriver("MySQL", max.con = 100)
con <- dbConnect(m, user="*****", password = "******", host ="**.**.***",dbname="dbname")
list<-dbListTables(con)
foreach(i = 1:(length(list))%dopar%{
query<-paste("SELECT * FROM ",list[i]," WHERE `CLOSE` BETWEEN 1 AND 100",sep="")
t<-dbGetQuery(con,query)
}
Though 'foreach' is working fine in my system for all other purposes, it is giving error only in case of sql queries. Is there a way to send sql queries inside 'foreach' loop?
My suggestion is this:
Move the database queries outside the loop, and lock access so you dont do parallel database queries. I think that will speed things up too, as you won't have parallel disk access, while still being able to do parallel processing.
Meaning (pseudo code)
db = connect to database
threadlock = lock();
parfor {
threadlock.lock
result = db query (pull all data here, as you cant process while you load without keeping the database locked)
thread.unlock
process resulting data (which is now just data, and not a sql object).
}

Updating the db 6000 times will take few minutes?

I am writing a test program with Ruby and ActiveRecord, and it reads a document
which is like 6000 words long. And then I just tally up the words by
recordWord = Word.find_by_s(word);
if (recordWord.nil?)
recordWord = Word.new
recordWord.s = word
end
if recordWord.count.nil?
recordWord.count = 1
else
recordWord.count += 1
end
recordWord.save
and so this part loops for 6000 times... and it takes a few minutes to
run at least using sqlite3. Is it normal? I was expecting it could run
within a couple seconds... can MySQL speed it up a lot?
With 6000 calls to write to the database, you're going to see speed issues. I would save the various tallies in memory and save to the database once at the end, not 6000 times along the way.
Take a look at AR:Extensions as well to handle the bulk insertions.
http://rubypond.com/articles/2008/06/18/bulk-insertion-of-data-with-activerecord/
I wrote up some quick code in perl that simply does:
Create the database
Insert a record that only contains a single integer
Retrieve the most recent record and verify that it returns what it inserted
And it does steps #2 and #3 6000 times. This is obviously a considerably lighter workload than having an entire object/relational bridge. For this trivial case with SQLite it still took 17 seconds to execute, so your desire to have it take "a couple of seconds" is not realistic on "traditional hardware."
Using the monitor I verified that it was primarily disk activity that was slowing it down. Based on that if for some reason you really do need the database to behave that quickly I suggest one of two options:
Do what people have suggested and find away around the requirement
Try buying some solid state disks.
I think #1 is a good way to start :)
Code:
#!/usr/bin/perl
use warnings;
use strict;
use DBI;
my $dbh = DBI->connect('dbi:SQLite:dbname=/tmp/dbfile', '', '');
create_database($dbh);
insert_data($dbh);
sub insert_data {
my ($dbh) = #_;
my $insert_sql = "INSERT INTO test_table (test_data) values (?)";
my $retrieve_sql = "SELECT test_data FROM test_table WHERE test_data = ?";
my $insert_sth = $dbh->prepare($insert_sql);
my $retrieve_sth = $dbh->prepare($retrieve_sql);
my $i = 0;
while (++$i < 6000) {
$insert_sth->execute(($i));
$retrieve_sth->execute(($i));
my $hash_ref = $retrieve_sth->fetchrow_hashref;
die "bad data!" unless $hash_ref->{'test_data'} == $i;
}
}
sub create_database {
my ($dbh) = #_;
my $status = $dbh->do("DROP TABLE test_table");
# return error status if CREATE resulted in error
if (!defined $status) {
print "DROP TABLE failed";
}
my $create_statement = "CREATE TABLE test_table (id INTEGER PRIMARY KEY AUTOINCREMENT, \n";
$create_statement .= "test_data varchar(255)\n";
$create_statement .= ");";
$status = $dbh->do($create_statement);
# return error status if CREATE resulted in error
if (!defined $status) {
die "CREATE failed";
}
}
What kind of database connection are you using? Some databases allow you to connect 'directly' rather then using a TCP network connection that goes through the network stack. In other words, if you're making an internet connection and sending data through that way, it can slow things down.
Another way to boost performance of a database connection is to group SQL statements together in a single command.
For example, making a single 6,000 line SQL statement that looks like this
"update words set count = count + 1 where word = 'the'
update words set count = count + 1 where word = 'in'
...
update words set count = count + 1 where word = 'copacetic'"
and run that as a single command, performance will be a lot better. By default, MySQL has a 'packet size' limit of 1 megabyte, but you can change that in the my.ini file to be larger if you want.
Since you're abstracting away your database calls through ActiveRecord, you don't have much control over how the commands are issued, so it can be difficult to optimize your code.
Another thin you could do would be to keep a count of words in memory, and then only insert the final total into the database, rather then doing an update every time you come across a word. That will probably cut down a lot on the number of inserts, because if you do an update every time you come across the word 'the', that's a huge, huge waste. Words have a 'long tail' distribution and the most common words are hugely more common then more obscure words. Then the underlying SQL would look more like this:
"update words set count = 300 where word = 'the'
update words set count = 250 where word = 'in'
...
update words set count = 1 where word = 'copacetic'"
If you're worried about taking up too much memory, you could count words and periodically 'flush' them. So read a couple megabytes of text, then spend a few seconds updating the totals, rather then updating each word every time you encounter it. If you want to improve performance even more, you should consider issuing SQL commands in batches directly
Without knowing about Ruby and Sqlite, some general hints:
create a unique index on Word.s (you did not state whether you have one)
define a default for Word.count in the database ( DEFAULT 1 )
optimize assignment of count:
recordWord = Word.find_by_s(word);
if (recordWord.nil?)
recordWord = Word.new
recordWord.s = word
recordWord.count = 1
else
recordWord.count += 1
end
recordWord.save
Use BEGIN TRANSACTION before your updates then COMMIT at the end.
ok, i found some general rule:
1) use a hash to keep the count first, not the db
2) at the end, wrap all insert or updates in one transaction, so that it won't hit the db 6000 times.

Why do our queries get stuck on the state "Writing to net" in MySql?

We have a lot of queries
select * from tbl_message
that get stuck on the state "Writing to net". The table has 98k rows.
The thing is... we aren't even executing any query like that from our application, so I guess the question is:
What might be generating the query?
...and why does it get stuck on the state "writing to net"
I feel stupid asking this question, but I'm 99,99% sure that our application is not executing a query like that to our database... we are however executing a couple of querys to that table using WHERE statement:
SELECT Count(*) as StrCount FROM tbl_message WHERE m_to=1960412 AND m_restid=948
SELECT Count(m_id) AS NrUnreadMail FROM tbl_message WHERE m_to=2019422 AND m_restid=440 AND m_read=1
SELECT * FROM tbl_message WHERE m_to=2036390 AND m_restid=994 ORDER BY m_id DESC
I have searched our application several times for select * from tbl_message but haven't found anything... But still our query-log on our mysql server is full of Select * from tbl_message queries
Since applications don't magically generate queries as they like, I think that it's rather likely that there's a misstake somewhere in your application that's causing this. Here's a few suggestions that you can use to track it down. I'm guessing that your using PHP, since your using MySQL, so I'll use that for my examples.
Try adding comments in front of all your queries in the application, like this:
$sqlSelect = "/* file.php, class::method() */";
$sqlSelect .= "SELECT * FROM foo ";
$sqlSelect .= "WHERE criteria";
The comment will show up in your query log. If you're using some kind database api wrapper, you could potentially add these messages automatically:
function query($sql)
{
$backtrace = debug_backtrace();
// The function that executed the query
$prev = $backtrace[1];
$newSql = sprintf("/* %s */ ", $prev["function"]);
$newSql .= $sql;
mysql_query($newSql) or handle_error();
}
In case you're not using a wrapper, but rather executing the queries directly, you could use the runkit extension and the function runkit_function_rename to rename mysql_query (or whatever you're using) and intercept the queries.
There are (at least) two data retrieval modes for mysql. With the c api you either call mysql_store_result() or mysql_use_result().
mysql_store_result() returns when all result data is transferred from the MySQL server to your process' memory, i.e. no data has to be transferred for further calls to mysql_fetch_row().
However, by using mysql_use_result() each record has to be fetched individually if and when mysql_fetch_row() is called. If your application does some computing that takes longer than the time period specified in net_write_timeout between two calls to mysql_fetch_row() the MySQL server considers your connection to be timed out.
Temporarily enable the query log by putting
log=
into your my.cnf file, restart mysql and watch the query log for those mystery queries (you don't have to give the log a name, it'll assume one from the host value).