Executing MySQL SELECT * query in parallel

Executing MySQL SELECT * query in parallel - mysql

I have a multithreaded application that periodically fetches the whole content of the MySQL table (with SELECT * FROM query)
The application is written in python, uses threading module to multithreading and uses mysql-python (mysqldb) as MySQL driver (using mysqlalchemy as a wrapper produces similar results).
I use InnoDB engine for my MySQL database.
I wrote a simple test to check the performance of SELECT * query in parallel and discovered that all of those queries are implemented sequentially.
I explicitly set the ISOLATION LEVEL to READ UNCOMMITTED, although it does not seem to help with performance.
The code snipper making the DB call is below:
#performance.profile()
def test_select_all_raw_sql(conn_pool, queue):
'''
conn_pool - connection pool to get mysql connection from
queue - task queue
'''
query = '''SELECT * FROM table'''
try:
conn = conn_pool.connect()
cursor = conn.cursor()
cursor.execute("SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED")
# execute until the queue is empty (Queue.Empty is thrown)
while True:
id = queue.get_nowait()
cursor.execute(query)
result = cursor.fetchall()
except Queue.Empty:
pass
finally:
cursor.execute("SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ")
conn.close()
Am I right expecting this query to be executed in parallel?
If yes, how can I implement that in python?

MySQL allows many connections from a single user or many users. Within that one connection, it uses at most one CPU core and does one SQL statement at a time.
A "transaction" can be composed of multiple SQL statements while the transaction is treated as atomically. Consider the classic banking application:
BEGIN;
UPDATE ... -- decrement from one user's bank balance.
UPDATE ... -- increment another user's balance.
COMMIT;
Those statements are performed serially (in a single connection); either all of them succeed or all of them fail as a unit ("atomically").
If you need to do things in "parallel", have a client (or clients) that can run multiple threads (or processes) and have each on make its own connection to MySQL.
A minor exception: There are some extra threads 'under the covers' for doing background tasks such as read-ahead or delayed-write or flushing stuff. But this does not give the user a way to "do two things at once" in a single connection.
What I have said here applies to all versions of MySQL/MariaDB and all client packages accessing them.

Related

mysql connection lost when using two sqlalchemy connectors in the same thread

Use case and background:
I want to use SELECT GET_LOCK against a fixed replica, so all of my servers can see the same locks which are against a subset of my data, but distribute the load against mysql to multiple replicas, so that my servers can be working on different subsets of the data. Therefore I have mysql_connector_locks and mysql_connector_data.
Each of these connector objects is a wrapper around a sqlalchemy engine and sessionmaker. They each have
def get_mysql_session(self, isolation_level=None):
if not self.session_maker:
self.session_maker = sessionmaker()
self.session_maker.configure(bind=self.engine)
session = self.session_maker()
if isolation_level is not None:
session.connection(execution_options={'isolation_level': isolation_level.value})
try:
yield session
session.commit()
except Exception:
session.rollback()
raise
finally:
session.close()
Now, I have my code
for data_subset_id in partitioned_data:
with mysql_connector_locks.get_mysql_session() as session_locks:
try:
with get_lock(session_locks, data_subset_id):
with mysql_connector_data.get_mysql_session(
isolation_level=IsolationLevel.READ_UNCOMMITTED
) as session_data:
data = get_data(session_data, data_subset_id)
process_data(data)
except LockNotAcquired:
continue
where get_lock follows a standard recipe for acquiring a lock.
What is going wrong:
Each server goes through one iteration of the loop, acquires a lock for the second iteration, and errors with MySQLdb._exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query') on the set transaction isolation level query during the call to mysql_connector_data.get_mysql_session.
Again: it succeeds on acquiring a lock, getting a subset of data, releasing the lock, acquiring the next one, and then fails to get the data the second time.
Other background
Previously, I was using the same connector for both the lock and the data. This has worked for years with no issues. I'm trying to distribute load for get_data, and need to keep the locks common to all the servers. If I can't get this to work I'll switch to holding the locks in a common redis server but I prefer not to go this route if possible.
Thanks for any insight!

Thread safety of sql.Result.LastInsertId()

The SQL docs say that LAST_INSERT_ID() works on "per-connection" basis, that is, the last insert ID value will not be overwritten by INSERT statements executed through other connections.
AFAIU, in Go (unlike PHP for example) we don't create separate DB connections on each client request. Instead, we are told to create just one instance of sql.DB object, which manages a pool of SQL connections under the hood. Consequently, they say, there is no guarantee that two consecutive SQL statements in a Go program (even in the same thread) will be executed through the same DB connection. Therefore, the opposite could be the case – two different threads could execute two different SQL statements on the same (reused) DB connection.
The question is: could this automatic connection management inside sql.DB affect the thread safety of sql.Result.LastInsertId()?
Consider the following case: Right after the INSERT statement in one thread, the sql.DB object reuses the connection in another thread and the other thread executes another INSERT statement on that same (reused) connection. Afterwards, the first thread queries the sql.Result.LastInsertId().
Will this return row ID of the second INSERT or the first INSERT? Is the last insert ID cached at the moment of the statement execution, or is it causing a separate statement to be sent to the DB connection?

The MySQL client-server protocol returns the value of LAST_INSERT_ID() in response packets to queries performing an INSERT operation. Generally the client APIs give that back to client code using methods like sql.Result.LastInsertId() in the SQL API. No round-trip query is required.
So the answer to your question is "the first INSERT."
To be clear, MySQL connections aren't thread safe in the broad sense. Instead, they are serially reusable resources. Multi-threaded client environments make them appear thread-safe by managing the serial reuse. You have described how that works for golang in your question.

Sending a query with a single transaction

I'm using the DBI package to send queries to a MySQL server. I'd like to assure that these queries are sent as a single transaction in order to avoid table lock.
I use the dbSendQuery function to send queries:
df <- fetch(dbSendQuery(connection,
statement = "SELECT *
FROM table"),
n = -1)
The DBI package says little about handling transactions, but what it does have is listed under these functions: dbCommit, dbRollback nor dbCallProc under the header:
Note: The following methods deal with transactions and store
procedures.
in the vignette. None seem to relate to sending queries as a single transaction.
How can I make sure I'm sending these queries as a single transaction?

Warning: not tested.
You would need some help from MySQL. By default, MySQL runs with auto commit mode enabled. To disable auto commit mode, you would need to issue a START TRANSACTION statement. I suspect dbCommit and dbRollback simply execute COMMIT and ROLLBACK, respectively.
Details: http://dev.mysql.com/doc/refman/5.0/en/commit.html
So you would need to do something like
dbSendQuery(connection, "START TRANSACTION")
# add your dbSendQuery code here
dbCommit(connection)

Should I commit after a single select

I am working with MySQL 5.0 from python using the MySQLdb module.
Consider a simple function to load and return the contents of an entire database table:
def load_items(connection):
cursor = connection.cursor()
cursor.execute("SELECT * FROM MyTable")
return cursor.fetchall()
This query is intended to be a simple data load and not have any transactional behaviour beyond that single SELECT statement.
After this query is run, it may be some time before the same connection is used again to perform other tasks, though other connections can still be operating on the database in the mean time.
Should I be calling connection.commit() soon after the cursor.execute(...) call to ensure that the operation hasn't left an unfinished transaction on the connection?

There are thwo things you need to take into account:
the isolation level in effect
what kind of state you want to "see" in your transaction
The default isolation level in MySQL is REPEATABLE READ which means that if you run a SELECT twice inside a transaction you will see exactly the same data even if other transactions have committed changes.
Most of the time people expect to see committed changes when running the second select statement - which is the behaviour of the READ COMMITTED isolation level.
If you did not change the default level in MySQL and you do expect to see changes in the database if you run a SELECT twice in the same transaction - then you can't do it in the "same" transaction and you need to commit your first SELECT statement.
If you actually want to see a consistent state of the data in your transaction then you should not commit apparently.
then after several minutes, the first process carries out an operation which is transactional and attempts to commit. Would this commit fail?
That totally depends on your definition of "is transactional". Anything you do in a relational database "is transactional" (That's not entirely true for MySQL actually, but for the sake of argumentation you can assume this if you are only using InnoDB as your storage engine).
If that "first process" only selects data (i.e. a "read only transaction"), then of course the commit will work. If it tried to modify data that another transaction has already committed and you are running with REPEATABLE READ you probably get an error (after waiting until any locks have been released). I'm not 100% about MySQL's behaviour in that case.
You should really try this manually with two different sessions using your favorite SQL client to understand the behaviour. Do change your isolation level as well to see the effects of the different levels too.

How can I execute a Database Operation outside of a transaction in Rails / ActiveRecord

I need to execute some raw SQL in my Rails app. The query will do cause an implicit commit if it is performed within a transaction. We are using MySQL with InnoDB and the query will include e.g. create table.
Executing the query with ActiveRecord::Base.connection.execute triggers the implict commit which is a problem.
It feels like I just need a separate connection for performing my queries. Can ActiveRecord provide this? I've seen discussions of connecting to multiple databases but not multiple connections to the same database.
A solution doesn't have to involve ActiveRecord if there's a better way.
Our Rails and ActiveRecord version is 3.2.3.

Database connections are done on a per thread basis (this is basically required for thread safety), which you can use to your advantage: just execute your code in a separate thread, for example
ActiveRecord::Base.transaction do
# ...
Thread.new do
ActiveRecord::Base.connection.execute "..." # in a new connection
end.join
end
As of rails 4, activerecord no longer reaps connections created in this way automatically. To avoid leaking connections you need to return them to the pool. As Matt Connelly suggests, the easiest way to do this is to use the with_connection method which will check the connection back in at the end of the block, for example
Thread.new do
ActiveRecord::Base.connection_pool.with_connection do
...
end
end

It is important that if you use a connection in a thread that you return the connection to the connection pool when done. The easiest way to do it is like this:
Thread.new do
ActiveRecord::Base.connection_pool.with_connection do |connection|
connection.execute "..."
# ensures the connection is returned to the pool when the thread is done.
end
end.join

DDl and some more queries fire implicit commit so they cannot be rolled back as per the mysql docs
http://dev.mysql.com/doc/refman/5.1/en/implicit-commit.html
These implicitly end any transaction active in the current session, as if you had done a COMMIT before executing the statement.
If there are no such queries in between then you can use SAVEPOINT feature.
(This does not work with DDL statements)
There is a option in active-record which helps create sub transactions which uses save points
ActiveRecord::Base.transaction do
# ...
ActiveRecord::Base.transaction(:requires_new => true) do #creates save point
# perform task
# if error occurs rollbacks only till the save point.
end
end
Check rails doc for more details.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Executing MySQL SELECT * query in parallel - mysql

Related

mysql connection lost when using two sqlalchemy connectors in the same thread

Thread safety of sql.Result.LastInsertId()

Sending a query with a single transaction

Should I commit after a single select

How can I execute a Database Operation outside of a transaction in Rails / ActiveRecord

Categories

Resources