mysql_connect "bool $new_link = true" is very slow

mysql_connect "bool $new_link = true" is very slow - mysql

I'm using latest version of Xampp on 64bit Win7.
The problem is that, when I use mysql_connect with "bool $new_link" set to true like so:
mysql_connect('localhost', 'root', 'my_password', TRUE);
script execution time increases dramatically (about 0,5 seconds per connection, and when I have 4 diffirent objects using different connections, it takes ~2 seconds).
Is setting "bool $new_link" to true, generally a bad idea or could it just be some problem with my software configuration.
Thank you.
//Edit:
I'm using new link, because I have multiple objects, that use mysql connections (new objects can be created inside already existing objects and so on). In the end, when it comes to unsetting objects (I have mysql_close inside my __destruct() functions), I figured, that the only way to correctly clean up loose ends would be that all objects have their own connection variables.
I just formated my PC so configuration should be default conf.

Don't open a new connection unless you have a need for one (for instance, accessing multiple databases simultaneously).
Also you don't have to explicitly call mysql_close. I usually just include a function to quickly retrieve an existing db link (or a new one if none exists yet).
function &getDBConn() {
global $DBConn;
if(!$DBConn) $DBConn = mysql_connect(...);
return $DBConn;
}
// now you can just call $dbconn = getDBConn(); whenever you need it

Use "127.0.0.1" instead of "localhost". It improved my performance with mysql_connect from ~1 sek to a couple of milliseconds.
Article about php/mysql_connect and IPv6 on windows: http://www.bluetopazgames.com/uncategorized/php-mysql_connect-is-slow-1-second-for-localhost-windows-7/

Related

AWS Lambda - MySQL caching

I have Lambda that uses RDS. I wanted to improve it and use the Lambda connection caching. I have found several articles, and implemented it on my side, best to my knowledge. But now, I am not sure it is this the rigth way to go.
I have Lambda (running Node 8), which has several files used with require. I will start from the main function, until I reach the MySQL initializer, which is exact path. All will be super simple, showing only to flow of the code that runs MySQL:
Main Lambda:
const jobLoader = require('./Helpers/JobLoader');
exports.handler = async (event, context) => {
const emarsysPayload = event.Records[0];
let validationSchema;
const body = jobLoader.loadJob('JobName');
...
return;
...//
Job Code:
const MySQLQueryBuilder = require('../Helpers/MySqlQueryBuilder');
exports.runJob = async (params) => {
const data = await MySQLQueryBuilder.getBasicUserData(userId);
MySQLBuilder:
const mySqlConnector = require('../Storage/MySqlConnector');
class MySqlQueryBuilder {
async getBasicUserData (id) {
let query = `
SELECT * from sometable WHERE id= ${id}
`;
return mySqlConnector.runQuery(query);
}
}
And Finally the connector itself:
const mySqlConnector = require('promise-mysql');
const pool = mySqlConnector.createPool({
host: process.env.MY_SQL_HOST,
user: process.env.MY_SQL_USER,
password: process.env.MY_SQL_PASSWORD,
database: process.env.MY_SQL_DATABASE,
port: 3306
});
exports.runQuery = async query => {
const con = await pool.getConnection();
const result = con.query(query);
con.release();
return result;
};
I know that measuring performance will show the actual results, but today is Friday, and I will not be able to run this on Lambda until the late next week... And really, it would be awesome start of the weekend knowing I am in right direction... or not.
Thank for the inputs.

First thing would be to understand how require works in NodeJS. I do recommend you go through this article if you're interested in knowing more about it.
Now, once you have required your connection, you have it for good and it won't be required again. This matches what you're looking for as you don't want to overwhelm your database by creating a new connection every time.
But, there is a problem...
Lambda Cold Starts
Whenever you invoke a Lambda function for the first time, it will spin up a container with your function inside it and keep it alive for approximately 5 mins. It's very likely (although not guaranteed) that you will hit the same container every time as long as you are making 1 request at a time. But what happens if you have 2 requests at the same time? Then another container will be spun up in parallel with the previous, already warmed up container. You have just created another connection on your database and now you have 2 containers. Now, guess what happens if you have 3 concurrent requests? Yes! One more container, which equals one more DB connection.
As long as there are new requests to your Lambda functions, by default, they will scale out to meet demand (you can configure it in the console to limit the execution to as many concurrent executions as you want - respecting your Account limits)
You cannot safely make sure you have a fixed amount of connections to your Database by simply requiring your code upon a Function's invocation. The good thing is that this is not your fault. This is just how Lambda functions behave.
...one other approach is
to cache the data you want in a real caching system, like ElasticCache, for example. You could then have one Lambda function be triggered by a CloudWatch Event that runs in a certain frequency of time. This function would then query your DB and store the results in your external cache. This way you make sure your DB connection is only opened by one Lambda at a time, because it will respect the CloudWatch Event, which turns out to run only once per trigger.
EDIT: after the OP sent a link in the comment sections, I have decided to add a few more info to clarify what the mentioned article wants to say
From the article:
"Simple. You ARE able to store variables outside the scope of our
handler function. This means that you are able to create your DB
connection pool outside of the handler function, which can then be
shared with each future invocation of that function. This allows for
pooling to occur."
And this is exactly what you're doing. And this works! But the problem is if you have N connections (Lambda Requests) at the same time. If you don't set any limits, by default, up to 1000 Lambda functions can be spun up concurrently. Now, if you then make another 1000 requests simultaneously in the next 5 minutes, it's very likely you won't be opening any new connections, because they have already been opened on previous invocations and the containers are still alive.

Adding to the answer above by Thales Minussi but for a Python Lambda. I am using PyMySQL and to create a connection pool I added the connection code above the handler in a Lambda that fetches data. Once I did this, I was not getting any new data that was added to the DB after an instance of the Lambda was executed. I found bugs reported here and here that are related to this issue.
The solution that worked for me was to add a conn.commit() after the SELECT query execution in the Lambda.
According to the PyMySQL documentation, conn.commit() is supposed to commit any changes, but a SELECT does not make changes to the DB. So I am not sure exactly why this works.

Parallel processing and querying SQL with dplyr or pool: MySQL server has gone away

There's a couple of earlier related questions, but none of which solve the issue for me:
https://dba.stackexchange.com/questions/160444/parallel-postgresql-queries-with-r
Parallel Database calls with RODBC
"foreach" loop : Using all cores in R (especially if we are sending sql queries inside foreach loop)
My use case is the following: I have a large database of data that needs to be plotted. Each plot takes a few seconds to create due to some necessary pre-processing of the data and the plotting itself (ggplot2). I need to do a large number of plots. My thinking is that I will connect to the database via dplyr without downloading all the data to memory. Then I have a function that fetches a subset of the data to be plotted. This approach works fine when using single-threading, but when I try to use parallel processing I run into SQL errors related to the connection MySQL server has gone away.
Now, I recently solved the same issue working in Python, in which case the solution was simply to kill the current connection inside the function, which forced the establishment of a new connection. I did this using connection.close() where connection is from Django's django.db.
My problem is that I cannot find an R equivalent of this approach. I thought I had found the solution when I found the pool package for R:
This package enables the creation of object pools for various types of
objects in R, to make it less computationally expensive to fetch one.
Currently the only supported pooled objects are DBI connections (see
the DBI package for more info), which can be used to query a database
either directly through DBI or through dplyr. However, the Pool class
is general enough to allow for pooling of any R objects, provided that
someone implements the backend appropriately (creating the object
factory class and all the required methods) -- a vignette with
instructions on how to do so will be coming soon.
My code is too large to post here, but essentially, it looks like this:
#libraries loaded as necessary
#connect to the db in some kind of way
#with dplyr
db = src_mysql(db_database, username = db_username, password = db_password)
#with RMySQL directly
db = dbConnect(RMySQL::MySQL(), dbname = db_database, username = db_username, password = db_password)
#with pool
db = pool::dbPool(RMySQL::MySQL(),
dbname = db_database,
username = db_username,
password = db_password,
minSize = 4)
#I tried all 3
#connect to a table
some_large_table = tbl(db, 'table')
#define the function
some_function = function(some_id) {
#fetch data from table
subtable = some_large_table %>% filter(id == some_id) %>% collect()
#do something with the data
something(subtable)
}
#parallel process
mclapply(vector_of_ids,
FUN = some_function,
mc.cores = num_of_threads)

The code you have above is not the equivalent of your Python code, and that is the key difference. What you did in Python is totally possible in R (see MWE below). However, the code you have above is not:
kill[ing] the current connection inside the function, which forced the establishment of a new connection.
What it is trying (and failing) to do is to make a database connection travel from the parent process to each child process opened by the call to mclapply. This is not possible. Database connections can never travel across process boundaries no matter what.
This is an example of the more general "rule" that the child process cannot affect the state of the parent process, period. For example, the child process also cannot write to memory locations. You can’t plot (to the parent process’s graphics device) from those child processes either.
In order to do the same thing you did in Python, you need to open a new connection inside of the function FUN (the second argument to mclapply) if you want it to be truly parallel. I.e. you have to make sure that the dbConnect call happens inside the child process.
This eliminates the point of pool (though it’s perfectly safe to use), since pool is useful when you reuse connections and generally want them to be easily accessible. For your parallel use case, since you can't cross process boundaries, this is useless: you will always need to open and close the connection for each new process, so you might as well skip pool entirely.
Here's the correct "translation" of your Python solution to R:
library(dplyr)
getById <- function(id) {
# create a connection and close it on exit
conn <- DBI::dbConnect(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
on.exit(DBI::dbDisconnect(conn))
# get a specific row based on ID
conn %>% tbl("City") %>% filter(ID == id) %>% collect()
}
parallel::mclapply(1:10, getById, mc.cores = 12)

Does last_insert_id return the correct auto_increment id in a multiprocessing environment?

Here's some simplified code in my web application:
sub insert {
my $pid = fork();
if ($pid > 0) {
return;
}
else {
&insert_to_mysql();
my $last_id = &get_last_inserted(); # call mysql last_inserted_id
exit(0);
}
}
for my $i (1..10) {
&insert();
}
Since insert is called in a multiprocessing environment, the order of get_last_inserted might be uncertain. Will it always return a correct last id corresponding to insert_to_mysql subroutine? I read some documents saying that as long as the processes don't share the same mysql connection, the returned id will be always the right one. However, these processes are spawned from the same session, so I'm not sure if they share the mysql connection or not. Thanks in advance.

these processes are spawned from the same session
Are you saying you're forking and using the same connection in more than one process? That doesn't work at all, never mind LAST_INSERT_ID(). You can't have two processes reading and writing from the same connection! The response for one could end up in the other, assuming the two clients didn't clobber each other's request.
Does last_insert_id return the correct auto_increment id in a multiprocessing environment?
According to MySQL's documentation for LAST_INSERT_ID(),
The ID that was generated is maintained in the server on a per-connection basis.
It would be useless otherwise. Since connections can't shared across processes, yes, it's safe.

I don't know about MySql and perl, but in PHP that's quite the same issue, since it depends on the environment and not on the language. In PHP, last_insert_id expects one parameter: current connection! As long as multiple instances do not share the same connection ressource, passing the connection resource to the current mysql session should do the trick.
That's what I've found googling around: http://www.xinotes.org/notes/note/179/

all pooled connections were in use and max pool size was reached

I am writing a .NET 4.0 console app that
Opens up a connection Uses a Data Reader to cursor through a list of keys
For each key read, calls a web service
Stores the result of a web service in the database
I then spawn multiple threads of this process in order to improve the maximum number of records that I can process per second.
When I up the process beyond about 30 or so threads, I get the following error:
System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
Is there an Server or client side option to tweak to allow me to obtain more connections fromn the connection pool?
I am calling a sql 2008 r2 DATABASE.
tHx

This sounds like a design issue. What's your total record count from the database? Iterating through the reader will be really fast. Even if you have hundreds of thousands of rows, going through that reader will be quick. Here's a different approach you could take:
Iterate through the reader and store the data in a list of objects. Then iterate through your list of objects at a number of your choice (e.g. two at a time, three at a time, etc) and spawn that number of threads to make calls to your web service in parallel.
This way you won't be opening multiple connections to the database, and you're dealing with what is likely the true bottleneck (the HTTP call to the web service) in parallel.
Here's an example:
List<SomeObject> yourObjects = new List<SomeObject>();
if (yourReader.HasRows) {
while (yourReader.Read()) {
SomeObject foo = new SomeObject();
foo.SomeProperty = myReader.GetInt32(0);
yourObjects.Add(foo);
}
}
for (int i = 0; i < yourObjects.Count; i = i + 2) {
//Kick off your web service calls in parallel. You will likely want to do something with the result.
Task[] tasks = new Task[2] {
Task.Factory.StartNew(() => yourService.MethodName(yourObjects[i].SomeProperty)),
Task.Factory.StartNew(() => yourService.MethodName(yourObjects[i+1].SomeProperty)),
};
Task.WaitAll(tasks);
}
//Now do your database INSERT.

Opening up a new connection for all your requests is incredibly inefficient. If you simply want to use the same connection to keep requesting things, that is more than possible. You can open a connection, and then run as many SqlCommand commands through that one connection. Simply keep the ONE connection around, and dispose of it after all your threading is done.

Please restart the IIS you will be able to connect

Muetexes in perl & MySQL

I am trying to insure that only one instance of a perl script can run at one time. The script performs some kind of db_operation depending on the parameters passed in. The script does not necessarily live in one place or on one machine, and possibly multiple OSs. Though the file system is automounted across the various machines.
My first aproach was to just create a .lock file, and do the following:
use warnings;
use strict;
use Fcntl qw(:DEFAULT :flock);
...
open(FILE,">>",$lockFilePath);
flock(FILE,LOCK_EX) or die("Could not lock ");
do_something();
flock(FILE,LOCK_UN) or die("Could not unlock ");
close(FILE);
but I keep getting the following errors:
Bareword "LOCK_EX" not allowed while "strict subs" in use
Bareword "LOCK_UN" not allowed while "strict subs" in use
So I am looking for another way to approach the problem. Locking the DB itself is also not practical since the db could be used by other scripts(which is acceptable), I am just trying to prevent this script from running. And locking a table for write is not practical, since my script is not aware of what table the operation is taking place, it just launches another perl script supplied as a parameter.
I am thinking of adding a table to the db, with just one value, and to use that as a muetex, but I don't know how practical/reliable that is(a lot of red flags go up in my head). I have a DBI connection to a db that this script useses.
Thanks

The Bareword error you are getting sounds like you've done something in that "..." to confuse Perl with regard to the imported Fcntl constants. There's nothing wrong with using those constants like that. You might try something like LOCK_UN() to see what error that gets you.
If you are using MySQL, you can use the GET_LOCK() and RELEASE_LOCK() mechanism. It works reasonably well for cases like this:
SELECT GET_LOCK("script_lock");
and then when you are finished:
SELECT RELEASE_LOCK("script_lock");
See http://dev.mysql.com/doc/refman/4.1/en/miscellaneous-functions.html for details.

You may want to avoid the file locking; from what I remember it's notoriously unreliable on non-local filesystems. Your better bet is to just use the existence of the file itself to the indicator that the script is already running (similar to a UNIX PID file) Granted, this won't be 100% reliable but should work reasonably reliably with very low overhead, provided the script isn't getting invoked incessantly.
If you need better reliability than that, using the database for the mutex is a good solution.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

mysql_connect "bool $new_link = true" is very slow - mysql

Use "127.0.0.1" instead of "localhost". It improved my performance with mysql_connect from ~1 sek to a couple of milliseconds. Article about php/mysql_connect and IPv6 on windows: http://www.bluetopazgames.com/uncategorized/php-mysql_connect-is-slow-1-second-for-localhost-windows-7/

Related

AWS Lambda - MySQL caching

Parallel processing and querying SQL with dplyr or pool: MySQL server has gone away

Does last_insert_id return the correct auto_increment id in a multiprocessing environment?

all pooled connections were in use and max pool size was reached

Muetexes in perl & MySQL

Categories

Resources