I'm using a multi-tenant architecture folowing the article Dynamic DataSource Routing, but creating new tenants (datasources) dynamically (on user registration).
Everything is running ok, but I'm worried with scalabillity. The app is read heavy and today we have 10 tenants but we will open the app to public and this number will increase a lot.
Each user datasource is created using the following code:
BasicDataSource ds = new org.apache.commons.dbcp.BasicDataSource();
ds.setDriverClassName(Driver.class.getName());
ds.setUsername(dsUser);
ds.setPassword(dsPassword);
ds.setPoolPreparedStatements(true);
ds.setMaxActive(5);
ds.setMaxIdle(2);
ds.setValidationQuery("SELECT 1+1");
ds.setTestOnBorrow(true);
It means it is creating at least 2 and a maximum of 5 connections per user.
How much connections and schemas does this architecture support by MySQL server (4 CPUs 2.3Mhz/8GB Ram/80GB SSD) and how can I improve it by changing datasource parameters or mysql configuration?
I know this answer depends of a lot of additional information, just ask in the comments.
In most cases you will not have more than 300 connections/second. That is if you add good caching mechanisms like memcached. if you are having more than 1000 connections/sec you should consider persistent connections and connection pools.
Related
I have the question exactly as this one:
Nodejs Cluster with MySQL connections
I want to know what is the recommended approach to create database connections for nodejs clusters - either create one database connection to be shared across all workers, or to create one connection per worker.
The above thread doesn't answer this, rather it only answers what is 'working without issues'.
What I would like to know, is whether there are any scaling issues or concurrent-request issues that I would run into if I were to use any one of the 2 approaches when using MySQL database?
What is the approach that people are using at scale? Am I better off creating one connection per worker?
I'm posed with a simple problem of scaling. As college students, we are new to setting up AWS services, droplets, scaling etc. We are stuck at deciding the architecture of our app. I'm unable to decide whether to use a big computing AMAZON EC2 or smaller multiple instances for benchmarking performance.
Even after code optimization, our MySQL queries are not as fast as we want it to be and clearly our hardware will address this problem. We are looking for high performance servers which require mostly searching a lot of MySQL FULL INDEXED search queries over 1.3M records (which clearly is a CPU and Memory intensive task). We intend to switch over to Solr at a later point of time. Both these tasks are very CPU demanding and RAM dependent. As of now, we are running our web app stack entirely on a single CPU with 2 cores and 4 GB RAM. However, we now wish to split the load up into multiple, say 5 instances/droplets of each 2 cores and 4 GB RAM.
Our primary concern is that, If we did create multiple ec2 instances/ droplets, wouldn't there be a considerable overhead for communicating between the instances/droplets for a standard MySQL search. As far as I know, the MySQL connection uses sockets to connect to local/remote host. Being a remote communication between 4 servers, I would expect significant overhead for EACH query.
For instance, let's say I've setup 4 instances and I've been allocated these IP's for each of them.
Server 1- x.x.x.1
Server 2- x.x.x.2
Server 3 - x.x.x.3
Server 4 - x.x.x.4
I setup a MySQL server and dump my databases into each of these instances (sounds like a very bad idea). Now I make a MySQL connection using python as:
conn1 = MySQLdb.connect(host=server1,user=user,passwd=password,db=db)
conn2 = MySQLdb.connect(host=server2,user=user,passwd=password,db=db)
conn3 = MySQLdb.connect(host=server3,user=user,passwd=password,db=db)`
conn4 = MySQLdb.connect(host=server4,user=user,passwd=password,db=db)
Since, each of these databases arn't on the localhost, I would guess that there is a huge overhead involved in contacting the server and getting the data for each query.
My Thoughts:
I'm guessing there must be a solution for integrating different droplets/ instances together. Unfortunately, I haven't found any resources to support that claim.
I've looked into Amazon RDS, which seems like a good fit. But again, I wouldn't be able to benchmark against a 4 instances MySQL search or a single huge AWS RDS server (given, it is quite expensive for new apps.)
We are also unsure of replacement of python with popular languages for scaling such as Scala which will help me tackle this problem of dealing with multiple servers.
Any suggestions will be greatly appreciated by our 3 member team :)
I'm using a MySQL instance on Azure under the free trial period. One thing I've noticed is that max_user_connections is set to 4 under this option and 40 under the highest priced tier.
Both of these seem really low, unless I'm misunderstanding something. Let's say I have 41 users making database requests simultaneously, wouldn't this cause a failure due to going over the max allowable connections? That doesn't seem like much room.
How can I use Azure to allow a realistic number of simultaneous connections? Or am I thinking about this incorrectly? Should I just dump MySQL for SQL Azure?
Thanks.
If you are using the .NET framework, connection pooling is managed by the data provider. Instead of opening a connection and leaving it open for an entire session, with .NET each database operation/transaction typically opens the connection, performs a single task and then closes the connection after the operation completes. The .NET MySQL data provider also supports advanced connection pooling, see http://www.devart.com/dotconnect/mysql/docs/ComparingProviders.html
I would assume the Azure limitation is referring to applications that employ the first (session duration) alternative.
I have several Rails apps running on a single MySQL server. All of them run the same app, and all of the databases have the same schema, but each database belongs to a different customer.
Conceptually, here's what I want to do:
Customer.all.each do |customer|
connection.execute("use #{customer.database}")
customer.do_some_complex_stuff_with_multiple_models
end
This approach does not work because, when this is run in a web request, the underlying model classes cache different database connections from the A/R connection pool. So the connection on which I execute the "use" statement, may not be the connection the model uses, in which case it queries the wrong database.
I read through the Rails A/R code (version 3.0.3), and came up with this code to execute in the loop, instead of the "use" statement:
ActiveRecord::Base.clear_active_connections!
ActiveRecord::Base.establish_connection(each_customer_database_config)
I believe that the connection pool is per-thread, so it seems like this would clobber the connection pool and re-establish it only for the one thread the web request is on. But if the connections are shared in some way I'm not seeing, I would not want that code to wreak havoc with other active web requests in the same app.
Is this safe to do in a running web app? Is there any other way to do this?
IMO switching to a new database connection for different requests is a very expensive operation. AR maintains a limited pool of connections.
I guess you should move to PostgreSQL, where you have concept of schemas.
In an ideal SQL world this is the structure of a database
database --> schemas --> tables
In MYSQL, database and schemas are the same thing. Postgres has separate schemas, which can hold tables for different customers. You can switch schema on the fly without changing the AR connection by setting
ActiveRecord::Base.connection.set_schema_search_path("CUSTOMER's SCHEMA")
Developing it require a bit of hacking though.
Switching database by connecting/disconnecting is really slow, and is not going to work due to AR connection pools an internal caches. Try using ActiveRecord::Base.table_name_prefix = "customer_" and keep the database constant.
Right now you have connections in ActiveRecord can be per class level. Its looks per thread basis because is in before 1.9 ruby threads sucked so implementations were using process instead of thread, but It may not be true for long.
But since AR uses one thread per Model. You can create different mock models for each database you have. So using answer given in this question.
Code will look something like this. (I have not tested it)
Customer.all.each do |customer|
c_class = Class.new(ActiveRecord::Base)
c_class.establish_connection(each_customer_database_config)
c_class.table_name = customor.table_name()
c_class.do_something_on_diff_models_using_cutomer_from_diff_conn(customer.id)
c_class.clear_active_connections!
end
Why not keep the same db and tables and just have each of your models belong_to a customer? Then you can find all the models for that customer with:
Customer.all.each do |customer|
customer.widgets
customer.wodgets
# etc
end
Hows does Drupal 6 interact with MySQL for connections and transactions? Does connection pooling get used? How are transactions handled? At what level are these things managed by Drupal vs being handed off to be handled by MySQL?
I did a good amount of searching on the web and within Stack Overflow, but mainly, I only found articles for tweaking Drupal performance and scaling needs.
From Acquia support team,
The number of connections would vary based on activity but you can boil it down as you mention here, one request per user request. There is no concept of connection pooling or persistent connections in Drupal.
Sometimes it helps to get a handle on
the Database Abstraction Layer (how
Drupal talks to the database) and the
bootstrap process (See
http://api.drupal.org/api/drupal/includes--bootstrap.inc/6)
for a more detailed walk of how it
works.