Does Google Drive SDK/API have a throttling/limiting policy (bandwidth limit)? - google-drive-api

I created a program that downloads an entire user's drive. To improve the performance, it's a .NET multi-threaded application and I increased the value of System.Net.ServicePointManager.DefaultConnectionLimit to increase the limit of simultaneous connections. I can confirm that if the application asks for 50 concurrent connections, they are correctly opened and used.
Currently, what I have experimented is that I can increase the number of the threads to improve the number of files processed per second. However, after a certain numbers of threads, there is no difference in terms of performance (throttling?).
I have profiled the bandwidth and it seems to have a limit around 1.5 Mo/s (maximum).
The application can download as many files as the bandwidth allows and after a certain threshold, the threads that download lose in speed.
Does Google limit the number of concurrent connections or the amount of bandwidth? In the documentation, I only saw that they impose a limit of API calls per day.
Thanks for your help.

Related

Bulk loading Google Drive Performance Optimization

We're building a system that migrates documents from a different data store into Drive. We'll be doing this for different clients on a regular basis. Therefore, we're interested in performance, because it impacts our customer's experience as well as our time to market in that we need to do testing, and waiting for files to load prolongs each testing cycle.
We have 3 areas of drive interaction
Create folders (there are many, potentially 30,000+)
Upload files (similar in magnitude to the number of folders)
Recursively delete a file structure
In both cases 1 and 2, we run into "User rate limit exceeded" errors with just 2 and 3 threads, respectively. We have an exponential backup policy as suggested that starts at 1 second, and retries 8 times. We're setting the quotaUser on all requests to a random uuid in an attempt to indicate to the server that we don't require user specific rate limiting - but this seems to have had not impact as compared to when we didn't set the quotaUser.
Number 3 currently uses batch queries. 1 and 2 currently use "normal" requests.
I'm looking for guidance on how best to improve the performance of this system.

Loadrunner vuser limit

So I've read elsewhere that LoadRunner is well known to support 2-4k users easily enough, but what that didn't tell me was what sort of environment LoadRunner needed to do that. Is there any sort of guidance available on what the environment needs to be for various loads?
For example, Would a single dual-core 2.4Ghz CPU, 4GB RAM support 1,000 concurrent vUsers easily? What about if we were testing something at a larger scale (say 10,000 users) where I assume we'd need a small server farm to generate ? What would be the effect of fewer machines but with more network cards?
There have been tests run with loadrunner well into the several hundred thousand user ranges. You can imagine the logistical effort on the infrastructure required to run such tests.
Your question on how many users can a server support is actually quite a complex question. Just like any other piece of engineered software, each virtual user takes a slice of resources to operate from the finite pool of CPU, DISK, Network and RAM. So, simply adding more network cards doesn't buy you anything if your limiting factor is CPU for your virtual users. Each virtual user type has a base weight and then your own development and deployment models alter that weight. I have observed a single load generator that could take 1000 winsock users easily with less than 50% of all used resources and then drop to 25 users for a web application which had significantly high network data flows, lots of state management variables and the need for some disk activity related to the loading of files as part of the business process. You also don't want to max load your virtual user hosts in order to limit the possibility of test bed influences on your test results.
If you have immature loadrunner users then you can virtually guarantee you will be running less than optimal virtual user code in terms of resource utilization which could result in as few as 10% of what you should expect to run on a given host to produce load because of choices made in virtual user type, development and deployment run time settings.
I know this is not likely the answer you wanted to hear, i.e, "for your hosts you can get 5732 of virtual user type xfoo," but there is no finite answer without holding the application as a constant and the skills of the user of the tool as a constant. Then you can move from protocol to protocol and from host to host and find out how many users you can get per box.
Ideally each virtual user needs around 4 mb of ram memory.. so you can calculate what number your existing machine can reach up to..

Maximum concurrent connections to MySQL

I want to set up a MySQL database for a social networking website for my college.
My app can have at most 10,000 users. What is the maximum number of concurrent MySQL connections possible?
As per the MySQL docs: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_user_connections
maximum range: 4,294,967,295 (e.g. 2**32 - 1)
You'd probably run out of memory, file handles, and network sockets, on your server long before you got anywhere close to that limit.
You might have 10,000 users total, but that's not the same as concurrent users. In this context, concurrent scripts being run.
For example, if your visitor visits index.php, and it makes a database query to get some user details, that request might live for 250ms. You can limit how long those MySQL connections live even further by opening and closing them only when you are querying, instead of leaving it open for the duration of the script.
While it is hard to make any type of formula to predict how many connections would be open at a time, I'd venture the following:
You probably won't have more than 500 active users at any given time with a user base of 10,000 users.
Of those 500 concurrent users, there will probably at most be 10-20 concurrent requests being made at a time.
That means, you are really only establishing about 10-20 concurrent requests.
As others mentioned, you have nothing to worry about in that department.
I can assure you that raw speed ultimately lies in the non-standard use of Indexes for blazing speed using large tables.

Database querying performance when using shared and dedicated hosting servers

I would like to know how many database requests per page view (that is, every page that an user browses will start multiple requests to retrieve data from the database) should be made in order to have an "optimum" performance when I am using shared or dedicated hosting servers whose hardware is the most "commonly" provided (for example that that offer HostMonster or Bluehost providers). For both cases, I would like to know that when
I use MySQL or another database system
The database size is 1, 10, 100, 1000 Megabyte
I don't or I do use cache optimization
Users browsing pages are 10, 100, 1000, 10000 per second
In few word, under what conditions (considering the above cases) the server will begin to slow down and the user experience will be affected in a negative way? I appreciate some statistics...
P.S.: At this time I am using Ruby on Rails 3, so it is "easy" to increase requests!
I've had Facebook apps hosted on a shared host that did about a million pages per month without too many issues. I generally did 5-8 queries per page request. The number of queries isn't usually the issue, it's how long each query takes. You can have a small data set that is poorly indexed and you'll start having issue. The hosting provider usually kills your query after a certain length of time.
If you are causing the CPU on the server to spike, for whatever reason, then they may start killing processes on you. That is usually the issue.

Database concurrent connections in regard to web (http) requests and scalability

One database connection is equal to one web request (in case, of course, your client reads the database on each request). By using a connection pool these connections are pre-created, but they are still used one-per-request.
Now to some numbers - if you google for "Tomcat concurrent connections" or "Apache concurrent connections", you'll see that they support without any problem 16000-20000 concurrent connections.
On the other hand, the MySQL administrator best practices say that the maximum number of concurrent database connections is 4096.
On a quick search, I could not find any information about PostgreSQL.
Q1: is there a software limit to concurrent connections in PostgreSQL, and is the one of MySQL indeed 4096
Q2. Do I miss something, or MySQL (or any db imposing a max concurrent connections limit) will appear as a bottleneck, provided the hardware and the OS allow a large number of concurrent connections?
Update: Q3 how exactly a higher connection count is negative to performance?
Q2: You can have far more users on your web site than connections to your database because each user doesn't hold a connection open. Users only require a connection every so often and then only for a short time. Your web app connection pool will generally have far fewer than the 4096 limit.
Think of a restaurant analogy. A restaurant may have 100 customers (users) but only 5 waiters (connections). It works because customers only require a waiter for a short time every so often.
The time when it goes wrong is when all 100 customers put their hand up and say 'check please', or when all 16,000 users hit the 'submit order' button at the same time.
Q1: you set a configuration paramter called max_connections. It can be set well above 4096, but you are definitely advised to keep it much lower than that for performance reasons.
Q2: you usually don't need that many connections, and things will be much faster if you limit the number of concurrent queries on your database. You can use something like pgbouncer in transaction mode to interleave many transactions over fewer connections.
The Wikipedia Study Case
30 000 HTTP requests/s during peak-time
3 Gbit/s of data traffic
3 data centers: Tampa, Amsterdam, Seoul
350 servers, ranging between 1x P4 to 2x Xeon Quad-
Core, 0.5 - 16 GB of memory
...managed by ~ 6 people
This is a little bit off-topic of your questions. But I think you could find this useful. you don't always kick the DB for each request. a correct caching strategy is almost always the best performance improvement you can apply to your web app. lot of static content could remain in cache until it explicitly change. this is how Wikipedia does it.
From the link you provided to "MySQL administrator best practices"
"Note: connections take memory and your OS might not be able to handle a lot of connections. MySQL binaries for Linux/x86 allow you to have up to 4096 concurrent connections, but self compiled binaries often have less of a limit."
So 4096 seems like the current maximum. Bear in mind that the limit is per server and you can have multiple slave servers that can be used to serve queries.
http://dev.mysql.com/doc/refman/5.0/en/replication-solutions-scaleout.html