I have a large query and I want to process the result row-by-row using the Go MySQL driver. The query is very simple but it returns a huge number of rows.
I have seen mysql_use_result() vs mysqli_store_result() at the C-API level. Is there an equivalent way to do an unbuffered query over a TCP connection, such as is used by the Go MySQL driver?
This concept of buffered/unbuffered queries in database client libraries is a bit misleading, because actually, buffering may occur on multiple levels. In general (i.e. not Go specific, and not MySQL specific), you have different kinds of buffers.
TCP socket buffers. The kernel associates a communication buffer to each socket. By default, the size of this buffer is dynamic and controlled by kernel parameters. Some clients can change those defaults to get more control and optimize. Purpose if this buffer is to regulate the traffic in the device queues and eventually, decrease the number of packets on the network.
Communication buffers. Database oriented protocols are generally based on a framing protocol, meaning that frames are defined to separate the logical packets in the TCP stream. Socket buffers do not guarantee that a complete logical packet (a frame) is available for reading. Extra communication buffers are therefore required to make sure the frames are complete when they are processed. They can also help to reduce the number of system calls. These buffers are managed by the low-level communication mechanism of the database client library.
Rows buffers. Some database clients choose to keep all the rows read from the server in memory, and let the application code browse the corresponding data structures. For instance, the PostgreSQL C client (libpq) does it. The MySQL C client leaves the choice to the developer (by calling mysql_use_result or mysql_store_result).
Anyway, the Go driver you mention is not based on the MySQL C client (it is a pure Go driver). It uses only the two first kinds of buffers (sockets, and communication buffers). Row level buffering is not provided.
There is one communication buffer per MySQL connection. Its size is a multiple of 4 KB. It will grow dynamically if the frames are large. In the MySQL protocol, each row is sent as a separate packet (in a frame), so the size of the communication buffer is directly linked to the largest rows received/sent by the client.
The consequence is you can run a query returning a huge number of rows without saturating the memory, and still having good socket performance. With this driver, buffering is never a problem for the developer, whatever the query.
Related
I understand that we need to use cursors when the number of records in database is massive and can fill up memory on app and database server.
However if someone does not use cursors and selects all the records in a massive table:
How does database server protect itself from exhausting the memory ? eg: is per query memory capped ?
How does application server protect itself from exhausting memory and preventing OOO exceptions ?
The only interface between the server and the driver for retrieving result sets is cursor-based. The driver always uses cursors.
If an application "does not use cursors", what that typically means is the application iterates cursors to completion and loads the entire result set into memory. This doesn't affect memory usage on the server - it only increases memory usage in the application processes.
How does database server protect itself from exhausting the memory ? eg: is per query memory capped ?
There are various memory limits on various operations, such as sorting, to ensure memory usage is not unbounded.
How does application server protect itself from exhausting memory and preventing OOO exceptions ?
Typically by not loading entire result sets into memory, though given the wider variety of tasks in applications it is more likely for an average application to use a lot of memory than it would be for an average database.
I am developing a web application using web-sockets which needs real time data.
The number of clients using the web application will be over 100 000.
Server side web socket coding is done in Java. Can a single web-socket server handle this amount of connections?
If not, how can I achieve this. I have to use web sockets only.
WebSocket servers, like any other TCP-based server, can open huge numbers of connections. They can be file-descriptor-based. You can find out the max (system-wide) FDs easily enough on Linux:
% cat /proc/sys/fs/file-max
165038
There are system-wide and there are kernel parameters for user limits (and shell-level things like "ulimit"). Btw, you'll need to edit /etc/sysctl.conf to increase your FD mods during a reboot.
And of course you can increase this number to whatever you want (with the proportional impact on kernel memory).
Or servers can do tricks to multiplex a single connection.
But the real question is, what is the profile of the data that will flow over the connection? Will you have 100K users getting 1 64-byte message per day? Or are those 100K users getting 50 1K messages a second? Can the WebSocket server shard its connections over multiple NICs (ie, spread the I/O load)? Are the messages all encrypted and therefore need a lot of CPU? How easily can you cluster your WebSocket server so failover is easy for you and painless for your users? Is your server mission/business critical?... that is, can you afford to have 100K users disappear if a disaster occurs? There are many questions to consider when you thinking about scalability of a WebSocket server.
In our labs, we can create millions of connections on a server (and many more in a cluster). In the real-world, there are other 'scale' factors to consider in a production deployment besides file descriptors. Hope this helps.
Full disclosure: I work for Kaazing, a WS vendor.
As FrankG explained above, the number of WebSocket connections is depended on the use case.
Here are two benchmarks using MigratoryData WebSocket Server for two very different use cases that also detail system configuration (let's note however that system configuration is only a detail and the high scalability is achieved by the architecture of the MigratoryData which has been designed for real-time websites with millions of users).
In one use case MigratoryData scaled up to 10 million concurrent connections (while delivering ~1 Gbps messaging):
https://mrotaru.wordpress.com/2016/01/20/migratorydata-makes-its-c10m-scalability-record-more-robust-with-zing-jvm-achieve-near-1-gbps-messaging-to-10-million-concurrent-users-with-only-15-milliseconds-consistent-latency/
In another use case MigratoryData scaled up to 192,000 (while delivering ~9 Gbps):
https://mrotaru.wordpress.com/2013/03/27/migratorydata-demonstrates-record-breaking-8x-higher-websocket-scalability-than-competition/
These numbers are achieved on a single instance of MigratoryData WebSocket Server. MigratoryData can be clustered so you can also scale horizontally to any number of subscribers in an effective way.
Full disclosure: I work for MigratoryData.
I made a program that receives user input and stores it on a MySQL database. I want to implement this program on several computers so users can upload information to the same database simoultaneously. The database is very simple, it has just seven columns and the user will only enter four of them.
There would be around two-three hundred computers uploading information (not always at the same time but it can happen). How reliable is this? Is that even possible?
It's my first script ever so I appreciate if you could point me in the right direction. Thanks in advance.
Having simultaneous connections from the same script depends on how you're processing the requests. The typical choices are by forking a new Python process (usually handled by a webserver), or by handling all the requests with a single process.
If you're forking processes (new process each request):
A single MySQL connection should be perfectly fine (since the total number of active connections will be equal to the number of requests you're handling).
You typically shouldn't worry about multiple connections since a single MySQL connection (and the server), can handle loads much higher than that (completely dependent upon the hardware of course). In which case, as #GeorgeDaniel said, it's more important that you focus on controlling how many active processes you have and making sure they don't strain your computer.
If you're running a single process:
Yet again, a single MySQL connection should be fast enough for all of those requests. If you want, you can look into grouping the inserts together, as well as multiple connections.
MySQL is fast and should be able to easily handle 200+ simultaneous connections that are writing/reading, regardless of how many active connections you have open. And yet again, the performance you get from MySQL is completely dependent upon your hardware.
Yes, it is possible to have up to that many number of mySQL connectins. It depends on a few variables. The maximum number of connections MySQL can support depends on the quality of the thread library on a given platform, the amount of RAM available, how much RAM is used for each connection, the workload from each connection, and the desired response time.
The number of connections permitted is controlled by the max_connections system variable. The default value is 151 to improve performance when MySQL is used with the Apache Web server.
The important part is to properly handle the connections and closing them appropriately. You do not want redundant connections occurring, as it can cause slow-down issues in the long run. Make sure when coding that you properly close connections.
Overview of the application:
I have a Delphi application that allows a user to define a number of queries, and run them concurrently over multiple MySQL databases. There is a limit on the number of threads that can be run at once (which the user can set). The user selects the queries to run, and the systems to run the queries on. Each thread runs the specified query on the specified system using a TADOQuery component.
Description of the problem:
When the queries retrieve a low number of records, the application works fine, even when lots of threads (up to about 100) are submitted. The application can also handle larger numbers of records(150,000+) as long as only a few threads (up to about 8) are running at once. However, when the user is running more than around 10 queries at once (i.e. 10+ threads), and each thread is retrieving around 150,000+ records, we start getting errors. Here are the specific error messages that we have encountered so far:
a: Not enough storage is available to complete this operation
b: OLE error 80040E05
c: Unspecified error
d: Thread creation error: Not enough storage is available to process this command
e: Object was open
f: ODBC Driver does not support the requested properties
Evidently, the errors are due to a combination of factors: number of threads, amount of data retrieved per thread, and possibly the MySQL server configuration.
The main question really is why are the errors occurring? I appreciate that it appears to be in some way related to resources, but given the different errors that are being returned, I'd like to get my head around exactly why the errors are cropping up. Is it down to resources on the PC, or something to do with the configuration of the server, for example.
The follow up question is what can we do to avoid getting the problems? We're currently throttling down the application by lowering the number of threads that can be run concurrently. We can't force the user to retrieve less records as the queries are totally user defined and if they want to retrieve 200,000 records, then that's up to them, so there's not much that we can do about that side of things. Realistically, we don't want to throttle down the speed of the application because most users will be retrieving small amounts of data, and we don't want to make the application to slow for them to use, and although the number of threads can be changed by the user, we'd rather get to the root of the problem and try to fix it without having to rely on tweaking the configuration all the time.
It looks you're loading a lot of data client-side. They may require to be cached in the client memory (especially if you use bidirectional cursors), and in a 32 bit application that could not be enough, depending on the average row size and how efficient is the library to store rows.
Usually the best way to accomplish database work is to perform that on the server directly, without retrieving data to the client. Usually databases have an efficient cache system and can write data out to disk when they don't fit in memory.
Why do you retrieve 150000 rows at once? You could use a mechanism to transfer data only when the user actually access them (sort of paging through data), to avoid large chunks of "wasted" memory.
This makes perfect sense (the fact you're having problems, not the specific errors). Think it through - you have the equivalent of 10 database connections (1 per thread) each receiving 150,000 rows of data (1,500,000 rows total) across a single network connection. Even if you're not using client-side cursors and the rows are small (just a few small columns), this is a HUGE flow of data across a single network interface, and a big hit on memory on the client computer.
I'd suspect the error messages are incorrect, in the same way that sometimes you have an access violation caused by a memory overwrite in another code location.
Depending on your DBMS, to help with the problem you could use the LIMIT/TOP sql clauses to limit the amountof data returned.
Things I would do:
write a very simple test application which only uses the necessary parts of the connection / query creation (with threads), this would eliminate all side effects caused by other parts of your software
use a different database access layer instead of ODBC, to find out if the ODBC driver is the root cause of the problem
it looks like the memory usage is no problem when the number of threads is low - to verify this, I would also measure / calculate the memory requirement of the records, compare it with the memory usage of the application in the operating system. For example if tests show that four threads can safely query 1.5 GB of total data without problems, but ten threads fail with under 0.5 GB of total data, I would say it is a threading problem
I have written a program which uses a MySQL database, and transaction between the database server (a very powerful one) and the client is happening over a ADSL connection (1 Mbit/s).
But I have a very very slow connection between each client and the server. Only approximately 3-4 KB/s data is send through the server. Neither the server nor the clients use the Internet for other purposes, just my program uses the Internet.
I can't figure out why? Is the reason MySQL server packet size?
Any suggestions?
Try using mytop to identify the server low performance cause.
Another one: you may be using SELECT COUNT(*) FROM .. for large InnoDB tables which causes a table scan.
And can you test for some other services whether the exchange data rate between the machines is OK? Even if the even if the output bandwidth is lower for ADSL users 3-4 kB might not be the reason of low performance.
The effective transfer rate is often heavily limited by the number of roundtrips between client and server. Without seeing your code it is sort of difficult to tell, but you should check the number of requests happening.
If you have a single request that results in many records being returned, you should see a better usage of bandwidth than with a higher number of requests which only deliver a few rows each.
In the latter case the actual result transfer is probably quite fast, but the latencies involved in the "control communications" (i. e. the statements themselves, login requests etc.) will add up, effectively lowering overall throughput.
As for the packet size: When it is very small, there is more overhead in the communications, increasing the aforementioned effect. The server's default max_allowed_packet size if 1MB if memory serves, but that should be fine with your connection.
You first have to debug both connections.
What is your upload speed if you upload a file with WinSCP ot equivalent to the MySQL server? It should be near 90 KB/s with ADSL 1 Mbit/s.