We're using SQL Server 2014 Enterprise and for some tables we're using a clustered columnstore index.
Sometimes when running queries, we're seeing unknown wait types like HTREBUILD, HTREPARTITION, HTMEMO. Searching with Google didn't give any meaningful results.
Does anyone know what these are, and if so, can you please give us some background?
HTREBUILD, HTREPARTITION, HTMEMO, HTDELETE
This wait type (and the other HT* waits) is when a thread is waiting for access to the shared hash table used during batch-mode processing.
SQL Server 2012 used to use a hash table per thread and SQL Server 2014 now uses a shared hash table.
Basically; SQL Server 2014 now uses one shared hash table instead of per-thread copy.
This change was made to reduce the amount of memory required for the hash table, but comes at the expense of these waits when synchronizing access to the hash table. Typically these waits occur when queries involve columnstore indexes, but they can also occur without columnstore indexes being involved if a hash operator runs in batch mode.
Using one shared hash table instead of per-thread copy :
Provides the benefit of significantly lowering the amount of memory
required to persist the hash table but, as you can imagine, the multiple
threads depending on that single copy of the hash table must synchronize with
each other before, for example, deallocating the hash table. To do so, those
threads wait on the HTDELETE (Hash Table DELETE) wait type.
Related
There a few large tables in one of the databases of a customer (each table is ~50M rows in size and is not too wide). The intent is to infrequently read these tables (completely). As there are no reasonable CDC indices present, the plan is to read the tables by querying them
SELECT * from large_table;
The reads will be performed using a jdbc driver. With the following fetch configuration present, the intent is to read the data approximately one record at a time (it may require a significant amount of time) so that the client code is never overwhelmed.
PreparedStatement stmt = connection.prepareStatement(queryString, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
I was going through the execution path of a query in High Performance MySQL, however some questions seemed unanswered:
Without the temp tables being explicitly created and the query cache being made use of, "how" are the stream reads tracked on the server?
Is any temporary data created (in main memory or files on disk) whatsoever? If so, where is it created and how much?
If temporary data is not created, how are the rows to be returned tracked? Does the query engine keep track of all the page files to be read for this query on this connection? In case there are several such queries running on the server, are the earliest "Tracked" files purged in favor of queries submitted recently?
PS: I want to understand the effect of this approach on the MySql server (not saying that there aren't better ways of reading the tables)
That simple query will not use a temp table. It will simply fetch the rows and transfer them to the client until it finishes. Nor would any possible index be useful. (If the real query is more complex, let's see it.)
The client may wait for all the rows (faster, but memory intensive) before it hands any to the user code, or it may hand them off one at a time (much slower).
I don't know the details in JDBC on specifying it.
You may want to page through the table. If so, don't use OFFSET, but use the PRIMARY KEY and "remember where you left off". More discussion: http://mysql.rjweb.org/doc.php/pagination
Your Question #3 leads to a complex answer...
Every query brings all the relevant data (and index entries) into RAM. The data/index is read in chunks ("blocks") of 16KB from the BTree structure that is persisted on disk. For a simple select like that, it will read the blocks 'sequentially' until finished.
But, be aware of "caching":
If a block is already in RAM, no I/O is needed.
If a block is not in the cache ("buffer_pool"), it will, if necessary, bump some block out and read the desired block in. This is very normal, and very common. Do not fear it.
Because of the simplicity of the query, only a few blocks ever need to be in RAM at any moment. Hence, if your buffer pool were only a few megabytes, it could still handle, say, a 1TB table. There would be a lot of I/O, and that would impact other operations.
As for "tracking", let me use the analogy of reading a long book in a single sitting. There is nothing to track, you are simply turning pages ('blocks'). You don't even need a 'bookmark' for tracking, it is next-next-next...
Another note: InnoDB uses "B+Tree", which includes a link from one block to the "next", thereby making the page turning efficient.
Another interpretation of tracking... "Transactions" and "ACID". When any query (read or write) touches a table, there is some form of lock applied to each row touched. For SELECT the lock is rather light-weight. For writes it can cause delays or even a "deadlock". The locks are unavoidable, but sometimes actions can be taken to minimize their impact.
Logically (but not actually), a "snapshot" of all rows in all tables is taken at the instant you start a transaction. This allows you to see a consistent view of everything, even if other connections are changing rows. The underlying mechanism is very lightweight on reading, but heavier for writes. Writes will make a copy of the row so that each connection sees the snapshot that it 'should' see. Also, the copy allows for ROLLBACK and recovery from a crash (eg power failure).
(Transaction "isolation" mode allows some control over the snapshot.) To get the optimal performance for your case, do nothing special.
Here's a way to conceptualize the handling of transactions: Each row has a timestamp associated with it. Each query saves the start time of the query. The query can "see" only rows that are older than that start time. A subsequent write in another connection will be creating copies of rows with a later timestamp, hence not visible to the SELECT. Hence, the onus is on writes to do extra work; reads are cheap.
I'm currently working on a java application which performs following in a background thread.
opens database connection
Select some rows (100000+ rows)
perform a long running task for each row by calling ResultSet.next() with some buffer size defined by resultSet.setFetchSize()
finally after everything's done closes the connection
If the query does some sorting or joining it will create a temp table and will have some additional memory usage. My question is if my database connection is being opened for long time (let's say few hours) and fetch batch by batch slowly, will it cause performance trouble's in database due to memory usage? (If the database is concurrently used by other threads also.) Or databases are designed to handle these things effectively?
(In the context of both MySQL and Oracle)
From an Oracle perspective, opening a cursor and fetching from it periodically doesn't have that much of an impact if it's left open... unless the underlying data that the cursor is querying against changes since the query was first started.
If so, the Oracle database now has to do additional work to find the data as it was at the start of the query (since read-consistency!), so now it needs to query the data blocks (either on disk or from the buffer cache) and, in the event the data has changed, the undo tablespace.
If the undo tablespace is not sized appropriately and enough data has changed, you may find that your cursor fetches fail with an "ORA-01555: snapshot too old" exception.
In terms of memory usage, a cursor doesn't open a result set and store it somewhere for you; it's simply a set of instructions to the database on how to get the next row that gets executed when you do a fetch. What gets stored in memory is that set of instructions, which is relatively small when compared to the amount of data it can return!
this mechanism seems not good.
although both mysql(innodb engine) and oracle provides consistent read for select,
do such a long select may leads to performance downgrade due to build cr block and other work,
even ora-01555 in oracle.
i think you should query/export all data first,
then process the actual business one by one.
at last, query all data first will not reduce the memory usage,
but reduce the continus time for memory and temp sort segment/file usage.
or you shoud consider separete the whole work to small pieces,
this is better.
I have used SSIS Balance Data Distributor just to fill 50,000 Records from a OLEDB Source to OLEDB Destination,
When i don't Use SSIS BDD it takes 2 minutes 40 Secs when i use BDD it takes 1 minute 55 Secs which does not make that much difference.
I also find that the data does not load into destination Parallelly it is loading in first destination and later on it fills the next one. (One at a time) Can any of you please help how to fill them parallelly?
Balanced Data Distribution is not a silver bullet for performance and runtime. It is good when:
You have CPU-intensive transformations, and can benefit from parallel execution
Your destination supports concurrent insert
The first case is clear and it depends on your dataflow. As for the concurrent insert on OLE DB destination; the best results are on heap tables, or tables without a primary key/clustered index and other indexes as well. Or the clustered key has to defined on autoincremented surrogate key. On OLE DB Destination you might need to disable table lock; otherwise it could prevent insert from being parallel. But check for yourself, as written in Mark's answer - sometimes parallel insert works with table lock, but on a heap table or columnstore.
Other table types (with indexes, cluster or not) might escalate locks to table level or require index rebuilding, effectively disabling parallel insert. Delete or disable it.
So, you have to evaluate yourself weather the parallel execution justifies additional efforts in development and support.
When you are using BDD to insert into the same table, you will only get parallel inserts if the table is a heap (no clustered index) and does not have a unique constraint.
If tablock is turned ON for all the destinations, sql will take a special lock (BU), which will allow parallel inserts to the same heap.
If there are no other indexes on the table and the database is NOT in full recovery model, you will gain the additional benefit of minimal logging.
As you noted in your question, using BDD saved you about 45s - it’s actually working. You’ll probably see different performance when running on a server that has more cores and memory, so be sure to test there as well. The measure of success will be total duration, rather than what visual studio displays in its debugger.
Planning for server execution, it would also be helpful to increase these two properties on the data flow:
-DefaultBufferMaxRows (try adding a 0 to bring it to 100,000)
-DefaultBufferSize - add a zero to max out memory
If you’re on sql 2016, you can instead set AutoAdjustBufferSize to true, which will ignore the above properties and optimize the buffer to the best performance size.
These adjustments will increase the commit size of the inserts which will result in faster writes to some degree.
The bottom line is keep tablock on, test on the server. BDD is working.
I am attempting to make a query run on a large database in acceptable time. I'm looking at optimizing the query itself (e.g. Clarification of join order for creation of temporary tables), which took me from not being able to complete the query at all (with a 20 hr cap) to completing it but with time that's still not acceptable.
In experimenting, I found the following strange behavior that I'd like to understand: I want to do the query over a time range of 2 years. If I try to run it like that directly, then it still will not complete within the 10 min I'm allowing for the test. If I reduce it to the first 6 months of the range, it will complete pretty quickly. If I then incrementally re-run the query by adding a couple of months to the range (i.e. run it for 8 months, then 10 months, up to the full 2 yrs), each successive attempt will complete and I can bootstrap my way up to being able to get the full two years that I want.
I suspected that this might be possible due to caching of results by the MySQL server, but that does not seem to match the documentation:
If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
http://dev.mysql.com/doc/refman/5.7/en/query-cache.html
The key word there seems to be "identical," and the apparent requirement that the queries be identical was reenforced by other reading that I did. (The docs even indicate that the comparison on the query is literal to the point that logically equivalent queries written with "SELECT" vs. "select" would not match.) In my case, each subsequent query contains the full range of the previous query, but no two of them are identical.
Additionally, the tables are updated overnight. So at the end of the day yesterday we had the full, 2-yr query running in 19 sec when, presumably, it was cached since we had by that point obtained the full result at least once. Today we cannot make the query run anymore, which would seem to be consistent with the cache having been invalidated when the table was updated last night.
So the questions: Is there some special case that allows the server to cache in this case? If yes, where is that documented? If not, any suggestion on what else would lead to this behavior?
Yes, there is a cache that optimizes (general) access to the harddrive. It is actually a very important part of every storage based database system, because reading data from (or writing e.g. temporary data to) the harddrive is usually the most relevant bottleneck for most queries.
For InnoDB, this is called the InnoDB Buffer Pool:
InnoDB maintains a storage area called the buffer pool for caching data and indexes in memory. Knowing how the InnoDB buffer pool works, and taking advantage of it to keep frequently accessed data in memory, is an important aspect of MySQL tuning. For information about how the InnoDB buffer pool works, see InnoDB Buffer Pool LRU Algorithm.
You can configure the various aspects of the InnoDB buffer pool to improve performance.
Ideally, you set the size of the buffer pool to as large a value as practical, leaving enough memory for other processes on the server to run without excessive paging. The larger the buffer pool, the more InnoDB acts like an in-memory database, reading data from disk once and then accessing the data from memory during subsequent reads. See Section 15.6.3.2, “Configuring InnoDB Buffer Pool Size”.
There can be (and have been) written books about the buffer pool, how it works and how to optimize it, so I will stop there and just leave you with this keyword and refer you to the documentation.
Basically, your subsequent reads add data to the cache that can be reused until it has been replaced by other data (which in your case has happened the next day). Since (for MySQL) this can be any read of the involved tables and doesn't have to be your maybe complicated query, it might make the "prefetching" easier for you.
Although the following comes with a disclaimer because it obviously can have a negative impact on your server if you change your configuration: the default MySQL configuration is very (very) conservative, and e.g. the innodb_buffer_pool_size system setting is way too low for most servers younger than 15 years, so maybe have a look at your configuration (or let your system administrator check it).
We did some experimentation, including checking the effect from the system noted in the answer by #Solarflare. In our case, we concluded that the apparent caching was real, but it had nothing to do with MySQL at all. It was instead caused by the Linux disk cache. We were able to verify this in our case by manually flushing that cache after and before getting a result and comparing times.
I have a write intensive application running on EC2. Any thoughts on how to optimize it to be able to make several thousands concurrent writes on the MySQL DB?
Write scaling is a hard problem. Perhaps, secret to write scaling is in read scaling. That is, cache reads as much as possible, so that the writes get all the throughput.
Having said that, there are a bunch of things one can do:
1) Start with the data model. Design a data model so that you do not ever delete or update a table. Only operation is an insert. Use Effective Date, Effective Sequence and Effective Status to implement Insert, Update and Delete operations using just the Insert Command. This concept is called Append Only model. Checkout RethinkDB..
2) Set the Concurrent Insert flag to 1. This makes sure that the tables keep inserting while reads are in progress.
3) When you have only Inserts at the tail, you may not need row-level locks. So, use MyISAM (this is not to take anything away from InnoDB, which I will come to later).
4) If all this does not do much, create a replica table in Memory Engine. If you have a table called MY_DATA, create a table called MY_DATA_MEM in memory table.
5) Redirect all Inserts to the MEM table. Create a View that UNIONS both tables and use that view as your Read Source.
6) Write a daemon that periodically moves MEM contents to the Main table and deletes from the Mem table. It may be ideal to implement the MOVE operation as a Delete trigger on the Mem table (I am hoping triggers are possible on Memory Engine, not entirely sure).
7) Do not do any deletes or Updates on the MEM table (they are slow) also pay attention to the cardinality of the keys in your table (HASH vs B-Tree : Low Card -> Hash, High Card-> B-Tree)
8) Even if all the above does not work, ditch jdbc/odbc. Move to InnoDB and use Handler Socket interface to do the direct inserts (Google for Yoshinori-San MySQL)
I have not used the HS myself, but the benchmarks are impressive. There is a even Java HS Project on Google Code.
Hope that helps..