I have a sproc that generates an 80,000-row temp table which is passed as a table-valued parameter to 32 different other sprocs (each sproc the TVP as an input parameter).
Should I be concerned that I am going to get a balloon of memory I can't manage?
What is a good way to monitor (PerfMon?) how the memory is being used/tracked?
Thanks.
1)
According to this question:
Performance of bcp/BULK INSERT vs. Table-Valued Parameters
TVP's will underperform using bulkcopy on datasets that large.
On the other hand... figure out the max datasize of your 80,000 rows and determine if you're ok with that size object floating around in RAM (Personally I wouldn't have a problem with it... we could store our entire DB in RAM three times over)
2)
Here is a good thread on ServerFault for monitoring SQL Server's memory usage:
https://serverfault.com/questions/115957/viewing-sqls-cache-ram-usage
Related
I'm currently working on a java application which performs following in a background thread.
opens database connection
Select some rows (100000+ rows)
perform a long running task for each row by calling ResultSet.next() with some buffer size defined by resultSet.setFetchSize()
finally after everything's done closes the connection
If the query does some sorting or joining it will create a temp table and will have some additional memory usage. My question is if my database connection is being opened for long time (let's say few hours) and fetch batch by batch slowly, will it cause performance trouble's in database due to memory usage? (If the database is concurrently used by other threads also.) Or databases are designed to handle these things effectively?
(In the context of both MySQL and Oracle)
From an Oracle perspective, opening a cursor and fetching from it periodically doesn't have that much of an impact if it's left open... unless the underlying data that the cursor is querying against changes since the query was first started.
If so, the Oracle database now has to do additional work to find the data as it was at the start of the query (since read-consistency!), so now it needs to query the data blocks (either on disk or from the buffer cache) and, in the event the data has changed, the undo tablespace.
If the undo tablespace is not sized appropriately and enough data has changed, you may find that your cursor fetches fail with an "ORA-01555: snapshot too old" exception.
In terms of memory usage, a cursor doesn't open a result set and store it somewhere for you; it's simply a set of instructions to the database on how to get the next row that gets executed when you do a fetch. What gets stored in memory is that set of instructions, which is relatively small when compared to the amount of data it can return!
this mechanism seems not good.
although both mysql(innodb engine) and oracle provides consistent read for select,
do such a long select may leads to performance downgrade due to build cr block and other work,
even ora-01555 in oracle.
i think you should query/export all data first,
then process the actual business one by one.
at last, query all data first will not reduce the memory usage,
but reduce the continus time for memory and temp sort segment/file usage.
or you shoud consider separete the whole work to small pieces,
this is better.
If I used MySQLdb or JDBC to issue the sql: select * from users to Mysql. Suppose the table has 1 billion records. Then how many rows would be returned by Mysql in one chunk/package. I mean Mysql won't transfer the rows one by one neither transfer all of the data just one time, right? So what's the default chunk/package size one internet transfer to the client please?
If I used server-side cursor then I should set the fetch size bigger than default chunk size for better performance, right please?
The implementation notes of MySQL's JDBC API implementation points out, that by default the whole set will be retreived and stored in memory. So if there are 1 billion records they will be retreived. The limiting factor would probably be the memory of your machine.
So to sum up the size of the ResultSet retreived is depending on the JDBC implementation. For example Oracle's JDBC-Driver would only retreive 10 rows at a time and store them in memory.
We're using SQL Server 2014 Enterprise and for some tables we're using a clustered columnstore index.
Sometimes when running queries, we're seeing unknown wait types like HTREBUILD, HTREPARTITION, HTMEMO. Searching with Google didn't give any meaningful results.
Does anyone know what these are, and if so, can you please give us some background?
HTREBUILD, HTREPARTITION, HTMEMO, HTDELETE
This wait type (and the other HT* waits) is when a thread is waiting for access to the shared hash table used during batch-mode processing.
SQL Server 2012 used to use a hash table per thread and SQL Server 2014 now uses a shared hash table.
Basically; SQL Server 2014 now uses one shared hash table instead of per-thread copy.
This change was made to reduce the amount of memory required for the hash table, but comes at the expense of these waits when synchronizing access to the hash table. Typically these waits occur when queries involve columnstore indexes, but they can also occur without columnstore indexes being involved if a hash operator runs in batch mode.
Using one shared hash table instead of per-thread copy :
Provides the benefit of significantly lowering the amount of memory
required to persist the hash table but, as you can imagine, the multiple
threads depending on that single copy of the hash table must synchronize with
each other before, for example, deallocating the hash table. To do so, those
threads wait on the HTDELETE (Hash Table DELETE) wait type.
Hi everyone I'm trying to extract a lot of records from a lot of joined tables and views using SSIS (OLE DB SOURCE) but it takes a huge time! the problem is due to the query because when I parsed it on sql server it takes more than hour ! Her's my ssis package design
I thought of paralleled extraction using two OLE DB source and merge join but it isn't recommended using it! besides it takes more time! Is there any way to help me please?
Writing the T-sql query with all the joins in the OLEDB source will always be faster than using different source and then using Merge Join IMHO. The reason is SSIS is memory Oriented architecture .It has to bring all the data from N different tables into its buffers and then filter it using Merge join and more over Merge Join is an asynchronous component(Semi Blocking) therefore it cannot use the same input buffer for its output .A new buffer is created and you may run out of memory if there are large number of rows extracted from the table.
Having said that there are few ways you can enhance the extraction performance using OLEDB source
1.Tune your SQL Query .Avoid using Select *
2.Check network bandwidth .You just cannot have faster throughput than your bandwidth supports.
3.All source adapters are asynchronous .The speed of an SSIS Source is not about how fast your query runs .It's about how fast the data is retrieved .
As others have suggested above ,you should show us the query and also the time it is taking to retireve the data else these are just few optimization technique which can make the extraction faster
Thank you for posting a screen shot of your data flow. I doubt whether the slowness you encounter is truly the fault of the OLE DB Source component.
Instead, you have 3 asynchronous components that result in a 2 full blocks of your data flow and one that's partially blocking (AGG, SRT, MRJ). That first aggregate will have to wait for all 500k rows to arrive before it can finish the aggregate and then pass it along to the sort.
These transformations also result in fragmented memory. Normally, a memory buffer is filled with data and visits each component in a data flow. Any changes happen directly to that address space and the engine can parallelize operations if it can determine step 2 is modifying field X and step 3 is modifying Y. The async components are going to cause data to be copied from one space to another. This is a double slow down. The first is the physical act of copying data from address space 0x01 to 0xFA or something. The second is that it reduces the available amount of memory for the dtexec process. No longer can SSIS play with all N gigs of memory. Instead, you'll have quartered your memory and after each async is done, that memory partition is just left there until the data flow completes.
If you want this run better, you'll need to fix your query. It may result in your aggregated data being materialized into a staging table or all in one big honkin' query.
Open a new question and provide insight into the data structures, indexes, data volumes, the query itself and preferably the query plan - estimated or actual. If you need help identifying these things, there are plenty of helpful folks here that can help you through the process.
i am facing a terrible issue with the SSIS packages. In my packages I have one lookup and i have a condition like this
OLEDB Source has around 400,000 records and lookup table has around 1,200,000 records. Both the tables can grow but the one which is coming from the OLEDB source will have max around 900,000. Both table has around 40-50 columns to lookup.
There are 51528912896 bytes of physical memory with 32689860608 bytes free. There are 4294836224 bytes of virtual memory with 249348096 bytes free. The paging file has 120246493184 bytes with 109932904448 bytes free.
is there is any effective solution to this?
At that scale I would be considering using a Merge Join transformation instead of a Lookup. Order by your keys in your OLE DB Source SQL code and define the sort manually (ref http://www.ssistalk.com/2009/09/17/ssis-avoiding-the-sort-components/ ). While slower than a cached lookup, this design tends to scale better in terms of memory use.