I am trying to understand what happens to a torch-tensor, after it is processed. So let's say i retrieve a batch of data from my DataLoader (in according Dataset object big tabular dataset is initialized - stored on local memory I assume). Then I send this batch to the device, e.g. the GPU, and it gets run through the model.
But what happens after the computations are done (so forward and during training backward pass). Is the memory that the batch-tensors took freed again?
Related
I want to cache data that I got from my MySQL DB and for this I am currently storing the data in an object.
Before querying the database, I check if the needed data exists in the meantioned object or not. If not, I will query and insert it.
This works quiet well and my webserver is now just fetching the data once and reuses it.
My concern is now: Do I have to think of concurrent writes/reads for such data structures that lay in the object, when using nodejs's clustering feature?
Every single line of JavaScript that you write on your Node.js program is thread-safe, so to speak - at any given time, only a single statement is ever executed. The fact that you can do async operations is only implemented at a low level implementation that is completely transparent to the programmer. To be precise, you can only run some code in a "truly parallel" way when you do some input/output operation, i.e. reading a file, doing TCP/UDP communication or when you spawn a child process. And even then, the only code that is executed in parallel to your application is that of Node's native C/C++ code.
Since you use a JavaScript object as a cache store, you are guaranteed no one will ever read or write from/to it at the same time.
As for cluster, every worker is created its own process and thus has its own version of every JavaScript variable or object that exists in your code.
I am trying to decide between holding static data (gets updated on a nightly basis, not real-time) in a database or in flat JSON files to supply a Node.js application. In preliminary tests the flat file method is twice as fast. My question is about the issue of memory when using the file method.
If my Node.js app reads the data from the file and then does JSON.parse and passes the object to the template to render... does the in-memory size of that data get duplicated with each user connection?
i.e. if the data file is 1MB and there are 1000 concurrent users, does it consume 1000MB of server memory during that period?
Each connection runs separately so if you have 1000 concurrent users, they aren't really running their request all at the same time because node.js is single threaded. It runs one single connection until it either finishes or until it hits a non-blocking operation such as async I/O. Assuming you are using async file I/O, you could have a few connections in process at the same time, but as soon as one finishes, its memory use will be returned to the system by the garbage collector.
Your operation sounds ideal for an in-memory cache. You can decide what lifetime works best for the cache, but you could load the JSON, store it in memory, set an expiration time 10 minutes from now and as long as the current time is not greater than the expiration time, you just return the result from the cache with no disk access. Thus, you'd only ever retrieve the data from disk once every 10 minutes max and the data would be returned even faster and the average memory used per request would be significantly lower.
The only downside to this cache approach is that when the data is updated real-time, it could take up to 10 minutes (on average 1/2 the cache time or 5 minutes) for the cached data to expire and the new data to be returned. Since this only happens once nightly, it may not be a big deal to you, but there are ways to deal with that issue if you want to. For example, you can check the file date/time of the data file on every request and if it hasn't changed since the last time, then you just keep using your cached version of the data. When it does change, you read it from the disk and replace the cached version. This adds an extra disk I/O operation on each request, but guarantees that the user always gets the latest version while still allowing for the benefits of a cached version that only has to be newly read into memory when the data has actually changed.
One other thing to consider. If the data is 1MB and you're generating a giant HTML file from that, your page rendering may be where the largest memory consumption is since expanding a large data structure into HTML can often make it 10-20x larger and how well your rendering engine does with memory consumption depends entirely on the rendering engine.
If there is no per-user customization in the HTML or anything else in the rendered HTML that varies from one rendering to the next (as long as the JSON hasn't changed), you might want to actually cache the rendered HTML so all you have to do is stream it to each request.
I have changed the memory setting in the sql server property to a low memory.Also i have changed the buffer temp path to a particular location of my system.But why package is failing with message as insufficient memory?.If we set buffer temp and blob temp,the data should swap to that temp location right?Then if it is failing, what is the use of buffer temp?
Somewhat related What is the default file path location for BufferTempStoragePath in SSIS 2005? In particular, read the linked article from bimonkey concerning the accessibility of these locations on disk from the sql agent service account.
Generally speaking, when your package is reporting low memory, it is due to the use of fully blocking transformations and Lookup Tasks pulling back too much data. If your package make heavy use of blocking transformations, try and offload the work to source systems. If lookups are to blame, try being more selective in your query. Do not pull back the entire table, only pull the columns you need. If that isn't selective enough, can you try filtering that dataset with a where clause (I only need current year's data, etc). Failing that, switch the lookup from full cache mode to partial cache or no cache. No cache will result in one-off queries to the source system for every row that comes through. It will have no memory that it ran the exact same query 2 rows ago. Partial cache solves that dilemma by keeping the X MB of data in memory. If you want more details about how to reduce memory usage, post some screenshots of what your package looks like. Also note, settings like BufferTempStoragePath are per data flow so if you have multiple data flows in a package, each one will need to be configured.
The architecture of the dataflow is such that data is read into memory buffers and the address of those buffers are passed to the various tasks. Instead of each task needing however much memory allocated to them to hold the data that's passing through them, they all work off the same shared set of memory. Copying that memory from task to task would be slow and very expensive in terms of memory consumption.
With that preamble said, what are BufferTempStoragePath and BlobTempStoragePath? Anytime you pull large object types (n/varchar(max), xml, image, etc) into the dataflow, that data is not kept in memory buffers like native types. Instead it's written to disk and a pointer to that address is what is put into the memory buffer. BufferTempStoragePath is used when your data flow task still has work to do but you've either
fragmented your memory so much (through fully/partially blocking transformations) the engine can't get any more
are trying to do too damn many things in a single task. My rule of thumb is that I should be able to trace a line from any transformation in the package to all the sources and destinations. If you've created a package from the import/export wizard, those dataflows are prime candidates for being split out into separate flows as it loves to group unrelated things into a single data flow which makes them memory hungry.
the box simply doesn't have sufficient resources to perform for the data. I generally prefer to avoid throwing more hardware at a job but if you've addressed the first two bullets, this would be the last one in my pistol.
Overview of the application:
I have a Delphi application that allows a user to define a number of queries, and run them concurrently over multiple MySQL databases. There is a limit on the number of threads that can be run at once (which the user can set). The user selects the queries to run, and the systems to run the queries on. Each thread runs the specified query on the specified system using a TADOQuery component.
Description of the problem:
When the queries retrieve a low number of records, the application works fine, even when lots of threads (up to about 100) are submitted. The application can also handle larger numbers of records(150,000+) as long as only a few threads (up to about 8) are running at once. However, when the user is running more than around 10 queries at once (i.e. 10+ threads), and each thread is retrieving around 150,000+ records, we start getting errors. Here are the specific error messages that we have encountered so far:
a: Not enough storage is available to complete this operation
b: OLE error 80040E05
c: Unspecified error
d: Thread creation error: Not enough storage is available to process this command
e: Object was open
f: ODBC Driver does not support the requested properties
Evidently, the errors are due to a combination of factors: number of threads, amount of data retrieved per thread, and possibly the MySQL server configuration.
The main question really is why are the errors occurring? I appreciate that it appears to be in some way related to resources, but given the different errors that are being returned, I'd like to get my head around exactly why the errors are cropping up. Is it down to resources on the PC, or something to do with the configuration of the server, for example.
The follow up question is what can we do to avoid getting the problems? We're currently throttling down the application by lowering the number of threads that can be run concurrently. We can't force the user to retrieve less records as the queries are totally user defined and if they want to retrieve 200,000 records, then that's up to them, so there's not much that we can do about that side of things. Realistically, we don't want to throttle down the speed of the application because most users will be retrieving small amounts of data, and we don't want to make the application to slow for them to use, and although the number of threads can be changed by the user, we'd rather get to the root of the problem and try to fix it without having to rely on tweaking the configuration all the time.
It looks you're loading a lot of data client-side. They may require to be cached in the client memory (especially if you use bidirectional cursors), and in a 32 bit application that could not be enough, depending on the average row size and how efficient is the library to store rows.
Usually the best way to accomplish database work is to perform that on the server directly, without retrieving data to the client. Usually databases have an efficient cache system and can write data out to disk when they don't fit in memory.
Why do you retrieve 150000 rows at once? You could use a mechanism to transfer data only when the user actually access them (sort of paging through data), to avoid large chunks of "wasted" memory.
This makes perfect sense (the fact you're having problems, not the specific errors). Think it through - you have the equivalent of 10 database connections (1 per thread) each receiving 150,000 rows of data (1,500,000 rows total) across a single network connection. Even if you're not using client-side cursors and the rows are small (just a few small columns), this is a HUGE flow of data across a single network interface, and a big hit on memory on the client computer.
I'd suspect the error messages are incorrect, in the same way that sometimes you have an access violation caused by a memory overwrite in another code location.
Depending on your DBMS, to help with the problem you could use the LIMIT/TOP sql clauses to limit the amountof data returned.
Things I would do:
write a very simple test application which only uses the necessary parts of the connection / query creation (with threads), this would eliminate all side effects caused by other parts of your software
use a different database access layer instead of ODBC, to find out if the ODBC driver is the root cause of the problem
it looks like the memory usage is no problem when the number of threads is low - to verify this, I would also measure / calculate the memory requirement of the records, compare it with the memory usage of the application in the operating system. For example if tests show that four threads can safely query 1.5 GB of total data without problems, but ten threads fail with under 0.5 GB of total data, I would say it is a threading problem
I have a million rows in a database table. For each row I have to run a custom exe, parse the output and update another database table
How can I run process multiple rows in parallel?
I now have a simple dataflow task ->GetData->Run Script (Run Process , Parse Output)->Store Data
For 6000 rows it took 3 hours.Way too much.
There is the single bottleneck here, running the process per each row. Increasing "EngineThreads" would not help at all, as there will be only one thread running this particular script transform anyway. The time spent in other transforms probably does not matter at all. Processes are heavy weight objects, and running thousands of them will never be cheap.
I can think of following ideas to make it better:
1) The best way to fix it is to convert your custom EXE into an assembly and call it from the script transform - to avoid the overhead of creating processes, parsing the output etc.
2) If you have to use the separate processes, you can try to run these processes in parallel. It will help if the process mostly waits for some input/output (i.e. it is I/O bound). If the processes are memory bound or CPU bound, you would not win much by running them in parallel.
2A) Complex script, simple package.
To run them in parallel, modify the ProcessInput method in your script to start the process asynchronously, and don't wait for the process completion - move to the next row and create the next process. Subscribe to process output and process Exited event, so you know when it has finished. Limit the number of processes run in parallel - otherwise you'll run out of memory. Wait until all the processes are done before returning from ProcessInput call.
2B) Simple script, complex package.
Keep the current sequential script, but partition the data using SSIS. Add conditional split transform, and split the input stream into multiple streams, based on some hash expression - something that will make each output to receive approximately the same amount of data. The number of streams equals the number of process instances you want to run in parallel. Add your script transform to each output of conditional split. Now you should also increase "Engine Threads" property :) and these transforms will run in parallel. (Note: based on tag, I assume you use SSIS 2008. You'll need to insert additional Union All transforms to make it work in SSIS 2005).
This should make it perform better, but millions of processes is a lot. You'll hardly get really good performance here.
If you are executing this process using the "data flow" container, then there is a property on it called "EngineThreads" which defaults to a value of 5. You can set it to a higher number like 20, which will devote more threads to processing those rows.
That is just a performance tweak or optmisation, if your ssis package is still running really slowly then I would perhaps address the architecture and design of your package.