Inherent issue with custom memory manager and RTLD_DEEPBIND? - dlopen

I need help regarding an issue for which I created bug 25533 a while ago.
The issue is mainly because of complex interaction of RTLD_DEEPBIND, libc and custom memory manager.
Let me explain issue first
In my application, I use a third party application which makes use of dlopen with RTLD_DEEPBIND flag
In my application, this third party application is linked statically along with Google’s memory manager (gperftools https://github.com/gperftools/gperftools).
When third party application calls dlopen with RTLD_DEEPBIND flag, strdup (along with malloc) gets picked from glibc.
free definition is still communing from custom memory manager and it’s a problem as malloc’ed memory was allocated by libc malloc while free call is directed to custom memory manager.
I asked third party application vendor to fix it at their end (dlopen with RTLD_DEEPBIND), but they expressed their inability to do it as it has side effect.
I filed ticket against tcmalloc as well but it also hit dead end. As a workaround, I made use of malloc hooks.
But, I consider malloc hooks as temporary workaround and I’m looking for proper solution to this problem and IMO, proper fix should be in glibc which should be agnostic to memory manager
that any application chooses. I believe jemalloc (Facebook) also has same issue and they also have used malloc hooks to work around this problem.
Through this email, I’m looking for suggestions to fix this issue properly.
I have created a small test case which is not exactly what is there in my application but closely resembles what I see. It’s dependent on TCMALLOC_STATIC_LIB variable which should be path to static library (.a) for custom memory manager.

Related

How to use an in-process IMFTransform with WinRT MediaPlayer::AddVideoEffect via activatableClassId

WinRT's Windows::Media::Playback::MediaPlayer has support to adding video and audio effects (much like IMFMediaEngine), however I can't find a way to use existing IMFTransform's that I already use with IMFMediaEngineEx::InsertVideoEffect() in MediaPlayer::AddVideoEffect()
MediaPlayer::AddVideoEffect() only takes a string for the "activatableClassId", whereas IMFMediaEngineEx::InsertVideoEffect() allows me to pass in a pointer to my local IMFTransform directly. I don't want to registry a DLL with the system for the class to be activatable, I just want the IMFTransform to be registered locally in-process so that it can be discovered by the classId.
I've searched online but there is very little information. All I found was this Microsoft thread, an old article showing CGreyScale MFT using WRL, and this useful repository which uses an appxmanifest to registry the classes (not what I want to do).
These example seem useful and I implemented the decoration around my existing MFT however the example relies on registering the activatableClassId externally so I still can't tell how to do it in-process. The only thing I could find was RoRegisterActivationFactories() but there's very little information about this so I'm not sure.
Does anyone know how to do this?
Thanks,
Since the MediaPlayer API is WinRT, it will expect to use WinRT activated objects for effects. Alternatively, the lower level win32 MF Media Engine allows you to pass in an IMFActivate for any custom activation.
There are two ways to activate the MFT with WinRT:
Register the MFT to the registry and reference the CLSID, you can refer to this document.
Registration-Free WinRT(which requires the use of an application manifest), you can refer to this blog.
Unfortunately, this means that there is an appmanifest requirement if you wish to register the MFT in-process.

Castle Windsor when is transient with disposable released? Burden

We're using Castle Windsor 2.1.0.6655.
I'm wanting to use transient lifecycle for my resolved objects, but I'm wanting to check how this version of Castle deals with transients that have dependencies. If I use my immediate window (visual studio), I can see the effects of resolving, disposing, and finally realeasing, all the time checking whether the resolved object is released.
eg.
resolved = container.Resolve(Id);
container.Kernal.ReleasePolicy.HasTrack(resolved)
= true
resolved.Dispose()
container.Kernal.ReleasePolicy.HasTrack(resolved)
= true
container.release(resolved)
container.Kernal.ReleasePolicy.HasTrack(resolved)
= false
My concern is that these objects are continuing to be tracked between requests, as they are never released, meaning memory usage continues to rise.
I've read that Component Burden is related to this issue, but I haven't been able to find out exactly what this is in Castle 2.0 and greater.
The difficulty in 'releasing' is that the resolved objects are in fact part of services, their usage being to provide ORM functions and mappings. I'm not sure that referencing the container to release is correct in these instances.
I'm wondering whether there is a way for me to see how many objects the container is referencing at a given point, without having to use memory profilers, as we don't have this available.
I thought I could maybe use the following:
container.Kernel.GetHandlers()
with the type I'm looking for, to see if tracked occurrences are increasing?
Vesion 2.1 will celebrate its 4th birthday very soon. I strongly recommend you to upgrade to version 3.1.
Not only because v2.1 is no longer supported and v3.1 is much newer, with many bugfixes, but also it has some major improvements in the way it does tracking.
Also in v3.1 you will be able to enable a performance counter, that will report to you, in real time, the number of instances being tracked by the release policy.
Addressing the particular concern you're referring to, that sounds like an old threading bug that was fixed somewhere along the way. One more reason to upgrade.
windsor has to be used with R(egister)R(esolve)R(elease) pattern.
by default(you definitely should stick with that...) all components are tracked/owned by the container... that's the windsor beauty!
Until you (or the container itself) calls Release the instance will be hold in memory, no matter if you call the Dispose directly(as per you sample).
Said so, components registered as Transient should be called w/ composition root only in other word as first object of the dependency graph or through a factory(late dependency).
Of course keep in mind that using a factory within dependency graph you may need to implement RRR pattern expliclty.

How to find Code generating known data?

From debugging the program I only know that before clicking a button a set of known data isn't in memory (confirmed by memory search) and after clicking it the data is in memory (all the time a different location).
How can I find the code that generates this data?
One of the major problems (which might be important to know) is that it is a .net-Program (which I can't analyze with Reflector because it is obfuscated). So I'm analyzing the assembly generated by .NET (in Olly / Immunity / IDA).
If it is .Net you could debug the IL code. It is not easy though, but it should be possible to find the il intruction that writes the sequence into memory.
Try debugging tools for windows with the so called SOS extension.
You could also try if it would be possible to generate say C# code from the obfuscated assemblies for debugging. But this will most certainly not better readable code than IL.
Add Cheat Engine to your toolkit.
If you can get the address it will write to you could right click it and choose "Find out what writes to this address".
P.S. For a reverse effect you can select an instruction in the memory view, right click and choose "Find out what addresses this instruction access".

What is instrumentation?

I've heard this term used a lot in the same context as logging, but I can't seem to find a clear definition of what it actually is.
Is it simply a more general class of logging/monitoring tools and activities?
Please provide sample code/scenarios when/how instrumentation should be used.
I write tools that perform instrumentation. So here is what I think it is.
DLL rewriting. This is what tools like Purify and Quantify do. A previous reply to this question said that they instrument post-compile/link. That is not correct. Purify and Quantify instrument the DLL the first time it is executed after a compile/link cycle, then cache the result so that it can be used more quickly next time around. For large applications, profiling the DLLs can be very time consuming. It is also problematic - at a company I worked at between 1998-2000 we had a large 2 million line app that would take 4 hours to instrument, and 2 of the DLLs would randomly crash during instrumentation and if either failed you would have do delete both of them, then start over.
In place instrumentation. This is similar to DLL rewriting, except that the DLL is not modified and the image on the disk remains untouched. The DLL functions are hooked appropriately to the task required when the DLL is first loaded (either during startup or after a call to LoadLibrary(Ex). You can see techniques similar to this in the Microsoft Detours library.
On-the-fly instrumentation. Similar to in-place but only actually instruments a method the first time the method is executed. This is more complex than in-place and delays the instrumentation penalty until the first time the method is encountered. Depending on what you are doing, that could be a good thing or a bad thing.
Intermediate language instrumentation. This is what is often done with Java and .Net languages (C~, VB.Net, F#, etc). The language is compiled to an intermediate language which is then executed by a virtual machine. The virtual machine provides an interface (JVMTI for Java, ICorProfiler(2) for .Net) which allows you to monitor what the virtual machine is doing. Some of these options allow you to modify the intermediate language just before it gets compiled to executable instructions.
Intermediate language instrumentation via reflection. Java and .Net both provide reflection APIs that allow the discovery of metadata about methods. Using this data you can create new methods on the fly and instrument existing methods just as with the previously mentioned Intermediate language instrumentation.
Compile time instrumentation. This technique is used at compile time to insert appropriate instructions into the application during compilation. Not often used, a profiling feature of Visual Studio provides this feature. Requires a full rebuild and link.
Source code instrumentation. This technique is used to modify source code to insert appropriate code (usually conditionally compiled so you can turn it off).
Link time instrumentation. This technique is only really useful for replacing the default memory allocators with tracing allocators. An early example of this was the Sentinel memory leak detector on Solaris/HP in the early 1990s.
The various in-place and on-the-fly instrumentation methods are fraught with danger as it is very hard to stop all threads safely and modify the code without running the risk of requiring an API call that may want to access a lock which is held by a thread you've just paused - you don't want to do that, you'll get a deadlock. You also have to check if any of the other threads are executing that method, because if they are you can't modify it.
The virtual machine based instrumentation methods are much easier to use as the virtual machine guarantees that you can safely modify the code at that point.
(EDIT - this item added later) IAT hooking instrumentation. This involved modifying the import addess table for functions linked against in other DLLs/Shared Libraries. This type of instrumentation is probably the simplest method to get working, you do not need to know how to disassemble and modify existing binaries, or do the same with virtual machine opcodes. You just patch the import table with your own function address and call the real function from your hook. Used in many commercial and open source tools.
I think I've covered them all, hope that helps.
instrumentation is usually used in dynamic code analysis.
it differs from logging as instrumentation is usually done automatically by software, while logging needs human intelligence to insert the logging code.
It's a general term for doing something to your code necessary for some further analysis.
Especially for languages like C or C++, there are tools like Purify or Quantify that profile memory usage, performance statistics, and the like. To make those profiling programs work correctly, an "instrumenting" step is necessary to insert the counters, array-boundary checks, etc that is used by the profiling programs. Note that in the Purify/Quantify scenario, the instrumentation is done automatically as a post-compilation step (actually, it's an added step to the linking process) and you don't touch your source code.
Some of that is less necessary with dynamic or VM code (i.e. profiling tools like OptimizeIt are available for Java that does a lot of what Quantify does, but no special linking is required) but that doesn't negate the concept.
A excerpt from wikipedia article
In context of computer programming,instrumentation refers to an
ability to monitor or measure the level of a product's performance, to
diagnose errors and to write trace information. Programmers implement
instrumentation in the form of code instructions that monitor specific
components in a system (for example, instructions may output logging
information to appear on screen). When an application contains
instrumentation code, it can be managed using a management tool.
Instrumentation is necessary to review the performance of the
application. Instrumentation approaches can be of two types, source
instrumentation and binary instrumentation.
Whatever Wikipedia says, there is no standard / widely agreed definition for code instrumentation in IT industry.
Please consider, instrumentation is a noun derived from instrument which has very broad meaning.
"Code" is also everything in IT, I mean - data, services, everything.
Hence, code instrumentation is a set of applications that is so wide ... not worth giving it a separate name ;-).
That's probably why this Wikipedia article is only a stub.

Is it bad practice to use the system() function when library functions could be used instead? Why?

Say there is some functionality needed for an application under development which could be achieved by making a system call to either a command line program or utilizing a library. Assuming efficiency is not an issue, is it bad practice to simply make a system call to a program instead of utilizing a library? What are the disadvantages of doing this?
To make things more concrete, an example of this scenario would be an application which needs to download a file from a web server, either the cURL program or the libcURL library could be used for this.
Unless you are writing code for only one OS, there is no way of knowing if your system call will even work. What happens when there is a system update or an OS upgrade?
Never use a system call if there is a library to do the same function.
I prefer libraries because of the dependency issue, namely the executable might not be there when you call it, but the library will be (assuming external library references get taken care of when the process starts on your platform). In other words, using libraries would seem to guarantee a more stable, predictable outcome in more environments than system calls would.
There are several factors to take into account. One key one is the reliability of whether the external program will be present on all systems where your software is installed. If there is a possibility that it will be missing, then maybe it is better to do it inside your program.
Weighing against that, you might consider that the extra code loaded into your program is prohibitive - you don't need the code bloat for such a seldom-used part of your application.
The system() function is convenient, but dangerous, not least because it invokes a shell, usually. You may be better off calling the program more directly - on Unix, via the fork() and exec() system calls. [Note that a system call is very different from calling the system() function, incidentally!] OTOH, you may need to worry about ensuring all open file descriptors in your program are closed - especially if your program is some sort of daemon running on behalf of other users; that is less of a problem if your are not using special privileges, but it is still a good idea not to give the invoked program access to anything you did not intend. You may need to look at the fcntl() system call and the FD_CLOEXEC flag.
Generally, it is easier to keep control of things if you build the functionality into your program, but it is not a trivial decision.
Security is one concern. A malicious cURL could cause havoc in your program. It depends if this is a personal program where coding speed is your main focus, or a commercial application where things like security play a factor.
System calls are much harder to make safely.
All sorts of funny characters need to be correctly encoded to pass arguments in, and the types of encoding may vary by platform or even version of the command. So making a system call that contains any user data at all requires a lot of sanity-checking and it's easy to make a mistake.
Yeah, as mentioned above, keep in mind the difference between system calls (like fcntl() and open()) and system() calls. :)
In the early stages of prototyping a c program, I often make external calls to programs like grep and sed for manipulation of files using popen(). It's not safe, it's not secure, and it's certainly not portable. But it can allow you to get going quickly. That's valuable to me. It lets me focus on the really important core of the program, usually the reason I used c in the first place.
In high level languages, you'd better have a pretty good reason. :)
Instead of doing either, I'd Unix it up and build a script framework around your app, using the command line arguments and stdin.
Other's have mentioned good points (reliability, security, safety, portability, etc) - but I'll throw out another. Performance. Generally it is many times faster to call a library function or even spawn a new thread then it is to start an entire new process (and then you still have to correctly check/verify it's execution and parse it's output!)