Different __getstate__/__setstate__ for multiprocesing and pickling? - pickle

I have a problem. The classes i defined are passed to multiprocessing.Pool()'s during the computations. Then, i want to save thoose objects with pickle.dump() once all computations are done.
To ease computations, i pre-compute some things before sending the objects to the processes. Then, while saving to disk, theese precomputations are not necessary anymore, so i wanted to remove them using a __setstate__/__getstate__ mecanism as recomanded by the pickle documentation.
Q: Does the pickling done by multiprocessing uses the user-defined __setstate__/__getstate__ ? If yes, how can i use different __setstate__/__getstate__ for pickle and multiprocessing ?
Indeed, my pre-computations are thrown away and re-done by each spawn process..
EDIT : I founded this : multiprocessing ignores "__setstate__" which treats a similar problem. The first answer shows differences between windows and linux behavior (which is problematic for me...). But this is 7 years old. What is the current implementation state on this matter ?

Related

The use of packages to parse command arguments employing options/switches?

I have a couple questions about adding options/switches (with and without parameters) to procedures/commands. I see that tcllib has cmdline and Ashok Nadkarni's book on Tcl recommends the parse_args package and states that using Tcl to handle the arguments is much slower than this package using C. The Nov. 2016 paper on parse_args states that Tcl script methods are or can be 50 times slower.
Are Tcl methods really signicantly slower? Is there some minimum threshold number of options to be reached before using a package?
Is there any reason to use parse_args (not in tcllib) over cmdline (in tcllib)?
Can both be easily included in a starkit?
Is this included in 8.7a now? (I'd like to use 8.7a but I'm using Manjaro Linux and am afraid that adding it outside the package manager will cause issues that I won't know how to resolve or even just "undo").
Thank you for considering my questions.
Are Tcl methods really signicantly slower? Is there some minimum threshold number of options to be reached before using a package?
Potentially. Procedures have overhead to do with managing the stack frame and so on, and code implemented in C can avoid a number of overheads due to the way values are managed in current Tcl implementations. The difference is much more profound for numeric code than for string-based code, as the cost of boxing and unboxing numeric values is quite significant (strings are always boxed in all languages).
As for which is the one to use, it really depends on the details as you are trading off flexibility for speed. I've never known it be a problem for command line parsing.
(If you ask me, fifty options isn't really that many, except that it's quite a lot to pass on an actual command line. It might be easier to design a configuration file format — perhaps a simple Tcl script! — and then to just pass the name of that in as the actual argument.)
Is there any reason to use parse_args (not in tcllib) over cmdline (in tcllib)?
Performance? Details of how you describe things to the parser?
Can both be easily included in a starkit?
As long as any C code is built with Tcl stubs enabled (typically not much more than define USE_TCL_STUBS and link against the stub library) then it can go in a starkit as a loadable library. Using the stubbed build means that the compiled code doesn't assume exactly which version of the Tcl library is present or what its path is; those are assumptions that are usually wrong with a starkit.
Tcl-implemented packages can always go in a starkit. Hybrid packages need a little care for their C parts, but are otherwise pretty easy.
Many packages either always build in stubbed mode or have a build configuration option to do so.
Is this included in 8.7a now? (I'd like to use 8.7a but I'm using Manjaro Linux and am afraid that adding it outside the package manager will cause issues that I won't know how to resolve or even just "undo").
We think we're about a month from the feature freeze for 8.7, and builds seem stable in automated testing so the beta phase will probably be fairly short. The list of what's in can be found here (filter for 8.7 and Final). However, bear in mind that we tend to feel that if code can be done in an extension then there's usually no desperate need for it to be in Tcl itself.

How could I go about loading functions from NTDLL without linking against it or any other DLLs?

I've been experimenting with loading functions from the Windows system DLLs using only the loader functions exported by NTDLL. This works as expected. For the sake of curiosity and getting an even better understanding of the process structure in NT-based systems, I've started trying to load functions from NTDLL by doing the following steps:
Load the PEB of the process from gs:[60h]
Iterate over the modules loaded into the process according to the loader to find NTDLL's base address
Parse the PE headers of NTDLL
Try to parse the export table to find LdrLoadDll, LdrGetDllHandle, and LdrGetProcedureAddress
This fails at step 4. After stepping through it in a debugger (both VS2019 and WinDbg Preview), it seems as though the offsets I've tried yield an invalid structure that leads to an access violation when my code compares the current function name to one of the ones I'm searching for. My code is being compiled and run on a 64-bit copy of Windows 10 Pro build 21364. Note that I'm using my own header that contains definitions for the structures used for this (these definitions are from winnt.h and here) because the Windows headers don't really play nice with the rest of my code. The function trying to do this is here. For the record, this is part of an attempt to implement my own libc (again, for the sake of curiosity). The code that calls the functions is here. Any help with this is tremendously appreciated.
Nevermind, turns out I had outdated verbose definitions of the structures I was using. I found better (more up-to-date) definitions at https://vergiliusproject.com.

How can I check for downstream components in an SSIS custom transform?

I am working on a custom SSIS component that has 4 asynchronous outputs. It works just fine but now I have a user request for an enhancement and I am not sure how to handle it. They want to use the component in another context where only 2 of the 4 outputs will be well defined. I foolishly said that this would be trivial for me to support, I planned to just look to see if the two "undefined" streams were even connected, if not then I would skip over that part of the processing.
My problem is that I cannot figure out if an output is connected at run time, I had hoped that the output pipeline or output buffer would be missing. It doesn't look like that is the case; even when they are not hooked up the output and buffer are present.
Does anyone know where I should be looking to see if an output has a downstream consumer or not?
Thanks!
Edit: I was never able to figure out how to do this reliably, so I ended up making this behaviour configurable by the user. It is not automatic like I would have hoped but the difference I found between the BIDS environment and the DTExec environment pushed me to the conclusion that a component probably should not be making assumptions about the component graph it is embedded in.

Managing different project configurations in Dynamics AX

We have trouble with keeping apart our different configurations. Allow me to explain with a little
Scenario:
Let's say you have two AX projects, e.g. P and M, that both alter the ProdRouteJob table, calling methods in one of their own project specific classes.
You have all these classes on your developer machine, of course, and ProdRouteJob compiles fine, but for an installation on a new server, you don't want to add stub classes for every non-installed project, do you? So you wrap these calls to project classes in something like
if( Global::isConfigurationkeyEnabled( <projectPConfKey> ) )
// call project P stuff
to encapsulate them cleanly. You declare configuration keys for all your projects, and you're ready to go, right?
However, this throws an error if you haven't installed project P on this machine, because its projectPConfKey is unknown. Now you could at every installation install configuration keys for all your projects, just to tell the server that there is such a thing as a projectPConfKey, but then all these ifs would evaluate to true...
In short: how do you include configuration keys in your project XPOs so that your code compiles, but at the same time so that some configuration keys are disabled from the start?
Or is there something totally basic that I'm missing here?
EDIT:
Consensus among the answers (thank you, demas; thank you, Mr. Kjeldsen) is that it's impractical to attempt a more or less automatic client-side configuration via macros or configuration keys. Thus, when we are installing at the client, we will have to import the standard tables with our changes and manually take out all changes not pertinent to the current installation.
I am a bit at a loss which answer to accept, because demas answered the question I asked, while Mr. Kjeldsen answered the questions that arose in the comment conversation under demas' answer. I shall, with apologies to Mr. Kjeldsen, mark demas' answer as accepted.
if( Global::isConfigurationkeyEnabled( <projectPConfKey> ) )
This check works only execution time and when you compile code you get the error.
So you need create stub classes or compare objects when you import them on production server and import only changes of project M.
There is another one approach but I don't have Axapta to check it. Try:
#if.never
someMissingCall();
#endif
But i'm not sure that it will works now.
Big solution centers (VAR) typically have separate applications for each module.
If you have 2 modules, you have 2 applications.
Good modules have very small collision areas to the standard application, these are kept in a separate project.
Collisions with the standard and other modules are then handled manually at install time. As the collision area is small and rarely change (it should not contain buisness logic) the problem is minimal.

Best practices for avoiding hardcoded values IRL

In theory, source code should not contain hardcoded values beyond 0, 1 and the empty string. In practice, I find it very hard to avoid all hardcoded values while being on very tight delivery times, so I end up using a few of them and feeling a little guilty.
How do you reconcile avoiding hardcoded values with tight delivery times?
To avoid hard-coding you should
use configuration files (put your values in XML or ini-like text files).
use database table(s) to store your values.
Of course not all values qualify to be moved to a config file. For those you should use constructs provided by the programming language such as (constants, enums, etc).
Just saw an answer to use "Constatn Interface". With all due respect to the poster and the voters, but this is not recommended. You can read more about that at:
http://en.wikipedia.org/wiki/Constant_interface
The assumption behind the question seems to me invalid.
For most software, configuration files are massively more difficult to change that source code. For widely installed, software, this could easily be a factor of a million times more difficult: there could easily be that many files hanging round on user installations which you have little knowledge and no control over.
Having numeric literals in the software is no different from having functional or algorithmic literals: it's just source code. It is the responsibility of any software that intends to be useful to get those values right.
Failing that make them at least maintainable: well named and organised.
Making them configurable is the kind of last-ditch compromise you might be forced into if you are on a tight schedule.
This comes with a little bit of planning, in most cases it is as simple as having a configuration file, or possibly a database table that stores critical configuration items. I don't find that there is any reason that you "have" to have hard coded values, and it shouldn't take you much additional time to offload to a configuration mechanism to where tight time lines would be a valid excuse.
The problem of hardcoded values is that sometimes it's not obvoius that particular code relies on them. For example, in java it is possible to move all constants into separate interface and separate particular constants into inner sub-interfaces. It's quite convenient and obvious. Also it's easy to find the constant usage just by using IDE facilities ("find usage" functionality) and change or refactor them.
Here's an example:
public interface IConstants {
public interface URL {
String ALL = "/**";
}
public interface REST_URL {
String DEBUG = "/debug";
String SYSTEM = "/system";
String GENERATE = "/generate";
}
}
Referencing is quite human readable: IConstants.REST_URL.SYSTEM
Most non-trivial enterprise-y projects will have some central concept of properties or configuration options, which already takes care of loading up option from a file/database. In these cases, it's usually simple (as in, less than 5 minutes' work) to extend this to support the new propert(ies) you need.
If your project doesn't have one, then either:
It could benefit from one - in which case write it, taking values from a flat .properties file to start with. This shouldn't take more than an hour, tops, and is reusable for any config stuff in future
Even that hour would be a waste - in which case, you can still hav a default value but allow this to be overridden by a system property. This require no infrastructure work and minimal time to implement in your class.
There's really no excuse for hardcoding values - it only saves you a few minutes at most, and if your project deadline is measured in minutes then you've got bigger problems than how to code for configurability.
Admittedly, I hardcode a lot of stuff at my current hobby project. Configuration files are ridiculously easy to use instead (at least with Python, which comes with a great and simple .cfg parser), I just don't bother to use them because I am 99% confident that I will never have to change them - and even if that assumption proved false, it's small enough to refactor it with reasonable effort. For annything larger/more important, however, I would never type if foo == "hardcoded bar", but rather if foo == cfg.bar (likely with a more meaningful name for cfg). Cfg is a global singleton (yeah, I know...) which is fed the .cfg file at startup, and next time some sentinel value changes, you change the configuration file and not the source.
With a dynamic/reflective language, you don't even need to change the part loading the .cfg when you add another value to it - make it populate the cfg object dynamically with all entries in the file (or use a hashmap, for that matter) and be done.
2 suggestions here:
First, if you are working on embedded system using language like C. Simply work out a coding convention to use a #define for any string or constant. All the #define should be categorized in a .h file. That should be enough - not too complex but enough for maintainability. You don't need to mangle all the constant between the code line.
Second, if you are working on a application with access to DB. It is simple just to keep all the constant values in the database. You just need a very simple interface API to do the retrieval.
With simple tricks, both methods can be extended to support multi-language feature.