How to pickle function from imported module with dill - pickle

I'm trying to pickle functions with dill. I want to include the whole function and not just a reference to it. Here are my two files:
fun.py:
import dill
from foo import ppp
def qqq(me):
return me + 1
print(dill.dumps(ppp, protocol=4, recurse=True, byref=True))
print(dill.dumps(qqq, protocol=4, recurse=True, byref=True))
And foo.py
def qqq(me):
return me + 1
When I run fun.py I get the following output:
b'\x80\x04\x95\x0f\x00\x00\x00\x00\x00\x00\x00\x8c\x03foo\x94\x8c\x03ppp\x94\x93\x94.'
b'\x80\x04\x95\x90\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x10_create_function\x94\x93\x94(h\x00\x8c\n_load_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x01K\x02KCC\x08|\x00d\x01\x17\x00S\x00\x94NK\x01\x86\x94)\x8c\x02me\x94\x85\x94\x8c\x06fun.py\x94\x8c\x03qqq\x94K\x04C\x02\x00\x01\x94))t\x94R\x94}\x94h\rNN}\x94Nt\x94R\x94.'
I want to be able to make the first line of output be more similar to the second line, and actually encapsulate the function without the need for a context when reloaded later. Is there a way to do this?
Thanks so much!
James

If the module (foo) is installed on both computers, then there should be no need to do anything but import the function. So, I'll assume the question is in regard to a module that is only installed on the first machine.
It also depends on whether the module foo is "installed" on sys.path or is just available in the current directory. See:
https://github.com/uqfoundation/dill/issues/123
If it's only available in the current directory, either use dill to pickle the file itself or use something like dill.source.getsource to extract the source of the module as a string, and then transfer the string as a "pickle" (this is what ppft does).
Generally, however, dill pickles imported functions by reference, and thus assumes they are available on both sides of the load/dump.

Related

How to add w:altChunk and its relationship with python-docx

I have a use case that make use of <w:altChunk/> element in Word document by inject (fragment of) HTML file as alternate chunks and let Word do it works when the file gets opened. The current implementation was using XML/XSL to compose WordML XML, modify relationships, and do all packaging stuffs manually which is a real pain.
I wanted to move to python-docx but the API doesn't support this directly. Currently I found a way to add the <w:altChunk/> in the document XML. But still struggle to find a way to add relationship and related file to the package.
I think I should make a compatible part and pass it to document.part.relate_to function to do its job. But still can't figure how to:
from docx import Document
from docx.oxml import OxmlElement, qn
from docx.opc.constants import RELATIONSHIP_TYPE as RT
def add_alt_chunk(doc: Document, chunk_part):
''' TODO: figuring how to add files and relationships'''
r_id = doc.part.relate_to(chunk_part, RT.A_F_CHUNK)
alt = OxmlElement('w:altChunk')
alt.set(qn('r:id'), r_id)
doc.element.body.sectPr.addprevious(alt)
Update:
As per scanny's advice, below is my working code. Thank you very much Steve!
from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
from docx.opc.part import Part
from docx.opc.constants import RELATIONSHIP_TYPE as RT
def add_alt_chunk(doc: Document, html: str):
package = doc.part.package
partname = package.next_partname('/word/altChunk%d.html')
alt_part = Part(partname, 'text/html', html.encode(), package)
r_id = doc.part.relate_to(alt_part, RT.A_F_CHUNK)
alt_chunk = OxmlElement('w:altChunk')
alt_chunk.set(qn('r:id'), r_id)
doc.element.body.sectPr.addprevious(alt_chunk)
doc = Document()
doc.add_paragraph('Hello')
add_alt_chunk(doc, "<body><strong>I'm an altChunk</strong></body>")
doc.add_paragraph('Have a nice day!')
doc.save('test.docx')
Note: the altChunk parts only work/appear when document is open using MS Word
Well, some hints here anyway. Maybe you can post your working code at the end as a full "answer":
The alt-chunk part needs to start its life as a docx.opc.part.Part object.
The blob argument should be the bytes of the file, which is often but not always plain text. It must be bytes though, not unicode (characters), so any encoding has to happen before calling Part().
I expect you can work out the other arguments:
package is the overall OPC package, available on document.part.package.
You can use docx.opc.package.OpcPackage.next_partname() to get an available partname based on a root template like: "altChunk%s" for a name like "altChunk3". Check what partname prefix Word uses for these, possibly with unzip -l has-an-alt-chunk.docx; should be easy to spot.
The content-type is one in docx.opc.constants.CONTENT_TYPE. Check the [Content_Types].xml part in a .docx file that has an altChunk to see what they use.
Once formed, the document_part.relate_to() method will create the proper relationship. If there is more than one relationship (not common) then you need to create each one separately. There would only be one relationship from a particular part, just some parts are related to more than one other part. Check the relationships in an existing .docx to see, but pretty good guess it's only the one in this case.
So your code would look something like:
package = document.part.package
partname = package.next_partname("altChunkySomethingPrefix")
content_type = docx.opc.constants.CONTENT_TYPE.THE_RIGHT_MIME_TYPE
blob = make_the_altChunk_file_bytes()
alt_chunk_part = Part(partname, content_type, blob, package)
rId = document.part.relate_to(alt_chunk_part, RT.A_F_CHUNK)
etc.

How can I make scala's ammonite use scala.util instead of ammonite.util as default for util?

In the "official" scala REPL I can do
scala> import util.Random
scala> util.Random.nextInt
res0: Int => -306696783
but in Ammonite-REPL I get
# import util.Random
cmd3.sc:1: object Random is not a member of pack ammonite.util
import util.Random
^
Compilation Failed
So right now I have to use the scala. prefix to make it work in Ammonite:
# import scala.util.Random
# scala.util.Random.nextInt
res1: Int = 503117434
I'm kind of new to Scala so I don't get why would ammonite use a different util than the (for me) "official" util, so I would appreciate if anyone can provide a rationale for this.
And more specifically, Is there any way to make util to be scala.util instead of ammonite.util?
It's not that Ammonite is replacing a different util library for the normal Scala one, it's that the Ammonite namespace has it's own util package which has a whole bunch of methods specific to Ammonite. Perhaps it would be nicer if the developer had chosen a different name for his package, but this is not an issue specific to Ammonite. It is something you will run into all the time. When there is a clash of namespaces, your only option is to fully qualify the package name for the one you want. So what you actually did is a fine solution. You can find more about this here.
And btw, since there is no util.Random in the Ammonite package you can do this after your import--I tested and this is a cut and paste from my terminal:
# Random.nextInt
res1: Int = 1045964363
When you do actually have a collision of method names, you can find a solution here

Why does a function name have to be specified in a use statement?

In perl, sometimes it is necessary to specify the function name in the use statement.
For example:
use Data::DPath ('dpath');
will work but
use Data::DPath;
won't.
Other modules don't need the function names specified, for example:
use WWW::Mechanize;
Why?
Each module chooses what functions it exports by default. Some choose to export no functions by default at all, you have to ask for them. There's a few good reasons to do this, and one bad one.
If you're a class like WWW::Mechanize, then you don't need to export any functions. Everything is a class or object method. my $mech = WWW::Mechanize->new.
If you're a pragma like strict then there are no functions nor methods, it does its work simply by being loaded.
Some modules export waaay too many functions by default. An example is Test::Deep which exports...
all any array array_each arrayelementsonly arraylength arraylengthonly bag blessed bool cmp_bag cmp_deeply cmp_methods cmp_set code eq_deeply hash
hash_each hashkeys hashkeysonly ignore Isa isa listmethods methods noclass
none noneof num obj_isa re reftype regexpmatches regexponly regexpref
regexprefonly scalarrefonly scalref set shallow str subbagof subhashof
subsetof superbagof superhashof supersetof useclass
The problem comes when another module tries to export the same functions, or if you write a function with the same name. Then they clash and you get mysterious warnings.
$ cat ~/tmp/test.plx
use Test::Deep;
use List::Util qw(all);
$ perl -w ~/tmp/test.plx
Subroutine main::all redefined at /Users/schwern/perl5/perlbrew/perls/perl-5.20.2/lib/5.20.2/Exporter.pm line 66.
at /Users/schwern/tmp/test.plx line 2.
Prototype mismatch: sub main::all: none vs (&#) at /Users/schwern/perl5/perlbrew/perls/perl-5.20.2/lib/5.20.2/Exporter.pm line 66.
at /Users/schwern/tmp/test.plx line 2.
For this reason, exporting lots of functions is discouraged. For example, the Exporter documentation advises...
Do not export method names!
Do not export anything else by default without a good reason!
Exports pollute the namespace of the module user. If you must export try to use #EXPORT_OK in preference to #EXPORT and avoid short or common symbol names to reduce the risk of name clashes.
Unfortunately, some modules take this too far. Data::DPath is a good example. It has a really clear main function, dpath(), which it should export by default. Otherwise it's basically useless.
You can always turn off exporting with use Some::Module ();.
The reason is that some modules simply contain functions in them and they may or may not have chosen to export them by default, and that means they may need to be explicitly imported by the script to access directly or use a fully qualified name to access them. For example:
# in some script
use SomeModule;
# ...
SomeModule::some_function(...);
or
use SomeModule ('some_function');
# ...
some_function(...);
This can be the case if the module was not intended to be used in an object-oriented way, i.e. where no classes have been defined and lines such as my $obj = SomeModule->new() wouldn't work.
If the module has defined content in the EXPORT_OK array, it means that the client code will only get access to it if it "asks for it", rather than "automatically" when it's actually present in the EXPORT array.
Some modules automatically export their content by means of the #EXPORT array. This question and the Exporter docs have more detail on this.
Without you actually posting an MCVE, it's difficult to know what you've done in your Funcs.pm module that may be allowing you to import everything without using EXPORT and EXPORT_OK arrays. Perhaps you did not include the package Funcs; line in your module, as #JonathanLeffler suggested in the comments. Perhaps you did something else. Perl is one of those languages where people pride themselves in the TMTOWTDI mantra, often to a detrimental/counter-productive level, IMHO.
The 2nd example you presented is very different and fairly straightforward. When you have something like:
use WWW::Mechanize;
my $mech = new WWW::Mechanize;
$mech->get("http://www.google.com");
you're simply instantiating an object of type WWW::Mechanize and calling an instance method, called get, on it. There's no need to import an object's methods because the methods are part of the object itself. Modules looking to have an OOP approach are not meant to export anything. They're different situations.

how to resolve "conflicts with" errors in d?

I'm trying to compile some D. The code that I've written uses the std.string library as well as std.algorithm. One of my functions calls indexOf on a string: unfortunately, apparently there's also a indexOf function in std.algorithm, and the compiler doesn't like it:
assembler.d(81): Error: std.algorithm.indexOf!("a == b", string, immutable(char)).indexOf at /usr/share/dmd/src/phobos/std/algorithm.d(4431) conflicts with std.string.indexOf!(char).indexOf at /usr/share/dmd/src/phobos/std/string.d(334)
assembler.d(81): Deprecation: function std.algorithm.indexOf!("a == b", string, immutable(char)).indexOf is deprecated
How do I get around this? In C++ I could use the :: to explicitly say what namespace I'm in... what about D?
If you want to call std.string.indexOf explicitly, then do std.string.indexOf(str, c) instead of indexOf(str, c) or str.indexOf(c).
Or you can use an alias:
alias std.string.indexOf indexOf;
If you put that inside the function where you're calling indexOf, then it should then consider indexOf to be std.string.indexOf for the rest of the function. Or if you put it at the module level, then it'll affect the whole module.
However, due to a bug, UFCS (Universal Function Call Syntax) doesn't currently work with local aliases, so if you put the alias within the function, you'll have to do indexOf(str, c) instead of str.indexOf(c).
A third option is to use a selective import:
import std.string : indexOf;
With that import, only indexOf is imported from std.string, and when you use indexOf, it'll use the string version (even if you've also import std.algorithm). And you can even import std.string regularly in addition to the selective import to get the rest of std.string, and the selective import will still fix the conflict (in which case, it's really not that different from importing std.string and then aliases indexOf). However, due to a bug, selective imports are always treated as public, so doing a selective import of indexOf in a module will affect every module that imports it (potentially causing new conflicts), so you may want to avoid it at this point.

An exported aliases symbol doesn't exist in PDB file (RegisterClipboardFormat has RegisterWindowMessage internal name)

I'm trying to set a breakpoint in user32!RegisterClipboardFormat
Evidently, this function is exported (link /dump /exports - it is right there). Before downloading the PDB file from the Microsoft symbol server, I'm able to find this function:
0:001> lm m user32
start end
76eb0000 76fcf000 USER32 (export symbols) c:\Windows\system32\USER32.dll
0:001> x user32!RegisterClipboardFormat*
76ec4eae USER32!RegisterClipboardFormatA (<no parameter info>)
76ec6ffa USER32!RegisterClipboardFormatW (<no parameter info>)
No problems. I'm able to 'bu' any of these functions. But when I download the PDB symbols from the Microsoft PDB server:
0:001>
start end module name
76d50000 76e6f000 USER32 (pdb symbols) c:\symbols\user32.pdb\561A146545614951BDB6282F2E3522F72\user32.pdb
0:000> x user32!RegisterClipboardFormat
WinDBG cannot find the symbols. However, it can find RegisterWindowMesssage:
0:000> x user32!RegisterWindowMessage*
76d64eae USER32!RegisterWindowMessageA = <no type information>
76d66ffa USER32!RegisterWindowMessageW = <no type information>
Note that the functions have the same addresses (This is on Windows 8. Not sure about previous versions). This is probably achieved by the optimizer or in the DEF file (func1=func2 in the EXPORT section). 'link /dump /exports' shows RegisterWindowMessage and RegisterClipboardFormat have the same RVA.
Problem is that I spent way too much time on this. So my questions are:
Is there is an easy way, from within WinDBG to find out missing aliased export symbols.
Say I want to break only on RegisterClipboardFormatW. If I recall correctly, there should be a JMP instruction somewhere (in the calling module import table). How do I find that symbol? Is there a way to find this entry in all calling modules?
Since RegisterWindowMessage and RegisterClipboardFormat have the same RVA, they share the same implementation. Apparently Windows does not make any distinction between the two and both clipboard format and window messages share the same domain of identifiers.
For your first question -- how to find out which implementation function corresponds to exported function. (assuming you have symbols fixed up) First figure out RVA of the export:
C:\>link /dump /exports C:\Windows\Syswow64\user32.dll |findstr RegisterClipboardFormat
2104 24F 00020AFA RegisterClipboardFormatA
2105 250 00019EBD RegisterClipboardFormatW
Then in WinDbg find starting address where DLL is loaded from. Commands lm or lml list all modules, you just need to find the module you are after:
0:001> lml
start end module name
75460000 75560000 USER32
Using RVA as offset to the starting address, get symbol that corresponds to it:
0:002> ln 75460000+00020AFA
(75480afa) USER32!RegisterWindowMessageA | (75480b4a) USER32!MsgWaitForMultipleObjects
Exact matches:
0:002> ln 75460000+00019EBD
(75479ebd) USER32!RegisterWindowMessageW | (75479eea) USER32!NtUserGetProcessWindowStation
Exact matches:
So here we actually found out that RegisterClipboardFormat actually calls into RegisterWindowMessage.
Your second question -- how to put breakpoint only on RegisterClipboardFormat, and not on RegisterWindowMessage. In general it is impossible, because they share the same implementation. For example, your app might call GetProcAddress("RegisterClipboardFormat") and you will have hard time figuring out if it called to one function or another. However if you know that the call was made through imported function, then you can do this. All imported functions are declared in import address table in your application. If you put an access breakpoint on the entry in import address table, you can break before the call is made. This might be compiler specific, but I know that Visual C++ assigns symbolic names to entries in import address table. In this case putting breakpoint is easy:
ba r4 MyModule!_imp_RegisterClipboardFormatA