How to get voidptr out of capsule using python cffi? - python-c-extension

Is there any way to use cffi to extract the contents of a capsule and convert it into a voidptr which I can send into C code?
Background info -- numpy arrays can give you a capsule containing a very handy struct, namely the PyArrayInterface. I don't think capsules exist for PyPy yet, so the answer is probably no, but I believe that the future contains capsules for all python versions, so I'm hoping the answer is yes :).

I don't think so. Capsules are a way for some CPython C extension modules to pass around pointers; typically, between two different C extension modules. If you replace one of these modules with a CFFI version, you loose: there is no official way to get the "void *" value from Python, with or without CFFI. It looks like it would be a valid enhancement. Feel free to open a feature request here:
https://bitbucket.org/cffi/cffi/issues?status=new&status=open

Related

What's the difference using extension type (cdef) and set up pure python code?

Note that I'm new to the C language. According to Basic Tutorial of Cython, I believe there are two ways of using Cython: Building the extension of pure Python code, and using Ctype variable (cdef).
What I don't understand is the difference between them. Which one of them is the more efficient or proper way to using Cython?
It's mostly historical.
Originally Cython only supported cdef declarations.
Pure Python mode was added as a way of adding declarations to a file to help speed it up while not requiring Cython.
Python added type annotations. Cython can increasingly use these (with the annotation_typing directive, which defaults to true). If you like the syntax of these better than cdef then use them. Or not.
The cdef version is slightly better tested and there's still gaps in what you can do in "pure Python" mode. Especially with respect to interfacing with native C/C++. But mostly they are different ways to achieve the same thing and they should generate largely the same code, so you should use whichever you prefer. You can also use a mixture.
Most Python codes can be directly "cythonized", with no change to your code. Nevertheless, to get the best of Cython, you need to adapt your Python code by providing the cdef and the type of your variables. Not mandatory, but essential to get the decent speed up that you expect from Cython.

Parsing HTML with OCaml

I'm looking for a library to parse HTML files in OCaml.
Basically the equivalent of Jsoup/Beautiful Soup.
The main requirement is being able to query the DOM with CSS selectors.
Something in the form of
page.fetch("http://www.url.com")
page.find("#tag")
I had a need for something like this recently, so after seeing this question and reading the recommendations in the comments, I wrote a library "Lambda Soup" over the weekend for fun.
You will want to use a library like ocurl or Cohttp to retrieve the actual HTML. After you have it, you can do
html |> parse $ "#tag"
to do what is asked in the question. For other possibilities and the full signature, see the documentation. You may want to look at the documentation postprocessor or tests for a fairly thorough demonstration of usage and capabilities, including CSS support and extensions.
Per comments, Lambda Soup uses Ocamlnet's HTML parser. Lambda Soup uses Markup.ml. Otherwise, it has no dependencies, except OUnit if you wish to run the tests. I'm happy for any feedback, including about modifying the interface (it is at an early stage) or discussions of adding an HTTP downloader to the library (which seems iffy because it greatly alters the scope of the library as it now is, but I am happy to hear arguments).
The license is BSD.

Converting JSON string to an array in ColdFusion MX7

I have a cookie value like:
"[{"index":"1","name":"TimePeriod","hidden":false},{"index":"2","name":"Enquiries","hidden":false},{"index":"3","name":"Online","hidden":false}]"
I would like to use this cookie value as an array in ColdFusion. What would be the best possible way to do this?
The normal answer would be use the built-in deserializeJson function, but since that function wasn't available in CFMX7 (it arrived in CF8), you will need to use a UDF to achieve the same thing.
There are two sites which contain resources of this type, cflib.org and riaforge.org, each of which have a different potential solution for MX7.
Searching CFlib provides JsonDecode. (CFLib has a specific filter for "Maximum Required CF Version", so you can ensure any results that appear will work for your version.)
Searching riaforge provides JSONUtil, which runs on MX7 (but also claims better type mapping than the newer built-in functions).
Since MX7 runs on Java, you can likely also make use of any of the numerous Java libraries listed on json.org, using createObject/java.
JSON serialization was added natively in CF8.
If you are on MX7 look on riaforge.org for a library that will deSerialize JSON for you.

Create object hierarchy from Make output?

make -d and make -p provide useful information, but I need this in JSON format, so I can enumerate what libraries came from which source files, recursively. Is there a way to do this already (approximately close, anyhow)? Or is there a custom tool available? I've scoured the Intarwebs, and my search has come up dry. Thank you for any help!
Note: I'm looking for something that's similar to sysconfig.parse_makefile. In fact, what that does is pretty close to what I'm looking for, except that it's only useful for the implicit Makefile that is used to build Python. Any pointers?
It's not JSON, but the Perl CPAN module Makefile::GraphViz creates visualizations of the dependency graph from a makefile. If JSON is really what you want, you could probably capture the 'dot' dependency file that is generated and convert it to JSON fairly easily.

Why don't I see pipe operators in most high-level languages?

In Unix shell programming the pipe operator is an extremely powerful tool. With a small set of core utilities, a systems language (like C) and a scripting language (like Python) you can construct extremely compact and powerful shell scripts, that are automatically parallelized by the operating system.
Obviously this is a very powerful programming paradigm, but I haven't seen pipes as first class abstractions in any language other than a shell script. The code needed to replicate the functionality of scripts using pipes seems to always be quite complex.
So my question is why don't I see something similar to Unix pipes in modern high-level languages like C#, Java, etc.? Are there languages (other than shell scripts) which do support first class pipes? Isn't it a convenient and safe way to express concurrent algorithms?
Just in case someone brings it up, I looked at the F# pipe-forward operator (forward pipe operator), and it looks more like a function application operator. It applies a function to data, rather than connecting two streams together, as far as I can tell, but I am open to corrections.
Postscript: While doing some research on implementing coroutines, I realize that there are certain parallels. In a blog post Martin Wolf describes a similar problem to mine but in terms of coroutines instead of pipes.
Haha! Thanks to my Google-fu, I have found an SO answer that may interest you. Basically, the answer is going against the "don't overload operators unless you really have to" argument by overloading the bitwise-OR operator to provide shell-like piping, resulting in Python code like this:
for i in xrange(2,100) | sieve(2) | sieve(3) | sieve(5) | sieve(7):
print i
What it does, conceptually, is pipe the list of numbers from 2 to 99 (xrange(2, 100)) through a sieve function that removes multiples of a given number (first 2, then 3, then 5, then 7). This is the start of a prime-number generator, though generating prime numbers this way is a rather bad idea. But we can do more:
for i in xrange(2,100) | strify() | startswith(5):
print i
This generates the range, then converts all of them from numbers to strings, and then filters out anything that doesn't start with 5.
The post shows a basic parent class that allows you to overload two methods, map and filter, to describe the behavior of your pipe. So strify() uses the map method to convert everything to a string, while sieve() uses the filter method to weed out things that aren't multiples of the number.
It's quite clever, though perhaps that means it's not very Pythonic, but it demonstrates what you are after and a technique to get it that can probably be applied easily to other languages.
You can do pipelining type parallelism quite easily in Erlang. Below is a shameless copy/paste from my blogpost of Jan 2008.
Also, Glasgow Parallel Haskell allows for parallel function composition, which amounts to the same thing, giving you implicit parallelisation.
You already think in terms of
pipelines - how about "gzcat
foo.tar.gz | tar xf -"? You may not
have known it, but the shell is
running the unzip and untar in
parallel - the stdin read in tar just
blocks until data is sent to stdout by
gzcat.
Well a lot of tasks can be expressed
in terms of pipelines, and if you can
do that then getting some level of
parallelisation is simple with David
King's helper code (even across erlang
nodes, ie. machines):
pipeline:run([pipeline:generator(BigList),
{filter,fun some_filter/1},
{map,fun_some_map/1},
{generic,fun some_complex_function/2},
fun some_more_complicated_function/1,
fun pipeline:collect/1]).
So basically what he's doing here is
making a list of the steps - each step
being implemented in a fun that
accepts as input whatever the previous
step outputs (the funs can even be
defined inline of course). Go check
out David's blog entry for the
code and more detailed explanation.
magrittr package provides something similar to F#'s pipe-forward operator in R:
rnorm(100) %>% abs %>% mean
Combined with dplyr package, it brings a neat data manipulation tool:
iris %>%
filter(Species == "virginica") %>%
select(-Species) %>%
colMeans
You can find something like pipes in C# and Java, for example, where you take a connection stream and put it inside the constructor of another connection stream.
So, you have in Java:
new BufferedReader(new InputStreamReader(System.in));
You may want to look up chaining input streams or output streams.
Thanks to all of the great answers and comments, here is a summary of what I learned:
It turns out that there is an entire paradigm related to what I am interested in called Flow-based programming. A good example of a language designed specially for flow-based programming is Hartmann pipelines. Hartamnn pipelines generalize the idea of streams and pipes used in Unix and other OS's, to allows for multiple input and output streams (rather than just a single input stream, and two output streams). Erlang contains powerful abstractions that make it easy to express concurrent processes in a manner which resembles pipes. Java provides PipedInputStream and PipedOutputStream which can be used with threads to achieve the same kind of abstractions in a more verbose manner.
I think the most fundamental reason is because C# and Java tend to be used to build more monolithic systems. Culturally, it's just not common to even want to do pipe-like things -- you just make your application implement the necessary functionality. The notion of building a multitude of simple tools and then gluing them together in arbitrary ways just isn't common in those contexts.
If you look at some of the scripting languages, like Python and Ruby, there are some pretty good tools for doing pipe-like things from within those scripts. Check out the Python subprocess module, for example, which allows you to do things like:
proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,)
stdout_value = proc.communicate('through stdin to stdout')[0]
print '\tpass through:', stdout_value
Are you looking at the F# |> operator? I think you actually want the >> operator.
Usually you just don't need it and programs run faster without it.
Basically piping is consumer/producer pattern. And it's not that hard to write those consumers and producers because they don't share much data.
Piping for Python : pypes
Mozart-OZ can do pipes using ports and threads.
Objective-C has the NSPipe class. I use it quite frequently.
I've had a lot of fun building pipeline functions in Python. I have a library I wrote, I put the contents and a sample run here. The best fit me for has been XML processing, described in this Wikipedia article.
You can do pipe like operations in Java by chaining/filtering/transforming iterators.
You can use Google's Guava Iterators.
I will say even with the very helpful guava library and static imports its still ends up being lots of Java code.
In Scala its quite easy to make your own pipe operator.
Streaming libraries based on coroutines have existed in Haskell for quite some time now. Two popular examples are conduit and pipes.
Both libraries are well-written and well-documented, and are relatively mature. The Yesod web framework is based on conduit, and it's pretty damn fast. Yesod is competitive with Node on performance, even beating it in a few places.
Interestingly, all of the these libraries are single-threaded by default. This is because the single motivating use case for pipelines is servers, which are I/O bound.
Since R added pipe operator today, it's worth to mention Julialang has pipe all a long:
help?> |>
search: |>
|>(x, f)
Applies a function to the preceding argument. This allows for easy function chaining.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> [1:5;] |> x->x.^2 |> sum |> inv
0.01818181818181818
if you're still interested in an answer...
you can look at factor, or the older joy and forth for the concatenative paradigm.
in arguments and out arguments are implicit, dumped to a stack. then the next word (function) takes that data and does something with it.
the syntax is postfix.
"123" print
where print takes one argument, whatever is in the stack.
You can use my library in python: github.com/sspipe/sspipe
In Mathematica, you can use //
for example
f[g[h[x,parm1],parm2]]
quite a mess.
could be written as
x // h[#, parm1]& // g[#, parm2]& // f
the # and & is lambda in Mathematica
In js, there seems to be pipe operator |> soon.
https://github.com/tc39/proposal-pipeline-operator