How to use JRuby's org.jruby.lexer.yacc.RubyYaccLexer - jruby

I'm using ripper to doing ruby-code lexing in mri-1.9.*, I would like to do the same thing in JRuby, I noticed there is this org.jruby.lexer.yacc.RubyYaccLexer used in org.jruby.parser.DefaultRubyParser, I'm thinking that I can use it to do what ripper in mri-1.9.* does, though definitely at a lower level as compared to ripper. Being a noob in java, I couldn't figure out how to use it from within jruby. I'm not sure if it is doable at all, hope to get some advice on this.

Take a look at this post from JRuby committer Ola Bini. In it he shows some brief usage of JRuby's AST. You can use the code from JRuby to create an AST and navigate it in memory, manipulate it, and turn it back into executable code.
require 'jruby'
JRuby.ast_for "puts 'hello'"
# => RootNode
# NewlineNode
# FCallOneArgNode |puts|
# ArrayNode
# StrNode =="hello"
It doesn't give you the same event-like approach like Ripper does, but by traversing the AST you can get similar information.

Related

The use of packages to parse command arguments employing options/switches?

I have a couple questions about adding options/switches (with and without parameters) to procedures/commands. I see that tcllib has cmdline and Ashok Nadkarni's book on Tcl recommends the parse_args package and states that using Tcl to handle the arguments is much slower than this package using C. The Nov. 2016 paper on parse_args states that Tcl script methods are or can be 50 times slower.
Are Tcl methods really signicantly slower? Is there some minimum threshold number of options to be reached before using a package?
Is there any reason to use parse_args (not in tcllib) over cmdline (in tcllib)?
Can both be easily included in a starkit?
Is this included in 8.7a now? (I'd like to use 8.7a but I'm using Manjaro Linux and am afraid that adding it outside the package manager will cause issues that I won't know how to resolve or even just "undo").
Thank you for considering my questions.
Are Tcl methods really signicantly slower? Is there some minimum threshold number of options to be reached before using a package?
Potentially. Procedures have overhead to do with managing the stack frame and so on, and code implemented in C can avoid a number of overheads due to the way values are managed in current Tcl implementations. The difference is much more profound for numeric code than for string-based code, as the cost of boxing and unboxing numeric values is quite significant (strings are always boxed in all languages).
As for which is the one to use, it really depends on the details as you are trading off flexibility for speed. I've never known it be a problem for command line parsing.
(If you ask me, fifty options isn't really that many, except that it's quite a lot to pass on an actual command line. It might be easier to design a configuration file format — perhaps a simple Tcl script! — and then to just pass the name of that in as the actual argument.)
Is there any reason to use parse_args (not in tcllib) over cmdline (in tcllib)?
Performance? Details of how you describe things to the parser?
Can both be easily included in a starkit?
As long as any C code is built with Tcl stubs enabled (typically not much more than define USE_TCL_STUBS and link against the stub library) then it can go in a starkit as a loadable library. Using the stubbed build means that the compiled code doesn't assume exactly which version of the Tcl library is present or what its path is; those are assumptions that are usually wrong with a starkit.
Tcl-implemented packages can always go in a starkit. Hybrid packages need a little care for their C parts, but are otherwise pretty easy.
Many packages either always build in stubbed mode or have a build configuration option to do so.
Is this included in 8.7a now? (I'd like to use 8.7a but I'm using Manjaro Linux and am afraid that adding it outside the package manager will cause issues that I won't know how to resolve or even just "undo").
We think we're about a month from the feature freeze for 8.7, and builds seem stable in automated testing so the beta phase will probably be fairly short. The list of what's in can be found here (filter for 8.7 and Final). However, bear in mind that we tend to feel that if code can be done in an extension then there's usually no desperate need for it to be in Tcl itself.

Alternatives to CGI.pm for header() and param()?

I've been an avid user of CGI.pm since the previous millennium so I was a bit surprised when it disappeared from my old Ubuntu server when I upgraded it recently. My short-term fix was sudo cpan install CGI, but a quick web search to find out why it was missing in the first place revealed CGI::Alternatives which explains why it has gone and offers some suggestions for alternatives. For my purposes, HTML::Tiny looks best for replacing my programmatic HTML generation, but Alternatives is strangely silent on the subject of HTTP headers and CGI parameters.
I broadened by search and found lighter alternatives to CGI.pm on perlmonks where one response suggests CGI::Simple, but the recommendation is less than whole-hearted - "its not quite as up to date as CGI.pm".
So is CGI::Simple the way to go, or is there a better alternative?
Please don't spend time suggesting "rewrite everything using framework XXX". I really don't have the time or energy for that. I'm happy to replace all my HTML generation with HTML::Tiny, so I'm looking for something with a similar (or lower!) amount of rework to replace header() and param().
You're missing the point if you're looking for an alternative that provides header and param.
The argument for the removal of CGI.pm from core (but not from CPAN) is that you shouldn't have to deal with CGI yourself; you should be using a framework that handles this for you.
If you don't agree with this — if you're looking for an equivalent to header and param — go ahead and keep using CGI.pm.
If you do agree, CGI::Simple is no better than CGI.pm.
As others have said, there's no reason not to use CGI together with HTML::Tiny. So that's the answer to your question. For the last five years that I was using CGI, my programs all started something like:
use CGI qw[param header];
which is the approach you're talking about here.
If you wait a year or two, the plan is for the HTML generation functions to be removed from the main module, so your problems will all go away at that point.
But that's not what I'd do in your situation. I'd switch to using PSGI and Plack. You said that you don't want anyone to suggest a new framework, so I'm not going to do that. Plack isn't a framework, it's a toolbox for writing PSGI applications. Certainly, I'd use a framework like Dancer, but you don't have to. You can happily use Plack without any of the frameworks built on top of it.
You'll still get most of the advantages of PSGI. You'll be able to deploy your applications in any way you like. You'll have access to all the awesome Plack middleware. Testing your program will be far easier.
When you're using "raw" Plack, the equivalent of CGI::param is Plack::Request::parameters and the equivalent of CGI::header is Plack::Response::headers.
So there are three answers to your question.
Carry on using CGI.pm. Just stop using the HTML generation functions and replace them with HTML::Tiny
Use raw PSGI/Plack and bring your web development into the 21st century
Use one of Perl's many great web frameworks.
Unfortunately, you don't seem to like any of those answers.
The issue with CGI.pm is not that it's going away, merely that it will no longer be distributed as part of the core Perl distribution. However that doesn't mean you have to install from CPAN. On your Ubuntu system you can just do:
sudo apt-get install libcgi-pm-perl
and you'll be off and running with the same old CGI you know and love :-)
The correct answer to my question is that use CGI::Simple is better than use CGI qw(header param) because it loads faster.
Answers along the lines of "Use Plack, it's the future of Perl for websites" weren't helpful to me because I didn't have time to learn a new programming paradigm or to discover how to reconfigure my web server to make it work, no matter how insistent the Plack Evangelists were that I was wrong in what I was trying to do.
I've now had a bit of time to wade through the links to documentation and presentation slides I was offered and I can see what they were getting at, but one failing in what I've read so far is the lack of a concise end-to-end working example to help get my head around things ... so here's what I knocked together to get me started (and, no, I haven't finished yet!). I hope that others beginning the journey from CGI to PSGI will find this useful to help get them underway...
First you need to install Plack. I'm running an Ubuntu 14.04 installation so it was simply a matter of running sudo apt-get install libplack-perl. The generic way is to install Task::Plack from CPAN.
Next you need to know where your cgi-bin directory is located. You ought to know already if you're a CGI die-hard! Since I'm running Apache mine is defined in /etc/apache2/conf-available/serve-cgi-bin.conf by ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/.
Now for the magic. We're going to create a CGI script that runs a PSGI app, handing it data from the CGI environment. This is good for experimentation and testing but NOT for deployment, as you don't get any of the speed benefit that PSGI can give you (for that you need something like Plack::Handler::Apache2, Plack::Handler::FCGI or mod_psgi in Apache, or a dedicated PSGI server such as Starman or Starlet, or one of the other handlers mentioned on PlackPerl.org). Create /usr/lib/cgi-bin/psgi-cgi.pl with the following contents and make it executable:-
#!/usr/bin/perl
use Plack::Util;
use Plack::Handler::CGI;
my $app = Plack::Util::load_psgi($ENV{PATH_TRANSLATED});
Plack::Handler::CGI->new->run($app);
Next we need to tell Apache to pass PSGI app files to this handler. I did this by creating /etc/apache2/conf-available/psgi-cgi.conf containing:-
Action psgi-cgi /cgi-bin/psgi-cgi.pl
AddHandler psgi-cgi .psgi
then loaded it into my Apache server by running sudo a2enconf psgi-cgi and sudo service apache2 reload. Basically you need to get these lines into your httpd.conf file and restart the server.
Finally, my first PSGI script, which I created in my server's DocumentRoot as /var/www/html/hello.psgi:-
use Plack::Request;
my $app = sub {
my $env = shift;
my $req = Plack::Request->new($env);
my $par = $req->parameters;
return [
200,
[ 'Content-Type', 'text/plain' ],
[ "Hello world!\n",
map("$_ = ".join(", ", $par->get_all($_))."\n", sort keys %$par),
]
];
};
The application is a coderef which returns a 3-element arrayref: the first is the HTTP status code, the second is the name,value pairs for the HTTP header, the third is the body of the response (which could be generated using HTML::Tiny for a web page). The first 2 elements answer the question of what you need instead of the CGI::header function - nothing! (though for more complex handling you'll need Plack::Response::headers). The example also shows how to replace CGI::param - use Plack::Request::parameters, which returns a Hash::MultiValue object containing the values of URL (GET) and BODY (POST) parameters, including the ones with multiple values.
Finally, a test:-
$ wget -q -O- 'http://localhost/hello.psgi?a=1&a=2&a=3&b=1&b=4'
Hello world!
a = 1, 2, 3
b = 1, 4
I hope this is useful to other CGI die-hards in taking their first steps towards PSGI proficiency, and I hope the Plack Evangelists will acknowledge that it takes a lot of reading and comprehension to get even this far.
CGI::Minimal would a good option, it is much lighter than CGI & CGI::Simple, but it lacks advanced methods like CGI & CGI::Simple

Something like Typesafe Config for NodeJS

I have a nodejs application that will take a JSON configuration file.
The JSON file will have some ${} and #{} tags that will be used to build up a dynamic context by loading a template configuration and populating the tags. HOCON may also end up being used eventually but that's not in there yet.
I came across Typesafe Config in the past and it looks amazing for this kind of thing. I did a bit of searching around npm and didn't spot anything similar in the node world but perhaps I am too unfamiliar with what terms to search for.
Does anyone know of a similar library in nodejs or a sensible strategy I may employ to do this in nodejs?
I know it wouldn't be much effort to implement something myself with string replace on the JSON or some such although I can't help but think that this has been done before in node applications and probably in a much better way than I would do it for this single use case. On that basis it seems to make sense to ask here before I continue.
A bit late, but it seems there is still no dedicated npm module to convert hocon to js. However there is a library which could be easily converted to a npm module : https://github.com/scottburch/webpack-hocon-loader

Understanding run time code interpretation and execution

I'm creating a game in XNA and was thinking of creating my own scripting language (extremely simple mind you). I know there's better ways to go about this (and that I'm reinventing the wheel), but I want the learning experience more than to be productive and fast.
When confronted with code at run time, from what I understand, the usual approach is to parse into a machine code or byte code or something else that is actually executable and then execute that, right? But, for instance, when Chrome first came out they said their JavaScript engine was fast because it compiles the JavaScript into machine code. This implies other engines weren't compiling into machine code.
I'd prefer not compiling to a lower language, so are there any known modern techniques for parsing and executing code without compiling to low level? Perhaps something like parsing the code into some sort of tree, branching through the tree, and comparing each symbol and calling some function that handles that symbol? (Wild guessing and stabbing in the dark)
I personally wouldn't roll your own parser ( turning the input into tokens ) or lexer ( checking the input tokens for your language grammar ). Take a look at ANTLR for parsing/lexing - it's a great framework and has full source code if you want to dig into the guts of it.
For executing code that you've parsed, I'd look at running a simple virtual machine or even better look at llvm which is an open-source(ish) attempt to standardise the virtual machine byte code format and provide nice features like JITing ( turning your script compiled byte code into assembly ).
I wouldn't discourage you from the more advanced options that you machine such as native machine code execution but bear in mind that this is a very specialist area and gets real complex, real fast!
Earlz pointed out that my reply might seem to imply 'don't bother doing this yourself. Re-reading my post it does sound a bit that way. The reason I mentioned ANTLR and LLVM is they both have heaps of source code and tutorials so I feel this is a good reference source. Take it as a base and play
You can try this framework for building languages (it works well with XNA):
http://www.meta-alternative.net/mbase.html
There are some tutorials:
http://www.meta-alternative.net/calc.pdf
http://www.meta-alternative.net/pfront.pdf
Python is great as a scripting language. I would recommend you make a C# binding for its C API and use that. Embedding Python is easy. Your application can define functions, types/classes and variables inside modules which the Python interpreter can access. The application can also call functions in Python scripts and get a result back. These two features combined gives you a two-way communication scheme.
Basically, you get the Python syntax and semantics for free. What you would need to implement is the API your application exposes to Python. An example could be access to game logic functions and render functions. Python scripts would then define functions which calls these, and the host application would invoke the Python functions (with parameters) to get work done.
EDIT: Seems like IronPython can save you even more work. It's a C# implementation of CPython, and has its own embedding API: http://www.ironpython.net/

Why don't I see pipe operators in most high-level languages?

In Unix shell programming the pipe operator is an extremely powerful tool. With a small set of core utilities, a systems language (like C) and a scripting language (like Python) you can construct extremely compact and powerful shell scripts, that are automatically parallelized by the operating system.
Obviously this is a very powerful programming paradigm, but I haven't seen pipes as first class abstractions in any language other than a shell script. The code needed to replicate the functionality of scripts using pipes seems to always be quite complex.
So my question is why don't I see something similar to Unix pipes in modern high-level languages like C#, Java, etc.? Are there languages (other than shell scripts) which do support first class pipes? Isn't it a convenient and safe way to express concurrent algorithms?
Just in case someone brings it up, I looked at the F# pipe-forward operator (forward pipe operator), and it looks more like a function application operator. It applies a function to data, rather than connecting two streams together, as far as I can tell, but I am open to corrections.
Postscript: While doing some research on implementing coroutines, I realize that there are certain parallels. In a blog post Martin Wolf describes a similar problem to mine but in terms of coroutines instead of pipes.
Haha! Thanks to my Google-fu, I have found an SO answer that may interest you. Basically, the answer is going against the "don't overload operators unless you really have to" argument by overloading the bitwise-OR operator to provide shell-like piping, resulting in Python code like this:
for i in xrange(2,100) | sieve(2) | sieve(3) | sieve(5) | sieve(7):
print i
What it does, conceptually, is pipe the list of numbers from 2 to 99 (xrange(2, 100)) through a sieve function that removes multiples of a given number (first 2, then 3, then 5, then 7). This is the start of a prime-number generator, though generating prime numbers this way is a rather bad idea. But we can do more:
for i in xrange(2,100) | strify() | startswith(5):
print i
This generates the range, then converts all of them from numbers to strings, and then filters out anything that doesn't start with 5.
The post shows a basic parent class that allows you to overload two methods, map and filter, to describe the behavior of your pipe. So strify() uses the map method to convert everything to a string, while sieve() uses the filter method to weed out things that aren't multiples of the number.
It's quite clever, though perhaps that means it's not very Pythonic, but it demonstrates what you are after and a technique to get it that can probably be applied easily to other languages.
You can do pipelining type parallelism quite easily in Erlang. Below is a shameless copy/paste from my blogpost of Jan 2008.
Also, Glasgow Parallel Haskell allows for parallel function composition, which amounts to the same thing, giving you implicit parallelisation.
You already think in terms of
pipelines - how about "gzcat
foo.tar.gz | tar xf -"? You may not
have known it, but the shell is
running the unzip and untar in
parallel - the stdin read in tar just
blocks until data is sent to stdout by
gzcat.
Well a lot of tasks can be expressed
in terms of pipelines, and if you can
do that then getting some level of
parallelisation is simple with David
King's helper code (even across erlang
nodes, ie. machines):
pipeline:run([pipeline:generator(BigList),
{filter,fun some_filter/1},
{map,fun_some_map/1},
{generic,fun some_complex_function/2},
fun some_more_complicated_function/1,
fun pipeline:collect/1]).
So basically what he's doing here is
making a list of the steps - each step
being implemented in a fun that
accepts as input whatever the previous
step outputs (the funs can even be
defined inline of course). Go check
out David's blog entry for the
code and more detailed explanation.
magrittr package provides something similar to F#'s pipe-forward operator in R:
rnorm(100) %>% abs %>% mean
Combined with dplyr package, it brings a neat data manipulation tool:
iris %>%
filter(Species == "virginica") %>%
select(-Species) %>%
colMeans
You can find something like pipes in C# and Java, for example, where you take a connection stream and put it inside the constructor of another connection stream.
So, you have in Java:
new BufferedReader(new InputStreamReader(System.in));
You may want to look up chaining input streams or output streams.
Thanks to all of the great answers and comments, here is a summary of what I learned:
It turns out that there is an entire paradigm related to what I am interested in called Flow-based programming. A good example of a language designed specially for flow-based programming is Hartmann pipelines. Hartamnn pipelines generalize the idea of streams and pipes used in Unix and other OS's, to allows for multiple input and output streams (rather than just a single input stream, and two output streams). Erlang contains powerful abstractions that make it easy to express concurrent processes in a manner which resembles pipes. Java provides PipedInputStream and PipedOutputStream which can be used with threads to achieve the same kind of abstractions in a more verbose manner.
I think the most fundamental reason is because C# and Java tend to be used to build more monolithic systems. Culturally, it's just not common to even want to do pipe-like things -- you just make your application implement the necessary functionality. The notion of building a multitude of simple tools and then gluing them together in arbitrary ways just isn't common in those contexts.
If you look at some of the scripting languages, like Python and Ruby, there are some pretty good tools for doing pipe-like things from within those scripts. Check out the Python subprocess module, for example, which allows you to do things like:
proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,)
stdout_value = proc.communicate('through stdin to stdout')[0]
print '\tpass through:', stdout_value
Are you looking at the F# |> operator? I think you actually want the >> operator.
Usually you just don't need it and programs run faster without it.
Basically piping is consumer/producer pattern. And it's not that hard to write those consumers and producers because they don't share much data.
Piping for Python : pypes
Mozart-OZ can do pipes using ports and threads.
Objective-C has the NSPipe class. I use it quite frequently.
I've had a lot of fun building pipeline functions in Python. I have a library I wrote, I put the contents and a sample run here. The best fit me for has been XML processing, described in this Wikipedia article.
You can do pipe like operations in Java by chaining/filtering/transforming iterators.
You can use Google's Guava Iterators.
I will say even with the very helpful guava library and static imports its still ends up being lots of Java code.
In Scala its quite easy to make your own pipe operator.
Streaming libraries based on coroutines have existed in Haskell for quite some time now. Two popular examples are conduit and pipes.
Both libraries are well-written and well-documented, and are relatively mature. The Yesod web framework is based on conduit, and it's pretty damn fast. Yesod is competitive with Node on performance, even beating it in a few places.
Interestingly, all of the these libraries are single-threaded by default. This is because the single motivating use case for pipelines is servers, which are I/O bound.
Since R added pipe operator today, it's worth to mention Julialang has pipe all a long:
help?> |>
search: |>
|>(x, f)
Applies a function to the preceding argument. This allows for easy function chaining.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> [1:5;] |> x->x.^2 |> sum |> inv
0.01818181818181818
if you're still interested in an answer...
you can look at factor, or the older joy and forth for the concatenative paradigm.
in arguments and out arguments are implicit, dumped to a stack. then the next word (function) takes that data and does something with it.
the syntax is postfix.
"123" print
where print takes one argument, whatever is in the stack.
You can use my library in python: github.com/sspipe/sspipe
In Mathematica, you can use //
for example
f[g[h[x,parm1],parm2]]
quite a mess.
could be written as
x // h[#, parm1]& // g[#, parm2]& // f
the # and & is lambda in Mathematica
In js, there seems to be pipe operator |> soon.
https://github.com/tc39/proposal-pipeline-operator