Using Tidy in JRuby - jruby

After spending some hours with the Ruby Debugger I finally learned that I need to clean up some malformed HTML pages before I can feed those to Hpricot. The best solution I found so far is the Tidy Ruby interface.
Tidy works great from the command line and also the Ruby interface works. However, it requires dl/import, which fails to load in JRuby:
$ jirb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'tidy'
LoadError: no such file to load -- dl/import
Is this library available for JRuby? A web search revealed that it wasn't available last year.
Alternatively, can someone suggest other ways to clean up malformed HTML in JRuby?
Update
Following Markus' suggestion I now use Tidy via popen instead of libtidy. I posted the code which pipes the document data through tidy for future reference. Hopefully, this is robust and portable.
def clean(data)
cleaned = nil
tidy = IO.popen('tidy -f "log/tidy.log" --force-output yes -wrap 0 -utf8', 'w+')
begin
tidy.write(data)
tidy.close_write
cleaned = tidy.read
tidy.close_read
rescue Errno::EPIPE
$stderr.print "Running 'tidy' failed: " + $!
tidy.close
end
return cleaned if cleaned and cleaned != ""
return data
end

You could use it from the command line from within JRuby with %x{...} or backticks. You may also want to consider popen (and pipe things through it).
Not elegant perhaps, but more likely to get you going with minimal hassle than trying to mess with unsupported libraries.

Related

Remove debugger keyword during compilation in google closure

UPDATE:
The JS version of closure-compiler is no longer supported or maintained.
https://github.com/google/closure-compiler-npm/blob/master/packages/google-closure-compiler-js/readme.md
Im trying to find if there is a way to remove the "debugger" keyword during compilation process, im using the javascript version google-closure-compiler with gulp.
Looking through the documentation it is clear we can set the flag to stop/show error messages during compilation by doing the following.
https://github.com/google/closure-compiler/wiki/Flags-and-Options
--jscomp_off
translating this to gulp, it is:
const googleClosureOptions = {
...
jscomp_error:"checkDebuggerStatement"
}
however this works on stopping the compilation by throwing error or to show a warning.
zyxcdafg.js:1444: ERROR - [JSC_DEBUGGER_STATEMENT_PRESENT] Using the debugger statement can halt your application if the user has a JavaScript debugger running.
debugger;
^^^^^^^^^
but what I am trying to achieve is to remove the debugger keyword. Is this possible to achieve using googleclosure. I can not find any flags or options relating to this.
UPDATE:
The JS version of closure-compiler is no longer supported or maintained.
https://github.com/google/closure-compiler-npm/blob/master/packages/google-closure-compiler-js/readme.md
No I don't think so. I'd suggest you use something else to do it. Like sed:
find dist -name "*.js" -exec sed -i 's/\sdebugger;//' {} +
Something like that will find files in your dist folder that end with .js and then exec-ute sed to replace all instances of debugger; with nothing.
You could add that to a script that calls your Closure Compiler build.
The compiler doesn't have a command-line api for defining custom code removal passes, but the compiler's architecture does allow for registering custom passes and a pass to remove a debugger statement should be trivial:
if (n.isDebugger()) {
compiler.reportChangeToEnclosingScope(n);
n.detach();
}
The general structure would follow:
https://github.com/google/closure-compiler/blob/master/src/com/google/javascript/jscomp/CheckDebuggerStatement.java

How to pass options to UglifyJS through html-minifier on Windows command line?

HTMLMinifier (html-minifier) (3.5.14) for Node.js (v8.11.1), installed with npm install html-minifier -g, can be run via command line (Windows CMD), e.g. html-minifier --help produces the usage info (excerpts):
Usage: html-minifier [options] [files...]
Options:
-V, --version output the version number
...
--minify-js [value] Minify Javascript in script elements and on* attributes (uses uglify-js)
...
-c --config-file <file> Use config file
--input-dir <dir> Specify an input directory
--output-dir <dir> Specify an output directory
--file-ext <text> Specify an extension to be read, ex: html
-h, --help output usage information
The option --minify-js [value] relies on UglifyJS to "compress" the JavaScript embedded inside the HTML file(s) passed to html-minifier. UglifyJS can remove console.log() function calls (Can uglify-js remove the console.log statements?) from the JavaScript, by enabling the drop_console option (also see pure_funcs).
But --minify-js drop_console=true does not have an effect, nor does something like "uglify:{options:{compress:{drop_console:true}}}" or "compress:{pure_funcs:['console.log']}".
How can such an option be set, ideally via the html-minifier command line (alternatively by config-file, though it just sets "minifyJS": true)?
I was very close.
I started digging through the code (installed in %appdata%\npm\node_modules\html-minifier) to see what happens with the options provided, i.e. adding debug output with console.log(xyz); (using an actual debugger probably would be a better idea).
So, here's my "trace":
option: https://github.com/kangax/html-minifier/blob/gh-pages/cli.js#L118
option handling: https://github.com/kangax/html-minifier/blob/gh-pages/cli.js#L144
argument parsing using [commander][2]
createOptions() https://github.com/kangax/html-minifier/blob/gh-pages/cli.js#L197
options then contains e.g. minifyJS: 'compress:{pure_funcs:[\'console.log\']}',
passed on to minify() https://github.com/kangax/html-minifier/blob/gh-pages/src/htmlminifier.js#L806 which immediately runs
processOptions() https://github.com/kangax/html-minifier/blob/gh-pages/src/htmlminifier.js#L616
where finally in line https://github.com/kangax/html-minifier/blob/gh-pages/src/htmlminifier.js#L667 options.minifyJS is handled, before it's run as var result = UglifyJS.minify(code, minifyJS); in https://github.com/kangax/html-minifier/blob/gh-pages/src/htmlminifier.js#L680.
But there our option string compress:{pure_funcs:['console.log']} gets cleaned because it's not yet an object, resulting in {}.
Or, in a different trial with a different string you may encounter the error Could not parse JSON value '{compress:{pure_funcs:'console.log']}}'
At least it gets that far! But why doesn't it work?
First, it's a good time to revisit the JSON spec: https://www.json.org/index.html
Second, see if the string could be parsed as valid JSON, e.g. with the JSON.parse() demo at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse
Third, figure out how to get that string through the CMD as argument (escaping the double quotes).
Finally, make sure the data structure to configure UgliFyJS is correct. That's quite easy, since it's documented: https://github.com/mishoo/UglifyJS2#minify-options-structure
And behold, simply escaping the double quotes with a backslash works for me:
html-minfier ... --minify-js {\"compress\":{\"pure_funcs\":[\"console.log\"]}} ...
and it properly shows up in the options as
...
{ compress:
{ pure_funcs: [ 'console.log' ],
...
For ex. curl can read config from a file, like proxies, etc...
Many programs do so. git, maven, gradle.... No matter how and where you call them, they look for the config you or the system provides: first from the current directory, then from the user home and then the system /etc/...
If no batteries included with these node packages, they can only be used on separate html and js files.

problems while generating java code from jruby

I have problems while generating .java files with jruby 1.7.3. Here is an example:
class Duck
def quack()
puts "quack!";
end
end
def quack_it(duck)
duck.quack
end
a = Duck.new
quack_it(a)
when I execute
jrubyc --java Test.rb I get the following compilation error:
Failure during compilation of file DuckExample_simple.rb:
undefined method `new_method' for nil:NilClass.
Therefore, I have 2 questions:
What is wrong here?
I want to generate .java files in order to see how the JRuby code is translated into the bytecode and instead of reading the bytecode itself I thought to read the java code. Does the generated java code correspond 1 to 1 to the bytecode generated by AOT jruby compiler, or it's better to read the bytecode itself? I actually want to see how jruby handles dynamic method dispatch at the bytecode level. Any hints would be appreciated.
i don't use jruby so i am not really the best guy to talk to, but here are my 2 cents anyways.
if you just put a simple class into the file, it will work. so try
class Duck
def quack()
puts "quack!"
end
end
it will create a Duck.java file as you would expect, which answeres the second question you had. there is also a nice writeup about the generated file here: http://rhnh.net/2012/10/20/guice-in-your-jruby
i guess that the command is somewhat broken. it would be best to open an issue at the jruby issue tracker: http://jira.codehaus.org/browse/JRUBY

Syslog-ng format-json not working

I'm desperately trying to send a message as JSON to a PHP script.
destination d_php {
program("/usr/bin/php -f /data/htdocs/log.php" template("$(format-json)\n") ) ;
};
The php script is fine. Using simple macros works well, but the "format-json" function does always return this:
error in template: $(format-json)
I tried everything I could find in the documentation, but all response I get is "error in template". The official docs (link) even use 2 different spellings, not very promising.
Any ideas?
the format-json and format_json syntax should both work, hyphens and underscores are equivalent in syslog-ng.
As for the actual problem, have you tried setting the scope parameter of format-json, like "$(format_json --scope selected_macros)"? By default, it is empty, which means there is nothing to format.
HTH,
Regards,
Robert Fekete
Found the reason. Apparently, syslog-ng is split into separate packages on Ubuntu (12). I had to install syslog-ng-mod-json.
It's really a shame that syslog-ng doesn't give the slightest hint that the function is missing or unknown, instead of some general error.
If you are compiling from the source, you shall install json-c library first (yum install json-c-devel).

HTML tidy/cleaning in Ruby 1.9

I'm currently using the RubyTidy Ruby bindings for HTML tidy to make sure HTML I receive is well-formed. Currently this library is the only thing holding me back from getting a Rails application on Ruby 1.9. Are there any alternative libraries out there that will tidy up chunks of HTML on Ruby 1.9?
http://github.com/libc/tidy_ffi/blob/master/README.rdoc works with ruby 1.9 (latest version)
If you are working on windows, you need to set the library_path eg
require 'tidy_ffi'
TidyFFI.library_path = 'lib\\tidy\\bin\\tidy.dll'
tidy = TidyFFI::Tidy.new('test')
puts tidy.clean
(It uses the same dll as tidy) The above links gives you more example of the usage.
I am using Nokogiri to fix invalid html:
Nokogiri::HTML::DocumentFragment.parse(html).to_html
Here is a nice example of how to make your html look better using tidy:
require 'tidy'
Tidy.path = '/opt/local/lib/libtidy.dylib' # or where ever your tidylib resides
nice_html = ""
Tidy.open(:show_warnings=>true) do |tidy|
tidy.options.output_xhtml = true
tidy.options.wrap = 0
tidy.options.indent = 'auto'
tidy.options.indent_attributes = false
tidy.options.indent_spaces = 4
tidy.options.vertical_space = false
tidy.options.char_encoding = 'utf8'
nice_html = tidy.clean(my_nasty_html_string)
end
# remove excess newlines
nice_html = nice_html.strip.gsub(/\n+/, "\n")
puts nice_html
For more tidy options, check out the man page.
Currently this library is the only
thing holding me back from getting a
Rails application on Ruby 1.9.
Watch out, the Ruby Tidy bindings have some nasty memory leaks. It's currently unusable in long running processes. (for the record, I'm using http://github.com/ak47/tidy)
I just had to remove it from a production Rails 2.3 application because it was leaking about 1MB/min.