gulp-htmlmin fails on valid document: workaround or abandon plugin? - html

I'm trying to minify my HTML. I've just discovered and started using the gulp-htmlmin plugin
My gulp task...
gulp.task('build-html',function(){
return gulp.src(appDev+'test.html')
.pipe(htmlmin({collapseWhitespace: true}))
.pipe(gulp.dest(appProd));
});
fails when applied to this valid HTML, or any document with the < character:
<div> < back </div>
The error is:
Error: Parse Error: < back </div>
at new HTMLParser (html-minifier\src\htmlparser.js:236:13)
I can think of two solutions:
Replace < with < in all my templates. Doing so manually won't be fun, but that's life. Any idea how to do it in a gulp task?
Ditch this plugin in search for one that can parse and minify my templates. Any suggestions?
Guess I'm wondering how someone more experienced at building for deployment (I'm new) would handle this.

I'm afraid you're going to have to change them to <. The html-minifier team has specifically stated they won't support bare <s.
You want do this anyway, both to not trip up parsers and to protect against certain XSS attacks. See
When Should One Use HTML Entities,
the W3C's recommendations,
and OWASP's XSS prevention cheat sheet
for more info.
The good news is any text editor worth its coding salt supports project-wide or at least multi-file search and replace. Assuming all your HTML <tags> don't have whitespace after the <, you should be able to just replace "< " with "< ".

I decided to replace all < with < since I figured this will probably save me some grief down the road. Besides #henry made great points in his answer.
I was too big of a chicken though to trust my IDE to do a find and replace without breaking my code all kinds of ways. Instead I followed these steps:
Run gulp task from the OP
Notice the file that threw the parse error and go fix it
Run gulp task again
A new file throws the parse error. Go fix it
...
Eventually I fixed all the files.

Related

POEdit: Can't update translations from Source Code

I am using POEdit for translations in a web application.
However, when I start POEdit I can't find any sources when I run 'Catalog > Update from Sources'. I only have .CSHTML-Files where the texts need to be translated.
What I've already tried:
Set the source path in Catalog > Properties and the charset to
'UTF-8'.
Added additional keyword ("[[[w+]]]") for matching words in my files (the words to translate always have the following form: [[[wordToTranslate]]]
Added a cshtml-extractor (In File > Settings > Extractor). When I did this, the following error message appeared: "warning: unterminated string constant". Warning: ')' found where '}' was expected.
Browsing the web without finding any clue of how to include cshtml-files.
Any hints are appreciated.
Any solutions are MUCH appreciated. :-)
Added additional keyword ("[[[w+]]]") for matching words in my files
I don’t know why you assume the keyword values are regexes of all things; they are not. The GNU gettext manual makes it clear what “keyword” is in the gettext context: name of the function used to call gettext with translatable string literals as the argument.
Added a cshtml-extractor
You get errors coming from this, it would be reasonable to assume that’s the problem. Because you gloss over this crucial step and don’t reveal the details of how you configured it, it’s impossible to give you a concrete answer (not without a crystal ball, anyway).
So I can only make an educated guess: if you didn’t actually add a proper extractor that understands the syntax of the template language you use, and used some gross hack like using the Python parser, then that’s the cause of your errors, together with the use of keyword value that can’t possibly be valid.

Is there a way that I can verify if a file is json format(not missing comma or something) in a console?

That's it, i'm a careless guy, i always miss something or what if i'm writing a json.
I think maybe we can utilize irb.
Before looking at Node, look at your editor. Does it do plugins (say, Sublime Text)? If so, install a JSON linter/validator that won't let you save until you've fixed the errors. Problem solved.
No such luck? Look into using grunt or gulp with a simple JSON validator task (of the kind "look for **/*.json, check that"). e.g. https://www.npmjs.org/package/grunt-jsonlint or https://www.npmjs.org/package/gulp-jsonlint ...
Or, even just use plain old https://www.npmjs.org/package/jsonlint on its own to check individual files.
This is a solved problem, pick your favourite solution.
You could always create an alias:
alias jsonlint="echo \"try{JSON.parse(require('fs').readFileSync(process.argv[2])); process.exit(0);}catch(e){process.exit(1);}\" | node"
and use it like:
jsonlint some_file.json
or if you don't want the error output
jsonlint some_file.json 2>/dev/null
In spite of what Snowmanzzz said, require('some_json.json') isn't guaranteed to detect the file as JSON (especially if it doesn't have the .json extension).
I find a super simple one, just use node, then require this json file.

igraph for python

I'm thoroughly confused about how to read/write into igraph's Python module. What I'm trying right now is:
g = igraph.read("football.gml")
g.write_svg("football.svg", g.layout_circle() )
I have a football.gml file, and this code runs and writes a file called football.svg. But when I try to open it using InkScape, I get an error message saying the file cannot be loaded. Is this the correct way to write the code? What could be going wrong?
The write_svg function is sort of deprecated; it was meant only as a quick hack to allow SVG exports from igraph even if you don't have the Cairo module for Python. It has not been maintained for a while so it could be the case that you hit a bug.
If you have the Cairo module for Python (on most Linux systems, you can simply install it from an appropriate package), you can simply do this:
igraph.plot(g, "football.svg", layout="circle")
This would use Cairo's SVG renderer, which is likely to generate the correct result. If you cannot install the Cairo module for Python for some reason, please file a bug report on https://bugs.launchpad.net/igraph so we can look into this.
(Even better, please file a bug report even if you managed to make it work using igraph.plot).
Couple years late, but maybe this will be helpful to somebody.
The write_svg function seems not to escape ampersands correctly. Texas A&M has an ampersand in its label -- InkScape is probably confused because it sees & rather than &. Just open football.svg in a text editor to fix that, and you should be golden!

Escape Html in erlang

Does anyone have a good way to escape html tags in erlang (as for CGI.escapeHtml in Ruby)?
Thanks
Well, i would tell you to roll your own method using string and list processing But, i would also say that if you have yaws web server source, there is a method i have used and copied into my own libraries. yaws_api:url_encode(HtmlString). See it here in action.
1> Html = "5 > 4 = true".
"5 > 4 = true"
2> yaws_api:url_encode(Html).
"5%20%3E%204%20%3D%20true"
3>
I hope this is some how what u needed. If this is what you needed, you could just browse yaws web server source code and then copy out this function and use it in your own projects, notice that within the module yaws_api.erl, you will have to make sure that you copy out all the dependencies for this function as klacke did a lot of pattern matching, function clauses, recursion e.t.c. Just copy the whole function and the small support functions from that source file and paste it some where in your projects. The other way would be to do it by your own by manipulating strings and Lists. Those are my suggestions :)

What is the best way to parse a web page in Ruby?

I have been looking at XML and HTML libraries on rubyforge for a simple way to pull data out of a web page. For example if I want to parse a user page on stackoverflow how can I get the data into a usable format?
Say I want to parse my own user page for my current reputation score and badge listing. I tried to convert the source retrieved from my user page into xml but the conversion failed due to a missing div. I know I could do a string compare and find the text I'm looking for, but there has to be a much better way of doing this.
I want to incorporate this into a simple script that spits out my user data at the command line, and possibly expand it into a GUI application.
Unfortunately stackoverflow is claiming to be XML but actually isn't. Hpricot however can parse this tag soup into a tree of elements for you.
require 'hpricot'
require 'open-uri'
doc = Hpricot(open("http://stackoverflow.com/users/19990/armin-ronacher"))
reputation = (doc / "td.summaryinfo div.summarycount").text.gsub(/[^\d]+/, "").to_i
And so forth.
Hpricot is over !
Use Nokogiri now.
try hpricot, its well... awesome
I've used it several times for screen scraping.
I always really like what Ilya Grigorik writes, and he wrote up a nice post about using hpricot.
I also read this post a while back and it looks like it would be useful for you.
Haven't done either myself, so YMMV but these seem pretty useful.
Something I ran into trying to do this before is that few web pages are well-formed XML documents. Hpricot may be able to deal with that (I haven't used it) but when I was doing a similar project in the past (using Python and its library's built in parsing functions) it helped to have a pre-processor to clean up the HTML. I used the python bindings for HTML Tidy as this and it made life a lot easier. Ruby bindings are here but I haven't tried them.
Good luck!
it seems to be an old topic but here is a new one. Example getting reputation:
#!/usr/bin/env ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'
user = "619673/100kg"
html = "http://stackoverflow.com/users/%s?tab=reputation"
page = html % user
puts page
doc = Hpricot(open(page))
pars = Array.new
doc.search("div[#class='subheader user-full-tab-header']/h1/span[#class='count']").text.each do |p|
pars << p
end
puts "reputation " + pars[0]