Indenting generated markup in Jekyll/Ruby - html

Well this is probably kind of a silly question but I'm wondering if there's any way to have the generated markup in Jekyll to preserve the indentation of the Liquid-tag. World doesn't end if it isn't solvable. I'm just curious since I like my code to look tidy, even if compiled. :)
For example I have these two:
base.html:
<body>
<div id="page">
{{content}}
</div>
</body>
index.md:
---
layout: base
---
<div id="recent_articles">
{% for post in site.posts %}
<div class="article_puff">
<img src="/resources/images/fancyi.jpg" alt="" />
<h2>{{post.title}}</h2>
<p>{{post.description}}</p>
Read more
</div>
{% endfor %}
</div>
Problem is that the imported {{content}}-tag is rendered without the indendation used above.
So instead of
<body>
<div id="page">
<div id="recent_articles">
<div class="article_puff">
<img src="/resources/images/fancyimage.jpg" alt="" />
<h2>Gettin' down with responsive web design</h2>
<p>Everyone's talking about it. Your client wants it. You need to code it.</p>
Read more
</div>
</div>
</div>
</body>
I get
<body>
<div id="page">
<div id="recent_articles">
<div class="article_puff">
<img src="/resources/images/fancyimage.jpg" alt="" />
<h2>Gettin' down with responsive web design</h2>
<p>Everyone's talking about it. Your client wants it. You need to code it.</p>
Read more
</div>
</div>
</div>
</body>
Seems like only the first line is indented correctly. The rest starts at the beginning of the line... So, multiline liquid-templating import? :)

Using a Liquid Filter
I managed to make this work using a liquid filter. There are a few caveats:
Your input must be clean. I had some curly quotes and non-printable chars that looked like whitespace in a few files (copypasta from Word or some such) and was seeing "Invalid byte sequence in UTF-8" as a Jekyll error.
It could break some things. I was using <i class="icon-file"></i> icons from twitter bootstrap. It replaced the empty tag with <i class="icon-file"/> and bootstrap did not like that. Additionally, it screws up the octopress {% codeblock %}s in my content. I didn't really look into why.
While this will clean the output of a liquid variable such as {{ content }} it does not actually solve the problem in the original post, which is to indent the html in context of the surrounding html. This will provide well formatted html, but as a fragment that will not be indented relative to tags above the fragment. If you want to format everything in context, use the Rake task instead of the filter.
-
require 'rubygems'
require 'json'
require 'nokogiri'
require 'nokogiri-pretty'
module Jekyll
module PrettyPrintFilter
def pretty_print(input)
#seeing some ASCII-8 come in
input = input.encode("UTF-8")
#Parsing with nokogiri first cleans up some things the XSLT can't handle
content = Nokogiri::HTML::DocumentFragment.parse input
parsed_content = content.to_html
#Unfortunately nokogiri-pretty can't use DocumentFragments...
html = Nokogiri::HTML parsed_content
pretty = html.human
#...so now we need to remove the stuff it added to make valid HTML
output = PrettyPrintFilter.strip_extra_html(pretty)
output
end
def PrettyPrintFilter.strip_extra_html(html)
#type declaration
html = html.sub('<?xml version="1.0" encoding="ISO-8859-1"?>','')
#second <html> tag
first = true
html = html.gsub('<html>') do |match|
if first == true
first = false
next
else
''
end
end
#first </html> tag
html = html.sub('</html>','')
#second <head> tag
first = true
html = html.gsub('<head>') do |match|
if first == true
first = false
next
else
''
end
end
#first </head> tag
html = html.sub('</head>','')
#second <body> tag
first = true
html = html.gsub('<body>') do |match|
if first == true
first = false
next
else
''
end
end
#first </body> tag
html = html.sub('</body>','')
html
end
end
end
Liquid::Template.register_filter(Jekyll::PrettyPrintFilter)
Using a Rake task
I use a task in my rakefile to pretty print the output after the jekyll site has been generated.
require 'nokogiri'
require 'nokogiri-pretty'
desc "Pretty print HTML output from Jekyll"
task :pretty_print do
#change public to _site or wherever your output goes
html_files = File.join("**", "public", "**", "*.html")
Dir.glob html_files do |html_file|
puts "Cleaning #{html_file}"
file = File.open(html_file)
contents = file.read
begin
#we're gonna parse it as XML so we can apply an XSLT
html = Nokogiri::XML(contents)
#the human() method is from nokogiri-pretty. Just an XSL transform on the XML.
pretty_html = html.human
rescue Exception => msg
puts "Failed to pretty print #{html_file}: #{msg}"
end
#Yep, we're overwriting the file. Potentially destructive.
file = File.new(html_file,"w")
file.write(pretty_html)
file.close
end
end

We can accomplish this by writing a custom Liquid filter to tidy the html, and then doing {{content | tidy }} to include the html.
A quick search suggests that the ruby tidy gem may not be maintained but that nokogiri is the way to go. This will of course mean installing the nokogiri gem.
See advice on writing liquid filters, and Jekyll example filters.
An example might look something like this: in _plugins, add a script called tidy-html.rb containing:
require 'nokogiri'
module TextFilter
def tidy(input)
desired = Nokogiri::HTML::DocumentFragment.parse(input).to_html
end
end
Liquid::Template.register_filter(TextFilter)
(Untested)

Related

django tag { % block content % } isn't working

so i just started learning django, i understand the basic tag blocks but it didn't works well on my page. i have page called index.html and question.html
i write like this in index.html
<body>
<div>
<div>
sum content
</div>
<div>
{ % block content % }
{ % endblock % }
</div>
</div>
</body>
and like this in question.html :
{ % extends 'index.html' % }
{ % block content % }
<<my content>>
{ % endblock % }
but the content in question.html didn't show up in index.html. i've checked my setting and didn't have django-stub like in other case.
and if you want to know the structure, it goes like :
djangoProject1
>djangoProject1
>myweb
>static
>templates
-index.html
-question.html
this is my views.py
def index(request):
return render(request, 'index.html')
def question(request):
return render(request, 'question.html')
def formdata(request):
nama = request.POST.get("namaa")
umur = request.POST.get("umur")
komorbid = request.POST.get("penyakit_bawaan")
ruang = request.POST.get("ketersediaan_ruang")
demam = request.POST.get("demam")
lelah = request.POST.get("lelah")
batuk = request.POST.get("batuk")
nyeri = request.POST.get("nyeri")
tersumbat = request.POST.get("tersumbat")
pilek = request.POST.get("pilek")
sakit_kepala = request.POST.get("sakit_kepala")
tenggorokan = request.POST.get("tenggorokan")
diare = request.POST.get("diare")
hilang_cium = request.POST.get("hilang_penciuman")
ruam = request.POST.get("ruam")
sesak = request.POST.get("sesak")
sulit_gerak = request.POST.get("sulit_gerak")
nyeri_dada = request.POST.get("nyeri_dada")
hasil_rekomendasi = request.POST("hasil_rekomendasi")
data_resp = DataResponden(nama=nama, umur=umur, penyakit_bawaan=komorbid, ketersediaan_ruang=ruang, demam=demam,
lelah=lelah, batuk=batuk, nyeri=nyeri, tersumbat=tersumbat, pilek=pilek,
sakit_kepala=sakit_kepala, tenggorokan=tenggorokan, diare=diare,
hilang_penciuman=hilang_cium, ruam=ruam, sesak=sesak, sulit_gerak=sulit_gerak,
nyeri_dada=nyeri_dada, hasil_rekomendasi=hasil_rekomendasi)
data_resp.save()
return render(request, 'question.html')
Thank you in advance!
It looks like you might have some confusion regarding how templates work. index.html is a parent/base template, because it is being extended (through the {% extends 'index.html' %} tag). question.html is a child template, which means if you make no changes, it will inherit everything from index.html.
A child template can override parts of the parent template by using {% block %} tags. So when the webpage is getting rendered, the code from the block in the parent is not used at all (if there was any). When you directly render the parent template, there will be no such replacement since it does not extend anything.
So the rendered HTML for your files should be as follows
index.html
<body>
<div>
<div>
sum content
</div>
<div>
</div>
</div>
</body>
question.html
<body>
<div>
<div>
sum content
</div>
<div>
<<my content>>
</div>
</div>
</body>
So yeah, content in question.html is not supposed to show up in index.html. It works the other way around, with the entire structure of index.html being used for question.html, except for the things you override.
If you want index to have some content by default, you can have code inside the content block. It will be replaced by any child templates if necessary, but when you load just index.html it will still be visible.
If you are actually trying to insert something into index.html, take a look at the include tag. This allows for re-using common sections of the website across webpages. But you would not extend the base template inside any template you are planning to include.
Just remove the spaces in tag blocks everywhere like this:
Change this:
{ % block content % }
To this:
{% block content %}
Similarly with other tag blocks.

Nokogiri HTML Nested Elements Extract Class and Text

I have a basic page structure with elements (span's) nested under other elements (div's and span's). Here's an example:
html = "<html>
<body>
<div class="item">
<div class="profile">
<span class="itemize">
<div class="r12321">Plains</div>
<div class="as124223">Trains</div>
<div class="qwss12311232">Automobiles</div>
</div>
<div class="profile">
<span class="itemize">
<div class="lknoijojkljl98799999">Love</div>
<div class="vssdfsd0809809">First</div>
<div class="awefsaf98098">Sight</div>
</div>
</div>
</body>
</html>"
Notice that the class names are random. Notice also that there is whitespace and tabs in the html.
I want to extract the children and end up with a hash like so:
page = Nokogiri::HTML(html)
itemhash = Hash.new
page.css('div.item div.profile span').map do |divs|
children = divs.children
children.each do |child|
itemhash[child['class']] = child.text
end
end
Result should be similar to:
{\"r12321\"=>\"Plains\", \"as124223\"=>\"Trains\", \"qwss12311232\"=>\"Automobiles\", \"lknoijojkljl98799999\"=>\"Love\", \"vssdfsd0809809\"=>\"First\", \"awefsaf98098\"=>\"Sight\"}
But I'm ending up with a mess like this:
{nil=>\"\\n\\t\\t\\t\\t\\t\\t\", \"r12321\"=>\"Plains\", nil=>\" \", \"as124223\"=>\"Trains\", \"qwss12311232\"=>\"Automobiles\", nil=>\"\\n\\t\\t\\t\\t\\t\\t\", \"lknoijojkljl98799999\"=>\"Love\", nil=>\" \", \"vssdfsd0809809\"=>\"First\", \"awefsaf98098\"=>\"Sight\"}
This is because of the tabs and whitespace in the HTML. I don't have any control over how the HTML is generated so I'm trying to work around the issue. I've tried noblanks but that's not working. I've also tried gsub but that only destroys my markup.
How can I extract the class and values of these nested elements while cleanly ignoring whitespace and tabs?
P.S. I'm not hung up on Nokogiri - so if another gem can do it better I'm game.
The children method returns all child nodes, including text nodes—even when they are empty.
To only get child elements you could do an explicit XPath query (or possibly the equivalent CSS), e.g.:
children = divs.xpath('./div')
You could also use the children_elements method, which would be closer to what you are already doing, and which only returns children that are elements:
children = divs.element_children

Controlling the existence of an attribute

I have a problem with the Slim template engine in a Sinatra project. I have an edit form to be filled when the route is triggered. There is an issue with HTML select option. I need something like this when the edit form is loaded. Notice that Mrs. option is selected:
<select name="person[title]" id="person[title]">
<option value="Mr.">Mr.</option>
<option value="Mrs." selected>Mrs.</option>
</select>
I tried:
option[value="Mrs." "#{person.title == :mrs ? 'selected' : ''}"]
The exception was about an attribute error. Then I tried something like this:
option[value="Mrs." selected="#{person.title == :mrs ? true : false}"]
but then the output was something like this:
<option value"Mrs." selected="false">Mrs.</option>
I guess the string"false" is interpreted as true. That failed. I tried some combinations with round brackets but couldn't get it to work.
How could I set the selected attribute of an option in a select list in Slim?
For an attribute, you can write ruby code after the =, but if the ruby code has spaces in it, you have to put parentheses around the ruby code:
option[value="1" selected=("selected" if #title=="Mrs.")] "Mrs."
See "Ruby attributes" here: http://rdoc.info/gems/slim/frames.
The brackets are optional, so you can also write it like this:
option value="1" selected=("selected" if #title=="Mrs.") "Mrs."
Or, instead of brackets, you can use a different delimiter:
option {value="1" selected=("selected" if #title=="Mrs.")} "Mrs."
Here it is with some code:
slim.slim:
doctype html
html
head
title Slim Examples
meta name="keywords" content="template language"
body
h1 Markup examples
p This example shows you how a basic Slim file looks like.
select
option[value="1" selected=("selected" if #title=="Mr.")] "Mr."
option[value="2" selected=("selected" if #title=="Mrs.")] "Mrs."
Using Slim in a standalone ruby program without rails:
require 'slim'
template = Slim::Template.new(
"slim.slim",
pretty: true #pretty print the html
)
class Person
attr_accessor :title
def initialize title
#title = title
end
end
person = Person.new("Mrs.")
puts template.render(person)
--output:--
<!DOCTYPE html>
<html>
<head>
<title>
Slim Examples
</title>
<meta content="template language" name="keywords" />
</head>
<body>
<h1>
Markup examples
</h1>
<p>
This example shows you how a basic Slim file looks like.
</p>
<select><option value="1">"Mr."</option><option selected="selected" value="2">"Mrs."</option></select>
</body>
</html>
I guess the string "false" is interpreted as true.
Yes. The only things that evaluate to false are false itself and nil. Any number(including 0), any string (including ""), and any array(including []), etc. are all true.
Not pertinent to your problem, but perhaps useful to some future searcher...I guess Slim looks up instance variables in whatever object you pass as an argument to render. So if you want to provide a whole bunch of values for the template, you can write:
require 'slim'
template = Slim::Template.new(
"slim.slim",
pretty: true #pretty print the html
)
class MyVals
attr_accessor :count, :title, :animals
def initialize count, title, animals
#count = count
#title = title
#animals = animals
end
end
vals = MyVals.new(4, "Sir James III", %w[ squirrel, monkey, cobra ])
puts template.render(vals)
slim.slim:
doctype html
html
head
title Slim Examples
meta name="keywords" content="template language"
body
p =#count
p =#title
p =#animals[-1]
Neither OpenStruct nor Struct work with render() even though they seem like natural candidates.

How to output just html image tag when I have text and html in django template output?

I am using this to output,
{{ movie.img }}
and I get the output is supposed to be something like,
u'<img src="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/Dark_Knight.jpg/220px-Dark_Knight.jpg" alt="" height="327" width="220" >\nTheatrical release poster'
How do I just output the html image part? I don't want the Theatrical release poster to appear in the output.
Since you are getting that as just text, your best solution would be to write a template filter that would strip content not in the <img> html tag.
If the object were a ImageField (or FileField), you can call on the url attribute only, {{ movie.img.url }}
update
Ok, here's a basic, probably too naive template filter for your use.
from django import template
from django.template.defaultfilters import stringfilter
import re
register = template.Library()
#register.filter(is_safe=True)
#stringfilter
def get_img_tag(value):
result = re.search("<.*?>", value)
if result:
return result.group()
return value
Use:
{{ movie.img|get_img_tag|safe }}

Changing href attributes with nokogiri and ruby on rails

I Have a HTML document with links links, for exemple:
<html>
<body>
<ul>
<li>teste1</li>
<li>teste2</li>
<li>teste3</li>
<ul>
</body>
</html>
I want with Ruby on Rails, with nokogiri or some other method, to have a final doc like this:
<html>
<body>
<ul>
<li>teste1</li>
<li>teste2</li>
<li>teste3</li>
<ul>
</body>
</html>
What's the best strategy to achieve this?
If you choose to use Nokogiri, I think this should work:
require 'cgi'
require 'rubygems' rescue nil
require 'nokogiri'
file_path = "your_page.html"
doc = Nokogiri::HTML(open(file_path))
doc.css("a").each do |link|
link.attributes["href"].value = "http://myproxy.com/?url=#{CGI.escape link.attributes["href"].value}"
end
doc.write_to(open(file_path, 'w'))
If I'm not mistaken rails loads REXML up by default, depending on what you're trying to do you could use this also.
Here is what I did for replacing images src attributes:
doc = Nokogiri::HTML(html)
doc.xpath("//img").each do |img|
img.attributes["src"].value = Absolute_asset_path(img.attributes["src"].value)
end
doc.to_html // simply use .to_html to re-convert to html