How do I scrape page content with a generator? - jekyll

I have a blog where each post has a bunch of images and videos. I would like to be able to tag each of them with some keywords, and then populate a set of pages based on the tags. For example going to /photos/car/ would list all images tagged with car.
I include images and videos using a simple plugin right now, that essentially only has a render function. I figure I could add the tags there.
But how can I make Jekyll 'scrape' my pages and populate pages with images?

Short answer is : this is tricky.
But why ?
A custom imgWithTags-tag liquid tag like {% img cookie-monster.png tag2,tag3 %} can do two things :
render the html tag
store tag -> image associations in a site.taggedImage variable
As this tag can be used on any post, page or collection document, the site.taggedImage variable will only be complete after the render process.
Once the render process is finished, we can grab the :site, :post_render hook to process our datas and create Tags pages.
But here, we can no longer rely on things like {% for p in tagPages %}...{% endfor %} to automatically generate links to our pages : rendering is finished.
The trick can be to maintain a link data file by hand, in order to be able to generate links with loops like this {% for p in site.data.tagPages %}...{% endfor %}
Let's give it a try
NOTE : this works only with Jekyll version 3.1.x (not 3.0.x)
_plugins/imgWithTags-tag.rb
module Jekyll
class ImgWithTags < Liquid::Tag
# Custom liquid tag for images with tags
# syntax : {% img path/to/img.jpg coma, separated, tags %}
# tags are optionnal
#
# This plugin does two things :
# it renders html tag <a href='url'><img src='url' /></a>
# it stores tags -> images associations in site.config['images-tags']
Syntax = /^(?<image_src>[^\s]+\.[a-zA-Z0-9]{3,4})\s*(?<tags>[\s\S]+)?$/
def initialize(tag_name, markup, tokens)
super
if markup =~ Syntax then
#url = $1
#tags = $2.split(",").collect!{|tag| tag.strip } if !$2.nil?
else
raise "Image Tag can't read this tag. Try {% img path/to/img.jpg [coma, separated, tags] %}."
end
end
def render(context)
storeImgTags(context) if defined?(#tags) # store datas if we have tags
site = context.registers[:site]
imgTag = "<a href='#{ site.baseurl }/assets/#{#url}'><img src='#{ site.baseurl }/assets/#{#url}' /></a>"
end
def storeImgTags(context)
# store tagged images datas in site config under the key site.config['images-tags']
imagesTags = context.registers[:site].config['images-tags'] || {}
#tags.each{|tag|
slug = Utils.slugify(tag) # My tag -> my-tag
# create a tag.slug entry if it doesn't exist
imagesTags[slug] = imagesTags[slug] || {'name' => tag, 'images' => [] }
# add image url in the tag.images array if the url doesn't already exist
# this avoid duplicates
imagesTags[slug]['images'] |= [#url.to_s]
}
context.registers[:site].config['images-tags'] = imagesTags
end
end
end
Liquid::Template.register_tag('img', Jekyll::ImgWithTags)
At this point : all tagged images links are rendered and all tags->images associations are stored.
_plugins/hook-site-post-render-imagesTagsPagesGenerator.rb
Jekyll::Hooks.register :site, :post_render do |site, payload|
puts "++++++ site post_render hook : generating Images Tags pages"
imagesTags = site.config['images-tags'] # datas stored by img tag
linksDatas = site.data['imagesTagsLinks'] # tags pages link in data file
pagesFolder = 'tag'
imagesTags.each do |slug, datas|
tagName = datas['name']
tagImages = datas['images']
pageDir = File.join(pagesFolder, slug)
tagPage = Jekyll::ImageTagPage.new(site, site.source, pageDir, tagName, tagImages)
# as page rendering has already fired we do it again for our new pages
tagPage.output = Jekyll::Renderer.new(site, tagPage, payload).run
tagPage.trigger_hooks(:post_render)
site.pages << tagPage
# verbose check to see if we reference every tag url in out data file
if !linksDatas.key?(tagName) # check if tag is in imagesTagsLinks data file
puts "Warning ---------> #{tagName} not in data file"
puts "Add : #{tagName}: #{tagPage.url}"
puts "in data/imagesTagsLinks.yml"
puts
else
if tagPage.url != linksDatas[tagName] then # check if url is correct in imagesTagsLinks data file
puts "Warning ---------> incorrect url for '#{tagName}'"
puts "Replace : #{tagName}: #{linksDatas[tagName]}"
puts "by : #{tagName}: #{tagPage.url}"
puts "in data/imagesTagsLinks.yml"
puts
end
end
end
puts "++++++ END site post_render hook"
end
module Jekyll
class ImageTagPage < Page
def initialize(site, base, dir, tag, images)
#site = site
#base = base
#dir = dir
#name = 'index.html'
self.process(#name)
self.read_yaml(File.join(base, '_layouts'), 'tag_index.html')
self.data['tag'] = tag
self.data['images'] = images
self.data['title'] = "Images for tag : #{tag}"
end
end
end
And the Tag layout page
_layouts/tag_index.html
---
layout: default
---
<h2>{{ page.title }}</h2>
{% for i in page.images %}
<img src="{{ site.baseurl }}/assets/{{ i }}" alt="Image tagged {{ page.tag }}">
{% endfor %}
Here, everything is in place to generate tags page.
We can now do a jekyll build and see the verbose output just tell us what to add in _data/imagesTagsLinks.yml
_data/imagesTagsLinks.yml
tag1: /tag/tag1/
tag2: /tag/tag2/
...
We can now link to our tag page from everywhere with a simple
{% for t in site.data.imagesTagsLinks %}
<li>{{ t[0] }}</li>
{% endfor %}
I've told you : it's tricky. But It does the job.
Note: the img tag can be improved, and why not a figure tag ?

Related

How to save HTML tags + data into sqlalchemy?

I am creating a personal blog website with Flask and sqlalchemy.
While posting my blogs, I want the blog to be published with well formatted html.
Here is my model for Blogs:
class Blog(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(), index=True)
description = db.Column(db.Text(), index=True)
content = db.Column(db.Text(), index=True)
timestamp = db.Column(db.DateTime, index=True, default=datetime.utcnow)
likes = db.Column(db.Integer, default=0)
dislikes = db.Column(db.Integer, default=0)
comments = db.relationship('Comment', backref='commented_by', lazy='dynamic')
def __repr__(self):
return 'Title <>'.format(self.title)
And here is my form for adding blog:
{% extends 'base.html' %}
{% block content %}
<div class="container">
<div class="row">
<div class="col-xs-12 col-sm-12 col-md-12 col-lg-12 col-xl-12">
<h1 class="code-line text-center" data-line-start="14" data-line-end="15">Add Blog</h1>
<br>
</div>
</div>
<form action="" method="POST" novalidate>
{{ form.hidden_tag() }}
<p>
{{ form.title.label }}<br>
{{ form.title(size=30) }}<br>
</p>
<p>
{{ form.description.label }}<br>
{{ form.description(size=30) }}<br>
</p>
<p>
{{ form.content.label }}<br>
{{ form.content() }}<br>
</p>
<p>
{{ form.submit() }}
</p>
</form>
{{ ckeditor.load() }}
{{ ckeditor.config(name='content') }}
{% endblock %}
This is how I am rendering my blog:
{% extends 'base.html' %}
{% block content %}
<div class="container">
<div class="row">
<div class="col-xs-12 col-sm-12 col-md-12 col-lg-12 col-xl-12">
<h1 class="code-line text-center" data-line-start="14" data-line-end="15">{{ blog.title }}</h1>
<br>
{{ blog.content }}
</div>
</div>
</div>
{% endblock %}
While adding blog, I am using a text editor
But once it has been posted and I render it on view blog page, no html content is being rendered not even linebreaks
How can I save html content and tags in my sql database and then render it using jinja template?
first, what is wrong:
the text you get from the text field in the form is not the same thing as HTML that renders it, what you are getting is the text.
in case you want to get the HTML generated inthat form, you should integrate a rich text editor, like quilljs.com, or tiny.cloud in your template, that will have a field that you can use, to grab the HTML it generated, and it will also allow you to create nice looking blog articles.
if you do not want this either, to get html from that form, writing HTML directly in that form will give you what you want.
In the context of markdown, it is actually possible to apply the same format to your database-saved content. You can use a few packages to help you work with HTML in your database.
To begin, let me suggest Stackoverflow QA forms. Notice how it has enabled markdown editing and a nice little preview of the text being written. To enable the preview, you can install the flask-pagedown package in your virtual environment.
(venv)$ pip3 install flask-pagedown
Initialize a pagedown object in your application's instance, say in __init__.py file, or whatever other file you are using.
# __init__.py
from flask import Flask
from flask_pagedown import PageDown
app = Flask(__name__)
pagedown = PageDown(app)
Within your head tags in HTML, add this CDN call whose files you do not need to have in your application.
<!-- base.html -->
{% block head %}
{{ pagedown.html_head() }}
{% endblock %}
Or this:
<head>
{{ pagedown.html_head() }}
</head>
If you prefer to use your own JavaScript source files, you can simply include your Converter and Sanitizer files directly in the HTML page instead of calling pagedown.html_head():
<head>
<script type="text/javascript" src="https://mycdn/path/to/converter.min.js"></script>
<script type="text/javascript" src="https://mycdn/path/to/sanitizer.min.js"></script>
</head>
Now, simply update your forms to use PageDownField:
# forms.py
from flask_pagedown.fields import PageDownField
class Post(FlaskForm):
post = PageDownField('Comment', validators=[DataRequired()])
Or this:
<form method="POST">
{{ form.pagedown(rows=10) }}
</form>
That's it! You should be able to have a client-side post preview right below the form.
Handling Rich Text in the Server
When the post request is made, then only raw markdown will be sent to the database and the preview will be discarded. It is a security risk to send HTML content to your database. An attacker can easily construct HTML sequences which don't match the markdown source and submit them, hence the reason why only markdown text is submitted. Once in the server, that text can be converted back to HTML using a Python markdown-to-html convertor. There are two packages that you can make use of. Install then in your virtual environment as seen below:
(venv)$ pip3 install markdown bleach
bleach is used to sanitize the HTML you want converted to allow for a set of tags.
At this point, the next logical step would be to cache your content field content while in the database. This is done by adding a new field, let us say content_html, in your database specifically for this cached content. It is best to leave your content field as it is in case you would want to use it.
# models.py
class Blog(db.Model):
content = db.Column(db.String(140))
content_html = db.Column(db.String(140))
#staticmethod
def on_changed_body(target, value, oldvalue, initiator):
allowed_tags = ['a', 'abbr', 'acronym', 'b', 'blockquote', 'code',
'em', 'i', 'li', 'ol', 'pre', 'strong', 'ul',
'h1', 'h2', 'h3', 'p']
target.content_html = bleach.linkify(bleach.clean(
markdown(value, output_format='html'),
tags=allowed_tags, strip=True))
def __repr__(self):
return f'Title {self.title}'
db.event.listen(Blog.content, 'set', Blog.on_changed_body)
The on_changed_body() function is registered as a listener of SQLAlchemy’s “set” event for body , which means that it will be automatically invoked whenever the body field is set to a new value. The handler function renders the HTML version of the content and stores it in content_html , effectively making the conversion of the Markdown text to HTML fully automatic.
The actual conversion is done in 3 steps:
markdown() function does an initial conversion to HTML. The result is passed to clean() function with a list of approved HTML tags
clean() function removes any tags that are not in the whitelist.
linkify() function from bleach converts any URLs written in plain text into proper <a> links. Automatic link generation is not officially in the Markdown specification, but is a very convenient feature. On the client side, PageDown supports this feature as an optional extension, so linkify() matches that functionality on the server.
In your template, where you want to post your content you can add a condition such as:
{% for blog in blogs %}
{% if blog.content_html %}
{{ blog.content_html | safe }}
{% else %}
{{ blog.content }}
{% endif %}
{% endfor %}
The | safe suffix when rendering the HTML body is there to tell Jinja2 not to escape the HTML elements. Jinja2 escapes all template variables by default as a security measure, but the Markdown-generated HTML was generated by the server, so it is safe to render directly as HTML.

DJANGO template tags in plain text not displaying

I am making an app that displays questions. The question model has a text field and an image field. Each question has a template that is stored in my database in the text field. My problem is when I want to access images from the model, template tags are displayed as text and not rendering. My code:
# question model
class Question(models.Model):
question_text = models.TextField()
question_image = models.FileField(upload_to='static/images/questions', blank=true)
# question view
def question(request, question_id):
question = get_object_or_404(Question, pk=question_id)
return render(request, 'questiontemplate.html', {'question': question})
# template
{% extends 'base.html %}
{% load static %}
{% autoscape off %}
{{ question.question_text }}
{% endautoscape %}
# in my database:
question.question_text = '<p> some html
{{ question.question_image.url }}
some html </p>'
question.question_image = 'image.png'
This works fine and renders the html perfectly except the template tag is not rendered and does not not give the image url
I want this to be the output:
Some html
static/images/questions/image.png
some html
But instead this is the output:
some html
{{ question.question_image.url }}
some html
Any suggestions to how the template tags could be render from the database text would be much appreciated.
Thanks for reading
Django doesn't know that the content in your model field is itself a model. The template can't know that. The only way to make this work is to treat that field itself as a template, and render it manually.
You could do that with a method on the model:
from django.template import Template, Context
class Question(models.Model):
...
def render_question(self):
template = Template(self.question_text)
context = Context({'question': self})
rendered = template.render(context)
return mark_safe(rendered)
Now you can call it in your template:
{{ question.render_question }}

build an includable template as string [django]

I want to build a list of clickable links for my nav, and because these are links to my site, I want to use the url-tag. I get a list of dictonaries which knows all names for the links and create a template string with them with this function:
def get_includable_template(links):
output = '<ul>'
for link in links:
output = output + '<li><a href="{% url' + get_as_link(link) + '%}>" + link['shown'] + '</a></li>'
output = output + '</ul>
links looks like this:
links = [
{'app': 'app1', 'view': 'index', 'shown': 'click here to move to the index'},
{'app': 'app2', 'view': 'someview', 'shown': 'Click!'}
]
get_as_link(link) looks like this:
def get_as_link(link):
return "'" + link['app'] + ':' + link['view'] + "'"
The first method will return a template, which looks like this (but it's all in the same code line):
<ul>
<li>click here to move to the index</li>'
<li>Click!</li>
</ul>
I want this to be interpreted as template and included to another template.
But how to include this?
Let's say, my other template looks like this:
{% extends 'base.html' %}
{% block title %}App 1 | Home{% endblock %}
{% block nav %}INCLUDE THE TEMPLATE HERE{% endblock %}
{% block content %}...{% endblock %}
What I have already tried:
make the template string a variable - doesn't work, because it doesn't interpret template language in variables (I couldn't find a template tag similar to safe which not only interprets HTML code but also template code.
Building HTMl code in my methods (Isn't best-practice at all, because I needed to use absolute paths)
Is there a good solution about this?
You are making this more complicated than it needs to be.
Firstly, it seems that the only reason you need this to be interpreted as a template is so that it parses the url tag. But there is already a way of creating links in Python code, and that is the reverse() function. You should use that instead.
Secondly, the way to dynamically generate content for use inside a template is to use a custom template tag.

Unable to link blog post to its content page in Wagtail

I'm having a problem creating a link of a Blog Post to its own content page in wagtail. In my models I have two page classes, BlogPage and IndexPage. My BlogPage class is used to create the blog post, and IndexPage class is used to display a list of blog posts.
Please see models below:
from django.db import models
from modelcluster.fields import ParentalKey
from wagtail.wagtailcore.models import Page, Orderable
from wagtail.wagtailcore.fields import RichTextField
from wagtail.wagtailadmin.edit_handlers import FieldPanel, MultiFieldPanel, InlinePanel
from wagtail.wagtailimages.edit_handlers import ImageChooserPanel
from wagtail.wagtailsearch import index
class IndexPage(Page):
intro = RichTextField(blank=True)
def child_pages(self):
return BlogPage.objects.live()
content_panels = Page.content_panels + [
FieldPanel('intro', classname='full'),
]
subpage_types = ['blog.BlogPage']
class BlogPage(Page):
date = models.DateField("Post date")
intro = models.CharField(max_length=250)
body = RichTextField(blank=True)
search_fields = Page.search_fields + (
index.SearchField('intro'),
index.SearchField('body'),
)
content_panels = Page.content_panels + [
FieldPanel('date'),
FieldPanel('intro'),
FieldPanel('body', classname="full")
]
My challenge is that I can't figure out how to link the blog post on the Index Page to its own page. Do I need to create a separate page model and html template to achieve this? or what could be the best approach to solve this problem?
You can create an include template (it doesn't need a model) - let's name it truncated_blog_post.html - which you can then invoke in your index_page.html template. This would be the recommended approach because using a include template for a post gives the possibility to use it anywhere you need to display a list of (truncated usually) posts: when you want the posts under a certain tag, for example.
truncated_blog_post.html
{% load wagtailcore_tags %}
<article>
<h2>{{ blog.title }}</h2>
<p>{{ blog.date }}</p>
<p>{{ blog.body|truncatewords:40 }}</p>
</article>
Using the pageurl tag from wagtailcore_tags you get the relative URL of that blog post. Obviously, if you don't want to create a include template for a truncated post, you can put the article code from blog_post.html directly in the for loop in the index_page.html template.
And your index_page.html template:
....
{% for blog in blogs %}
{% include "path/to/includes/truncated_blog_post.html" %}
{% empty %}
No posts found
{% endfor %}
....
For this to work you have to modify the IndexPage model:
class IndexPage(Page):
intro = RichTextField(blank=True)
#property
def blogs(self):
blogs = BlogPage.objects.live()
return blogs
def get_context(self, request):
# Get blogs
blogs = self.blogs
# Update template context
context = super(IndexPage, self).get_context(request)
context['blogs'] = blogs
return context
content_panels = Page.content_panels + [
FieldPanel('intro', classname='full'),
]
subpage_types = ['blog.BlogPage']

In Jekyll, how to set default tab size in highlighted code?

The tab size of highlighted code can be set by {% highlight c tabsize=4 %}. But this should be set every code blocks. Is there a method to set the default tab size?
# _plugins/pygments_options.rb
class Jekyll::Tags::HighlightBlock
old_sanitized_opts = instance_method(:sanitized_opts)
define_method(:sanitized_opts) do |*args|
old_sanitized_opts.bind(self).(*args).
merge(Jekyll.configuration.fetch("pygments_options", {}))
end
end
and
# _config.yml
# …
pygments_options:
tabsize: 4
{% highlight c tabsize=4 %} is the only way to pass arguments to highlight tag.