By default, the Django admin strips away all HTML tags from user input. I'd like to allow a small subset of tags, say <a>. What's the easiest way to do this? I know about allow_tags, but it's deprecated. I also want to be careful about manually marking strings as safe that aren't.
If external library isn't a burden for you, then you must try django-bleach, it will suffice your requirement. It returns valid HTML that only contains your specified allowed tags.
Configuration:
in settings.py
BLEACH_ALLOWED_TAGS = ['p', 'b', 'i', 'u', 'em', 'strong', 'a']
BLEACH_ALLOWED_ATTRIBUTES = ['href', 'title', 'style']
BLEACH_STRIP_TAGS = True
Use cases:
1. In your models:
from django import models
from django_bleach.models import BleachField
class Post(models.Model):
title = models.CharField()
content = BleachField()
2. In your forms:
class PostForm(forms.ModelForm):
content = BleachField()
class Meta:
model = Post
fields = ['title', 'content']
In your templates:
{% load bleach_tags %}
{{ unsafe_html|bleach }}
for more usage, I suggest you must read the documentation. Its quite easy and straight forward.
documentation
You can use format_html() or mark_safe() in place of allow_tags. Although, like you were saying, mark_safe() probably isn't a good idea for user input.
format_html(): https://docs.djangoproject.com/en/1.9/ref/utils/#django.utils.html.format_html
mark_safe(): https://docs.djangoproject.com/en/1.9/ref/utils/#django.utils.safestring.mark_safe
Related
I have the following use case: a user should be able to enter in HTML input and it should be displayed as such. However, it can only contain <br>, <italic>, <strong>, <ul> or <li> tags.
I know about the safe filter, but that way it would allow every HTML input and be prone to XSS.
Any idea how I can solve this?
Thanks!
As mentioned in the answer to this question, one can use bleach.
Start by defining a list of the tags one wants to allow by overriding the default ALLOWED_TAGS
ALLOWED_TAGS = ['br', 'italic', 'strong', 'ul', 'li']
Then, use bleach.clean() to remove any other HTML tags that are not allowed
user_input = '<p>an <strong>example</strong> for SO</p>'
cleaned_user_input = bleach.clean(user_input, tags=ALLOWED_TAGS)
This will remove the p tag from the user_input.
We can make a validator that only allows certain tags, for example with BeautifulSoup:
from bs4 import BeautifulSoup
from bs4.element import Tag
from django.core.exceptions import ValidationError
from django.utils.deconstruct import deconstructible
#deconstructible
class HtmlValidator:
def __init__(self, tags=()):
self.tags = tags
def validate(self, node):
if isinstance(node, Tag):
if node.name not in self.tags:
raise ValidationError(f'Tag {node.name} is not a valid tag')
for child in node:
self.validate(child)
def __call__(self, value):
soup = BeautifulSoup(value, 'html.parser')
for child in soup:
self.validate(soup)
Then we can add such validator to the model:
class MyModel(models.Model):
content = models.CharField(
max_length=1024,
validators=[HtmlValidator(tags={'br', 'italic', 'strong', 'ul', 'li'})],
)
# …
I am new to django and I know there are many stack overflow questions and answers related with this topic, but none of the solutions seems to work for me. I am trying to override django's default error messages.
I tried these to name a few of the solutions I tried.
class MyForm(forms.ModelForm):
class Meta:
error_messages = {
'first_name': {
'required': _("First name is required."),
},
}
Also tried this
class MyRequest(models.Model):
first_name = models.CharField(
max_length=254,
blank=False,
error_messages={
'blank': 'my required msg..',
}
)
Is there any thing I need to do on the template side?
Override the field like this
`first_name = forms.CharField(error_messages={'required':_("First name is required.")})`
NB: if the error is for the condition required, adding a message for blank or null won't cut it. also in your form's Meta, don't forget to add your model.
I suggest the following:
(In general the required property is assigned to form field by default)
class MyForm(forms.ModelForm):
# we want to add or overwrite this field
first_name = forms.CharField(label="First Name", required=True,
help_text="Required: Enter your first name")
class Meta:
# If the form is related to some model, add the model name here
model = ModelNameGoesHere
I don't know how to implement this with django crispy forms.
I have an interface with a URL like this:
myurl.com/movements/new
And I have a select in the form with the type of movement.
When there is not a type of movement explicitly assigned, just shows the select without any option selected.
When user access to form with an URL like myurl.com/movements/income/
I want this select to have by default the income option.
And so on with every possible option.
I know that I can use JavaScript for this, but I think that it would be better to have it on the back-end.
How can I achieve this on the back-end part?
models.py:
class MyModel(models.Model):
CHOICES = (
('Income', 'Income'),
('Option2', 'Option2'),
('Option3', 'Option3'),
)
choice = models.CharField(max_length=25, choices=CHOICES)
urls.py:
urlpatterns = [
url(
regex=r'^new/(?P<option>[\w.#+-]+)/$', # feel free to adjust the regex
view=views.NewCreateView.as_view(),
name='new'
),
url(
regex=r'^new/$',
view=views.NewCreateView.as_view(),
name='new'
)
]
views.py:
class NewCreateView(CreateView):
model = MyModel
fields = ['choice']
def get_form_kwargs(self):
form_kwargs = super().get_form_kwargs()
if 'option' in self.kwargs:
if any(self.kwargs['option'] in choice for choice in MyModel.CHOICES):
form_kwargs['initial']['choice'] = self.kwargs['option']
return form_kwargs
The initial selection of the drop-down list is only given if you visit the URL new/ with a valid option like new/Income. Of course, you can adjust the URL according to your needs.
You could also override get_initial instead of get_form_kwargs.
I currently have this class for scraping products from a single retailer website using Nokogiri. XPath, CSS path details are stored in MySQL.
ActiveRecord::Base.establish_connection(
:adapter => "mysql2",
...
)
class Site < ActiveRecord::Base
has_many :site_details
def create_product_links
# http://www.example.com
p = Nokogiri::HTML(open(url))
p.xpath(total_products_path).each {|lnk| SiteDetail.find_or_create_by(url: url + "/" + lnk['href'], site_id: self.id)}
end
end
class SiteDetail < ActiveRecord::Base
belongs_to :site
def get_product_data
# http://www.example.com
p = Nokogiri::HTML(open(url))
title = p.css(site.title_path).text
price = p.css(site.price_path).text
description = p.css(site.description_path).text
update_attributes!(title: title, price: price, description: description)
end
end
# Execution
#s = Site.first
#s.site_details.get_product_data
I will be adding more sites (around 700) in the future. Each site have a different page structure. So get_product_data method cannot be used as is. I may have to use case or if statement to jump and execute relevant code. Soon this class becomes quite chunky and ugly (700 retailers).
What is the best design approach suitable in this scenario?
Like #James Woodward said, you're going to want to create a class for each retailer. The pattern I'm going to post has three parts:
A couple of ActiveRecord classes that implement a common interface for storing the data you want to record from each site
700 different classes, one for each site you want to scrape. These classes implement the algorithms for scraping the sites, but don't know how to store the information in the database. To do that, they rely on the common interface from step 1.
One final class that ties it all together running each of the scraping algorithms you wrote in step 2.
Step 1: ActiveRecord Interface
This step is pretty easy. You already have a Site and SiteDetail class. You can keep them for storing the data you scrape from website in your database.
You told the Site and SiteDetail classes how to scrape data from websites. I would argue this is inappropriate. Now you've given the classes two responsibilities:
Persist data in the database
Scrape data from the websites
We'll create new classes do handle the scraping responsibility in the second step. For now, you can strip down the Site and SiteDetail classes so that they only act as database records:
class Site < ActiveRecord::Base
has_many :site_details
end
class SiteDetail < ActiveRecord::Base
belongs_to :site
end
Step 2: Implement Scrapers
Now, we'll create new classes that handle the scraping responsibility. If this were a language that supported abstract classes or interfaces like Java or C#, we would proceed like so:
Create an IScraper or AbstractScraper interface that handles the tasks common to scraping a website.
Implement a different FooScraper class for each of the sites you want to scrape, each one inheriting from AbstractScraper or implementing IScraper.
Ruby doesn't have abstract classes, though. What it does have is duck typing and module mix-ins. This means we'll use this very similar pattern:
Create a SiteScraper module that handles the tasks common to scraping a website. This module will assume that the classes that extend it have certain methods it can call.
Implement a different FooScraper class for each of the sites you want to scrape, each one mixing in the SiteScraper module and implementing the methods the module expects.
It looks like this:
module SiteScraper
# Assumes that classes including the module
# have get_products and get_product_details methods
#
# The get_product_urls method should return a list
# of the URLs to visit to get scraped data
#
# The get_product_details the URL of the product to
# scape as a string and return a SiteDetail with data
# scraped from the given URL
def get_data
site = Site.new
product_urls = get_product_urls
for product_url in product_urls
site_detail = get_product_details product_url
site_detail.site = site
site_detail.save
end
end
end
class ExampleScraper
include 'SiteScraper'
def get_product_urls
urls = []
p = Nokogiri::HTML(open('www.example.com/products'))
p.xpath('//products').each {|lnk| urls.push lnk}
return urls
end
def get_product_details(product_url)
p = Nokogiri::HTML(open(product_url))
title = p.css('//title').text
price = p.css('//price').text
description = p.css('//description').text
site_detail = SiteDetail.new
site_detail.title = title
site_detail.price = price
site_detail.description = description
return site_detail
end
end
class FooBarScraper
include 'SiteScraper'
def get_product_urls
urls = []
p = Nokogiri::HTML(open('www.foobar.com/foobars'))
p.xpath('//foo/bar').each {|lnk| urls.push lnk}
return urls
end
def get_product_details(product_url)
p = Nokogiri::HTML(open(product_url))
title = p.css('//foo').text
price = p.css('//bar').text
description = p.css('//foo/bar/iption').text
site_detail = SiteDetail.new
site_detail.title = title
site_detail.price = price
site_detail.description = description
return site_detail
end
end
... and so on, creating a class that mixes in SiteScraper and implements get_product_urls and get_product_details for each one of the 700 website you need to scrape. Unfortunately, this is the tedious part of the pattern: There's no real way to get around writing a different scraping algorithm for all 700 sites.
Step 3: Run Each Scraper
The final step is to create the cron job that scrapes the sites.
every :day, at: '12:00am' do
ExampleScraper.new.get_data
FooBarScraper.new.get_data
# + 698 more lines
end
I'm using both django-taggit and django-filter in my web application, which stores legal decisions. My main view (below) inherits from the stock django-filter FilterView and allows people to filter the decisions by both statutes and parts of statutes.
class DecisionListView(FilterView):
context_object_name = "decision_list"
filterset_class = DecisionFilter
queryset = Decision.objects.select_related().all()
def get_context_data(self, **kwargs):
# Call the base implementation to get a context
context = super(DecisionListView, self).get_context_data(**kwargs)
# Add in querysets for all the statutes
context['statutes'] = Statute.objects.select_related().all()
context['tags'] = Decision.tags.most_common().distinct()
return context
I also tag decisions by topic when they're added and I'd like people to be able to filter on that too. I currently have the following in models.py:
class Decision(models.Model):
citation = models.CharField(max_length = 100)
decision_making_body = models.ForeignKey(DecisionMakingBody)
statute = models.ForeignKey(Statute)
paragraph = models.ForeignKey(Paragraph)
...
tags = TaggableManager()
class DecisionFilter(django_filters.FilterSet):
class Meta:
model = Decision
fields = ['statute', 'paragraph']
I tried adding 'tags' to the fields list in DecisionFilter but that had no effect, presumably because a TaggableManager is a Manager rather than a field in the database. I haven't found anything in the docs for either app that covers this. Is it possible to filter on taggit tags?
You should be able to use 'tags__name' as the search/filter field. Check out the Filtering section on http://django-taggit.readthedocs.org/en/latest/api.html#filtering