Strip html from string Ruby on Rails

Strip html from string Ruby on Rails - html

I'm working with Ruby on Rails, Is there a way to strip html from a string using sanitize or equal method and keep only text inside value attribute on input tag?

If we want to use this in model
ActionView::Base.full_sanitizer.sanitize(html_string)
which is the code in "strip_tags" method

There's a strip_tags method in ActionView::Helpers::SanitizeHelper:
http://api.rubyonrails.org/classes/ActionView/Helpers/SanitizeHelper.html#method-i-strip_tags
Edit: for getting the text inside the value attribute, you could use something like Nokogiri with an Xpath expression to get that out of the string.

Yes, call this: sanitize(html_string, tags:[])

ActionView::Base.full_sanitizer.sanitize(html_string)
White list of tags and attributes can be specified as bellow
ActionView::Base.full_sanitizer.sanitize(html_string, :tags => %w(img br p), :attributes => %w(src style))
Above statement allows tags img, br and p and attributes src and style.

I've used the Loofah library, as it is suitable for both HTML and XML (both documents and string fragments). It is the engine behind the html sanitizer gem. I'm simply pasting the code example to show how simple it is to use.
Loofah Gem
unsafe_html = "ohai! <div>div is safe</div> <script>but script is not</script>"
doc = Loofah.fragment(unsafe_html).scrub!(:strip)
doc.to_s # => "ohai! <div>div is safe</div> "
doc.text # => "ohai! div is safe "

How about this?
white_list_sanitizer = Rails::Html::WhiteListSanitizer.new
WHITELIST = ['p','b','h1','h2','h3','h4','h5','h6','li','ul','ol','small','i','u']
[Your, Models, Here].each do |klass|
klass.all.each do |ob|
klass.attribute_names.each do |attrs|
if ob.send(attrs).is_a? String
ob.send("#{attrs}=", white_list_sanitizer.sanitize(ob.send(attrs), tags: WHITELIST, attributes: %w(id style)).gsub(/<p>\s*<\/p>\r\n/im, ''))
ob.save
end
end
end
end

If you want to remove all html tags you can use
htm.gsub(/<[^>]*>/,'')

This is working for me in rails 6.1.3:
.errors-description
= sanitize(message, tags: %w[div span strong], attributes: %w[class])

You can do .to_plain_text:
#my_string = <p>My HTML String</p>
#my_string.to_plain_text
=> My HTML String

Related

Ruby - unable to write using nokogiri

I am searching a <div> element by its classname and want to add it next to other <div> element. Following code is not writing data what I get using doc1.search .
require 'nokogiri'
doc1 = Nokogiri::HTML(File.open("overview.html"))
affixButtons = doc1.search('div.margin-0-top-lg.margin-10-bottom-lg.text-center')
doc1.at('div.leftnav-btn').add_next_sibling(affixButtons)
Can someone suggest what I'm missing ?

Your code works just fine, if you would just like to write the edited data to file use File.open as follows:
require 'nokogiri'
doc1 = Nokogiri::HTML(File.open("overview.html"))
affixButtons = doc1.search('div.margin-0-top-lg.margin-10-bottom-lg.text-center')
doc1.at('div.leftnav-btn').add_next_sibling(affixButtons)
File.open('output.html', 'w') {|f| f.write(doc1.to_html)}

You're not saving your resulting HTML to file, you can do it like so:
File.open("result.html", "w"){|f| f.write(doc1.to_html)}

Ruby HTML attributes to Hash or Array

type="checkbox" name="prdCdList" value="102001174" class="bnone" newfl="Y" cpnfl="N" catcpnfl="N" eventfl="N" catcd1="102000" catcd2="102001" prdimgl="/upload/product/320_1405497216907.jpg" prdnm="Dear my volume" prdvol="3.4g" prdlndesc="Limited Pink" selprc="10000" spsalprc="0" cpnprc="0" cashptrat="0" cashpt="0" discpt="0" salstatcdnm="Available" salstatcd="PS01" prdwidth="0" prdheight="0" prddepth="0" pricestr="" price="10000" prepromote="" endpromote=""
I am currently using bunch of regexes to parse above data into a structured array or hash.
Actual tag includes much more values. Thought there must be a better way in Ruby like using split or something? There are spaces between attributes but also within certain values so..
Can any one suggest a good way to handle this type of string?
I would like the result be:
hash = {
type => "checkbox",
name => "prdCdList",
... so on.
}
or
arr = [
"checkbox",
"prdCdList",
... so on.
]
Would appreciate any advice =]
Thanks,

node.attributes.each_with_object({}) {|(k,v), acc| acc[k] = v.value }
where node is your tag.

Using Nokogiri, the attributes are already parsed for you - simply access them using []:
doc = Nokogiri::HTML.parse('<html><body><div type="checkbox" name="prdCdList" value="102001174" class="bnone" newfl="Y" cpnfl="N" catcpnfl="N" eventfl="N" catcd1="102000" catcd2="102001" prdimgl="/upload/product/320_1405497216907.jpg" prdnm="Dear my volume" prdvol="3.4g" prdlndesc="Limited Pink" selprc="10000" spsalprc="0" cpnprc="0" cashptrat="0" cashpt="0" discpt="0" salstatcdnm="Available" salstatcd="PS01" prdwidth="0" prdheight="0" prddepth="0" pricestr="" price="10000" prepromote="" endpromote=""></body></html>')
div = doc.css('div').first
div['prdnm']
# => "Dear my volume"
From the documentation:
Nokogiri::XML::Node is your window to the fun filled world of dealing
with XML and HTML tags. A Nokogiri::XML::Node may be treated similarly
to a hash with regard to attributes. For example (from irb):
01.irb(main):004:0> node
02.=> link
03.irb(main):005:0> node['href']
04.=> "#foo"
05.irb(main):006:0> node.keys
06.=> ["href", "id"]
07.irb(main):007:0> node.values
08.=> ["#foo", "link"]
09.irb(main):008:0> node['class'] = 'green'
10.=> "green"
11.irb(main):009:0> node
12.=> link
13.irb(main):010:0>
See Nokogiri::XML::Node#[] and Nokogiri::XML#[]= for more information.

show a smarty variable with html content

I have a smarty variable with html content in it like:
$html="<strong>Content</strong><br/>etc etc"
.
I try to show it html-formatted. When showing it like
{$html}
only plain text appears without formatting. I try like:
{$html|unescape}
but then the tags are shown but not applied. Do you have any suggestions?

Interestingly, none of the answers here work with Smarty 3.1.21 on CS-Cart 4.3.4. So, just to add another thought in that circumstance, use the nofilter on the $html string like so:
{$html nofilter}

You should try this:
{$html|unescape:'html'}
Also check manual:
http://www.smarty.net/docs/en/language.modifier.unescape.tpl

You can try this:
{$html|unescape: "html" nofilter}

Use {$html|unescape: "html" nofilter}
Based on the answer from Sim1-81 and ρяσѕρєя K. I want to explain why the following code works.
The unescape:"html" modifier helps to keep the special characters in place. For example, "€". (Docs).
"nofilter" flag disables $escape_html, which essentially disables the variable being wrapped with htmlspecialchars() (Docs).
Their solution helped as my case was to display a templated block of HTML passed in from a variable.

Some versions of smarty unescape is not available. If this is the case, try using escape:'htmlentitydecode'.
{$html|escape:'htmlentitydecode'}

For those who are using Smarty 2.x, the unescape method is not available, can try this instead;
{$html|html_entity_decode}

you can try :
php function symbol:
function html($str) {
$arr = array(
"<" => "<",
">" => ">",
""" => '"',
"&" => "&",
"\" => chr(92),
"&#39" => chr(39),
"'" => chr(39)
);
return nl2br(strtr($str,$arr));
}
In smarty template call:
{html({$html})}
Or without php function only smarty:
{$html|unescape:'allhtml'}
Notice: if in tpl have use reset css you can try remove it and try again.

HAML - parameter with dash

How can I convert this line
<body data-spy="abcd">
to HAML syntax?
This one returns me an error
%body{:data-spy => "abcd"}

HAML Syntax for the HTML5 Data Field:
%div{ :data => {:id => '555'} }
Now, I started messing around, and it looks like this only works with "data" -- other tags need to be:
%div{ "star-datas" => "hello!" }
Your example:
%body{:data => { :spy => 'abcd'}}

I don't know why I didn't post this in the first place. The "correct" way to write your tag, <body data-spy="abcd">, in HAML, is to skip the {} entirely and use ():
%body(data-spy="abcd")
If you're not evaluating the values of the attributes as Ruby, you shouldn't be using {:key => value} syntax at all. Stick to (key="value") for static HTML attributes.
Original answer:
HAML has a specific syntax for working with data attributes which CrazyVipa's answer summarizes nicely.
For the sake of completeness, I'll point out that you can also use quoted symbol syntax, both here and anywhere else in Ruby that you want to use a hyphen in a symbol:
%body{ :"data-spy" => "abcd" }
In general, :"text" is equivalent to "text".to_sym, allowing your symbol to contain characters it normally couldn't due to parser limitations. The following are all valid symbols:
:"symbol with spaces"
:"symbol-with-hyphens"
:"symbol
with
newlines"
:"def my_func(); puts 'ok'; end"
Note that quoted symbols will not work with Ruby 1.9's new hash syntax:
{ :"key-1" => "value" } # works in 1.8/1.9
{ "key-1": "value" } # syntax error

For HAML ruby compiler:
%div{data: {some_hyphenated_id: 'value'}}
and HAML automatically converts underscores to hyphens so I get:
<div data-some-hyphenated-id="value"></div>
FYI: if you need empty attribute just use true instead of 'value'
Example:
Haml:
%div{data: {topbar: true}}
%div{data: {image_carousel: true}}
HTML:
<div data-topbar></div>
<div data-image-carousel></div>
To be more specific this syntax is valid for ruby haml gem as well as grunt task grunt-haml with language set to ruby (requires mentioned ruby haml gem installed)

How to HTML encode/escape a string? Is there a built-in?

I have an untrusted string that I want to show as text in an HTML page. I need to escape the chars '<' and '&' as HTML entities. The less fuss the better.
I'm using UTF8 and don't need other entities for accented letters.
Is there a built-in function in Ruby or Rails, or should I roll my own?

Checkout the Ruby CGI class. There are methods to encode and decode HTML as well as URLs.
CGI::escapeHTML('Usage: foo "bar" <baz>')
# => "Usage: foo "bar" <baz>"

The h helper method:
<%=h "<p> will be preserved" %>

In Ruby on Rails 3 HTML will be escaped by default.
For non-escaped strings use:
<%= raw "<p>hello world!</p>" %>

ERB::Util.html_escape can be used anywhere. It is available without using require in Rails.

An addition to Christopher Bradford's answer to use the HTML escaping anywhere,
since most people don't use CGI nowadays, you can also use Rack:
require 'rack/utils'
Rack::Utils.escape_html('Usage: foo "bar" <baz>')

You can use either h() or html_escape(), but most people use h() by convention. h() is short for html_escape() in rails.
In your controller:
#stuff = "<b>Hello World!</b>"
In your view:
<%=h #stuff %>
If you view the HTML source: you will see the output without actually bolding the data. I.e. it is encoded as <b>Hello World!</b>.
It will appear an be displayed as <b>Hello World!</b>

Comparaison of the different methods:
> CGI::escapeHTML("quote ' double quotes \"")
=> "quote ' double quotes ""
> Rack::Utils.escape_html("quote ' double quotes \"")
=> "quote ' double quotes ""
> ERB::Util.html_escape("quote ' double quotes \"")
=> "quote ' double quotes ""
I wrote my own to be compatible with Rails ActiveMailer escaping:
def escape_html(str)
CGI.escapeHTML(str).gsub("'", "'")
end

h() is also useful for escaping quotes.
For example, I have a view that generates a link using a text field result[r].thtitle. The text could include single quotes. If I didn't escape result[r].thtitle in the confirm method, the Javascript would break:
<%= link_to_remote "#{result[r].thtitle}", :url=>{ :controller=>:resource,
:action =>:delete_resourced,
:id => result[r].id,
:th => thread,
:html =>{:title=> "<= Remove"},
:confirm => h("#{result[r].thtitle} will be removed"),
:method => :delete %>
<a href="#" onclick="if (confirm('docs: add column &apos;dummy&apos; will be removed')) { new Ajax.Request('/resource/delete_resourced/837?owner=386&th=511', {asynchronous:true, evalScripts:true, method:'delete', parameters:'authenticity_token=' + encodeURIComponent('ou812')}); }; return false;" title="<= Remove">docs: add column 'dummy'</a>
Note: the :html title declaration is magically escaped by Rails.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Strip html from string Ruby on Rails - html

I'm working with Ruby on Rails, Is there a way to strip html from a string using sanitize or equal method and keep only text inside value attribute on input tag?

If we want to use this in model ActionView::Base.full_sanitizer.sanitize(html_string) which is the code in "strip_tags" method

Yes, call this: sanitize(html_string, tags:[])

ActionView::Base.full_sanitizer.sanitize(html_string) White list of tags and attributes can be specified as bellow ActionView::Base.full_sanitizer.sanitize(html_string, :tags => %w(img br p), :attributes => %w(src style)) Above statement allows tags img, br and p and attributes src and style.

If you want to remove all html tags you can use htm.gsub(/<[^>]*>/,'')

This is working for me in rails 6.1.3: .errors-description = sanitize(message, tags: %w[div span strong], attributes: %w[class])

You can do .to_plain_text: #my_string = <p>My HTML String</p> #my_string.to_plain_text => My HTML String

Related

Ruby - unable to write using nokogiri

Ruby HTML attributes to Hash or Array

show a smarty variable with html content

HAML - parameter with dash

How to HTML encode/escape a string? Is there a built-in?

Categories

Resources