Ruby: Change class based on array value - html

I'm looking to create an HTML structure with classes based on the values of arrays from Ruby.
I have 6 classes that will be applied to different elements on an 8x8 grid.
Each row will be a div with 8 span elements inside. In ruby, each nested array will be the div row and then each element will be a span assigned a class based on the value of the array element.
a = [[1,4,3,2,2,3,1,4]
[4,5,6,6,3,2,3,5]]
So two rows will be created with 8 elements inside with the appropriate classes.
Is it possible to convert data structures to HTML like this in Ruby?

Maybe this is what you want:
a = [[1,4,3,2,2,3,1,4],
[4,5,6,6,3,2,3,5]]
html = ''
a.each do |row|
html << "<div>%s</div>" % row.map { |c| %{<span class="#{c}"></span>} }.join
end
# puts html
update
In other words:
html = a.map do |row|
"<div>%s</div>" % row.map { |c| %{<span class="#{c}"></span>} }.join
end.join

umm.. yea. something among the lines of...
a.each do |subArray|
puts "<div>"
subArray.each do |element|
puts '<span class="#{element}">Some text</span>'
end
puts "</div>
end
If this doesn't fit your needs please post a more specific question.

Related

Watir/Ruby selecting next value

I am working with a table that has links in the first column:
html = Nokogiri::HTML(browser.html)
html.css('tr td a').each do |links|
browser.link(:text=>"#{a}").click
puts "#{a}"
end
How do i display the NEXT value for the link?
If the link name is abcd but the next one is efgh, how do i get it to write the efgh?
You should be able to achieve this using the index in the array you are working with.
thing = ['a', 'b', 'c', 'd']
(0..thing.length - 1).each do |index|
puts thing[index + 1]
end
I don't understand the use case here (not at all), but this contrived example might point you in the direction that you're looking to go.
Use the links method to create an array of link objects. Then, you can print the text for the element at the second position but click the element at the first position.
require 'watir-webdriver'
b = Watir::Browser.new
b.goto('http://www.iana.org/domains/reserved')
nav_links = b.div(:class => "navigation").links
puts nav_links[1].text #=> NUMBERS
nav_links[0].click
puts b.url #=> http://www.iana.org/domains
The Enumerable::each_with_index method might also be useful since it cycles through each element of an array and additionally returns the respective element position. For example:
b.div(:class => "navigation").links.each_with_index { |el, i| puts el.text, i }
#=> DOMAINS
#=> 0
#=> NUMBERS
#=> 1
#=> PROTOCOLS
#=> 2
...

How to parse HTML tags as raw text using ElementTree

I have a file that has HTML within XML tags and I want that HTML as raw text, rather than have it be parsed as children of the XML tag. Here's an example:
import xml.etree.ElementTree as ET
root = ET.fromstring("<root><text><p>This is some text that I want to read</p></text></root>")
If i try:
root.find('text').text
It returns no output
but root.find('text/p').text will return the paragraph text without the tags. I want everything within the text tag as raw text, but I can't figure out how to get this.
Your solution is reasonable. An element object is the list of children. The .text attribute of the element object is related only to things (usually a text) that are not part of other (nested) elements.
There are things to be improved in your code. In Python, string concatenation is an expensive operation. It is better to build the list of substrings and to join them later -- like this:
output_lst = []
for child in root.find('text'):
output_lst.append(ET.tostring(child, encoding="unicode"))
output_text = ''.join(output_lst)
The list can be also build using the Python list comprehension construct, so the code would change to:
output_lst = [ET.tostring(child, encoding="unicode") for child in root.find('text')]
output_text = ''.join(output_lst)
The .join can consume any iterable that produces strings. This way the list need not to be constructed in advance. Instead, a generator expression (that is what can be seen inside the [] of the list comprehension) can be used:
output_text = ''.join(ET.tostring(child, encoding="unicode") for child in root.find('text'))
The one-liner can be formatted to more lines to make it more readable:
output_text = ''.join(ET.tostring(child, encoding="unicode")
for child in root.find('text'))
I was able to get what I wanted by appending all child elements of my text tag to a string using ET.tostring:
output_text = ""
for child in root.find('text'):
output_text += ET.tostring(child, encoding="unicode")
>>>output_text
>>>"<p>This is some text that I want to read</p>"
Above solutions will miss initial part of your html if your content begins with text. E.g.
<root><text>This is <i>some text</i> that I want to read</text></root>
You can do that:
node = root.find('text')
output_list = [node.text] if node.text else []
output_list += [ET.tostring(child, encoding="unicode") for child in node]
output_text = ''.join(output_list)

insert html into string at several positions at the same time

So I have a Peptide, which is a string of letters, corresponding to aminoacids
Say the peptide is
peptide_sequence = "VEILANDQGNR"
And it has a modification on L at position 4 and R at position 11,
I would like to insert a "<span class=\"modified_aa\"> and </span> before and after those positions at the same time.
Here is what I tried:
My modifications are stored in an array pep_mods of objects modification containing an attribute location with the position, in this case 4 and 11
pep_mods.each do |m|
peptide_sequence.gsub(peptide_sequence[m.position.to_i-1], "<span class=\"mod\">#{#peptide_sequence[m.location.to_i-1]}</span>" )
end
But since there are two modifications after the first insert of the html span tag the positions in the string become all different
How could I achieve what I intend to do? I hope it was clear
You should work backwards- make the modification starting with the last one. That way the index of earlier modifications is unchanged.
You might need to sort the array of indices in reverse order - then you can use the code you currently have.
Floris's answer is correct, but if you want to do it the hard way (O(n^2) instead of O(nlgn)) here is the basic idea.
Instead of relying on gsub you can iterate over the characters checking if each has an index corresponding to one of the modifications. If the index matches, perform the modification. Otherwise, keep the original character.
modified = peptide_sequence.each_with_index
.to_a
.map do |c, i|
pep_mods.each do |m|
if m.location.to_i = i
%Q{<span class="mod">#{c}</span>}
else
c
end
end
end.join('')
Ok, just in case this is helpful for anyone else, this is how I finally did it:
I first converted the peptide sequence to an array :
pep_seq_arr = peptide_sequence.split("")
then used each_with_index as Casey mentioned
pep_seq_arr.each_with_index do |aa, i|
pep_mods.each do |m|
pep_seq_arr[i] = "<span class='mod'>#{aa}</span>" if i == m.location.to_i-1
end
end
and finally joined the array:
pep_seq_arr.join
It was easier than I first thought

Download HTML Text with Ruby

I am trying to create a histogram of the letters (a,b,c,etc..) on a specified web page. I plan to make the histogram itself using a hash. However, I am having a bit of a problem actually getting the HTML.
My current code:
#!/usr/local/bin/ruby
require 'net/http'
require 'open-uri'
# This will be the hash used to store the
# histogram.
histogram = Hash.new(0)
def open(url)
Net::HTTP.get(URI.parse(url))
end
page_content = open('_insert_webpage_here')
page_content.each do |i|
puts i
end
This does a good job of getting the HTML. However, it gets it all. For www.stackoverflow.com it gives me:
<body><h1>Object Moved</h1>This document may be found here</body>
Pretending that it was the right page, I don't want the html tags. I'm just trying to get Object Moved and This document may be found here.
Is there any reasonably easy way to do this?
When you require 'open-uri', you don't need to redefine open with Net::HTTP.
require 'open-uri'
page_content = open('http://www.stackoverflow.com').read
histogram = {}
page_content.each_char do |c|
histogram[c] ||= 0
histogram[c] += 1
end
Note: this does not strip out <tags> within the HTML document, so <html><body>x!</body></html> will have { '<' => 4, 'h' => 2, 't' => 2, ... } instead of { 'x' => 1, '!' => 1 }. To remove the tags, you can use something like Nokogiri (which you said was not available), or some sort of regular expression (such as the one in Dru's answer).
See the section "Following Redirection" on the Net::HTTP Documentation here
Stripping html tags without Nokogiri
puts page_content.gsub(/<\/?[^>]*>/, "")
http://codesnippets.joyent.com/posts/show/615

How to write a transformer for Ruby Sanitize Gem to transform <br> into newlines?

I'm using a wrapper for the Sanitize Gem's clean method to solve some our issues:
def remove_markup(html_str)
html_str.gsub /(\<\/p\>)/, "#{$1}\n"
marked_up = Sanitize.clean html_str
ESCAPE_SEQUENCES.each do |esc_seq, ascii_seq|
marked_up = marked_up.gsub('&' + esc_seq + ';', ascii_seq.chr)
end
marked_up
end
I recently add the gsub two lines as a quick way to do what I wanted:
Replace insert a newline wherever a paragraph ended.
However, I'm sure this can be accomplished more elgantly with a Sanitize transformer.
Unfortunately, I think I must be misunderstanding a few things. Here is an example of a transformer I wrote for the tag that worked.
s2 = "<p>here is para 1<br> It's a nice paragraph</p><p>Don't forget para 2</p>"
br_to_nl = lambda do |env|
node = env[:node]
node_name = env[:node_name]
return if env[:is_whitelisted] || !node.element?
return unless node_name == 'br'
node.replace "\n"
end
Sanitize.clean s2, :transformers => [br_to_nl]
=> " here is para 1\n It's a nice paragraph Don't forget para 2 "
But I couldn't come up with a solution that would work well for <p> tags.
Should I add a text element to the node as a child? How to make it show up immediately after the element?
related question (answered) How to use RubyGem Sanitize transformers to sanitize an unordered list into a comma seperated list?