Just wondering if these two functions are to be done using Nokogiri or via more basic Ruby commands.
require 'open-uri'
require 'nokogiri'
require "net/http"
require "uri"
doc = Nokogiri.parse(open("example.html"))
doc.xpath("//meta[#name='author' or #name='Author']/#content").each do |metaauth|
puts "Author: #{metaauth}"
end
doc.xpath("//meta[#name='keywords' or #name='Keywords']/#content").each do |metakey|
puts "Keywords: #{metakey}"
end
etc...
Question 1: I'm just trying to parse a directory of .html documents, get the information from the meta html tags, and output the results to a text file if possible. I tried a simple *.html wildcard replacement, but that didn't seem to work (at least not with Nokogiri.parse(open()) maybe it works with ::HTML or ::XML)
Question 2: But more important, is it possible to output all of those meta content outputs into a text file to replace the puts command?
Also forgive me if the code is overly complicated for the simple task being performed, but I'm a little new to Nokogiri / xpath / Ruby.
Thanks.
I have a code similar.
Please refer to:
module MyParser
HTML_FILE_DIR = `your html file dir`
def self.run(options = {})
file_list = Dir.entries(HTML_FILE_DIR).reject { |f| f =~ /^\./ }
result = file_list.map do |file|
html = File.read("#{HTML_FILE_DIR}/#{file}")
doc = Nokogiri::HTML(html)
parse_to_hash(doc)
end
write_csv(result)
end
def self.parse_to_hash(doc)
array = []
array << doc.css(`your select conditons`).first.content
... #add your selector code css or xpath
array
end
def self.write_csv(result)
::CSV.open("`your out put file name`", 'w') do |csv|
result.each { |row| csv << row }
end
end
end
MyParser.run
You can output to a file like so:
File.open('results.txt','w') do |file|
file.puts "output" # See http://ruby-doc.org/core-2.1.2/IO.html#method-i-puts
end
Alternatively, you could do something like:
authors = doc.xpath("//meta[#name='author' or #name='Author']/#content")
keywrds = doc.xpath("//meta[#name='keywords' or #name='Keywords']/#content")
results = authors.map{ |x| "Author: #{x}" }.join("\n") +
keywrds.map{ |x| "Keywords: #{x}" }.join("\n")
File.open('results.txt','w'){ |f| f << results }
Related
I've written a simple plugin that generates a small JSON file
module Jekyll
require 'pathname'
require 'json'
class SearchFileGenerator < Generator
safe true
def generate(site)
output = [{"title" => "Test"}]
path = Pathname.new(site.dest) + "search.json"
FileUtils.mkdir_p(File.dirname(path))
File.open(path, 'w') do |f|
f.write("---\nlayout: null\n---\n")
f.write(output.to_json)
end
# 1/0
end
end
end
But the generated JSON file gets deleted every time Jekyll runs to completion. If I uncomment the division by zero line and cause it to error out, I can see that the search.json file is being generated, but it's getting subsequently deleted. How do I prevent this?
I found the following issue, which suggested adding the file to keep_files: https://github.com/jekyll/jekyll/issues/5162 which worked:
The new code seems to avoid search.json from getting deleted:
module Jekyll
require 'pathname'
require 'json'
class SearchFileGenerator < Generator
safe true
def generate(site)
output = [{"title" => "Test"}]
path = Pathname.new(site.dest) + "search.json"
FileUtils.mkdir_p(File.dirname(path))
File.open(path, 'w') do |f|
f.write("---\nlayout: null\n---\n")
f.write(output.to_json)
end
site.keep_files << "search.json"
end
end
end
Add your new page to site.pages :
module Jekyll
class SearchFileGenerator < Generator
def generate(site)
#site = site
search = PageWithoutAFile.new(#site, site.source, "/", "search.json")
search.data["layout"] = nil
search.content = [{"title" => "Test 32"}].to_json
#site.pages << search
end
end
end
Inspired by jekyll-feed code.
I am very new on Sketchup and ruby , I have worked with java and c# but this is the first time with ruby.
Now I have one problem, I need to serialize all scene in one json (scene hierarchy, object name, object material and position this for single object) how can I do this?
I have already done this for unity3D (c#) without a problem.
I tried this:
def main
avr_entities = Sketchup.active_model.entities # all objects
ambiens_dictionary = {}
ambiens_list = []
avr_entities.each do |root|
if root.is_a?(Sketchup::Group) || root.is_a?(Sketchup::ComponentInstance)
if root.name == ""
UI.messagebox("this is a group #{root.definition.name}")
if root.entities.count > 0
root.entities.each do |leaf|
if leaf.is_a?(Sketchup::Group) || leaf.is_a?(Sketchup::ComponentInstance)
UI.messagebox("this is a leaf #{leaf.definition.name}")
end
end
end
else
# UI.messagebox("this is a leaf #{root.name}")
end
end
end
end
Have you tried the JSON library
require 'json'
source = { a: [ { b: "hello" }, 1, "world" ], c: 'hi' }.to_json
source.to_json # => "{\"a\":[{\"b\":\"hello\"},1,\"world\"],\"c\":\"hi\"}"
Used the code below to answer a question Here, but it might also work here.
The code can run outside of SketchUp for testing in the terminal. Just make sure to follow these steps...
Copy the code below and paste it on a ruby file (example: file.rb)
Run the script in terminal ruby file.rb.
The script will write data to JSON file and also read the content of JSON file.
The path to the JSON file is relative to the ruby file created in step one. If the script can't find the path it will create the JSON file for you.
module DeveloperName
module PluginName
require 'json'
require 'fileutils'
class Main
def initialize
path = File.dirname(__FILE__)
#json = File.join(path, 'file.json')
#content = { 'hello' => 'hello world' }.to_json
json_create(#content)
json_read(#json)
end
def json_create(content)
File.open(#json, 'w') { |f| f.write(content) }
end
def json_read(json)
if File.exist?(json)
file = File.read(json)
data_hash = JSON.parse(file)
puts "Json content: #{data_hash}"
else
msg = 'JSON file not found'
UI.messagebox(msg, MB_OK)
end
end
# # #
end
DeveloperName::PluginName::Main.new
end
end
I have a whole lot of html files that live in one folder. I need to convert these to markdown I found a couple gems out there that does this great one by one.
my question is...
How can I loop though each file in the folder and run the command to convert these to md on a separate folder.
UPDATE
#!/usr/bin/ruby
root = 'C:/Doc'
inDir = File.join(root, '/input')
outDir = File.join(root, '/output')
extension = nil
fileName = nil
Dir.foreach(inDir) do |file|
# Dir.foreach will always show current and parent directories
if file == '.' or item == '..' then
next
end
# makes sure the current iteration is not a sub directory
if not File.directory?(file) then
extension = File.extname(file)
fileName = File.basename(file, extension)
end
# strips off the last string if it contains a period
if fileName[fileName.length - 1] == "." then
fileName = fileName[0..-1]
end
# this is where I got stuck
reverse_markdown File.join(inDir, fileName, '.html') > File.join(outDir, fileName, '.md')
Dir.glob(directory) {|f| ... } will loop through all files inside a directory. For example using the Redcarpet library you could do something like this:
require 'redcarpet'
markdown = Redcarpet::Markdown.new(Redcarpet::Render::HTML, :autolink => true)
Dir.glob('*.md') do |in_filename|
out_filename = File.join(File.dirname(in_filename), "#{File.basename(in_filename,'.*')}.html")
File.open(in_filename, 'r') do |in_file|
File.open(out_filename, 'w') do |out_file|
out_file.write markdown.render(in_file.read)
end
end
end
I have a html file has the general design (some div's) and I need to fill this div's with some html code Using ruby script.
any suggests?
example
I have page.html
<html>
<title>html Page</title>
<body>
<div id="main">
</div>
<div id="side">
</div>
</body>
</html>
and a ruby script inside it i collect some data and doing some kind of processing on it and i want to present it in a nice format**
so I want to set the div which it's id=main with some html code to be like this
<html>
<title>html Page</title>
<body>
<div id="main">
<h1>you have 30 files in games folder</h1>
</div>
<div id="side">
</div>
</body>
</html>
** why i don't use ROR? because I don't want to build a web site I just need to build a desktop tool but it's presentation layer is html code interpreted by browser to avoid working with graphics libraries
my problem isn't "how can I write to this html file" I can handle it.
my problem that If I want to create a table in the html file inside main div
I will wrote the whole html code inside the ruby script to print it to the html file, is there any lib or gem that i can tell it that I want a table with 3 rows and 2 columns and it generates the html code?
I historically have used ERB and REXML for things like this, since they both ship with Ruby (removing gem dependencies). You can combine one XML file (content) with one .erb file (for layout) and get simple merging. Here's a script I wrote for this (most of which is argument handling and extending REXML with some convenience methods):
USAGE = <<ENDUSAGE
Usage:
rubygen source_xml [-t template_file] [-o output_file]
-t,--template The ERB template file to merge (default: xml_name.erb)
-o,--output The output file name to write (default: template.txt)
If the template_file is named "somefile_XXX.yyy",
the output_file will default instead to "somefile.XXX"
ENDUSAGE
ARGS = {}
UNFLAGGED_ARGS = [ :source_xml ]
next_arg = UNFLAGGED_ARGS.first
ARGV.each{ |arg|
case arg
when '-t','--template'
next_arg = :template_file
when '-o','--output'
next_arg = :output_file
else
if next_arg
ARGS[next_arg] = arg
UNFLAGGED_ARGS.delete( next_arg )
end
next_arg = UNFLAGGED_ARGS.first
end
}
if !ARGS[:source_xml]
puts USAGE
exit
end
extension_match = /\.[^.]+$/
template_match = /_([^._]+)\.[^.]+$/
xml_file = ARGS[ :source_xml ]
template_file = ARGS[ :template_file] || xml_file.sub( extension_match, '.erb' )
output_file = ARGS[ :output_file ] || ( ( template_file =~ template_match ) ? template_file.sub( template_match, '.\\1' ) : template_file.sub( extension_match, '.txt' ) )
require 'rexml/document'
include REXML
class REXML::Element
# Find all descendant nodes with a specified tag name and/or attributes
def find_all( tag_name='*', attributes_to_match={} )
self.each_element( ".//#{REXML::Element.xpathfor(tag_name,attributes_to_match)}" ){}
end
# Find all child nodes with a specified tag name and/or attributes
def kids( tag_name='*', attributes_to_match={} )
self.each_element( "./#{REXML::Element.xpathfor(tag_name,attributes_to_match)}" ){}
end
def self.xpathfor( tag_name='*', attributes_to_match={} )
out = "#{tag_name}"
unless attributes_to_match.empty?
out << "["
out << attributes_to_match.map{ |key,val|
if val == :not_empty
"##{key}"
else
"##{key}='#{val}'"
end
}.join( ' and ' )
out << "]"
end
out
end
# A hash to tag extra data onto a node during processing
def _mydata
#_mydata ||= {}
end
end
start_time = Time.new
#xmldoc = Document.new( IO.read( xml_file ), :ignore_whitespace_nodes => :all )
#root = #xmldoc.root
#root = #root.first if #root.is_a?( Array )
end_time = Time.new
puts "%.2fs to parse XML file (#{xml_file})" % ( end_time - start_time )
require 'erb'
File.open( output_file, 'w' ){ |o|
start_time = Time.new
output_code = ERB.new( IO.read( template_file ), nil, '>', 'output' ).result( binding )
end_time = Time.new
puts "%.2fs to run template (#{template_file})" % ( end_time - start_time )
start_time = Time.new
o << output_code
}
end_time = Time.new
puts "%.2fs to write output (#{output_file})" % ( end_time - start_time )
puts " "
This can be used for HTML or automated source code generation alike.
However, these days I would advocate using Haml and Nokogiri (if you want structured XML markup) or YAML (if you want simple-to-edit content), as these will make your markup cleaner and your template logic simpler.
Edit: Here's a simpler file that merges YAML with Haml. The last four lines do all the work:
#!/usr/bin/env ruby
require 'yaml'; require 'haml'; require 'trollop'
EXTENSION = /\.[^.]+$/
opts = Trollop.options do
banner "Usage:\nyamlhaml [opts] <sourcefile.yaml>"
opt :haml, "The Haml file to use (default: sourcefile.haml)", type:String
opt :output, "The file to create (default: sourcefile.html)", type:String
end
opts[:source] = ARGV.shift
Trollop.die "Please specify an input Yaml file" unless opts[:source]
Trollop.die "Could not find #{opts[:source]}" unless File.exist?(opts[:source])
opts[:haml] ||= opts[:source].sub( EXTENSION, '.haml' )
opts[:output] ||= opts[:source].sub( EXTENSION, '.html' )
Trollop.die "Could not find #{opts[:haml]}" unless File.exist?(opts[:haml])
#data = YAML.load(IO.read(opts[:source]))
File.open( opts[:output], 'w' ) do |output|
output << Haml::Engine.new(IO.read(opts[:haml])).render(self)
end
Here's a sample YAML file:
title: Hello World
main: "<h1>you have 30 files in games folder</h1>"
side: "I dunno, something goes here."
...and a sample Haml file:
!!! 5
%html
%head
%title= #data['title']
%body
#main= #data['main']
#side= #data['side']
...and finally the HTML they produce:
<!DOCTYPE html>
<html>
<head>
<title>Hello World</title>
</head>
<body>
<div id='main'><h1>you have 30 files in games folder</h1></div>
<div id='side'>I dunno, something goes here.</div>
</body>
</html>
Are you trying to create a dynamic website? For that use Rails.
Are you trying to create a static website? Something like Jekyll is probably best.
Are you trying to to just create some some simple .html files you can FTP up somewhere? Jekyll might be a good option or even hand coding a quick little HTML generator might be a better option.
UPDATE:
Is this what you are looking for?
hash = {
:games => "you have 30 files in games folder",
:puppies => "you have 12 puppies in your pocket",
:pictures => "You have 9 files in pictures folder",
}
array = [
['run','x','y'],
[1,10,3],
[2,12,9],
[3,14,7],
]
hash.each do |key, value|
myfile = File.new("#{key}.html", "w+")
myfile.puts "<html>"
myfile.puts "<title>html Page</title>"
myfile.puts "<body>"
myfile.puts "<div id=\"main\">"
myfile.puts "<h1>#{value}</h1>"
myfile.puts "<table border=\"1\">"
array.each do |row|
myfile.puts "<tr>"
row.each do |cell|
myfile.puts "<td> #{cell} </td>"
end
myfile.puts "<tr>"
end
myfile.puts "</div>"
myfile.puts "<div id=\"side\">"
myfile.puts "</div>"
myfile.puts "</body>"
myfile.puts "</html>"
end
Continuing from #Phrogz's work, the ERB idea is a great idea. I was able to use it to build a simple Rake script that does the work for me. I find this approach to be a little easier.
rakefile.rb
task :default => :generate
task :generate do
require 'erb'
template_file = "page.erb"
output_file = "page.html"
File.open(output_file, 'w') do |o|
puts "Processing file: #{template_file}"
o << ERB.new( IO.read( template_file ), nil, '>', 'output' ).result( binding )
end
end
def render(file)
puts "Rendering file: #{file}"
IO.read(file)
end
$game_count = 30
def game_count
puts "Rendering game count: #{$game_count}"
$game_count
end
page.erb
<html>
<title>html Page</title>
<body>
<div id="main">
<h1>you have <%= game_count %> files in games folder</h1>
</div>
<div id="side">
<%= render "side.html" %>
</div>
</body>
</html>
side.html
<ul class="side">
<li>Side item 1</li>
<li>Side item 2</li>
</ul>
Running it
$ rake
Processing file: page.erb
Rendering game count: 30
Rendering file: side.html
Newly created file page.html
<html>
<title>html Page</title>
<body>
<div id="main">
<h1>you have 30 files in games folder</h1>
</div>
<div id="side">
<ul class="side">
<li>Side item 1</li>
<li>Side item 2</li>
</ul>
</div>
</body>
</html>
doc.xpath('//img') #this will get some results
doc.xpath('//img[#class="productImage"]') #and this gets nothing at all
doc.xpath('//div[#id="someID"]') # and this won't work either
I don't know what went wrong here,I double checked the HTML source,There are plenty of img tag which contains the attribute(class="productImage").
It's like the attribute selector just won't work.
Here is the URL which the HTML source come from.
http://www.amazon.cn/s/ref=nb_sb_ss_i_0_1?__mk_zh_CN=%E4%BA%9A%E9%A9%AC%E9%80%8A%E7%BD%91%E7%AB%99&url=search-alias%3Daps&field-keywords=%E4%B8%93%E5%85%AB&x=0&y=0&sprefix=%E4%B8%93
please do me a favor if you got some spare time.Parse the HTML content like I do see if you can solve this one
The weird thing is if you use open-uri on that page you get a different result than when using something like curl or wget.
However when you change the User-Agent you actually get probably the page you are looking for:
Analysis:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'pp'
URL = 'http://www.amazon.cn/...'
def analyze_html(file)
doc = Nokogiri.HTML(file)
pp doc.xpath('//img').map { |i| i[:class] }.compact.reject(&:empty?)
puts doc.xpath('//div').map { |i| i[:class] }.grep(/productImage/).count
puts doc.xpath('//div[#class="productImage"]//img').count
pp doc.xpath('//div[#class="productImage"]//img').map { |i| i[:src] }
end
puts "Attempt 1:"
analyze_html(open(URL))
puts "Attempt 2:"
analyze_html(open(URL, "User-Agent" => "Wget/1.10.2"))
Output:
Attempt 1:
["default navSprite"]
0
0
[]
Attempt 2:
["default navSprite", "srSprite spr_kindOfSortBtn"]
16
16
["http://ec4.images-amazon.com/images/I/51fOb3ujSjL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/513UQ1xiaSL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/41zKxWXb8HL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51bj6XXAouL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/516GBhDTGCL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51ADd3HSE6L._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51CbB-7kotL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51%2Bw40Mk51L._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/519Gny1LckL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51Dv6DUF-WL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51uuy8yHeoL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51T0KEjznqL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/419WTi%2BdjzL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51QTg4ZmMmL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51l--Pxw9TL._AA115_.jpg",
"http://ec4.images-amazon.com/images/I/51gehW2qUZL._AA115_.jpg"]
Solution:
Use User-Agent: Wget/1.10.2
Use xpath('//div[#class="productImage"]//img')