Nokogiri Exclude HTML Class - html

I'm trying to scrape the names of all the people who commented on a post in our Facebook group. I downloaded the file locally and am able to scrape the names of the people who commented plus the people who replied to those comments. I only want the original comments, not the replies... it seems like I have to exclude the UFIReplyList class but my code is still pulling all the names. Any help would be greatly appreciated. Thanks!
require 'nokogiri'
require 'pry'
class Scraper
##all = []
def get_page
file = File.read('/Users/mark/Desktop/raffle.html')
doc = Nokogiri::HTML(file)
# binding.pry
doc.css(".UFICommentContent").each do |post|
# binding.pry
author = post.css(".UFICommentActorName").css(":not(.UFIReplyList)").text
##all << author
end
puts ##all
end
end
Scraper.new.get_page

Traverse ancestors for every .UFICommentActorName element, to reject those contained within a .UFIReplyList element.
#authors_nodes = doc.css(".UFICommentActorName").reject do |node|
# extract all ancestor class names;
# beware of random whitespace and multiple classes per node
class_names = node.ancestors.map{ |a| a.attributes['class'].value rescue nil }
class_names = class_names.compact.map{ |names| names.split(' ') }
class_names = class_names.flatten.map(&:strip)
# reject if .UFIReplyList found
class_names.include?('UFIReplyList')
end
#authors_nodes.map(&:text)

Related

How do I cause ActiveModelSerializers to serialize with :attributes and respect my key_transform?

I have a very simple model that I wish to serialize in a Rails (5) API. I want to produce the resulting JSON keys as CamelCase (because that's what my client expects). Because I expect the model to increase in complexity in future, I figured I should use ActiveModelSerializers. Because the consumer of the API expects a trivial JSON object, I want to use the :attributes adapter. But, I cannot seem to get AMS to respect my setting of :key_transform, regardless of whether I set ActiveModelSerializers.config.key_transform = :camel in my configuration file or create the resource via s = ActiveModelSerializers::SerializableResource.new(t, {key_transform: :camel}) (where t represents the ActiveModel object to be serialized) in the controller. In either case, I call render json: s.as_json.
Is this a configuration problem? Am I incorrectly expecting the default :attributes adapter to respect the setting of :key_transform (this seems unlikely, based on my reading of the code in the class, but I'm often wrong)? Cruft in my code? Something else?
If additional information would be helpful, please ask, and I'll edit my question.
Controller(s):
class ApplicationController < ActionController::API
before_action :force_json
private
def force_json
request.format = :json
end
end
require 'active_support'
require 'active_support/core_ext/hash/keys'
class AvailableTrucksController < ApplicationController
def show
t = AvailableTruck.find_by(truck_reference_id: params[:id])
s = ActiveModelSerializers::SerializableResource.new(t, {key_transform: :camel})
render json: s.as_json
end
end
config/application.rb
require_relative 'boot'
require 'rails/all'
Bundler.require(*Rails.groups)
module AvailableTrucks
class Application < Rails::Application
config.api_only = true
ActiveModelSerializers.config.key_transform = :camel
# ActiveModelSerializers.config.adapter = :json_api
# ActiveModelSerializers.config.jsonapi_include_toplevel_object = false
end
end
class AvailableTruckSerializer < ActiveModel::Serializer
attributes :truck_reference_id, :dot_number, :trailer_type, :trailer_length, :destination_states,
:empty_date_time, :notes, :destination_anywhere, :destination_zones
end
FWIW, I ended up taking an end-around to an answer. From previous attempts to resolve this problem, I knew that I could get the correct answer if I had a single instance of my model to return. What the work with ActiveModel::Serialization was intended to resolve was how to achieve that result with both the #index and #get methods of the controller.
Since I had this previous result, I instead extended it to solve my problem. Previously, I knew that the correct response would be generated if I did:
def show
t = AvailableTruck.find_by(truck_reference_id: params[:id])
render json: t.as_json.deep_transform_keys(&:camelize) unless t.nil?
end
What had frustrated me was that the naive extension of that to the array returned by AvailableTruck.all was failing in that the keys were left with snake_case.
It turned out that the "correct" (if unsatisfying) answer was:
def index
trucks = []
AvailableTruck.all.inject(trucks) do |a,t|
a << t.as_json.deep_transform_keys(&:camelize)
end
render json: trucks
end

Use ERB to access images NOT asset pipeline

I want to display an dynamically chosen image, thus within the html I call upon the variable #background_img, which contains the url to a specific picture. However, doing
<body style='background-image: url(<%=#background_img%>);'>
simply refuses to display the image for the background. Am I misinterpreting how ERB works, because wouldn't Rails simply precompile the CSS and end up with a working HTML image fetch? Using the Chrome Developer Tools when previewing my app reveals url(), and obviously an empty parameter can't fetch the image.
EDIT:
Just wanted to add that I would rather not have to download the images, but keep the urls I already have prepared.
This is the WeatherMan class:
require 'rest-client'
class WeatherMan
#images within accessible data structures, designed to be expandable
def initialize
#hot = ['https://farm2.staticflickr.com/1515/23959664094_9c59962bb0_b.jpg']
#rain = ['https://farm8.staticflickr.com/7062/6845995798_37c20b1b55_h.jpg']
end
def getWeather(cityID)
response = JSON.parse RestClient.get "http://api.openweathermap.org/data/2.5/weather?id=#{cityID}&APPID=bd43836512d5650838d83c93c4412774&units=Imperial"
return {
temp: response['main']['temp'].to_f.round,
cloudiness: response['clouds']['all'].to_f.round,
humidity: response['main']['humidity'].to_f.round,
windiness: response['wind']['speed'],
condition_id: response['weather'][0]['id'].to_f,
condition_name: response['weather'][0]['main'],
condition_description: response['weather'][0]['description'],
condition_img: response['weather'][0]['icon']
}
end
def getImg(temp)
if temp <= 100 #CHANGE!!!
return #rain[rand(#rain.length)]
elsif temp <= 32
return nil
elsif temp <= 50
return nil
elsif temp <= 75
return nil
elsif temp <= 105
return nil
end
end
end
So sorry about the formatting, on mobile right now.
Now, the controller class:
load File.expand_path("../../data_reader.rb", __FILE__)
load File.expand_path("../../weatherstation.rb", __FILE__)
class PagesController < ApplicationController
def home
# `sudo python /home/pi/Documents/coding/raspberryPI/weatherStation/app/led_blink.py`
server = WeatherMan.new
#outside_data = server.getWeather(4219934)
#sensor_temp = DRead.read_data(File.expand_path('../../data.txt', __FILE__), 'temperature')
#sensor_temp = (#sensor_temp.to_f * (9.0/5) + 32).round(2)
#background_img = server.getImg(#outside_data[:temp])
end
end
The problem seems to be that #background_img is not populated.
The reason for this seems to be your Weatherman class. I will attempt to rectify the issue...
Controller
If you're calling #background_img on your body tag, it means it's accessible at every controller action. Thus, instead of declaring it in a solitary home action, you need to make it available each time you load your views:
#app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
before_action :set_background
private
def set_background
server = WeatherMan.new
#outside_data = server.getWeather(4219934)
#sensor_temp = DRead.read_data(File.expand_path('../../data.txt', __FILE__), 'temperature')
#sensor_temp = (#sensor_temp.to_f * (9.0/5) + 32).round(2)
#background_img = server.getImg(#outside_data[:temp])
end
end
--
Class
The main issue I see is that your class is not giving you a value. I'll attempt to refactor your class, although I can't promise anything:
require 'rest-client'
class WeatherMan
##static = {
hot: 'https://farm2.staticflickr.com/1515/23959664094_9c59962bb0_b.jpg',
rain: 'https://farm8.staticflickr.com/7062/6845995798_37c20b1b55_h.jpg'
}
def getWeather(cityID)
response = JSON.parse RestClient.get weather_url(cityID)
return {
temp: response['main']['temp'].to_f.round,
cloudiness: response['clouds']['all'].to_f.round,
humidity: response['main']['humidity'].to_f.round,
windiness: response['wind']['speed'],
condition_id: response['weather'][0]['id'].to_f,
condition_name: response['weather'][0]['main'],
condition_description: response['weather'][0]['description'],
condition_img: response['weather'][0]['icon']
}
end
def getImg(temp)
#### This should return the image ####
#### Below is a test ####
##static[:hot]
end
private
def weather_url city
"http://api.openweathermap.org/data/2.5/weather?id=#{city}&APPID=bd43836512d5650838d83c93c4412774&units=Imperial"
end
end
--
View
You need to make sure you're getting returned data from your controller in order to populate it in your view.
Because your getImg method is returning nil, you're getting a nil response. I have amended this for now with one of the flickr links you have included in the class.
If you always have a returned image, the following should work:
#app/views/layouts/application.html.erb
<body style='background-image: url(<%= #background_img %>);'>
Because your #background_img is an external URL, the above should work. If you were using a file from your asset_pipeline, you'd want to use image_url etc

Qweb, Blocking the report

I need to block the report at the draft state, In draft state if the user click the print button to generate the pdf it should raise a warning message.
Thanks in advance
In General Case Qweb Report Can Be printed in Two Way
HTML
PDF
Hear each and every time when you call the report based on report type the different report method is calling.
If you call the report as PDF then the get_pdf() method is called or if you call the report type as HTML then get_html() method is called of report module.
so that in our case you must have to override the above two method in our module then add some thing like this.
Override the get_pdf() method of report module :
class Report(osv.Model):
_inherit = "report"
_description = "Report"
#api.v7
def get_pdf(self, cr, uid, ids, report_name, html=None, data=None, context=None):
"""This method generates and returns pdf version of a report.
"""
order_pool=self.pool.get('sale.order')
for order in order_pool.browse(cr, uid, ids, context=None):
if order.state:
if order.state == 'draft':
raise osv.except_osv(_("Warning!"), _("Your Printed Report is in Draft State ...!! "))
if context is None:
context = {}
if html is None:
html = self.get_html(cr, uid, ids, report_name, data=data, context=context)
html = html.decode('utf-8') # Ensure the current document is utf-8 encoded.
# Get the ir.actions.report.xml record we are working on.
report = self._get_report_from_name(cr, uid, report_name)
# Check if we have to save the report or if we have to get one from the db.
save_in_attachment = self._check_attachment_use(cr, uid, ids, report)
# Get the paperformat associated to the report, otherwise fallback on the company one.
if not report.paperformat_id:
user = self.pool['res.users'].browse(cr, uid, uid)
paperformat = user.company_id.paperformat_id
else:
paperformat = report.paperformat_id
# Preparing the minimal html pages
css = '' # Will contain local css
headerhtml = []
contenthtml = []
footerhtml = []
irconfig_obj = self.pool['ir.config_parameter']
base_url = irconfig_obj.get_param(cr, SUPERUSER_ID, 'report.url') or irconfig_obj.get_param(cr, SUPERUSER_ID, 'web.base.url')
# Minimal page renderer
view_obj = self.pool['ir.ui.view']
render_minimal = partial(view_obj.render, cr, uid, 'report.minimal_layout', context=context)
# The received html report must be simplified. We convert it in a xml tree
# in order to extract headers, bodies and footers.
try:
root = lxml.html.fromstring(html)
match_klass = "//div[contains(concat(' ', normalize-space(#class), ' '), ' {} ')]"
for node in root.xpath("//html/head/style"):
css += node.text
for node in root.xpath(match_klass.format('header')):
body = lxml.html.tostring(node)
header = render_minimal(dict(css=css, subst=True, body=body, base_url=base_url))
headerhtml.append(header)
for node in root.xpath(match_klass.format('footer')):
body = lxml.html.tostring(node)
footer = render_minimal(dict(css=css, subst=True, body=body, base_url=base_url))
footerhtml.append(footer)
for node in root.xpath(match_klass.format('page')):
# Previously, we marked some reports to be saved in attachment via their ids, so we
# must set a relation between report ids and report's content. We use the QWeb
# branding in order to do so: searching after a node having a data-oe-model
# attribute with the value of the current report model and read its oe-id attribute
if ids and len(ids) == 1:
reportid = ids[0]
else:
oemodelnode = node.find(".//*[#data-oe-model='%s']" % report.model)
if oemodelnode is not None:
reportid = oemodelnode.get('data-oe-id')
if reportid:
reportid = int(reportid)
else:
reportid = False
# Extract the body
body = lxml.html.tostring(node)
reportcontent = render_minimal(dict(css=css, subst=False, body=body, base_url=base_url))
contenthtml.append(tuple([reportid, reportcontent]))
except lxml.etree.XMLSyntaxError:
contenthtml = []
contenthtml.append(html)
save_in_attachment = {} # Don't save this potentially malformed document
# Get paperformat arguments set in the root html tag. They are prioritized over
# paperformat-record arguments.
specific_paperformat_args = {}
for attribute in root.items():
if attribute[0].startswith('data-report-'):
specific_paperformat_args[attribute[0]] = attribute[1]
# Run wkhtmltopdf process
return self._run_wkhtmltopdf(
cr, uid, headerhtml, footerhtml, contenthtml, context.get('landscape'),
paperformat, specific_paperformat_args, save_in_attachment
)
As same as method you can override as get_html() in your module and check it
Hear the code will check the sale order report action.
Above code can be tested successfully from my side.
I hope this should helpful for you ..:)

How can I store a hash for the lifetime of a 'jekyll build'?

I am coding a custom Liquid tag as Jekyll plugin for which I need to preserve some values until the next invocation of the tag within the current run of the jekyll build command.
Is there some global location/namespace that I could use to store and retrieve values (preferably key-value pairs / a hash)?
You could add a module with class variables for storing the persistent values, then include the module in your tag class. You would need the proper accessors depending on the type of the variables and the assignments you might want to make. Here's a trivial example implementing a simple counter that keeps track of the number of times the tag was called in DataToKeep::my_val:
module DataToKeep
##my_val = 0
def my_val
##my_val
end
def my_val= val
##my_val = val
end
end
module Jekyll
class TagWithKeptData < Liquid::Tag
include DataToKeep
def render(context)
self.my_val = self.my_val + 1
return "<p>Times called: #{self.my_val}</p>"
end
end
end
Liquid::Template.register_tag('counter', Jekyll::TagWithKeptData)

opening json string to easily read and write to in ruby

I have a json file. I am using it to store information, and as such it is constantly going to be both read and written.
I am completely new to ruby and oop in general, so I am sure I am going about this in a crazy way.
class Load
def initialize(save_name)
puts "loading " + save_name
#data = JSON.parse(IO.read( $user_library + save_name ))
#subject = #data["subject"]
#id = #data["id"]
#save_name = #data["save_name"]
#listA = #data["listA"] # is an array containing dictionaries
#listB = #data["listB"] # is an array containing dictionaries
end
attr_reader :data, :subject, :id, :save_name, :listA, :listB
end
example = Load.new("test.json")
puts example.id
=> 937489327389749
So I can now easily read the json file, but how could I write back to the file - refering to example? say I wanted to change the id example.id.change(7129371289)... or add dictionaries to lists A and B... Is this possible?
The simplest way to go to/from JSON is to just use the JSON library to transform your data as appropriate:
json = my_object.to_json — method on the specific object to create a JSON string.
json = JSON.generate(my_object) — create JSON string from object.
JSON.dump(my_object, someIO) — create a JSON string and write to a file.
my_object = JSON.parse(json) — create a Ruby object from a JSON string.
my_object = JSON.load(someIO) — create a Ruby object from a file.
Taken from this answer to another of your questions.
However, you could wrap this in a class if you wanted:
class JSONHash
require 'json'
def self.from(file)
self.new.load(file)
end
def initialize(h={})
#h=h
end
# Save this to disk, optionally specifying a new location
def save(file=nil)
#file = file if file
File.open(#file,'w'){ |f| JSON.dump(#h, f) }
self
end
# Discard all changes to the hash and replace with the information on disk
def reload(file=nil)
#file = file if file
#h = JSON.parse(IO.read(#file))
self
end
# Let our internal hash handle most methods, returning what it likes
def method_missing(*a,&b)
#h.send(*a,&b)
end
# But these methods normally return a Hash, so we re-wrap them in our class
%w[ invert merge select ].each do |m|
class_eval <<-ENDMETHOD
def #{m}(*a,&b)
self.class.new #h.send(#{m.inspect},*a,&b)
end
ENDMETHOD
end
def to_json
#h.to_json
end
end
The above behaves just like a hash, but you can use foo = JSONHash.from("foo.json") to load from disk, modify that hash as you would normally, and then just foo.save when you want to save out to disk.
Or, if you don't have a file on disk to begin with:
foo = JSONHash.new a:42, b:17, c:"whatever initial values you want"
foo.save 'foo.json'
# keep modifying foo
foo[:bar] = 52
f.save # saves to the last saved location