Series of Uncaught Type Errors: "xxxx" is not a function in JavaScript compiled from ClojureScript in Brave Browser - clojurescript

I'm following code examples from the book Web Development with Clojure, 3rd Edition. Currently running through the examples for using ClojureScript to create a single page application with Reagent and Ajax. The compiled Javascript runs fine until I try to send data from a form to the server. This is supposed to break, because I haven't added in any of the Ajax code to attach a csrf token to the post request, and the author wants to walk through how the js console can be used for logging in development, but it's not breaking as expected. I should receive something like this:
error:{:status 403,
:status-text "Forbidden",
:failure :error,
:response "<h1>Invalid anti-forgery token</h1>"}
I've made sure that I'm running up to date on all of my dependencies and plugins. I recently made the switch to Brave to test it out, I like it a lot, but I've run into a few issues like this, and when I test this out in Chrome, it runs correctly (breaks as expected).
This is the code I have in my core.cljs file:
(:require [reagent.core :as r]
[ajax.core :refer [GET POST]]))
(defn send-message!
[fields]
(POST "/message"
{:params #fields
:handler #(.log js/console (str "response:" %))
:error-handler #(.log js/console (str "error:" %))}))
(defn message-form
[]
(let [fields (r/atom {})]
(fn []
[:div
[:div.field
[:label.label {:for :name} "Name"]
[:input.input
{:type :text
:name :name
:on-change #(swap! fields assoc :name (-> % .-target .-value))
:value (:name #fields)}]]
[:div.field
[:label.label {:for :message} "Message"]
[:textarea.textarea
{:name :message
:value (:message #fields)
:on-change #(swap! fields assoc :message (-> % .-target .-value))}]]
[:input.button.is-primary
{:type :submit
:on-click #(send-message! fields)
:value "Comment"}]
[:p "Name: " (:name #fields)]
[:p "Message: " (:message #fields)]])))
(defn home
[]
[:div.content
[:div.column.is-centered
[:div.column.is-two-thirds
[:div.columns
[:div.column
[message-form]]]]]])
(r/render [home] (.getElementById js/document "content"))
When I open the page in Brave, I get these two messages upon loading:
undefined
at log.js:38
Uncaught TypeError: goog.log.getLogger is not a function
at xhrio.js:249
(anonymous) # xhrio.js:249
The Javascript still runs fine, but when I hit the "submit" button, I get these two errors, which I suspect might be in Google's Closure code? Not sure:
at goog.net.XhrIo.ajax$protocols$AjaxImpl$_js_ajax_request$arity$3 (xhrio.cljs:30)
at ajax$protocols$_js_ajax_request (protocols.cljc:6)
at ajax$simple$raw_ajax_request (simple.cljc:64)
at ajax$simple$ajax_request (simple.cljc:67)
at ajax$easy$easy_ajax_request (easy.cljc:116)
at Function.cljs$core$IFn$_invoke$arity$variadic (core.cljc:75)
at ajax$core$POST (core.cljc:75)
at guestbook$core$send_message_BANG_ (core.cljs:7)
at core.cljs:30
at HTMLUnknownElement.callCallback (react-dom.inc.js:341)
(anonymous) # xhrio.cljs:30
ajax$protocols$_js_ajax_request # protocols.cljc:6
ajax$simple$raw_ajax_request # simple.cljc:64
ajax$simple$ajax_request # simple.cljc:67
ajax$easy$easy_ajax_request # easy.cljc:116
(anonymous) # core.cljc:75
ajax$core$POST # core.cljc:75
guestbook$core$send_message_BANG_ # core.cljs:7
(anonymous) # core.cljs:30
callCallback # react-dom.inc.js:341
invokeGuardedCallbackDev # react-dom.inc.js:391
invokeGuardedCallback # react-dom.inc.js:448
invokeGuardedCallbackAndCatchFirstError # react-dom.inc.js:462
executeDispatch # react-dom.inc.js:594
executeDispatchesInOrder # react-dom.inc.js:616
executeDispatchesAndRelease # react-dom.inc.js:719
executeDispatchesAndReleaseTopLevel # react-dom.inc.js:727
forEachAccumulated # react-dom.inc.js:701
runEventsInBatch # react-dom.inc.js:744
runExtractedPluginEventsInBatch # react-dom.inc.js:875
handleTopLevel # react-dom.inc.js:6026
batchedEventUpdates # react-dom.inc.js:2342
dispatchEventForPluginEventSystem # react-dom.inc.js:6121
dispatchEvent # react-dom.inc.js:6150
unstable_runWithPriority # react.inc.js:2820
runWithPriority$2 # react-dom.inc.js:11443
discreteUpdates$1 # react-dom.inc.js:21810
discreteUpdates # react-dom.inc.js:2357
dispatchDiscreteEvent # react-dom.inc.js:6104
react-dom.inc.js:481
Uncaught TypeError: G__20367.setTimeoutInterval is not a function
at goog.net.XhrIo.ajax$protocols$AjaxImpl$_js_ajax_request$arity$3 (xhrio.cljs:30)
at ajax$protocols$_js_ajax_request (protocols.cljc:6)
at ajax$simple$raw_ajax_request (simple.cljc:64)
at ajax$simple$ajax_request (simple.cljc:67)
at ajax$easy$easy_ajax_request (easy.cljc:116)
at Function.cljs$core$IFn$_invoke$arity$variadic (core.cljc:75)
at ajax$core$POST (core.cljc:75)
at guestbook$core$send_message_BANG_ (core.cljs:7)
at core.cljs:30
at HTMLUnknownElement.callCallback (react-dom.inc.js:341)
(anonymous) # xhrio.cljs:30
ajax$protocols$_js_ajax_request # protocols.cljc:6
ajax$simple$raw_ajax_request # simple.cljc:64
ajax$simple$ajax_request # simple.cljc:67
ajax$easy$easy_ajax_request # easy.cljc:116
(anonymous) # core.cljc:75
ajax$core$POST # core.cljc:75
guestbook$core$send_message_BANG_ # core.cljs:7
(anonymous) # core.cljs:30
callCallback # react-dom.inc.js:341
invokeGuardedCallbackDev # react-dom.inc.js:391
invokeGuardedCallback # react-dom.inc.js:448
invokeGuardedCallbackAndCatchFirstError # react-dom.inc.js:462
executeDispatch # react-dom.inc.js:594
executeDispatchesInOrder # react-dom.inc.js:616
executeDispatchesAndRelease # react-dom.inc.js:719
executeDispatchesAndReleaseTopLevel # react-dom.inc.js:727
forEachAccumulated # react-dom.inc.js:701
runEventsInBatch # react-dom.inc.js:744
runExtractedPluginEventsInBatch # react-dom.inc.js:875
handleTopLevel # react-dom.inc.js:6026
batchedEventUpdates # react-dom.inc.js:2342
dispatchEventForPluginEventSystem # react-dom.inc.js:6121
dispatchEvent # react-dom.inc.js:6150
unstable_runWithPriority # react.inc.js:2820
runWithPriority$2 # react-dom.inc.js:11443
discreteUpdates$1 # react-dom.inc.js:21810
discreteUpdates # react-dom.inc.js:2357
dispatchDiscreteEvent # react-dom.inc.js:6104
Any ideas as to why this runs as expected in Chrome but not in Brave?

Related

Google ReCAPTCHA v2 placeholder element must be an element or id

I've had google recaptcha v2 running from last few months but now suddenly it stopped working on my live site (https://blisspot.com/signup). Its working fine on dev site (https://dev.blisspot.com/signup) though. I'm using git to deploying changes which makes me certain that there is no difference between dev and live site code.
I've already tried different solutions suggested in similar posts but non of them is working. I've also tried different api keys but error remains same. Is it possible that google may have banned my domain? Any suggestions would be much appreciated.
$$('.g-recaptcha').each(function ($el) {
if ($el.retrieve('recaptcha-loaded', false)) {
return;
}
$el.empty();
grecaptcha.render($el, {
sitekey: $el.get('data-sitekey'),
theme: $el.get('data-theme'),
type: $el.get('data-type'),
tabindex: $el.get('data-tabindex'),
size: $el.get('data-size'),
});
$el.store('recaptcha-loaded', true);
});
Here is stacktrace.
recaptcha__en.js:212 Uncaught Error: reCAPTCHA placeholder element must be an element or id
at VM214 recaptcha__en.js:212
at Object.render (mootools-core-1.4.5-…at-nc.js?c=1160:959)
at core.js?c=1160:961
at Elements.Elements.forEach (<anonymous>)
at Function.forEach (mootools-core-1.4.5-…at-nc.js?c=1160:220)
at Elements.Elements.each (mootools-core-1.4.5-…at-nc.js?c=1160:337)
at Object.render (core.js?c=1160:952)
at signup:562
at condition (mootools-core-1.4.5-…t-nc.js?c=1160:4352)
at defn (mootools-core-1.4.5-…t-nc.js?c=1160:4366)
(anonymous) # recaptcha__en.js:212
(anonymous) # mootools-core-1.4.5-…at-nc.js?c=1160:959
(anonymous) # core.js?c=1160:961
(anonymous) # mootools-core-1.4.5-…at-nc.js?c=1160:220
each # mootools-core-1.4.5-…at-nc.js?c=1160:337
render # core.js?c=1160:952
(anonymous) # signup:562
condition # mootools-core-1.4.5-…t-nc.js?c=1160:4352
defn # mootools-core-1.4.5-…t-nc.js?c=1160:4366
load (async)
addListener # mootools-core-1.4.5-…t-nc.js?c=1160:3904
addEvent # mootools-core-1.4.5-…t-nc.js?c=1160:4369
(anonymous) # mootools-core-1.4.5-…t-nc.js?c=1160:4620
addEvent # mootools-more-1.4.0.…at-nc.js?c=1160:106
(anonymous) # signup:558

How do I edit Reitit routes in Reagent?

The routes created with the default reagent template look like this:
;; -------------------------
;; Routes
(def router
(reitit/router
[["/" :index]
["/items"
["" :items]
["/:item-id" :item]]
["/about" :about]]))
If I change the path of one ("/about" to "/test" below), why does it no longer work? There must be something else driving the routing, but I can't seem to figure out what.
;; -------------------------
;; Routes
(def router
(reitit/router
[["/" :index]
["/items"
["" :items]
["/:item-id" :item]]
["/test" :about]]))
This is the default reagent template (lein new reagent...) and I haven't changed anything else in the code. Any help would be greatly appreciated.
Edit - Some additional detail
I poked around in the repl a little bit in this function (from the default template):
(defn init! []
(clerk/initialize!)
(accountant/configure-navigation!
{:nav-handler
(fn [path]
(let [match (reitit/match-by-path router path)
current-page (:name (:data match))
route-params (:path-params match)]
(reagent/after-render clerk/after-render!)
(session/put! :route {:current-page (page-for current-page)
:route-params route-params})
(clerk/navigate-page! path)
))
:path-exists?
(fn [path]
(boolean (reitit/match-by-path router path)))})
(accountant/dispatch-current!)
(mount-root))
Everything looks ok to me. In fact, executing the below steps in the repl successfully navigated the browser to the correct page. I still can't enter the URL directly though.
app:problem.core=> (require '[reitit.frontend :as reitit])
nil
app:problem.core=> (reitit/match-by-path router "/test")
{:template "/test",
:data {:name :about},
:result nil,
:path-params {},
:path "/test",
:query-params {},
:parameters {:path {}, :query {}}}
app:problem.core=> (def match (reitit/match-by-path router "/test"))
#'problem.core/match
app:problem.core=> (:name (:data match))
:about
app:problem.core=> (:path-params match)
{}
app:problem.core=> (def current-page (:name (:data match)))
#'problem.core/current-page
app:problem.core=> (page-for current-page)
#'problem.core/about-page
app:problem.core=> (session/put! :route {:current-page (page-for current-page) :route-params {}})
{:route {:current-page #'problem.core/about-page, :route-params {}}}
app:problem.core=>
It looks like you changed the routes on client-side, in src/cljs/<project_name>/core.cljs, but did not change them server side in src/clj/<project_name>/handler.clj (look under the def app near the bottom of the file).
If your new to developing web applications with Clojure, I'd recommend looking at Luminus, rather than using the Reagent template. It's a much more batteries included-approach, with a lot more documentation. The book "Web Development With Clojure" is written by the same author (who is also a contributor to Reagent), and is also recommended reading.

Failed calling executeUserFunction with error {"instanceTree":null,"maxTreeDepth":0}

After upgrading of Autodesk Forge Viewer from v6.5 to v7.11 new console error start to appear every time when DWG loaded:
Failed calling executeUserFunction with error {"instanceTree":null,"maxTreeDepth":0}
LMV../src/logger/Logger.js.Logger._reportError # viewer3D.js:75372
(anonymous) # Hyperlink.js:857
Promise.catch (async)
HyperlinkTool.loadHyperlinksF2d # Hyperlink.js:854
HyperlinkTool.loadHyperlinks # Hyperlink.js:805
HyperlinkTool.activate # Hyperlink.js:622
ToolController.activateTool # viewer3D.js:83795
Autodesk.Extensions.Hyperlink../extensions/Hyperlink/Hyperlink.js.HyperlinkExtension.load # Hyperlink.js:192
loadExtensionLocal # viewer3D.js:26330
(anonymous) # viewer3D.js:26245
Promise.then (async)
loadExtension # viewer3D.js:26228
(anonymous) # viewer3D.js:62886
setTimeout (async)
LMV../src/gui/GuiViewer3D.js.GuiViewer3D.createUI # viewer3D.js:62874
createUI # viewer3D.js:62737
(anonymous) # viewer3D.js:62749
setTimeout (async)
onSuccessChained # viewer3D.js:62744
_ref2 # viewer3D.js:33850
onParse # viewer3D.js:49394
According to stacktrace it fails to execute function specified in a string variable:
function userFunction(pdb) {
var hyperlinkExists = false;
pdb.enumAttributes(function(i, attrDef, attrRaw) {
var name = attrRaw[0];
if (name === 'hyperlink') {
hyperlinkExists = true;
return true;
}
});
return hyperlinkExists;
}
With disabled Autodesk.Hyperlink extension it works well without any errors. Is it a bug that is not fixed yet? It would be good to have a sample DWG with hyperlinks, because it's not clear how to test hyperlinks as well.
With disabled Autodesk.Hyperlink extension it works well without any errors. Is it a bug that is not fixed yet?
Yes it's a known issue - before that's fixed be sure to have the Hyperlink switched off with:
new Autodesk.Viewing.GuiViewer3D(container, {disabledExtensions:{hyperlink:true}})
Stay tuned to our official blog for release notes of upcoming versions - this should get fixed soon...

troubleshooting clojure web-app: connecting html and css for heroku deployment

I have two files, one html and one css. I have tried to turn them into a heroku app and even used the lein command to create a heroku friendly skeleton and plug these two files in, but cannot get it to work for the life of me. There is something very basic that I don't yet understand about how to coordinate a view with the back-end control. And the hello world tutorials aren't helping me because they do not show me how to do different things or explain what needs to change in my defroutes function, for example, for that to be accomplished. In short, my question is this: How can I coordinate these two files into a Clojure project to make the html render as the front page of a webapp and then deploy it on heroku?
html:
<html>
<head>
<link rel="stylesheet" href="style.css" />
</head>
<body>
<img id="sun" src="http://goo.gl/dEEssP">
<div id='earth-orbit'>
<img id="earth" src="http://goo.gl/o3YWu9">
</div>
</body>
</html>
web.clj file in "lein new heroku ..." project:
(ns solar_system.web
(:require [compojure.core :refer [defroutes GET PUT POST DELETE ANY]]
[compojure.handler :refer [site]]
[compojure.route :as route]
[clojure.java.io :as io]
[ring.middleware.stacktrace :as trace]
[ring.middleware.session :as session]
[ring.middleware.session.cookie :as cookie]
[ring.adapter.jetty :as jetty]
[ring.middleware.basic-authentication :as basic]
[cemerick.drawbridge :as drawbridge]
[environ.core :refer [env]]))
(defn- authenticated? [user pass]
;; TODO: heroku config:add REPL_USER=[...] REPL_PASSWORD=[...]
(= [user pass] [(env :repl-user false) (env :repl-password false)]))
(def ^:private drawbridge
(-> (drawbridge/ring-handler)
(session/wrap-session)
(basic/wrap-basic-authentication authenticated?)))
(defroutes app
(ANY "/repl" {:as req}
(drawbridge req))
(GET "/" []
{:status 200
:headers {"Content-Type" "text/plain"}
:body (pr-str ["Hello" :from 'Heroku])}) ; <= Should I change this part here?
(ANY "*" []
(route/not-found (slurp (io/resource "404.html")))))
(defn wrap-error-page [handler]
(fn [req]
(try (handler req)
(catch Exception e
{:status 500
:headers {"Content-Type" "text/html"}
:body (slurp (io/resource "500.html"))}))))
(defn -main [& [port]]
(let [port (Integer. (or port (env :port) 5000))
;; TODO: heroku config:add SESSION_SECRET=$RANDOM_16_CHARS
store (cookie/cookie-store {:key (env :session-secret)})]
(jetty/run-jetty (-> #'app
((if (env :production)
wrap-error-page
trace/wrap-stacktrace))
(site {:session {:store store}}))
{:port port :join? false})))
;; For interactive development:
;; (.stop server)
;; (def server (-main))
project.clj file
(defproject solar_system "1.0.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://solar_system.herokuapp.com"
:license {:name "FIXME: choose"
:url "http://example.com/FIXME"}
:dependencies [[org.clojure/clojure "1.4.0"]
[compojure "1.1.1"]
[ring/ring-jetty-adapter "1.1.0"]
[ring/ring-devel "1.1.0"]
[ring-basic-authentication "1.0.1"]
[environ "0.2.1"]
[com.cemerick/drawbridge "0.0.6"]]
:min-lein-version "2.0.0"
:plugins [[environ/environ.lein "0.2.1"]]
:hooks [environ.leiningen.hooks]
:profiles {:production {:env {:production true}}})
example of typical handler code that renders text:
(ns hello-world.core
(:use ring.adapter.jetty))
(defn app [req]
{:status 200
:headers {"Content-Type" "text/plain"}
:body "Hello, world"}) ; <= Could I just change this part to slurp in
; the html file and stick it in a file in my
; root directory to get a successful 'git push heroku master'?
Modifying your code:
(defroutes app
(ANY "/repl" {:as req}
(drawbridge req))
(GET "/" []
{:status 200
:headers {"Content-Type" "text/html"} ; change content type
:body (slurp "resources/public/my-file.html")}) ; wherever your file is
(ANY "*" []
(route/not-found (slurp (io/resource "404.html")))))
How I'd write it:
(defroutes app
(ANY "/repl" {:as req} (drawbridge req))
(GET "/" [] (slurp "resources/public/my-file.html")) ; wherever your file is
(route/resources "/") ; special route for serving static files like css
; default root directory is resources/public/
(route/not-found (slurp (io/resource "404.html")))) ; IDK what io/resource does
; you might not need it

Parsing HTML - searching links - is it possible to search paragraph that a link is contained in?

I am parsing the links on wikipedia pages of actors, and trying to find links to films they appeared in.
I have a basic method that searchs the links and checks for the word film in the link. However many of the links to films do not actually contain this word.
However, within the paragraphs that the links are contained in, the word film appears , for example:
<p>Dreyfuss's first film part was a small, uncredited role in
<i><a href="/wiki/The_Graduate" title="The Graduate">The Graduate
// Paragraph goes on for a long time.
Here is the block from the method that checks all the links:
all_links = doca.search('//a[#href]')
all_links.each do |link|
link_info = link['href']
if link_info.include?("(film)") && !(link_info.include?("Category:") || link_info.include?("php"))
then out << link_info end
end
out.uniq.collect {|link| strip_out_name(link)}
Would there be a way of checking the previous text before the link but after the <p> tag for the word film, but being careful not to check other links (and also perhaps limited the search to 50 characters before the link)?
Thanks for any help or suggestions.
Click here, this is the main page that I am testing on
It is possible to search for text inside a tag. See https://stackoverflow.com/a/19816840/128421 for an example.
But, I'd do it something similar to this way:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://en.wikipedia.org/wiki/Richard_Dreyfuss'))
table = doc.at('#Filmography').parent.next_element
films = table.search('tr')[1..-1].map{ |tr|
tds = tr.search('td')
year = tds.shift.text
movie = tds.shift
movie_url = movie.at('a')['href']
movie_title = movie.at('a').text
role = tds.shift.text
{
year: year,
movie_url: movie_url,
movie_title: movie_title,
role: role
}
}
films
# => [{:year=>"1966",
# :movie_url=>"/wiki/Bewitched",
# :movie_title=>"Bewitched",
# :role=>"Rodney"},
# {:year=>"1966",
# :movie_url=>"/wiki/Gidget_(TV_series)",
# :movie_title=>"Gidget",
# :role=>"Durf the Drag"},
# {:year=>"1967",
# :movie_url=>"/wiki/Valley_of_the_Dolls_(film)",
# :movie_title=>"Valley of the Dolls",
# :role=>"Assistant stage manager"},
# {:year=>"1967",
# :movie_url=>"/wiki/The_Graduate",
# :movie_title=>"The Graduate",
# :role=>"Boarding House Resident"},
# {:year=>"1967",
# :movie_url=>"/wiki/The_Big_Valley",
# :movie_title=>"The Big Valley",
# :role=>"Lud Akley"},
# {:year=>"1968",
# :movie_url=>"/wiki/The_Young_Runaways",
# :movie_title=>"The Young Runaways",
# :role=>"Terry"},
# {:year=>"1969",
# :movie_url=>"/wiki/Hello_Down_There",
# :movie_title=>"Hello Down There",
# :role=>"Harold Webster"},
# {:year=>"1970",
# :movie_url=>"/wiki/The_Mod_Squad",
# :movie_title=>"The Mod Squad",
# :role=>"Curtis Bell"},
# {:year=>"1973",
# :movie_url=>"/wiki/American_Graffiti",
# :movie_title=>"American Graffiti",
# :role=>"Curt Henderson"},
# {:year=>"1973",
# :movie_url=>"/wiki/Dillinger_(1973_film)",
# :movie_title=>"Dillinger",
# :role=>"Baby Face Nelson"},
# {:year=>"1974",
# :movie_url=>"/wiki/The_Apprenticeship_of_Duddy_Kravitz_(film)",
# :movie_title=>"The Apprenticeship of Duddy Kravitz",
# :role=>"Duddy"},
# {:year=>"1974",
# :movie_url=>"/wiki/The_Second_Coming_of_Suzanne",
# :movie_title=>"The Second Coming of Suzanne",
# :role=>"Clavius"},
# {:year=>"1975",
# :movie_url=>"/wiki/Inserts_(film)",
# :movie_title=>"Inserts",
# :role=>"The Boy Wonder"},
# {:year=>"1975",
# :movie_url=>"/wiki/Jaws_(film)",
# :movie_title=>"Jaws",
# :role=>"Matt Hooper"},
# {:year=>"1976",
# :movie_url=>"/wiki/Victory_at_Entebbe",
# :movie_title=>"Victory at Entebbe",
# :role=>"Colonel Yonatan 'Yonni' Netanyahu"},
# {:year=>"1977",
# :movie_url=>"/wiki/Close_Encounters_of_the_Third_Kind",
# :movie_title=>"Close Encounters of the Third Kind",
# :role=>"Roy Neary"},
# {:year=>"1977",
# :movie_url=>"/wiki/The_Goodbye_Girl",
# :movie_title=>"The Goodbye Girl",
# :role=>"Elliott Garfield"},
# {:year=>"1978",
# :movie_url=>"/wiki/The_Big_Fix",
# :movie_title=>"The Big Fix",
# :role=>"Moses Wine"},
# {:year=>"1980",
# :movie_url=>"/wiki/The_Competition_(film)",
# :movie_title=>"The Competition",
# :role=>"Paul Dietrich"},
# {:year=>"1981",
# :movie_url=>"/wiki/Whose_Life_Is_It_Anyway%3F_(1981_film)",
# :movie_title=>"Whose Life Is It Anyway?",
# :role=>"Ken Harrison"},
# {:year=>"1984",
# :movie_url=>"/wiki/The_Buddy_System_(film)",
# :movie_title=>"The Buddy System",
# :role=>"Joe"},
# {:year=>"1986",
# :movie_url=>"/wiki/Down_and_Out_in_Beverly_Hills",
# :movie_title=>"Down and Out in Beverly Hills",
# :role=>"David 'Dave' Whiteman"},
# {:year=>"1986",
# :movie_url=>"/wiki/Stand_by_Me_(film)",
# :movie_title=>"Stand by Me",
# :role=>"Narrator/Gordie LaChance (adult)"},
# {:year=>"1987",
# :movie_url=>"/wiki/Tin_Men",
# :movie_title=>"Tin Men",
# :role=>"Bill 'BB' Babowsky"},
# {:year=>"1987",
# :movie_url=>"/wiki/Stakeout_(1987_film)",
# :movie_title=>"Stakeout",
# :role=>"Det. Chris Lecce"},
# {:year=>"1987",
# :movie_url=>"/wiki/Nuts_(film)",
# :movie_title=>"Nuts",
# :role=>"Aaron Levinsky"},
# {:year=>"1988",
# :movie_url=>"/wiki/Moon_Over_Parador",
# :movie_title=>"Moon Over Parador",
# :role=>"Jack Noah/President Alphonse Simms"},
# {:year=>"1989",
# :movie_url=>"/wiki/Let_It_Ride_(film)",
# :movie_title=>"Let It Ride",
# :role=>"Jay Trotter"},
# {:year=>"1989",
# :movie_url=>"/wiki/Always_(1989_film)",
# :movie_title=>"Always",
# :role=>"Pete Sandich"},
# {:year=>"1990",
# :movie_url=>"/wiki/Rosencrantz_%26_Guildenstern_Are_Dead_(film)",
# :movie_title=>"Rosencrantz & Guildenstern Are Dead",
# :role=>"The Player"},
# {:year=>"1990",
# :movie_url=>"/wiki/Postcards_from_the_Edge_(film)",
# :movie_title=>"Postcards from the Edge",
# :role=>"Doctor Frankenthal"},
# {:year=>"1991",
# :movie_url=>"/wiki/Once_Around",
# :movie_title=>"Once Around",
# :role=>"Sam Sharpe"},
# {:year=>"1991",
# :movie_url=>"/wiki/Prisoner_of_Honor",
# :movie_title=>"Prisoner of Honor",
# :role=>"Col. Picquart"},
# {:year=>"1991",
# :movie_url=>"/wiki/What_About_Bob%3F",
# :movie_title=>"What About Bob?",
# :role=>"Dr. Leo Marvin"},
# {:year=>"1993",
# :movie_url=>"/wiki/Lost_in_Yonkers_(film)",
# :movie_title=>"Lost in Yonkers",
# :role=>"Louie Kurnitz"},
# {:year=>"1993",
# :movie_url=>"/wiki/Another_Stakeout",
# :movie_title=>"Another Stakeout",
# :role=>"Detective Chris Lecce"},
# {:year=>"1994",
# :movie_url=>"/wiki/Silent_Fall",
# :movie_title=>"Silent Fall",
# :role=>"Dr. Jake Rainer"},
# {:year=>"1995",
# :movie_url=>
# "/w/index.php?title=The_Last_Word_(1995_film)&action=edit&redlink=1",
# :movie_title=>"The Last Word",
# :role=>"Larry"},
# {:year=>"1995",
# :movie_url=>"/wiki/The_American_President_(film)",
# :movie_title=>"The American President",
# :role=>"Senator Bob Rumson"},
# {:year=>"1995",
# :movie_url=>"/wiki/Mr._Holland%27s_Opus",
# :movie_title=>"Mr. Holland's Opus",
# :role=>"Glenn Holland"},
# {:year=>"1996",
# :movie_url=>"/wiki/James_and_the_Giant_Peach_(film)",
# :movie_title=>"James and the Giant Peach",
# :role=>"Centipede (voice)"},
# {:year=>"1996",
# :movie_url=>"/wiki/Mad_Dog_Time",
# :movie_title=>"Mad Dog Time",
# :role=>"Vic"},
# {:year=>"1997",
# :movie_url=>"/wiki/Night_Falls_on_Manhattan",
# :movie_title=>"Night Falls on Manhattan",
# :role=>"Sam Vigoda"},
# {:year=>"1997",
# :movie_url=>"/wiki/Oliver_Twist_(1997_film)",
# :movie_title=>"Oliver Twist",
# :role=>"Fagin"},
# {:year=>"1998",
# :movie_url=>"/wiki/Krippendorf%27s_Tribe",
# :movie_title=>"Krippendorf's Tribe",
# :role=>"Prof. James Krippendorf"},
# {:year=>"1999",
# :movie_url=>"/wiki/Lansky_(film)",
# :movie_title=>"Lansky",
# :role=>"Meyer Lansky"},
# {:year=>"2000",
# :movie_url=>"/wiki/The_Crew_(2000_film)",
# :movie_title=>"The Crew",
# :role=>"Bobby Bartellemeo/Narrator"},
# {:year=>"2000",
# :movie_url=>"/wiki/Fail_Safe_(2000_TV)",
# :movie_title=>"Fail Safe",
# :role=>"President of the United States"},
# {:year=>"2001",
# :movie_url=>"/wiki/The_Old_Man_Who_Read_Love_Stories",
# :movie_title=>"The Old Man Who Read Love Stories",
# :role=>"Antonio Bolivar"},
# {:year=>"2001",
# :movie_url=>"/wiki/Who_Is_Cletis_Tout%3F",
# :movie_title=>"Who Is Cletis Tout?",
# :role=>"Micah Donnelly"},
# {:year=>"2001",
# :movie_url=>"/wiki/The_Education_of_Max_Bickford",
# :movie_title=>"The Education of Max Bickford",
# :role=>"Max Bickford"},
# {:year=>"2001",
# :movie_url=>"/wiki/The_Day_Reagan_Was_Shot",
# :movie_title=>"The Day Reagan Was Shot",
# :role=>"Alexander Haig"},
# {:year=>"2003",
# :movie_url=>"/wiki/Coast_to_Coast_(TV_film)",
# :movie_title=>"Coast to Coast",
# :role=>"Barnaby Pierce"},
# {:year=>"2004",
# :movie_url=>"/wiki/Silver_City_(2004_film)",
# :movie_title=>"Silver City",
# :role=>"Chuck Raven"},
# {:year=>"2006",
# :movie_url=>"/wiki/Poseidon_(film)",
# :movie_title=>"Poseidon",
# :role=>"Richard Nelson"},
# {:year=>"2007",
# :movie_url=>"/wiki/Tin_Man_(TV_miniseries)",
# :movie_title=>"Tin Man",
# :role=>"Mystic Man"},
# {:year=>"2007",
# :movie_url=>"/wiki/Ocean_of_Fear",
# :movie_title=>"Ocean of Fear",
# :role=>"Narrator"},
# {:year=>"2008",
# :movie_url=>"/wiki/Signs_of_the_Time_(film)",
# :movie_title=>"Signs of the Time",
# :role=>"Narrator"},
# {:year=>"2008",
# :movie_url=>"/wiki/W._(film)",
# :movie_title=>"W.",
# :role=>"Dick Cheney"},
# {:year=>"2008",
# :movie_url=>"/w/index.php?title=America_Betrayed&action=edit&redlink=1",
# :movie_title=>"America Betrayed",
# :role=>"Narrator"},
# {:year=>"2009",
# :movie_url=>"/wiki/My_Life_in_Ruins",
# :movie_title=>"My Life in Ruins",
# :role=>"Irv"},
# {:year=>"2009",
# :movie_url=>"/wiki/Leaves_of_Grass_(film)",
# :movie_title=>"Leaves of Grass",
# :role=>"Pug Rothbaum"},
# {:year=>"2009",
# :movie_url=>"/wiki/The_Lightkeepers",
# :movie_title=>"The Lightkeepers",
# :role=>"Seth"},
# {:year=>"2010",
# :movie_url=>"/wiki/Piranha_3D",
# :movie_title=>"Piranha 3D",
# :role=>"Matthew Boyd"},
# {:year=>"2010",
# :movie_url=>"/wiki/Weeds_(TV_series)",
# :movie_title=>"Weeds",
# :role=>"Warren Schiff"},
# {:year=>"2010",
# :movie_url=>"/wiki/RED_(film)",
# :movie_title=>"RED",
# :role=>"Alexander Dunning"},
# {:year=>"2012",
# :movie_url=>"/wiki/Coma_(U.S._miniseries)",
# :movie_title=>"Coma",
# :role=>"Professor Hillside"},
# {:year=>"2013",
# :movie_url=>"/wiki/Very_Good_Girls",
# :movie_title=>"Very Good Girls",
# :role=>"Danny, Gerry's father"},
# {:year=>"2013",
# :movie_url=>"/wiki/Paranoia_(2013_film)",
# :movie_title=>"Paranoia",
# :role=>"Francis Cassidy"}]
To explain what it's doing:
The "Filmology" table is a good source for the information; It's organized logically, so writing code to walk through it is easy.
doc.at('#Filmography').parent.next_element
finds that table using the <h2> heading just above it, then backs up and looks in the next tag, which is the table itself.
table.search('tr')[1..-1] finds the <tr> rows inside the table, skips the first, then iterates (using map) over the remaining ones.
tds = tr.search('td') finds the cells for the table. From that point on it's a matter of peeling that NodeSet apart like an array, by looking at the elements I want. The rest of the code should be pretty obvious. Once the individual parts are retrieved that are of interest they're bundled into a hash, which is returned as part of an array of hashes by map.
Why not try parsing out the filmography section of the wikipedia article? It seems pretty standard across the few actors that I looked at, and it mentions whether or not it was a TV series so you could filter those out easily.
<tr>
<td>1966</td>
<td><i>Gidget</i></td>
<td>Durf the Drag</td>
<td>TV series 1 episode</td>
</tr>
<tr>
<td>1967</td>
<td><i>Valley of the Dolls</i></td>
<td>Assistant stage manager</td>
<td>Uncredited</td>
</tr>
Looks like you could pull nodes similar to this from the code and save all the info to do what you want with it. The first node could be disregarded since "TV" appears multiple times in the different subnodes.
Hope this helps!
-Larry
Okay So I have tested the code based on your actual request and come up with the following
url = "http://en.wikipedia.org/wiki/Richard_Dreyfuss"
doc = Nokogiri::HTML(open(url))
all_links = doc.search("//a[#href]")
all_links.each do |link|
p_text = link.ancestors("p").text
link_index = p_text.index(link.text)
unless link_index.nil?
search_back = link_index > 50 ? link_index - 50 : 0
p_text[search_back..link_index].downcase.include?("film") ? puts(link['href']) : nil
end
end
Output
#=>/wiki/American_Graffiti
/wiki/Jaws_(film)
/wiki/Close_Encounters_of_the_Third_Kind
/wiki/The_Graduate
/wiki/The_Apprenticeship_of_Duddy_Kravitz_(film)
/wiki/Down_And_Out_In_Beverly_Hills
/wiki/Stakeout_(1987_film)
/wiki/Stephen_King
/wiki/The_Body_(novella)
/wiki/Poseidon_(film)
#cite_note-27
/wiki/Jonathan_Tasini
This seems to satisfy the question you were asking but obviously needs to be modified to fit your needs.
Edit
Added your request for running back on 50 characters in the paragraph the response is much shorter now but I am not sure that the results will be as useful as you'd like. This answers the question but does not capture exactly what you are hoping for e.g. the last 2 links are not to films but they are within 50 characters of the world film.