I am looking to scrape player prices on https://www.fanteam.com/participate/138905/new/e30= using Python and Selenium libraries. I have used the following code:
url = 'https://www.fanteam.com/participate/138905/new/e30='
options = webdriver.ChromeOptions()
options.add_argument('--lang=en')
driver = webdriver.Chrome(chrome_options=options)
driver.get(url)
But I can't get all the players with prices, because I can't find any element on the page(see the picture below
players with prices).
There is HTML of this site:
<!DOCTYPE html>
<html lang="en">
<head>
<script type='text/javascript'>
</script>
<meta charset="UTF-8">
<link rel="shortcut icon" type="image/x-icon" href="/assets/favicon.ico">
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no, minimal-ui">
<meta name="mobile-web-app-capable" content="yes">
<meta property="og:title" content="FanTeam: The home of Fantasy Sports">
<meta property="og:description" content="Create Your Daily Fantasy Team, Play & Win Cash!">
<meta property="og:site_name" content="FanTeam">
<meta property="og:image:width" content="300">
<meta property="og:image:height" content="300">
<meta property="og:url" content="https://www.fanteam.com/participate/138905/new/e30=">
<meta property="og:image" content="https://www.fanteam.com/assets/og-banner.png">
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700,800&subset=latin,cyrillic,cyrillic-ext,latin-ext" rel="stylesheet" type="text/css">
<link rel="manifest" href="/manifest.json">
<script>
(function(getDescriptor) {
Object.getOwnPropertyDescriptor = function(obj, key) {
var descriptor = getDescriptor.apply(this, arguments)
if (!descriptor && obj === window && key == "showModalDialog") {
return {}
}
return descriptor
}
}(Object.getOwnPropertyDescriptor));
</script>
<style>
</style>
<title>FanTeam - Daily Fantasy & Betting</title>
</head>
<body>
<ft-cookie-warning></ft-cookie-warning>
<main>
<ft-header logo="fanteam-logo.svg" logosmall="logosmall.svg"></ft-header>
<section class="ft-view-port-wrapper">
<view-port></view-port>
</section>
<ft-footer tabindex="-1" logo="fanteam-logo.svg"></ft-footer>
<ft-push-receiver></ft-push-receiver>
<ft-olark></ft-olark>
</main>
<script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.0.6/webcomponents-lite.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/babel-polyfill/6.26.0/polyfill.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fetch/2.0.3/fetch.min.js"></script>
<script src="/build/application-b8ab977b2a.js" data-root="https://fanteam-game.api.scoutgg.net" data-ws="https://fanteam-game.ws.scoutgg.net" data-auth-url="" data-white-label="fanteam" data-olark="8903-397-10-7512" data-google-analytics="UA-55860585-1"
data-asset-host="https://d34h6ikdffho99.cloudfront.net" data-vapid-public-key="BH8zySo8DKTd9EY0koPSAmA7fo58QTVuFjcB4hTp95WDu21l4dwjckigl0hpYBgeS-6h2kbMtfbXw4u4097wK3w" data-scoutcc="https://scoutcc.scoutgg.net" data-payment-url="https://globpay.fantasy.solutions/v1"
data-projection-url="https://betflex-projection.api.scoutgg.net//api/v1" data-sportsbook-path="https://stage.fenixplayground.es/apuestas/mobilegoto.aspx" data-service-worker="sw.js"></script>
</body>
</html>
Any code like
el = driver.find_element_by_xpath("//div[#class='player-list']")
return me the error:
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='player-list']"}
But when I inspect an element I can see it in the browser.
How to click any element on the page?
The website you are trying to scrape has a shadow-DOM in its html and any html present inside it cannot be accessed and that is the reason you are getting NoSuchElementException.
Currently, selenium does not support the shadow DOM automation, so you need to use javascript in this case to scrape the data.
To get the data using javascript, you can use:
JavascriptExecutor js = (JavascriptExecutor) driver;
String return_value = (String) js.execute_script("return document.getElementByXpath('xpath').innerHTML");
References for the shadow DOM:
https://medium.com/rate-engineering/a-guide-to-working-with-shadow-dom-using-selenium-b124992559f
https://www.seleniumeasy.com/selenium-tutorials/accessing-shadow-dom-elements-with-webdriver
Related
I'm working on a .NET and Angular webapp, with ngx-sharebuttons ShareButtonModule. When I want to get a link of the post or share it via WhatsApp, the meta tags are not being updated properly. I am setting up default meta tags in index.html:
index.html
<html lang="pl">
<head>
<meta charset="utf-8">
<title>DDM - Daily Dose Of Memes</title>
<base href="/">
<meta name="description" content="Memy na miarę Twoich możliwości. Ty też możesz dać z siebie 30%! Zbiór obrazków, gifów i filmików dla sympatyków czarnego humoru.">
<!-- <meta name="keywords" content="Memy, hard, obrazki, filmiki, czarny humor"> -->
<meta name="keywords" content>
<meta name="url" content="https://ddmemes.com.pl">
<meta name="title" content="DDM - Daily Dose Of Memes">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta http-equiv="Content-type" content="text/html; charset=utf-8">
<link rel="icon" type="image/x-icon" href="favicon.ico">
<link rel="shortcut icon" href="./assets/favicon.ico">
<!-- og meta tags for link share -->
<meta property="og:title" content="DDM - Daily Dose Of Memes"/>
<meta property="og:url" content="https://ddmemes.com.pl"/>
<meta property="og:image" content="https://res.cloudinary.com/duj1ftjtp/image/upload/v1659954431/LogoImage.png"/>
<meta property="og:description" content="Memy na miarę Twoich możliwości. Ty też możesz dać z siebie 30%! Zbiór obrazków, gifów i filmików dla sympatyków czarnego humoru.">
<link rel="manifest" href="manifest.webmanifest">
<meta name="theme-color" content="#000000">
</head>
<body>
<app-root></app-root>
<noscript>Please enable JavaScript to continue using this application.</noscript>
</body>
</html>
Later, when the user wants to share a post's component, it appears that the meta attributes are not being updated properly. First, I am setting the data via share-buttons modal:
meme-card.component.html
<share-buttons theme="modern-dark"
[include]="['copy', 'facebook', 'messenger', 'reddit', 'telegram', 'twitter', 'whatsapp']"
[showIcon]="true"
[showText]="true"
[autoSetMeta]="false"
image="{{meme?.url}}"
url="https://ddmemes.com.pl/meme/{{meme.id}}/{{convertText(meme.title)}}"
description="{{meme?.title}}"
title="{{meme?.title}}"
class="pt-3">
</share-buttons>
Also, I am updating the meta tags themselves in ngOnInit via below code:
meme-card.component.ts
changeMetaTags() {
this.meta?.updateTag(
{ property: 'og:title', content: this.meme?.title },
);
this.meta?.addTag(
{ name: 'title', content: this.meme?.title}
);
this.meta?.updateTag(
{ name: 'description', content: this.meme?.title },
);
this.meta?.updateTag(
{ property: 'og:image', content: this.meme?.url },
);
}
Although everything appears to be updated properly upon inspecting a page, when I send the link to somebody or upon checking in https://www.heymeta.com/, it appears that the data still remains default, equal to the one given in index.html. The only thing that seems to be updated is url.
Is there something I'm missing here? I'd appreciate any help.
Cheers
I tried Embedding a Live website but It shows this message: Channel is domain protected. This channel can't be embedded on this domain name.
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<body>
<script type='text/javascript'>
width = 1280, height = 720, channel = 'e21e8c58-0137-466f-beb3-6a17c34dea39', g = '1';
</script>
<script type='text/javascript' src='https://web.newucp.com/static/scripts/hucaster.js'></script>
</body>
</html>```
I am working on a project in which i have to scrape images related to a keyword from a image site. When i search for any keyword on imgur(My choice for the image site), the results are shown as small thumbnails which when clicked open the main article with various images on it. My program for now makes a list of various links in the thumbnails and opens them one by one to download all images in it.
My problem is that when i inspect image element on the article it shows that the image is in class ".image-placeholder" but when i download the html by request method it does not show any such class available.
One of the examples is that when i request response object of article https://imgur.com/gallery/rfEAUDW.
I get below html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<meta content="funny, image, gif, gifs, memes, jokes, image upload, upload image, lol, humor, vote, comment, share, imgur, imgur.com, wallpaper" name="keywords">
<meta content="Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and so much more." name="description">
<meta content="Copyright 2020 Imgur, Inc." name="copyright"/>
<link href="https://s.imgur.com/images/favicon-32x32.png" rel="icon" sizes="32x32" type="image/png"/>
<link href="https://s.imgur.com/images/favicon-96x96.png" rel="icon" sizes="96x96" type="image/png"/>
<link href="https://s.imgur.com/images/favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/>
<link href="https://s.imgur.com/images/favicon-152.png" rel="apple-touch-icon-precomposed"/>
<meta content="#2cd63c" name="msapplication-TileColor"/>
<meta content="https://s.imgur.com/images/favicon-144.png" name="msapplication-TileImage"/>
<link href="https://m.imgur.com/" media="only screen and (max-width: 640px)" rel="alternate"/>
<meta content="834554521765408b9effdc758b69c5ee" name="p:domain_verify">
<meta content="Imgur" property="og:site_name">
<meta content="12331492" property="fb:admins"/>
<meta content="12301369" property="fb:admins"/>
<meta content="127621437303857" property="fb:app_id"/>
<meta content="imgur://imgur.com/?from=fbreferral" property="al:android:url"/>
<meta content="Imgur" property="al:android:app_name"/>
<meta content="com.imgur.mobile" property="al:android:package"/>
<meta content="imgur://imgur.com/?from=fbreferral" property="al:ios:url"/>
<meta content="639881495" property="al:ios:app_store_id"/>
<meta content="Imgur" property="al:ios:app_name"/>
<meta content="https://imgur.com/" property="al:web:url"/>
<meta content="#imgur" name="twitter:site"/>
<meta content="imgur.com" name="twitter:domain"/>
<meta content="com.imgur.mobile" name="twitter:app:id:googleplay"/>
<meta content="Imgur" property="author"/>
<meta content="Imgur" property="article:author"/>
<meta content="https://www.facebook.com/imgur" property="article:publisher"/>
<title>
Drawing Model (for you all naruto fans) - Imgur
</title>
<meta content="https://imgur.com/gallery/rfEAUDW" data-react-helmet="true" property="og:url"/>
<meta content="https://i.imgur.com/rfEAUDWh.jpg" data-react-helmet="true" name="twitter:image"/>
<link href="https://api.imgur.com/oembed.json?url=https://imgur.com/gallery/rfEAUDW" rel="alternate" title="Drawing Model (for you all naruto fans) - Imgur" type="application/json+oembed"/>
<link href="https://api.imgur.com/oembed.xml?url=https://imgur.com/gallery/rfEAUDW" rel="alternate" title="Drawing Model (for you all naruto fans) - Imgur" type="application/xml+oembed"/>
<meta content="funny" data-react-helmet="true" property="article:tag"/>
<meta content="" data-react-helmet="true" property="article:tag"/>
<meta content="" data-react-helmet="true" property="article:tag"/>
<meta content="" data-react-helmet="true" property="article:tag"/>
<meta content="" data-react-helmet="true" property="article:tag"/>
<meta href="https://imgur.com/gallery/rfEAUDW" rel="canonical"/>
<meta content="none" name="robots"/>
<meta content="600" data-react-helmet="true" property="og:image:width"/>
<meta content="315" data-react-helmet="true" property="og:image:height"/>
<meta content="https://i.imgur.com/rfEAUDW.jpg?fb" data-react-helmet="true" property="og:image"/>
<meta content="article" data-react-helmet="true" property="og:type"/>
<meta content="summary_large_image" data-react-helmet="true" name="twitter:card"/>
<script>
dataLayer=[];var pbjs=pbjs||{};pbjs.que=pbjs.que||[]
</script>
<script>
!function(e,t,a,n,g){e[n]=e[n]||[],e[n].push({"gtm.start":(new Date).getTime(),event:"gtm.js"});var m=t.getElementsByTagName(a)[0],r=t.createElement(a);r.async=!0,r.src="//www.googletagmanager.com/gtm.js?id=GTM-M6N38SF",m.parentNode.insertBefore(r,m)}(window,document,"script","dataLayer")
</script>
<link href="https://s.imgur.com/desktop-assets/css/styles.ebc99cf807f6b7c8c39c.css" rel="stylesheet"/>
</meta>
</meta>
</meta>
</meta>
</head>
<body>
<noscript>
<iframe height="0" src="https://www.googletagmanager.com/ns.html?id=GTM-M6N38SF" style="display:none;visibility:hidden" width="0">
</iframe>
</noscript>
<noscript>
If you're seeing this message, that means
<strong>
JavaScript has been disabled on your browser
</strong>
, please
<strong>
enable JS
</strong>
to make Imgur work.
</noscript>
<div id="root">
</div>
<script async="" src="https://www.googletagmanager.com/gtag/js?id=UA-6671908-15">
</script>
<script>
function gtag(){dataLayer.push(arguments)}window.dataLayer=window.dataLayer||[],gtag("js",new Date),gtag("config","UA-6671908-15",{send_page_view:!1})
</script>
<script class="abp" src="https://s.imgur.com/min/px.js?ch=1">
</script>
<script class="abp" src="https://s.imgur.com/min/px.js?ch=2">
</script>
<script src="https://s.imgur.com/desktop-assets/js/main.2e34f379cd8d1a3ca8b1.js">
</script>
</body>
</html>
Thanks in advance for all your help.
My code is:
import requests
from bs4 import BeautifulSoup
from pathlib import Path
import os
#UserInput for phrase to search.
phrase = input('Enter text to search: ')
finalphrase = phrase.replace(" ", "+")
print('Searching for ' + finalphrase + '.....')
#makes soup of the search page and collect all results as galleryElem
site = 'https://imgur.com'
res = requests.get(site + '/search?q=' + phrase)
soup = BeautifulSoup(res.text, 'html.parser')
galleryElem = soup.select('.image-list-link')
#For each thumbnail will open it and download its content
for n in range(len(galleryElem)):
imageLink = galleryElem[n].get('href')
res1 = requests.get(site+imageLink)
soup1 = BeautifulSoup(res1.text, 'html.parser')
imageElem = soup.select('.image-placeholder')
for m in range(len(imageElem)):
image = imageElem[m].get('src')
img = open(os.path.join(finalphrase, number, os.path.basename(image), 'wb'))
number += 1
for a in image.iter_content(100000):
img.write(a)
img.close()
You didn't use selenium. You used requests and bs4. If you have used selenium, you wouldn't have met this problem, because selenium executes javascript for you, while requests doesn't. And the website seems to load images with javascript.
An example with selenium can be found here.
You can check documentations of selenium for more information.
I would like to get the parameters in a URL and use them to generate an og:image meta tag in my SPA. The specific purpose is to have a dynamic thumbnail for a given url. The idea is for crawlers to be able to find the appropriate thumbnail.
example URL:
https://my.app/#/post?uid=abc&pid=123
These two parameters will not necessarily always be included. I hope it won't cause an issue.
My understanding is that crawlers generally only check the html for metadata. How could I include a bit of code in my html before the metadata? (I am relatively new to HTML)
Would I be able to put a script in the head tag? Are variable in the script available outside of the script? Can I use variable in the URL address of my og:image tag?
<!DOCTYPE html>
<html>
<head>
<meta property="og:image" content="https://my.app/{uid}/{pid}/thumb.png" />
<meta charset="UTF-8">
<meta content="IE=Edge" http-equiv="X-UA-Compatible">
<meta name="description" content="Get Dressed. Better than you ever had">
<!-- iOS meta tags & icons -->
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="apple-mobile-web-app-title" content="vestiqweb">
<link rel="apple-touch-icon" href="icons/Icon-192.png">
<!-- Favicon -->
<link rel="shortcut icon" type="image/png" href="favicon.png"/>
<title>VESTIQ</title>
<link rel="manifest" href="manifest.json">
</head>
<body id="app-container">
<script>
if ('serviceWorker' in navigator) {
window.addEventListener('load', function () {
navigator.serviceWorker.register('flutter_service_worker.js');
});
}
</script>
<script src="https://www.gstatic.com/firebasejs/7.17.1/firebase-app.js"></script>
<script src="https://www.gstatic.com/firebasejs/7.14.4/firebase-firestore.js"></script>
<script src="https://www.gstatic.com/firebasejs/7.17.1/firebase-auth.js"></script>
<script src="https://www.gstatic.com/firebasejs/7.17.1/firebase-analytics.js"></script>
<script>
var firebaseConfig = {
//config info
};
firebase.initializeApp(firebaseConfig);
firebase.analytics();
</script>
<script src="main.dart.js?version=14" type="application/javascript"></script>
</body>
</html>
I was able to put a script before the meta tags and used window.location.href to get the URL. I used URLSearchParams to extract the parameters. However, I can only get the second parameter and not the first. If I add an '&' before the uid it works, but makes the url look weird... "?&uid"
<script>
const queryString = window.location.href;
const urlParams = new URLSearchParams(queryString);
const uid = urlParams.get('uid')
const pid = urlParams.get('pid')
if (uid != null && pid != null)
document.getElementById('urlThumb').content = `https://my.app/posts%2F${uid}%2F${pid}%2Furl_thumb.jpg?alt=media`;
</script>
I am using google fonts and it generates following error for below link
<link href="http://fonts.googleapis.com/css?family=Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic|Montserrat:700|Merriweather:400italic|Roboto+Condensed|Source+Sans+Pro|Droid+Serif|Open+Sans+Condensed|Oswald|Molengo|PT Sans|Droid Sans')" rel="stylesheet" />
ERROR MESSAGE
Line 35, Column 289: Bad value for attribute href on element link: Illegal character in query: not a URL code point.
…if|Open+Sans+Condensed|Oswald|Molengo|PT Sans|Droid Sans')" rel="stylesheet" />
Syntax of URL:
Any URL. For example: /hello, #canvas, or http://example.org/. Characters should be represented in NFC and spaces should be escaped as %20.
SAMPLE HTML
<html>
<head>
<title>Title</title>
<meta charset="utf-8" />
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1, maximum-scale=1" />
<link href="http://fonts.googleapis.com/css?family=Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic|Montserrat:700|Merriweather:400italic|Roboto+Condensed|Source+Sans+Pro|Droid+Serif|Open+Sans+Condensed|Oswald|Molengo|PT+Sans|Droid+Sans')" rel="stylesheet" />
</head>
<body>
</body>
</html>
UPDATE
This generates error
<link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Roboto%20Condensed|Source%20Sans%20Pro" />
This Works
<link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Lato" />
<link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Roboto%20Condensed" />
When i add | to add multiple fonts it generates error so should i use multiple <link> tag to add fonts or ?
Confused about this is as below links is generate by on Google fonts font use on website
https://www.google.com/fonts#UsePlace:use/Collection:Open+Sans|Roboto:400,700,400italic|Roboto+Condensed:400,300|Lato
Your example code working with JAVASCRIPT NOTATION
LINK and IMPORT may not help to eliminate the VALIDATION error - so please try with JAVASCRIPT notation it works well without any error.
<!DOCTYPE html>
<html>
<head>
<title>Title</title>
<meta charset="utf-8" />
<script type="text/javascript">
WebFontConfig = {google: { families: [ Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic|Montserrat:700|Merriweather:400italic|Roboto+Condensed|Source+Sans+Pro|Droid+Serif|Open+Sans+Condensed|Oswald|Molengo|PT+Sans|Droid+Sans ] }};
(function() {
var wf = document.createElement('script');
wf.src = ('https:' == document.location.protocol ? 'https' : 'http') +
'://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js';
wf.type = 'text/javascript';
wf.async = 'true';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(wf, s);
})();
</script>
</head>
<body>
</body>
</html>
You will need to substiture & sign with &
<!DOCTYPE html>
<html>
<head>
<title>Title</title>
<meta charset="utf-8" />
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1, maximum-scale=1" />
<link href='http://fonts.googleapis.com/css?family=Open+Sans&subset=latin,cyrillic-ext,greek-ext,greek,vietnamese,latin-ext,cyrillic' rel='stylesheet' type='text/css'>
</head>
<body>
</body>
</html>
You may please use JAVASCRIPT notation for including the fonts from google
<!DOCTYPE html>
<html>
<head>
<title>Title</title>
<meta charset="utf-8" />
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1, maximum-scale=1" />
<script type="text/javascript">
WebFontConfig = {
google: { families: [ 'Open+Sans::cyrillic-ext,latin,greek-ext,greek,vietnamese,latin-ext,cyrillic' ] }
};
(function() {
var wf = document.createElement('script');
wf.src = ('https:' == document.location.protocol ? 'https' : 'http') +
'://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js';
wf.type = 'text/javascript';
wf.async = 'true';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(wf, s);
})(); </script>
</head>
<body>
</body>
</html>
Few more suggestions
Always include doctype at the top of HTML page
Try the IMPORT and JAVASCRIPT alternatives to include the fonts.
Please use your own google font - to avoid typos I tried with new fonts from google.
The character | is not allowed in the query component (nor anywhere else in a URI). It would have to be percent-encoded with %7C.
So
http://fonts.googleapis.com/css?family=Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic|Montserrat:700|Merriweather:400italic|Roboto+Condensed|Source+Sans+Pro|Droid+Serif|Open+Sans+Condensed|Oswald|Molengo|PT+Sans|Droid+Sans')
should be this URI instead
http://fonts.googleapis.com/css?family=Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic%7CMontserrat:700%7CMerriweather:400italic%7CRoboto+Condensed%7CSource+Sans+Pro%7CDroid+Serif%7COpen+Sans+Condensed%7COswald%7CMolengo%7CPT+Sans%7CDroid+Sans')
There is a space in the string near the end
PT Sans|Droid Sans')"
should be escaped as:
PT%20Sans|Droid%20Sans')"