I would like to get the parameters in a URL and use them to generate an og:image meta tag in my SPA. The specific purpose is to have a dynamic thumbnail for a given url. The idea is for crawlers to be able to find the appropriate thumbnail.
example URL:
https://my.app/#/post?uid=abc&pid=123
These two parameters will not necessarily always be included. I hope it won't cause an issue.
My understanding is that crawlers generally only check the html for metadata. How could I include a bit of code in my html before the metadata? (I am relatively new to HTML)
Would I be able to put a script in the head tag? Are variable in the script available outside of the script? Can I use variable in the URL address of my og:image tag?
<!DOCTYPE html>
<html>
<head>
<meta property="og:image" content="https://my.app/{uid}/{pid}/thumb.png" />
<meta charset="UTF-8">
<meta content="IE=Edge" http-equiv="X-UA-Compatible">
<meta name="description" content="Get Dressed. Better than you ever had">
<!-- iOS meta tags & icons -->
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="apple-mobile-web-app-title" content="vestiqweb">
<link rel="apple-touch-icon" href="icons/Icon-192.png">
<!-- Favicon -->
<link rel="shortcut icon" type="image/png" href="favicon.png"/>
<title>VESTIQ</title>
<link rel="manifest" href="manifest.json">
</head>
<body id="app-container">
<script>
if ('serviceWorker' in navigator) {
window.addEventListener('load', function () {
navigator.serviceWorker.register('flutter_service_worker.js');
});
}
</script>
<script src="https://www.gstatic.com/firebasejs/7.17.1/firebase-app.js"></script>
<script src="https://www.gstatic.com/firebasejs/7.14.4/firebase-firestore.js"></script>
<script src="https://www.gstatic.com/firebasejs/7.17.1/firebase-auth.js"></script>
<script src="https://www.gstatic.com/firebasejs/7.17.1/firebase-analytics.js"></script>
<script>
var firebaseConfig = {
//config info
};
firebase.initializeApp(firebaseConfig);
firebase.analytics();
</script>
<script src="main.dart.js?version=14" type="application/javascript"></script>
</body>
</html>
I was able to put a script before the meta tags and used window.location.href to get the URL. I used URLSearchParams to extract the parameters. However, I can only get the second parameter and not the first. If I add an '&' before the uid it works, but makes the url look weird... "?&uid"
<script>
const queryString = window.location.href;
const urlParams = new URLSearchParams(queryString);
const uid = urlParams.get('uid')
const pid = urlParams.get('pid')
if (uid != null && pid != null)
document.getElementById('urlThumb').content = `https://my.app/posts%2F${uid}%2F${pid}%2Furl_thumb.jpg?alt=media`;
</script>
I am looking to scrape player prices on https://www.fanteam.com/participate/138905/new/e30= using Python and Selenium libraries. I have used the following code:
url = 'https://www.fanteam.com/participate/138905/new/e30='
options = webdriver.ChromeOptions()
options.add_argument('--lang=en')
driver = webdriver.Chrome(chrome_options=options)
driver.get(url)
But I can't get all the players with prices, because I can't find any element on the page(see the picture below
players with prices).
There is HTML of this site:
<!DOCTYPE html>
<html lang="en">
<head>
<script type='text/javascript'>
</script>
<meta charset="UTF-8">
<link rel="shortcut icon" type="image/x-icon" href="/assets/favicon.ico">
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no, minimal-ui">
<meta name="mobile-web-app-capable" content="yes">
<meta property="og:title" content="FanTeam: The home of Fantasy Sports">
<meta property="og:description" content="Create Your Daily Fantasy Team, Play & Win Cash!">
<meta property="og:site_name" content="FanTeam">
<meta property="og:image:width" content="300">
<meta property="og:image:height" content="300">
<meta property="og:url" content="https://www.fanteam.com/participate/138905/new/e30=">
<meta property="og:image" content="https://www.fanteam.com/assets/og-banner.png">
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700,800&subset=latin,cyrillic,cyrillic-ext,latin-ext" rel="stylesheet" type="text/css">
<link rel="manifest" href="/manifest.json">
<script>
(function(getDescriptor) {
Object.getOwnPropertyDescriptor = function(obj, key) {
var descriptor = getDescriptor.apply(this, arguments)
if (!descriptor && obj === window && key == "showModalDialog") {
return {}
}
return descriptor
}
}(Object.getOwnPropertyDescriptor));
</script>
<style>
</style>
<title>FanTeam - Daily Fantasy & Betting</title>
</head>
<body>
<ft-cookie-warning></ft-cookie-warning>
<main>
<ft-header logo="fanteam-logo.svg" logosmall="logosmall.svg"></ft-header>
<section class="ft-view-port-wrapper">
<view-port></view-port>
</section>
<ft-footer tabindex="-1" logo="fanteam-logo.svg"></ft-footer>
<ft-push-receiver></ft-push-receiver>
<ft-olark></ft-olark>
</main>
<script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.0.6/webcomponents-lite.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/babel-polyfill/6.26.0/polyfill.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fetch/2.0.3/fetch.min.js"></script>
<script src="/build/application-b8ab977b2a.js" data-root="https://fanteam-game.api.scoutgg.net" data-ws="https://fanteam-game.ws.scoutgg.net" data-auth-url="" data-white-label="fanteam" data-olark="8903-397-10-7512" data-google-analytics="UA-55860585-1"
data-asset-host="https://d34h6ikdffho99.cloudfront.net" data-vapid-public-key="BH8zySo8DKTd9EY0koPSAmA7fo58QTVuFjcB4hTp95WDu21l4dwjckigl0hpYBgeS-6h2kbMtfbXw4u4097wK3w" data-scoutcc="https://scoutcc.scoutgg.net" data-payment-url="https://globpay.fantasy.solutions/v1"
data-projection-url="https://betflex-projection.api.scoutgg.net//api/v1" data-sportsbook-path="https://stage.fenixplayground.es/apuestas/mobilegoto.aspx" data-service-worker="sw.js"></script>
</body>
</html>
Any code like
el = driver.find_element_by_xpath("//div[#class='player-list']")
return me the error:
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='player-list']"}
But when I inspect an element I can see it in the browser.
How to click any element on the page?
The website you are trying to scrape has a shadow-DOM in its html and any html present inside it cannot be accessed and that is the reason you are getting NoSuchElementException.
Currently, selenium does not support the shadow DOM automation, so you need to use javascript in this case to scrape the data.
To get the data using javascript, you can use:
JavascriptExecutor js = (JavascriptExecutor) driver;
String return_value = (String) js.execute_script("return document.getElementByXpath('xpath').innerHTML");
References for the shadow DOM:
https://medium.com/rate-engineering/a-guide-to-working-with-shadow-dom-using-selenium-b124992559f
https://www.seleniumeasy.com/selenium-tutorials/accessing-shadow-dom-elements-with-webdriver
I am working on a simple webpage. I have a following sample json file and an HTML template
data.json
{
"NAME":"SAMPLE_NAME",
"ADDRESS":"New Brunswick Avenue"
}
index.html
<div class="name"></div>
<div class="address"></div>
So i have to display the name and address on the template reading from the json file. Is there any library that i can user for this or any other way to accomplish this?
I think you are looking for a compile-time templating or pre-compiled templating engine sort of thing.
You can build one your own with html, css and using javascript or jquery to change the text of certain elements, but this is going to take a long time if you have big pages.
However there is a library out there that does something like this and its called Handlebars.
Heres a link: http://berzniz.com/post/24743062344/handling-handlebarsjs-like-a-pro
This might give you an idea of what it does: What is the difference between handlebar.js and handlebar.runtime.js?
Here is an example using your html:
<script src="https://cdnjs.cloudflare.com/ajax/libs/handlebars.js/4.0.12/handlebars.min.js"></script>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Document</title>
</head>
<body>
<script>
// Load your html / template into this variable
var template = '<div class="name">{{name}}</div><div class="address">{{address}}</div>';
var jsonData = {
"name":"John",
"address": "City Street"
}
var compiledTemplate = Handlebars.compile(template);
// The output html is generated using
var html = compiledTemplate(jsonData);
document.getElementsByTagName('body')[0].innerHTML = html;
</script>
</body>
</html>
If you would rather write html outside of the javascript variables you could also do it like this:
<script src="https://cdnjs.cloudflare.com/ajax/libs/handlebars.js/4.0.12/handlebars.min.js"></script>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Document</title>
</head>
<body>
<div id="template">
<div class="name">{{name}}</div>
<div class="address">{{address}}</div>
</div>
<script>
// Load your html / template into this variable
var template = document.getElementById('template').innerHTML;
var jsonData = {
"name":"John",
"address": "City Street"
}
var compiledTemplate = Handlebars.compile(template);
// The output html is generated using
var html = compiledTemplate(jsonData);
document.getElementById('template').innerHTML = html;
</script>
</body>
</html>
i dont quite understand how this webpage displays it's data as the source code simply shows it's google tag code.
where does the code for the page layout come from?
http://www.trademe.co.nz/property/insights/address/Auckland/Orakei/Tautari-Street/42/60e6b7b1-d472-4fe2-bab1-7b8a9c18e641
<!DOCTYPE html>
<html>
<head ng-controller="headController as ctrl">
<base href="/">
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0,height=device-height"/>
<meta name="description" content="Search over 1.5 million New Zealand properties and discover free estimated market values, sold prices, rateable valuations and more." />
<meta name="apple-itunes-app" content="app-id=550943614, affiliate-data=1010lc5k"/>
<title>House sold prices & property information | Trade Me Property Insights</title>
<link rel="stylesheet" href="/property/insights/css/app.css?v=Xq_vVMQyeqWVriJqrsx95fOd3kQR4nYrrMxb8sLFsl4" />
<!-- Google Tag Manager TODO add key in iFrame for production-->
<noscript>
<iframe src="//www.googletagmanager.com/ns.html?id=GTM-WP683K>" height="0" width="0" style="display: none; visibility: hidden"></iframe>
</noscript>
<script>
var testRegEx = /test[1-9][a-z]/i;
var containerId = 'GTM-KMC2M2';
if (/test[1-9][a-z]/i.test(window.location.origin)) containerId = 'GTM-WP683K';
if (/.dev.trademe.co.nz/i.test(window.location.origin)) containerId = 'GTM-53CZK6';
(function(w, d, s, l, i) {
w[l] = w[l] || [];
w[l].push(
{ 'gtm.start': new Date().getTime(), event: 'gtm.js' }
);
var f = d.getElementsByTagName(s)[0],
j = d.createElement(s),
dl = l != 'dataLayer' ? '&l=' + l : '';
j.async = true;
j.src =
'//www.googletagmanager.com/gtm.js?id=' + i + dl;
f.parentNode.insertBefore(j, f);
})(window, document, 'script', 'dataLayer', containerId);
</script>
<link href="/property/insights/styles.0f79228bbfea7a89f25f.bundle.css?v=qL0UK5rKQQTni727hQPz_P9WxK7qhDCkpkC3fzJ_Kuk" rel="stylesheet" /></head>
<body>
<input id="HostingEnvironment" name="HostingEnvironment" type="hidden" value="Production" />
<app>
<div class="loading-container">
<img src="/property/insights/images/loader.svg" />
<div class="loading-text">Just a moment...</div>
</div>
</app>
<script type="text/javascript" src="/property/insights/inline.9119e57f92f5676e8860.bundle.js?v=Ef2IUZ--xAfHSZwkaFyNspZ7PkjVfBAh8gQYGNLYbKc"></script><script type="text/javascript" src="/property/insights/polyfills.6292e3ee5e1ea889726b.bundle.js?v=ryy3leF6QDbqOPfS7dyHC0QD_QKX5vjwiNit2pfnyAU"></script><script type="text/javascript" src="/property/insights/vendor.d418eda2f891d23557f3.bundle.js?v=t7foCzMb9MYsSguXz23JEwECvn_9RDoKaJIfSpijuAs"></script><script type="text/javascript" src="/property/insights/main.54e60317a2b1ac8487cb.bundle.js?v=wqYIDaveFz1-mbQA_o01prc5-eoXnyOH9x83fK2gSFw"></script></body>
</html>
There is a bunch of script tags at the end of the page - these are all likely the scripts that load the application - layout and all.
I am using google fonts and it generates following error for below link
<link href="http://fonts.googleapis.com/css?family=Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic|Montserrat:700|Merriweather:400italic|Roboto+Condensed|Source+Sans+Pro|Droid+Serif|Open+Sans+Condensed|Oswald|Molengo|PT Sans|Droid Sans')" rel="stylesheet" />
ERROR MESSAGE
Line 35, Column 289: Bad value for attribute href on element link: Illegal character in query: not a URL code point.
…if|Open+Sans+Condensed|Oswald|Molengo|PT Sans|Droid Sans')" rel="stylesheet" />
Syntax of URL:
Any URL. For example: /hello, #canvas, or http://example.org/. Characters should be represented in NFC and spaces should be escaped as %20.
SAMPLE HTML
<html>
<head>
<title>Title</title>
<meta charset="utf-8" />
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1, maximum-scale=1" />
<link href="http://fonts.googleapis.com/css?family=Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic|Montserrat:700|Merriweather:400italic|Roboto+Condensed|Source+Sans+Pro|Droid+Serif|Open+Sans+Condensed|Oswald|Molengo|PT+Sans|Droid+Sans')" rel="stylesheet" />
</head>
<body>
</body>
</html>
UPDATE
This generates error
<link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Roboto%20Condensed|Source%20Sans%20Pro" />
This Works
<link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Lato" />
<link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Roboto%20Condensed" />
When i add | to add multiple fonts it generates error so should i use multiple <link> tag to add fonts or ?
Confused about this is as below links is generate by on Google fonts font use on website
https://www.google.com/fonts#UsePlace:use/Collection:Open+Sans|Roboto:400,700,400italic|Roboto+Condensed:400,300|Lato
Your example code working with JAVASCRIPT NOTATION
LINK and IMPORT may not help to eliminate the VALIDATION error - so please try with JAVASCRIPT notation it works well without any error.
<!DOCTYPE html>
<html>
<head>
<title>Title</title>
<meta charset="utf-8" />
<script type="text/javascript">
WebFontConfig = {google: { families: [ Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic|Montserrat:700|Merriweather:400italic|Roboto+Condensed|Source+Sans+Pro|Droid+Serif|Open+Sans+Condensed|Oswald|Molengo|PT+Sans|Droid+Sans ] }};
(function() {
var wf = document.createElement('script');
wf.src = ('https:' == document.location.protocol ? 'https' : 'http') +
'://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js';
wf.type = 'text/javascript';
wf.async = 'true';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(wf, s);
})();
</script>
</head>
<body>
</body>
</html>
You will need to substiture & sign with &
<!DOCTYPE html>
<html>
<head>
<title>Title</title>
<meta charset="utf-8" />
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1, maximum-scale=1" />
<link href='http://fonts.googleapis.com/css?family=Open+Sans&subset=latin,cyrillic-ext,greek-ext,greek,vietnamese,latin-ext,cyrillic' rel='stylesheet' type='text/css'>
</head>
<body>
</body>
</html>
You may please use JAVASCRIPT notation for including the fonts from google
<!DOCTYPE html>
<html>
<head>
<title>Title</title>
<meta charset="utf-8" />
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1, maximum-scale=1" />
<script type="text/javascript">
WebFontConfig = {
google: { families: [ 'Open+Sans::cyrillic-ext,latin,greek-ext,greek,vietnamese,latin-ext,cyrillic' ] }
};
(function() {
var wf = document.createElement('script');
wf.src = ('https:' == document.location.protocol ? 'https' : 'http') +
'://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js';
wf.type = 'text/javascript';
wf.async = 'true';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(wf, s);
})(); </script>
</head>
<body>
</body>
</html>
Few more suggestions
Always include doctype at the top of HTML page
Try the IMPORT and JAVASCRIPT alternatives to include the fonts.
Please use your own google font - to avoid typos I tried with new fonts from google.
The character | is not allowed in the query component (nor anywhere else in a URI). It would have to be percent-encoded with %7C.
So
http://fonts.googleapis.com/css?family=Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic|Montserrat:700|Merriweather:400italic|Roboto+Condensed|Source+Sans+Pro|Droid+Serif|Open+Sans+Condensed|Oswald|Molengo|PT+Sans|Droid+Sans')
should be this URI instead
http://fonts.googleapis.com/css?family=Lato:100,300,400,700,900,100italic,300italic,400italic,700italic,900italic%7CMontserrat:700%7CMerriweather:400italic%7CRoboto+Condensed%7CSource+Sans+Pro%7CDroid+Serif%7COpen+Sans+Condensed%7COswald%7CMolengo%7CPT+Sans%7CDroid+Sans')
There is a space in the string near the end
PT Sans|Droid Sans')"
should be escaped as:
PT%20Sans|Droid%20Sans')"