I'm trying to scrape some data from LinkedIn but I noticed that the elements id change each time I load the page with Selenium. So I tried using class name to find all the elements but the class names have newline inside of them, preventing me from scraping the website.
example of class with newlines here
Website link example
I tried doing the below:
job_test = "ember-view jobs-search-results__list-item occludable-update p0 relative scaffold-layout__list-item\n \n \n "
job_list = driver.find_elements(By.CLASS_NAME, job_test)
I even tried this:
job_test = '''ember-view jobs-search-results__list-item occludable-update p0 relative scaffold-layout__list-item
'''
job_list = driver.find_elements(By.CLASS_NAME, job_test)
But it does not show me any elements when I print job_list. What do I do here?
By.CLASS_NAME accepts only one classname, so you can't pass multiple. See: Invalid selector: Compound class names not permitted error using Selenium
Solution
To create the job list you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:
Using CLASS_NAME:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python')
job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "jobs-search-results__list-item")))
Using CSS_SELECTOR:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python')
job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.jobs-search-results__list-item")))
Using XPATH:
driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3425809260&keywords=python')
job_list = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[contains(#class, 'jobs-search-results__list-item')]")))
It is looks like W3C's validator return a validation error on prettyPhoto's rel attribute for HTML5 pages. How do I solve this error?
Bad value prettyPhoto[gallery1] for attribute rel on element a: Keyword prettyphoto[gallery1] is not registered.
Many thanks!
Using rel attribute with non-proposed (thus not allowed) values not valid for HTML5 markup. Value prettyPhoto is not in the list of proposed values. So you may face the difficulties with getting your web-page with image gallery passing validation.
A Possible Solution:
Open jquery.prettyPhoto.js (presumably non-minified one) and perform find & replace function of your text-editor: replace all occurrences of attr('rel') with attr('data-gal').
In your gallery code use:data-gal="prettyPhoto[galname]"instead of:
rel="prettyPhoto[galname]"
Initialize your prettyPhoto with:
jQuery("a[data-gal^='prettyPhoto']").prettyPhoto();
And you are on the right way for getting your code valid!
You can also read this article with this possible solution.
You can use the (undocumented) hook setting as mentioned in the comments here.
Specify your links like this: and use $("a[data-gal^='prettyPhoto'").prettyPhoto({hook: 'data-gal'}); to initialize prettyPhoto.
You can also fix it by updating the settings to use the class hook:
s = jQuery.extend({
...
hook: "rel",
animation_speed: "fast",
ajaxcallback: function() {},
slideshow: 5e3,
autoplay_slideshow: false,
opacity: .8,
...
to:
s = jQuery.extend({
...
hook: "class",
...
I'm new to PHPUnit, and I'm having some trouble with unit testing HTML output.
My test follows:
/**
* #covers Scrap::removeTags
*
*/
public function testRemoveTags() {
// Variables
$simple_parameter = 'script';
$array_parameter = array('script', 'div');
$html = '<div class="pubanunciomrec" style="background:#FFFFFF;"><script type="text/javascript"><!-- google_ad_slot = "9853257829"; google_ad_width = 300; google_ad_height = 250; //--></script><script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script></div><table></table>';
// Expected HTML
$expected_html_whitout_script = new DOMDocument;
$expected_html_whitout_script->loadHTML('<div class="pubanunciomrec" style="background:#FFFFFF;"></div><table></table>');
$expected_html_without_script_div = new DOMDocument;
$expected_html_without_script_div->loadHTML('<table></table>');
// Actual HTML
$actual_whitout_script = new DOMDocument;
$actual_whitout_script->loadHTML($this->scrap->removeTags($html, $simple_parameter));
$actual_without_script_div = new DOMDocument;
$actual_without_script_div->loadHTML($this->scrap->removeTags($html, $array_parameter));
// Test
$this->assertEquals($expected_html_whitout_script, $actual_whitout_script);
$this->assertEquals($expected_html_without_script_div, $actual_without_script_div);
}
My problem is that the DOMDocument object generates some HTML code and I can't compare it. How can I print the DOMDocument object to see the output? Any clues on how to compare the HTML?
Sorry for my bad english.
Best Regards,
Since 2013, there is another way to test HTML Output using PHPUnit.
It is using assertTag() method that can be found in PHPUnit 3.7 and 3.8.
For example :
// Matcher that asserts that there is an element with an id="my_id".
$matcher = array('id' => 'my_id');
// Matcher that asserts that there is a "span" tag.
$matcher = array('tag' => 'span');
// Matcher that asserts that there is a "div", with an "ul" ancestor and a "li"
// parent (with class="enum"), and containing a "span" descendant that contains
// an element with id="my_test" and the text "Hello World".
$matcher = array(
'tag' => 'div',
'ancestor' => array('tag' => 'ul'),
'parent' => array(
'tag' => 'li',
'attributes' => array('class' => 'enum')
),
'descendant' => array(
'tag' => 'span',
'child' => array(
'id' => 'my_test',
'content' => 'Hello World'
)
)
);
// Use assertTag() to apply a $matcher to a piece of $html.
$this->assertTag($matcher, $html);
Read more in official PHPUnit Website.
You may want to consider looking at Selenium. It is a browser-based testing tool for doing functional tests for a web site.
You write scripts which involve loading a web browser and simulating clicks and other actions, and then doing asserts to check that, for example, specific page elements are present, in the correct place or contain the expected values.
The tests can be written using an IDE that runs as a plug-in for Firefox, but they can be run against all the major browsers.
We have a suite of Selenium tests that run as part of our CI process, allowing us to see very quickly if something has gone wrong with our HTML output.
All in all, its a very powerful testing tool.
Also, it integrates with PHPUnit (and other language-specific tools), so it does answer your question, although probably not in the way you were thinking of.
You should be a bit careful in comparing outputted HTML to a correct template. Your HTML will change a lot, and you can end up spending too much time on maintaining your tests.
See this post for an alternative approach.
You can use saveHtml method of DOMDocument and compare the output.
You can compare two HTML strings with PHPUnit assertXmlStringEqualsXmlString method:
$this->assertXmlStringEqualsXmlString($emailMarkup, $html);
where
$emailMarkup - expected HTML string
$html - current HTML string
Important! HTML strings must be XML-valid. For example use
<br/>
instead
<br>
Also tag attributes must have values, e.g. use
<hr noshade="true">
instead
<hr noshade>
It is best not to validate against a template (unless you want to make sure nothing changes, but that is a different condition / test that you may want). You will probably want to test that your HTML includes what the user should actually see, and not that the actual HTML that formats the output is exactly what is in a template. I would recommend sending your HTML through a converter that changes it into pure text, then testing to see if you get the right results. This accommodates future functionality and data related changes that are inevitable in software development. You don't want your tests failing because someone added a class somewhere. This is probably a custom type test you will want to code yourself to meet your needs.
It is also best to insure your HTML (and CSS) is correctly formatted, what ever it may be. Sometimes invalid HTML is parsed and displayed somewhat reasonably by the browser, but best not to rely on browsers knowing what do to with invalid HTML and CSS. I have seen many issues fixed just by correcting the HTML.
I developed a library that outputs HTML PHPFUI, and I could not find any recent or even supported HTML unit tests for PHPUnit. So I created https://packagist.org/packages/phpfui/html-unit-tester which is a modern HTML and CSS unit tester. It validates against w3.org standards, so will always be up to date with the latest.
Basically you can pass in HTML fragments, or entire pages, and it will check validity of your HTML. You can test strings, files or even live URLs. Really handy to make sure all the HTML and CSS you are generating is valid. I found so many issues with my code with this library, was definitely worth the time invested. Hope everyone can benefit from it as well.
Is there a way to create a link in Markdown that opens in a new window? If not, what syntax do you recommend to do this? I'll add it to the markdown compiler I use. I think it should be an option.
As far as the Markdown syntax is concerned, if you want to get that detailed, you'll just have to use HTML.
Hello, world!
Most Markdown engines I've seen allow plain old HTML, just for situations like this where a generic text markup system just won't cut it. (The StackOverflow engine, for example.) They then run the entire output through an HTML whitelist filter, regardless, since even a Markdown-only document can easily contain XSS attacks. As such, if you or your users want to create _blank links, then they probably still can.
If that's a feature you're going to be using often, it might make sense to create your own syntax, but it's generally not a vital feature. If I want to launch that link in a new window, I'll ctrl-click it myself, thanks.
Kramdown supports it. It's compatible with standard Markdown syntax, but has many extensions, too. You would use it like this:
[link](url){:target="_blank"}
I don't think there is a markdown feature, although there may be other options available if you want to open links which point outside your own site automatically with JavaScript.
Array.from(javascript.links)
.filter(link => link.hostname != window.location.hostname)
.forEach(link => link.target = '_blank');
jsFiddle.
If you're using jQuery:
$(document.links).filter(function() {
return this.hostname != window.location.hostname;
}).attr('target', '_blank');
jsFiddle.
With Markdown v2.5.2, you can use this:
[link](URL){:target="_blank"}
So, it isn't quite true that you cannot add link attributes to a Markdown URL. To add attributes, check with the underlying markdown parser being used and what their extensions are.
In particular, pandoc has an extension to enable link_attributes, which allow markup in the link. e.g.
[Hello, world!](http://example.com/){target="_blank"}
For those coming from R (e.g. using rmarkdown, bookdown, blogdown and so on), this is the syntax you want.
For those not using R, you may need to enable the extension in the call to pandoc with +link_attributes
Note: This is different than the kramdown parser's support, which is one the accepted answers above. In particular, note that kramdown differs from pandoc since it requires a colon -- : -- at the start of the curly brackets -- {}, e.g.
[link](http://example.com){:hreflang="de"}
In particular:
# Pandoc
{ attribute1="value1" attribute2="value2"}
# Kramdown
{: attribute1="value1" attribute2="value2"}
^
^ Colon
One global solution is to put <base target="_blank">
into your page's <head> element. That effectively adds a default target to every anchor element. I use markdown to create content on my Wordpress-based web site, and my theme customizer will let me inject that code into the top of every page. If your theme doesn't do that, there's a plug-in
Not a direct answer, but may help some people ending up here.
If you are using GatsbyJS there is a plugin that automatically adds target="_blank" to external links in your markdown.
It's called gatsby-remark-external-links and is used like so:
yarn add gatsby-remark-external-links
plugins: [
{
resolve: `gatsby-transformer-remark`,
options: {
plugins: [{
resolve: "gatsby-remark-external-links",
options: {
target: "_blank",
rel: "noopener noreferrer"
}
}]
}
},
It also takes care of the rel="noopener noreferrer".
Reference the docs if you need more options.
For ghost markdown use:
[Google](https://google.com" target="_blank)
Found it here:
https://cmatskas.com/open-external-links-in-a-new-window-ghost/
I'm using Grav CMS and this works perfectly:
Body/Content:
Some text[1]
Body/Reference:
[1]: http://somelink.com/?target=_blank
Just make sure that the target attribute is passed first, if there are additional attributes in the link, copy/paste them to the end of the reference URL.
Also work as direct link:
[Go to this page](http://somelink.com/?target=_blank)
You can do this via native javascript code like so:
var pattern = /a href=/g;
var sanitizedMarkDownText = rawMarkDownText.replace(pattern,"a target='_blank' href=");
JSFiddle Code
In my project I'm doing this and it works fine:
[Link](https://example.org/ "title" target="_blank")
Link
But not all parsers let you do that.
There's no easy way to do it, and like #alex has noted you'll need to use JavaScript. His answer is the best solution but in order to optimize it, you might want to filter only to the post-content links.
<script>
var links = document.querySelectorAll( '.post-content a' );
for (var i = 0, length = links.length; i < length; i++) {
if (links[i].hostname != window.location.hostname) {
links[i].target = '_blank';
}
}
</script>
The code is compatible with IE8+ and you can add it to the bottom of your page. Note that you'll need to change the ".post-content a" to the class that you're using for your posts.
As seen here: http://blog.hubii.com/target-_blank-for-links-on-ghost/
If someone is looking for a global rmarkdown (pandoc) solution.
Using Pandoc Lua Filter
You could write your own Pandoc Lua Filter which adds target="_blank" to all links:
Write a Pandoc Lua Filter, name it for example links.lua
function Link(element)
if
string.sub(element.target, 1, 1) ~= "#"
then
element.attributes.target = "_blank"
end
return element
end
Then update your _output.yml
bookdown::gitbook:
pandoc_args:
- --lua-filter=links.lua
Inject <base target="_blank"> in Header
An alternative solution would be to inject <base target="_blank"> in the HTML head section using the includes option:
Create a new HTML file, name it for example links.html
<base target="_blank">
Then update your _output.yml
bookdown::gitbook:
includes:
in_header: links.html
Note: This solution may also open new tabs for hash (#) pointers/URLs. I have not tested this solution with such URLs.
In Laravel I solved it this way:
$post->text= Str::replace('<a ', '<a target="_blank"', $post->text);
Not works for a specific link. Edit all links in the Markdown text. (In my case it's fine)
I ran into this problem when trying to implement markdown using PHP.
Since the user generated links created with markdown need to open in a new tab but site links need to stay in tab I changed markdown to only generate links that open in a new tab. So not all links on the page link out, just the ones that use markdown.
In markdown I changed all the link output to be <a target='_blank' href="..."> which was easy enough using find/replace.
I do not agree that it's a better user experience to stay within one browser tab. If you want people to stay on your site, or come back to finish reading that article, send them off in a new tab.
Building on #davidmorrow's answer, throw this javascript into your site and turn just external links into links with target=_blank:
<script type="text/javascript" charset="utf-8">
// Creating custom :external selector
$.expr[':'].external = function(obj){
return !obj.href.match(/^mailto\:/)
&& (obj.hostname != location.hostname);
};
$(function(){
// Add 'external' CSS class to all external links
$('a:external').addClass('external');
// turn target into target=_blank for elements w external class
$(".external").attr('target','_blank');
})
</script>
You can add any attributes using {[attr]="[prop]"}
For example [Google] (http://www.google.com){target="_blank"}
For completed alex answered (Dec 13 '10)
A more smart injection target could be done with this code :
/*
* For all links in the current page...
*/
$(document.links).filter(function() {
/*
* ...keep them without `target` already setted...
*/
return !this.target;
}).filter(function() {
/*
* ...and keep them are not on current domain...
*/
return this.hostname !== window.location.hostname ||
/*
* ...or are not a web file (.pdf, .jpg, .png, .js, .mp4, etc.).
*/
/\.(?!html?|php3?|aspx?)([a-z]{0,3}|[a-zt]{0,4})$/.test(this.pathname);
/*
* For all link kept, add the `target="_blank"` attribute.
*/
}).attr('target', '_blank');
You could change the regexp exceptions with adding more extension in (?!html?|php3?|aspx?) group construct (understand this regexp here: https://regex101.com/r/sE6gT9/3).
and for a without jQuery version, check code below:
var links = document.links;
for (var i = 0; i < links.length; i++) {
if (!links[i].target) {
if (
links[i].hostname !== window.location.hostname ||
/\.(?!html?)([a-z]{0,3}|[a-zt]{0,4})$/.test(links[i].pathname)
) {
links[i].target = '_blank';
}
}
}
Automated for external links only, using GNU sed & make
If one would like to do this systematically for all external links, CSS is no option. However, one could run the following sed command once the (X)HTML has been created from Markdown:
sed -i 's|href="http|target="_blank" href="http|g' index.html
This can be further automated by adding above sed command to a makefile. For details, see GNU make or see how I have done that on my website.
If you just want to do this in a specific link, just use the inline attribute list syntax as others have answered, or just use HTML.
If you want to do this in all generated <a> tags, depends on your Markdown compiler, maybe you need an extension of it.
I am doing this for my blog these days, which is generated by pelican, which use Python-Markdown. And I found an extension for Python-Markdown Phuker/markdown_link_attr_modifier, it works well. Note that an old extension called newtab seems not work in Python-Markdown 3.x.
For React + Markdown environment:
I created a reusable component:
export type TargetBlankLinkProps = {
label?: string;
href?: string;
};
export const TargetBlankLink = ({
label = "",
href = "",
}: TargetBlankLinkProps) => (
<a href={href} target="__blank">
{label}
</a>
);
And I use it wherever I need a link that open in a new window.
For "markdown-to-jsx" with MUI v5
This seem to work for me:
import Markdown from 'markdown-to-jsx';
...
const MarkdownLink = ({ children, ...props }) => (
<Link {...props}>{children}</Link>
);
...
<Markdown
options={{
forceBlock: true,
overrides: {
a: {
component: MarkdownLink,
props: {
target: '_blank',
},
},
},
}}
>
{description}
</Markdown>
This works for me: [Page Link](your url here "(target|_blank)")
I want to use a QGraphicWebView inside a delegate to render a QTableView cell, but I just don't know what to do with the QStyleOptionGraphicsItem parameter the paint() method requires. How to build it up / where should I retrieve it?
I'm using this code as reference, so the paint() method should be something like this:
def paint(self, painter, option, index):
web = QGraphicsWebView()
web.setHtml(some_html_text)
web.page().viewportSize().setWidth(option.rect.width())
painter.save()
painter.translate(option.rect.topLeft());
painter.setClipRect(option.rect.translated(-option.rect.topLeft()))
web.paint(painter, ??????) # what here?
painter.restore()
Any advice?
I'll assume that you don't really need QGraphicsWebView and that QWebView is sufficient.
It's important to keep in mind that you're not expected to call QWidget::paintEvent() yourself. Given that constraint, you'll want to use a helper class that can render on a paint device or render using a given painter. QWebFrame has one such method in the form of its render function. Based off of your linked-to example, the following should work:
class HTMLDelegate(QStyledItemDelegate):
def paint(self, painter, option, index):
model = index.model()
record = model.listdata[index.row()]
# don't instantiate every time, so move this out
# to the class level
web = QWebView()
web.setHtml(record)
web.page().viewportSize().setWidth(option.rect.width())
painter.save()
painter.translate(option.rect.topLeft());
painter.setClipRect(option.rect.translated(-option.rect.topLeft()))
web.page().mainFrame().render(painter)
painter.restore()