Check if a new if a new tab was opened with puppeteer - puppeteer

I am using Puppeteer to gather a list of companies that hire remotely. The page I am trying to collect the data from has a call to action that is not an <a href="" blank"_tab", but a <button></button>. The button doesn't have any data attribute either, that gives a hint as to what the URL of the page that gets opened in a new tab, when I click the button, is. I am now trying to find a way to catch the opening of the new tab, by clicking the said button, to then fetch the URL of the newly opened tab. Is this possible with puppeteer?

The solution is to extract all pages of the browser via the browser.pages() method call.
In your puppeteer project you always start by initialising the browser. The puppeteer virtual browser does contain the method pages(). Usage of pages().
What you have to do now: At first get the current url of the current page you are on. Then stream the result of the browser.pages() method. While streaming the array you can just get the index of your current tab. The tab you just opened using the button is the page with that index + 1 (the next element in the array)...
How to handle multiple tabs
Puppeteer Documentation about pages()

Related

Script error for "Multiple URLs copy in Sources/Network tab" [duplicate]

Is it possible to extract multiple resources' URLs in Sources or Network tab of Chrome Dev Tools?
When I want to get URL of a single resource, I can do it with context menu function Copy link address
I can switch to this resource from Network to Sources tab and vice versa, but what if I have a need to get URLs of multiple resources at once? It is very cumbersome to copy them manually if resultset consists of 200-300 resources.
What I've tried so far:
To copy the whole folder from a sources tab, but from this answer I found out it is not possible for now.
To use $(selector) as specified in the Console reference, in a form of
$('img')
in case we need to fetch image URLs.
The complexity of this approach is that it's often hard to distinguish target images on a page that has hundreds of them, and furthermore, multiple versions of the same image (views, previews, small-sized icons and etc.) I.e. to match the element inside the tag with the needed resource is not that easy, as it seems. Also not all the file types have dedicated tags (as in the case with img).
Maybe I should use src tag with some modifiers? Any other suggestions?
make sure Network panel is active
switch devtools Dock side in the menu to a detached (floating) window
Next time you can press CtrlShiftD to toggle docking.
in the now detached devtools press CtrlShifti or ⌘⌥i on MacOS,
which will open devtools-on-devtools in a new window
Run the following code in this new window:
copy(UI.panels.network.networkLogView.dataGrid.rootNode().flatNodes.map(n => n.request().url()).join('\n'))
It'll copy the URLs of all requests that match current filter to clipboard.
Hint: save the code as a Snippet and run it in devtools-on-devtools window via the commands palette, CtrlP or ⌘P then type the snippet's name.
In old Chrome the code was different:
copy(UI.panels.network._networkLogView._dataGrid._rootNode._flatNodes.map(n => n._request._url).join('\n'))
copy(UI.panels.network.networkLogView.dataGrid.rootNode().flatNodes.map(n => n.request().urlInternal).join('\n'))
I found the above method too clunky, its way easier to use fiddler:
open fiddler (possibly install it)
filter on the domain name that you are interested in
clear the screen
refresh the web page
Select the fiddler output
right click and select copy just the URL's
Based on #wOxxOm, this worked for me:
var nodes = UI.panels.network._networkLogView._dataGrid._rootNode._flatNodes,
urls = [];
nodes.forEach(function() {
var req = arguments[0]._request;
if (req !== undefined) {
urls.push(req.url());
}
});
A selection and simple copy (Ctrl+C) work for me.
I select URLs in the Url column by the mouse.
Then I use the context menu to copy the list to the clipboard.
The clipboard contents then I can paste to Excel and get the URL list. It adds some empty lines though.

Is the a function that allows a user to go back in history even if they open a new tab?

I have a back button that is suppose to lead me users back to a page in the history. But the issue is that when you right click and open in a new tab when the user clicks on the back button it does not work.
To clarify i have a page with products and there are buttons that go to the checkout page. On that check out page there is a back button using history.back(). but after testing opening the buy button on a new tab makes the back button unusable.
I need a way to prevents this please thank you.
I don't think there is a function for that, but I see two possibilities:
document.referrer (like APAD1 suggested in the comment section):
The referrer property returns the URL of the document that loaded the current document, hence if you do document.referrer, you will only get the URL from the page where you clicked the button to load the current page.
If you want to be able to not only go to the previous page but also remember the pages loaded before the previous page, then see next option
window.localStorage and document.referrer
Since document.referrer only remembers the previous document's URL, you can use window.localStorage to store the history. You can create an array as a localStorage item and add new URLs as you go forward and remove Urls as you go back to a previous page inside the new tab
More info:
- document.referrer
- document.referrer
- Window localStorage Property

Multiple URLs copy in Sources/Network tab

Is it possible to extract multiple resources' URLs in Sources or Network tab of Chrome Dev Tools?
When I want to get URL of a single resource, I can do it with context menu function Copy link address
I can switch to this resource from Network to Sources tab and vice versa, but what if I have a need to get URLs of multiple resources at once? It is very cumbersome to copy them manually if resultset consists of 200-300 resources.
What I've tried so far:
To copy the whole folder from a sources tab, but from this answer I found out it is not possible for now.
To use $(selector) as specified in the Console reference, in a form of
$('img')
in case we need to fetch image URLs.
The complexity of this approach is that it's often hard to distinguish target images on a page that has hundreds of them, and furthermore, multiple versions of the same image (views, previews, small-sized icons and etc.) I.e. to match the element inside the tag with the needed resource is not that easy, as it seems. Also not all the file types have dedicated tags (as in the case with img).
Maybe I should use src tag with some modifiers? Any other suggestions?
make sure Network panel is active
switch devtools Dock side in the menu to a detached (floating) window
Next time you can press CtrlShiftD to toggle docking.
in the now detached devtools press CtrlShifti or ⌘⌥i on MacOS,
which will open devtools-on-devtools in a new window
Run the following code in this new window:
copy(UI.panels.network.networkLogView.dataGrid.rootNode().flatNodes.map(n => n.request().url()).join('\n'))
It'll copy the URLs of all requests that match current filter to clipboard.
Hint: save the code as a Snippet and run it in devtools-on-devtools window via the commands palette, CtrlP or ⌘P then type the snippet's name.
In old Chrome the code was different:
copy(UI.panels.network._networkLogView._dataGrid._rootNode._flatNodes.map(n => n._request._url).join('\n'))
copy(UI.panels.network.networkLogView.dataGrid.rootNode().flatNodes.map(n => n.request().urlInternal).join('\n'))
I found the above method too clunky, its way easier to use fiddler:
open fiddler (possibly install it)
filter on the domain name that you are interested in
clear the screen
refresh the web page
Select the fiddler output
right click and select copy just the URL's
Based on #wOxxOm, this worked for me:
var nodes = UI.panels.network._networkLogView._dataGrid._rootNode._flatNodes,
urls = [];
nodes.forEach(function() {
var req = arguments[0]._request;
if (req !== undefined) {
urls.push(req.url());
}
});
A selection and simple copy (Ctrl+C) work for me.
I select URLs in the Url column by the mouse.
Then I use the context menu to copy the list to the clipboard.
The clipboard contents then I can paste to Excel and get the URL list. It adds some empty lines though.

Chrome send data from background page to browser action

So I have a background page that listens in on tab changes
var tabHandler={
onTabUpdate:function(tabId, changeInfo, tab){
},
tabChanged:function(activeInfo) {
function tabChanged(tab){
var parser = document.createElement('a');//To extract the hostname, we create dom element
parser.href = tab.url;
var regex=/^(www\.)?([^\.]+)/
var matches=regex.exec(parser.hostname)//This gives us the hostname, we extract the website name
var website=matches[2];
var data=getDataForWebsite(website);//Data is a json array
//TRANSFER 'data' to Browser popup so that it can be displayed.
}
chrome.tabs.get(activeInfo.tabId,tabChanged);
},
init:function(){
chrome.tabs.onActivated.addListener(this.tabChanged);
}
}
tabHandler.init();
This piece of code gets the nam of the website and fetches a list of parameters based on the website. Now that I have the data, I am wondering how to show this data in the browser action popup. I want to pass this data to the browser Action adn then parse it there to replace existing content. How do I do that?
One thing you need to remember is that popup pages don't live while the popup is closed (unlike background pages). That means that you can't just transfer the data to the popup page since it doesn't exist anywhere at all time. Instead, whenever the popup is opened, the first thing you need to do is request the info from storage and display it however you want.
In your background page, when you receive the data for the current domain, you should save it somewhere: that could be in localStorage, or sessionStorage, or chrome.storage (check the documentation to see which one would make more sense in your use-case). You would want to save it indexed on the domain most likely, so that you can have the info saved from all the open tabs if needed.
Then whenever the popup is open, get the data for the current tab from the storage you used, and display the data in whichever way you want.
You can directly access background window from popup by chrome.extension.getBackgroundPage().
You can also directly access popup's window from background when it is open by chrome.extension.getViews({type:'popup'})[0].
Using these methods you can implement a messaging between popup and background.

Duplicate a tab in Chrome without Reloading the Page?

Is there any way to completely duplicate the state of a current tab in Google Chrome? I want an exact copy of the current state of the page without having to reload the page in another tab.
An example use case:
While browsing a "slideshow" on a news website, I want to preserve the current slide that I'm on, but create a duplicate so that I can continue viewing the next slide. If I simply Right-Click and "Duplicate" the tab, the new page will completely Reload, reprocessing all of the Javascript and running the pre-slideshow advertisement again.
In short "NO" you can't.
I am not expert on this
but a similar behavior can be achieved in some ways i know :
Dump the whole DOM
Never tried this though. You can convert the DOM to a string, pass it to the new window and then parse it as a document. This will let you lose your DOM events and State manipulation javascript. (But that's good for your case)
var dtab = window.open('about:blank', 'duplicate_a_tab');
dtab.document.open();
dtab.document.write("... yout html string ..");
dtab.document.close();
Develop an extension
Let the users continue on the current tab with the current state, your extension should be able to capture the screenshot of that area and open that screenshot in new tab. There are plenty of screenshot taking extensions are available in the market.
If that website is your own
You can develop your services that uses state locally like progressive web apps. Give a link separately to 'duplicate' which will eventually open the same URL in different tab with the same local state and with the flag do-not-sync.
This will not work when the user uses browser inbuilt duplicate
feature.