puppeteer delete an Element for a screenshot - html

I'm trying to take a screenshot of a page but the cookies pages get in front of because of some scripts (My software should go on any website so I can't provide the HTML).
The things is the delete in puppeteer is not working and I need some help with that.
await this.sleep(5 * 1000);
let body = await browserPage.evaluate(() => {
document.querySelectorAll('#didomi-popup')[0].outerHTML = "";
return document.documentElement.outerHTML;
});
await browserPage.screenshot({path: "test.jpg", fullPage: true});
So Yes didomi-popup exist I even try document.querySelectorAll('#didomi-popup')[0].outerHTML = ""; on the console of chrome and it delete the element.
The waiting seem also good because the cookies's popup is in the body that page.evaluate return.
Anyone has an idea ?
Thanks a lot

Try this:
await page.goto('<url_here>');
let div_selector_to_remove= ".xj7.Kwh5n";
await page.evaluate((sel) => {
var elements = document.querySelectorAll(sel);
for(var i=0; i< elements.length; i++){
elements[i].parentNode.removeChild(elements[i]);
}
}, div_selector_to_remove)

By trying other things I constat that the popup keeps gettings back on the screenshot and so I delete also all script in Javascript to prevent it and now it works
This code delete the popup and the script and then it setup the webpage with no script and so the page is static now for the screenshot.
let body = await browserPage.evaluate(() => {
const deleteElement = function(selector) {for (let i = 0; i < selector.length; i++) {selector[i].outerHTML = "";}}
const elemToDelete = ["#didomi-popup", "script"];
for (let i = 0; i < elemToDelete.length; i++) {
deleteElement(document.querySelectorAll(elemToDelete[i]))
}
return document.documentElement.outerHTML;
});
await browserPage.setContent(body)
await browserPage.screenshot({path: path, fullPage: true});
I know for other people deleteing script can be an issue but I wait a 7 seconds before the eval and so I just want to take a picture of the webpage without doing any interraction with it so it works pretty well for me !

An alternative approach might be to use the element.screenshot({ ... }) API.
The idea here is to isolate the element you want to screenshot from the entire page, not just delete annoying elements blocking your screenshot.
browserPage.$('.my-css-class')
.then(element => {
if (element) {
return element.screenshot({});
}
// default to page
page.screenshot({});
});
For additional reading, check out Bannerbear's blog post.

Related

Can drawings be selected via apps script in Google Docs?

I have a document with Google drawings that for whatever reason are not selectable within the UI. I am not sure how they were ever placed.
I was hoping to write a script to delete them, but I'm not finding a function that applies to drawings specifically.
I'm wondering if anyone knows a trick to accomplish this..
The closest thing I found was their sample for deleting images:
function myFunction() {
var body = DocumentApp.getActiveDocument().getBody();
// Remove all images in the document body.
var imgs = body.getImages();
for (var i = 0; i < imgs.length; i++) {
// Retrieve the paragraph's attributes.
var atts = imgs[i].getAttributes();
// Log the paragraph attributes.
for (var att in atts) {
Logger.log(att + ":" + atts[att]);
}
imgs[i].removeFromParent();
}
}
Never too late (I hope). The trick here is that inline drawings (InlineDrawing) are children of Paragraph or ListItem (source).
If you want to remove some inline drawings, the code below works for me. If you want to find all drawings, please see the TODO comment. It is a simple code, please enhance it if you intend to use it. Just for reference.
Unfortunately, to this time, I didn't find out how to remove drawings that are not inline (drawings that are above or below text). Please forgive my limitation.
function eraseSomeDrawingsFromDoc() {
var body = DocumentApp.getActiveDocument().getBody();
const paragraphs = body.getParagraphs()
paragraphs.forEach(paragraph => {
const childIfAny = paragraph.getNumChildren() > 0 && paragraph.getChild(0) //TODO: analyze all children
const childType = childIfAny && childIfAny.getType()
const iAmADrawing = childType === DocumentApp.ElementType.INLINE_DRAWING
if(iAmADrawing) childIfAny.removeFromParent()
})
}

Puppeteer - How to fill form that is inside an iframe?

I have to fill out a form that is inside an iframe, here the sample page. I cannot access by simply using page.focus() and page.type(). I tried to get the form iframe by using const formFrame = page.mainFrame().childFrames()[0], which works but I cannot really interact with the form iframe.
I figured it out myself. Here's the code.
console.log('waiting for iframe with form to be ready.');
await page.waitForSelector('iframe');
console.log('iframe is ready. Loading iframe content');
const elementHandle = await page.$(
'iframe[src="https://example.com"]',
);
const frame = await elementHandle.contentFrame();
console.log('filling form in iframe');
await frame.type('#Name', 'Bob', { delay: 100 });
Instead of figuring out how to get inside the iFrame and type, I would simplify the problem by navigating to the IFrame URL directly
https://warranty.goodmanmfg.com/registration/NewRegistration/NewRegistration.aspx?Sender=Goodman
Make your script directly go to the above URL and try automating, it should work
Edit-1: Using frames
Since the simple approach didn't work for you, we do it with the frames itself
Below is a simple script which should help you get started
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://www.goodmanmfg.com/product-registration', { timeout: 80000 });
var frames = await page.frames();
var myframe = frames.find(
f =>
f.url().indexOf("NewRegistration") > -1);
const serialNumber = await myframe.$("#MainContent_SerNumText");
await serialNumber.type("12345");
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
The output is
If you can't select/find iFrame read this:
I had an issue with finding stripe elements.
The reason for that is the following:
You can't access an with different origin using JavaScript, it would be a huge security flaw if you could do it. For the same-origin policy browsers block scripts trying to access a frame with a different origin. See more detailed answer here
Therefore when I tried to use puppeteer's methods:Page.frames() and Page.mainFrame(). ElementHandle.contentFrame() I did not return any iframe to me. The problem is that it was happening silently and I couldn't figure out why it couldn't find anything.
Adding these arguments to launch options solved the issue:
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process'
Though you have figured out but I think I have better solution. Hope it helps.
async doFillForm() {
return await this.page.evaluate(() => {
let iframe = document.getElementById('frame_id_where_form_is _present');
let doc = iframe.contentDocument;
doc.querySelector('#username').value='Bob';
doc.querySelector('#password').value='pass123';
});
}

How to first read and the write to a file

I am trying to read tags, change tags,, and then and write back data to the same file. Reading tags is no problem but write are not working. Data seems not to be saved to disk.
My test case.
var file = await KnownFolders.MusicLibrary.GetFileAsync("sampleUnderTest.wav");
var content = await ReadFileAsynch(file);
content.Add("newTag","newValue")
await SubmittFileAsynch(file, content)
var newContent = await ReadFileAsynch(file);
Assert.NotEqual(content,newContent)
The Assert throws a Exception because content and newContent are identical.
This is part my code:
public async Task SubmittFileAsynch(StorageFile file, Content content)
{
var accessStream = await file.OpenAsync(FileAccessMode.ReadWrite);
using (accessStream)
await CommitAsynch(accessStream, content);
}
internal async Task CommitAsynch(IRandomAccessStream accessStream, Content content)
{
List<byte[]> tbw = new List<byte[]>();
using (var InStream = accessStream.AsStreamForRead())
{
using (var br = new BinaryReader(InStream))
{
//Reads file until it reaches the part of the containing tags and
//that are going to be modified by content.
//tbw is created from data in content and from the read process.
await BuildAsync(br, content, tbw);
using (var outstream = (accessStream.GetOutputStreamAt(0)))
using (var bw = new DataWriter(outstream))
{
await CommitAsync(tbw, bw);
await bw.FlushAsync();
}
}
}
}
private async Task CommitAsync(List<byte[]> tbw, DataWriter bw)
{
await Task.Run(() =>
{
foreach (byte[] buf in tbw)
bw.WriteBytes(buf);
});
}
I can't see what I am doing wrong ,, and are hoping for some help.
//lg
I am back smile
Finally I solved my problem. There where some threading issues in my code. I changed the write part to:
using (var bw = new DataWriter(outstream))
{
//await CommitAsync(tbw, bw);
foreach (byte[] buf in tbw)
bw.WriteBytes(buf);
await bw.StoreAsync();
bw.DetachStream();
}
await outstream.FlushAsync();
Now it works nicely. I am not sure if there are som part of the code that are redundant,, with respect to the "using" clause.
If anyone can tip me about some good reading regarding fileIO, streams etc,,, I would be greatful. All that I found on internet,, was very basic,,
Hope this will help someone else.

Repeatedly Grab DOM in Chrome Extension

I'm trying to teach myself how to write Chrome extensions and ran into a snag when I realized that my jQuery was breaking because it was getting information from the extension page itself and not the tab's current page like I had expected.
Quick summary, my sample extension will refresh the page every x seconds, look at the contents/DOM, and then do some stuff with it. The first and last parts are fine, but getting the DOM from the page that I'm on has proven very difficult, and the documentation hasn't been terribly helpful for me.
You can see the code that I have so far at these links:
Current manifest
Current js script
Current popup.html
If I want to have the ability to grab the DOM on each cycle of my setInterval call, what more needs to be done? I know that, for example, I'll need to have a content script. But do I also need to specify a background page in my manifest? Where do I need to call the content script within my extension? What's the easiest/best way to have it communicate with my current js file on each reload? Will my content script also be expecting me to use jQuery?
I know that these questions are basic and will seem trivial to me in retrospect, but they've really been a headache trying to explore completely on my own. Thanks in advance.
In order to access the web-pages DOM you'll need to programmatically inject some code into it (using chrome.tabs.executeScript()).
That said, although it is possible to grab the DOM as a string, pass it back to your popup, load it into a new element and look for what ever you want, this is a really bad approach (for various reasons).
The best option (in terms of efficiency and accuracy) is to do the processing in web-page itself and then pass just the results back to the popup. Note that in order to be able to inject code into a web-page, you have to include the corresponding host match pattern in your permissions property in manifest.
What I describe above can be achieved like this:
editorMarket.js
var refresherID = 0;
var currentID = 0;
$(document).ready(function(){
$('.start-button').click(function(){
oldGroupedHTML = null;
oldIndividualHTML = null;
chrome.tabs.query({ active: true }, function(tabs) {
if (tabs.length === 0) {
return;
}
currentID = tabs[0].id;
refresherID = setInterval(function() {
chrome.tabs.reload(currentID, { bypassCache: true }, function() {
chrome.tabs.executeScript(currentID, {
file: 'content.js',
runAt: 'document_idle',
allFrames: false
}, function(results) {
if (chrome.runtime.lastError) {
alert('ERROR:\n' + chrome.runtime.lastError.message);
return;
} else if (results.length === 0) {
alert('ERROR: No results !');
return;
}
var nIndyJobs = results[0].nIndyJobs;
var nGroupJobs = results[0].nGroupJobs;
$('.lt').text('Indy: ' + nIndyJobs + '; '
+ 'Grouped: ' + nGroupJobs);
});
});
}, 5000);
});
});
$('.stop-button').click(function(){
clearInterval(refresherID);
});
});
content.js:
(function() {
function getNumberOfIndividualJobs() {...}
function getNumberOfGroupedJobs() {...}
function comparator(grouped, individual) {
var IndyJobs = getNumberOfIndividualJobs();
var GroupJobs = getNumberOfGroupedJobs();
nIndyJobs = IndyJobs[1];
nGroupJobs = GroupJobs[1];
console.log(GroupJobs);
return {
nIndyJobs: nIndyJobs,
nGroupJobs: nGroupJobs
};
}
var currentGroupedHTML = $(".grouped_jobs").html();
var currentIndividualHTML = $(".individual_jobs").html();
var result = comparator(currentGroupedHTML, currentIndividualHTML);
return result;
})();

Reload Chrome extension tabs

Is there some way to get the tab id's of only the tabs that are part of my extension?
myTabs[i].location.reload()
makes error:
Uncaught TypeError: Cannot call method 'reload' of undefined
but this code:
chrome.tabs.executeScript(myTabs[i].id, {code:"document.location.reload(true);"});
works. But better to use correct method:
chrome.tabs.reload(myTabs[i].id)
This really depends on what you mean by "part of my extension".
If you mean tabs that are displaying a page that is contained within your extension you can do the following;
chrome.tabs.query({}, function (tabs) {
var myTabs = [];
for (var i = 0; i < tabs.length; i++) {
if (tabs[i].url.indexOf(chrome.extension.getURL('')) === 0) {
myTabs.push(tabs[i].id);
}
}
console.log(myTabs);
});
If you want to access the DOM of your tabs instead, it gets even easier;
var myTabs = chrome.extension.getViews({type: 'tab'});
With access to the DOM you can simply iterate of each view (DOMWindow) and refresh each page;
for (var i = 0; i < myTabs.length; i++) {
myTabs[i].location.reload()
}
I've tried to do the same thing and using chrome.tabs.query and chrome.tabs.get didn't work for me. This worked perfectly:
var data = chrome.extension.getViews({'type':'tab'});
$.each(data, function(k,v) {
v.window.location.reload();
});