Sessions and concurrency and how they are related

Sessions and concurrency and how they are related - puppeteer

I'm building a PuppeteerCrawler and I have to login to a certain website. But the website doesn't allow for multiple browsers to be using the same account at the same time. From my understanding, the session is persisted to a single IP, but how can i make that session also be exclusive to an browser instance?
I'm also using 10 input users to be rotated by the following function.
exports.authenticate = async (page) => {
const { users } = await Apify.getInput();
const user = Math.round(Math.random() * 10 );
let isLogged = await loggedCheck(page);
if (!isLogged) {
log.debug(`Cookies from cache didn't work, trying to login..`);
await page.type('input[name="email"]', users[user].username);
await page.type('input[name="password"]', users[user].password);
await page.click('input[name="submit"]');
isLogged = await loggedCheck(page);
}
if (!isLogged) {
throw new Error('Didn\'t work!');
}
};

By default, session IPs are exclusive to a browser instance and in PuppeteerCrawler they can be managed using the SessionPool
Check this out, should be helpful: https://sdk.apify.com/docs/guides/session-management

Related

What are the necessary cookies to set so the "Before you continue to Google" page won't appear?

I'm scraping data from Google using Puppeteer. But before Puppeteer gets to the google page, an annoying popup appears (screenshot).
I want to prevent this from happening so that I don't have to click the "reject" / "allow" button in Puppeteer every time. What are the necessary cookies to achieve it?
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto("https://google.com/")
const cookies = [
{
domain: '.google.com',
expirationDate: 9999999999.298648,
name: 'CONSENT',
...
},
]
await page.setCookie(...cookies)

Instead of trying to hard code the cookies, just save them.
Come up with a file path where the cookies file will be saved, e.g. cookiesFilePath.
After launching puppeteer, check to see if a file exists at the path. If it exists load it.
Keep your code for navigating the consent form, as you will use it at least once. At the end of the accepting/rejecting, save page.cookies to cookiesFilePath
const cookiesPath = './cookies.json';
const browser = await puppeteer.launch()
const page = await browser.newPage()
if(await fs.existsSync(cookiesPath)){
let cookies = require(cookiesPath)
for(let cookie of cookies){
await page.setCookie(cookie)
}
}
await page.goto("https://google.com/")
const hasConsentForm = async ()=>{
// create query that will return consent form
const consentFormQuerySelector=''
// I dont remember if page.$eval querySelector fails
// throws errors or not
try{
return await page.$eval(consentFormQuerySelector,el=>Boolean(el))
}catch(err){
console.log(err)
return false
}
}
const navigateConsentForm = async ()=>{
// add your existing code that navigates the consent form
// ...
// after the consent form is cleared, save cookies
let cookies = await page.cookies()
fs.writeFileSync(cookiesPath,JSON.stringify(cookies,null,2))
}
if(await hasConsentForm()){
await navigateConsentForm()
}
Assuming that the lack of cookies or the lack of a certain cookie property is the reason that the consent form pops up, this is will make it so that after running once, the form doesnt show again.

Google App Scripts User Cache Service: Accessing Cache Information in a Different Script but Same User

Looking into a way of sharing data via Google App Scripts's Cache Services from one web app to another.
Users load up the first webpage and filled out their information. Once submitted a function is run on this data and stored via the cache.
CacheService.getUserCache().put('FirstName','David')
CacheService.getUserCache().put('Surname','Armstrong')
Console log shows reports back that these two elements have been saved to cache.
However in the second web app when cache is called upon the console log returns null
var cache = CacheService.getUserCache().get('Firstname');
var cache2 = CacheService.getUserCache().get('Surname');
console.log(cache)
console.log(cache2)
Any ideas?

A possible solution would be to implement a service to synchronize the cache between web apps.
This can be achieved by creating a WebApp that via POST allows to add to the ScriptCache of the "Cache Synchronizer" the UserCache of the individual Web Apps.
The operation would be very simple:
From the web app that we want to synchronize, we check if we have cache of the user.
If it exists, we send it to the server so that it stores it.
If it does not exist, we check if the server has stored the user's cache.
Here is a sketch of how it could work.
CacheSync.gs
const cacheService = CacheService.getScriptCache()
const CACHE_SAVED_RES = ContentService
.createTextOutput(JSON.stringify({ "msg": "Cache saved" }))
.setMimeType(ContentService.MimeType.JSON)
const doPost = (e) => {
const { user, cache } = JSON.parse(e.postData.contents)
const localCache = cacheService.get(user)
if (!localCache) {
/* If no local data, we save it */
cacheService.put(user, JSON.stringify(cache))
return CACHE_SAVED_RES
} else {
/* If data we send it */
return ContentService
.createTextOutput(JSON.stringify(localCache))
.setMimeType(ContentService.MimeType.JSON)
}
}
ExampleWebApp.gs
const SYNC_SERVICE = "<SYNC_SERVICE_URL>"
const CACHE_TO_SYNC = ["firstName", "lastName"]
const cacheService = CacheService.getUserCache()
const syncCache = () => {
const cache = cacheService.getAll(CACHE_TO_SYNC)
const options = {
method: "POST",
payload: JSON.stringify({
user: Session.getUser().getEmail(),
cache
})
}
if (Object.keys(cache).length === 0) {
/* If no cache try to fetch it from the cache service */
const res = UrlFetchApp.fetch(SYNC_SERVICE, options)
const parsedResponse = JSON.parse(JSON.parse(res.toString()))
Object.keys(parsedResponse).forEach((k)=>{
console.log(k, parsedResponse[k])
cacheService.put(k, parsedResponse[k])
})
} else {
/* If cache send it to the sync service */
const res = UrlFetchApp.fetch(SYNC_SERVICE, options)
console.log(res.toString())
}
}
const createCache = () => {
cacheService.put('firstName', "Super")
cacheService.put('lastName', "Seagull")
}
const clearCache = () => {
cacheService.removeAll(CACHE_TO_SYNC)
}
Additional information
The synchronization service must be deployed with ANYONE access. You can control the access via an API_KEY.
This is just an example, and is not fully functional, you should adapt it to your needs.
The syncCache function of the web App is reusable, and would be the function you should use in all Web Apps.
There is a disadvantage when retrieving the cache, since you must provide the necessary keys, which forces you to write them manually (ex CACHE_TO_SYNC).
It could be considered to replace ScriptCache with ScriptProperties.
Documentation
Cache
Properties
Session

The doc says:
Gets the cache instance scoped to the current user and script.
As it is scoped to the script, accessing from another script is not possible. This is also the case with PropertiesService:
Properties cannot be shared between scripts.
To share, you can use a common file shared between them, like a drive text file or a spreadsheet.

Ethers.js event listeners - Expected behavior for many getLogs/chainId/blocknumber requests?

I'm building a minting site that requires me to check the number of NFTs minted and display that number in real time to the user.
At first I was just making a request every few seconds to retrieve the number, but then I figured I could use an event listener to cut down on the requests, as people would only be minting in short bursts.
However, after using the event listener, the volume of requests has gone way up. Looks like it is constantly calling blockNumber, chainId, and getLogs. Is this just how an event listener works under the hood? Or do am I doing something wrong here?
This is a next js API route and here is the code:
// Next.js API route support: https://nextjs.org/docs/api-routes/introduction
import { ethers } from 'ethers'
import { contractAddress } from '../../helpers'
import type { NextApiRequest, NextApiResponse } from 'next'
import abi from '../../data/abi.json'
const NEXT_PUBLIC_ALCHEMY_KEY_GOERLI =
process.env.NEXT_PUBLIC_ALCHEMY_KEY_GOERLI
let count = 0
let lastUpdate = 0
const provider = new ethers.providers.JsonRpcProvider(
NEXT_PUBLIC_ALCHEMY_KEY_GOERLI,
'goerli'
)
const getNumberMinted = async () => {
console.log('RUNNING NUMBER MINTED - MAKING REQUEST', Date.now())
const provider = new ethers.providers.JsonRpcProvider(
NEXT_PUBLIC_ALCHEMY_KEY_GOERLI,
'goerli'
)
const contract = new ethers.Contract(contractAddress, abi.abi, provider)
const numberMinted = await contract.functions.totalSupply()
count = Number(numberMinted)
lastUpdate = Date.now()
}
const contract = new ethers.Contract(contractAddress, abi.abi, provider)
contract.on('Transfer', (to, amount, from) => {
console.log('running event listener')
if (lastUpdate < Date.now() - 5000) {
getNumberMinted()
}
})
export default function handler(req: NextApiRequest, res: NextApiResponse) {
try {
res.setHeader('Content-Type', 'application/json')
res.status(200).json({ count })
} catch (err) {
res
.status(500)
.json({ error: 'There was an error from the server, please try again' })
}
}

If you use the AlchemyProvider or directly the StaticJsonRpcProvider (which ApchemyProvider inherits) you will eliminate the chainId calls; those are used to ensure the network hasn’t changed, but if you using a third-party service, like Alchemy or INFURA, this isn’t a concern which is why the StaticJsonRpcProvider exists. :)
Then every pollingInterval, a getBlockNumber is made (because this is a relatively cheap call) to detect when a new block occurs; when a new block occurs, it uses the getLogs method to find any logs that occurred during that block. This minimizes the number of expensive getLogs method.
You can increase or decrease the pollingInterval to trade-off latency for server resource cost.
And that’s how events work. :)
Does that make sense?

Cannot get image from ipfs in vuejs

I want to get an image from IPFS into my Vue/Nuxt project. I already import ipfs by 'npm i ipfs'. But when i run "cont node = Ipfs.create()". it show error
but this error doesn't always happen, many times it works and I can get the image normally. Has anyone ever encountered this situation and have a solution?
async downloadImg () {
const node = await Ipfs.create()
const { agentVersion, id } = await node.id()
this.agentVersion = agentVersion
this.id = id
const cid = '/ipfs/QmY2dod6X7GFmqnQ6qCBiaeNxJWa3CYQaxEjGUfL5CqMAj'
// load the raw data from js-ipfs (>=0.40.0)
const bufs = []
const a = node.cat(cid)
for await (const buf of node.cat(cid)) {
bufs.push(buf)
}
const data = Buffer.concat(bufs)
const blob = new Blob([data], { type: 'image/jpg' })
this.imageSrc = window.URL.createObjectURL(blob)
},

If am I true it can depends on the web protocol which you use, if you use https its work and in other protocol not !

The web crypto API is only available on pages accessed via https. If you're seeing that message, you are probably accessing the page via plain http.
Follow the github link at the top of the stack trace for more explanation and solutions.

BrowserSync share session

I'm wanting to use BrowserSync for some testing and development on a COTS (commercial, of the shelf) system - think like Sharepoint, but it's not Sharepoint.
As this is a COTS system, one of the security features that we cannot disable is that it will only allow one active session per user id. Having multiple browsers synced and trying to login will fail as the COTS system detects more than one user login.
Is there any way to have browser sync treat a window/browser as the 'master' session and simply re-draw the 'slaves' using the response from the master window? As opposed to copying all actions across and causing multiple requests to be sent from different browsers?

I had the same issue here is my solution (I save cookies in BS and use it all the time, its works well for me):
gulp.task('browser-sync', function () {
var cookies = {};
browserSync.init({
proxy: {
target: "localhost",
proxyReq: [
function (proxyReq) {
var cookieText = Object.keys(cookies).map(function (name) {
return name + '=' + cookies[name];
}).join('; ')
if (proxyReq._headers.cookie) {
proxyReq.setHeader('cookie', cookieText);
}
}
],
proxyRes: [
function (proxyRes, req, res) {
if (proxyRes.headers && proxyRes.headers['set-cookie']) {
proxyRes.headers['set-cookie'].forEach(function (cookie) {
var name, value;
var t = cookie.split(';')[0].split('=');
name = t[0];
value = t[1];
cookies[name] = value;
});
}
}
]
}
});
});

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Sessions and concurrency and how they are related - puppeteer

By default, session IPs are exclusive to a browser instance and in PuppeteerCrawler they can be managed using the SessionPool Check this out, should be helpful: https://sdk.apify.com/docs/guides/session-management

Related

What are the necessary cookies to set so the "Before you continue to Google" page won't appear?

Google App Scripts User Cache Service: Accessing Cache Information in a Different Script but Same User

Ethers.js event listeners - Expected behavior for many getLogs/chainId/blocknumber requests?

Cannot get image from ipfs in vuejs

BrowserSync share session

Categories

Resources