How to use cheerio to get the URL of an image on a given page for ALL cases - html

right now I have a function that looks like this:
static getPageImg(url) {
return new Promise((resolve, reject) => {
//get our html
axios.get(url)
.then(resp => {
//html
const html = resp.data;
//load into a $
const $ = cheerio.load(html);
//find ourself a img
const src = url + "/" + $("body").find("img")[0].attribs.src;
//make sure there are no extra slashes
resolve(src.replace(/([^:]\/)\/+/g, "$1"));
})
.catch(err => {
reject(err);
});
});
}
this will handle the average case where the page uses a relative path to link to an image, and the host name is the same as the URL provided.
However,
most of the time the URL scheme will be more complex, like for example the URL might be stackoverflow.com/something/asdasd and what I need is to get stackoverflow.com/someimage link. Or the more interesting case where a CDN is used and the images come from a separate server. For example if I want to link to something from imgur ill give a link like : http://imgur.com/gallery/epqDj. But the actual location of the image is at http://i.imgur.com/pK0thAm.jpg a subdomain of the website. More interesting is the fact that if i was to get the src attribute I would have: "//i.imgur.com/pK0thAm.jpg".
Now I imagine there must be a simple way to get this image, as the browser can very quickly and easily do a "open window in new tab" so I am wondering if anyone knows an easy way to do this other than writing a big function that can handle all these cases.
Thank you!

This is my function that ended up working for all my test cases uysing nodes built in URL type. I had to just use the resolve function.
static getPageImg(url) {
return new Promise((resolve, reject) => {
//get our html
axios.get(url)
.then(resp => {
//html
const html = resp.data;
//load into a $
const $ = cheerio.load(html);
//find ourself a img
const retURL = nodeURL.resolve(url,$("body").find("img")[0].attribs.src);
resolve(retURL);
})
.catch(err => {
reject(err);
});
});
}

Related

How to use a Javascript file to refresh/reload a div from an HTML file?

I am using Node JS and have a JS file, which opens a connection to an API, works with the receving API data and then saves the changed data into a JSON file. Next I have an HTML file, which takes the data from the JSON file and puts it into a table. At the end I open the HTML file in my browser to look at the visualized table and its data.
What I would like to happen is, that the table (or more specific a DIV with an ID inside the table) from the HTML file refreshes itself, when the JSON data gets updated from the JS file. Kinda like a "live table/website", that I can watch change over time without the need to presh F5.
Instead of just opening the HTML locally, I have tried it by using the JS file and creating a connection with the file like this:
const http = require('http');
const path = require('path');
const browser = http.createServer(function (request, response) {
var filePath = '.' + request.url;
if (filePath == './') {
filePath = './Table.html';
}
var extname = String(path.extname(filePath)).toLowerCase();
var mimeTypes = {
'.html': 'text/html',
'.css': 'text/css',
'.png': 'image/png',
'.js': 'text/javascript',
'.json': 'application/json'
};
var contentType = mimeTypes[extname] || 'application/octet-stream';
fs.readFile(filePath, function(error, content) {
response.writeHead(200, { 'Content-Type': contentType });
response.end(content, 'utf-8');
});
}).listen(3000);
This creates a working connection and I am able to see it in the browser, but sadly it doesn't update itself like I wish. I thought about some kind of function, which gets called right after the JSON file got saved and tells the div to reload itself.
I also read about something like window.onload, location.load() or getElementById(), but I am not able to figure out the right way.
What can I do?
Thank you.
Websockets!
Though they might sound scary, it's very easy to get started with websockets in NodeJS, especially if you use Socket.io.
You will need two dependencies in your node application:
"socket.io": "^4.1.3",
"socketio-wildcard": "^2.0.0"
your HTML File:
<script type="module" src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.0.0/socket.io.js"></script>
Your CLIENT SIDE JavaScript file:
var socket = io();
socket.on("update", function (data) { //update can be any sort of string, treat it like an event name
console.log(data);
// the rest of the code to update the html
})
your NODE JS file:
import { Server } from "socket.io";
// other code...
let io = new Server(server);
let activeConnections = {};
io.sockets.on("connection", function (socket) {
// 'connection' is a "magic" key
// track the active connections
activeConnections[socket.id] = socket;
socket.on("disconnect", function () {
/* Not required, but you can add special handling here to prevent errors */
delete activeConnections[socket.id];
})
socket.on("update", (data) => {
// Update is any sort of key
console.log(data)
})
})
// Example with Express
app.get('/some/api/call', function (req, res) {
var data = // your API Processing here
Object.keys(activeConnections).forEach((conn) => {
conn.emit('update', data)
}
res.send(data);
})
Finally, shameful self promotion, here's one of my "dead" side projects using websockets, because I'm sure I forgot some small detail, and this might help. https://github.com/Nhawdge/robert-quest

Image src isn't displaying after a time but link works in browser [duplicate]

I created a script that extracts photos in the gallery of a certain profile…
Using instagram-web-api
Unfortunately now it no longer works, instagram does not return the image of the media
This is the mistake:
ERR_BLOCKED_BY_RESPONSE
Instagram has changed it’s CORS policy recently? How I can fix?
for php; I changed my img src to this and it works like charm! Assume that $image is the instagram image cdn link came from instagram page:
'data:image/jpg;base64,'.base64_encode(file_get_contents($image))
EDIT FOR BETTER SOLUTION
I have also noticed that, this method is causing so much latency. So I have changed my approach and now using a proxy php file (also mentioned on somewhere on stackoverflow but I don't remember where it is)
This is my common proxy file content:
<?php
function ends_with( $haystack, $needle ) {
return substr($haystack, -strlen($needle))===$needle;
}
if (!in_array(ini_get('allow_url_fopen'), [1, 'on', 'true'])) {
die('PHP configuration change is required for image proxy: allow_url_fopen setting must be enabled!');
}
$url = isset($_GET['url']) ? $_GET['url'] : null;
if (!$url || substr($url, 0, 4) != 'http') {
die('Please, provide correct URL');
}
$parsed = parse_url($url);
if ((!ends_with($parsed['host'], 'cdninstagram.com') && !ends_with($parsed['host'], 'fbcdn.net')) || !ends_with($parsed['path'], 'jpg')) {
die('Please, provide correct URL');
}
// instagram only has jpeg images for now..
header("Content-type: image/jpeg");
readfile( $url );
?>
Then I have just converted all my instagram image links to this (also don't forget to use urlencode function on image links):
./proxyFile.php?url=https://www.....
It worked like charm and there is no latency anymore.
now 100% working.
You can try this.
corsDown
Using the Google translation vulnerability, it can display any image URL, with or without permission. All these processes are done by the visitor's IP and computer.
I have the same problem, when I try to load a Instagram's pictures url (I tried with 3 IP addresses), I see this on the console:
Failed to load resource: net::ERR_BLOCKED_BY_RESPONSE
You can see it here, the Instagram image doesn't load (Actually, when I paste this url on google it works, but Instagram puts a timestamp on there pictures so, it's possible it won't work for you).
It's very recent, 3 days ago, it works with no issues.
<img src="https://scontent-cdt1-1.cdninstagram.com/v/t51.2885-19/s320x320/176283370_363930668352575_6367243109377325650_n.jpg?tp=1&_nc_ht=scontent-cdt1-1.cdninstagram.com&_nc_ohc=nC7FG1NNChYAX8wSL7_&edm=ABfd0MgBAAAA&ccb=7-4&oh=696d56547f87894c64f26613c9e44369&oe=60AF5A34&_nc_sid=7bff83">
The answer is as follows. You can use the imgproxy.php file. You can do it like this:
echo '<a href="' . $item->link . '" class="image" target="_blank">
<span style="background-image:url(imgproxy.php?url=' . urlencode($thumbnail) . ');"> </span>
</a>';
Using PHP
u can grab content of the image and show it in php file as an image by setting the header:
<?php
$img_ctn = file_get_contents("https://scontent-ber1-1.cdninstagram.com/v/......");
header('Content-type: image/png');
echo $img_ctn;
You can display the Image using Base64 encoded.
Base64 func based on #abubakar-ahmad answer.
JavaScript:
export const checkUserNameAndImage = (userName) => {
/* CALL THE API */
return new Promise((resolve, reject) => {
fetch(`/instagram`, {
method: "POST",
headers: {
Accept: "application/json",
"Content-Type": "application/json",
},
body: JSON.stringify({ userName }),
})
.then(function (response) {
return response.text();
})
/* GET RES */
.then(function (data) {
const dataObject = JSON.parse(data);
/* CALL BASE64 FUCNTION */
toDataUrl(dataObject.pic, function (myBase64) {
/* INSERT TO THE OBEJECT BASE64 PROPERTY */
dataObject.picBase64 = myBase64;
/* RETURN THE OBJECT */
resolve(dataObject);
});
})
.catch(function (err) {
reject(err);
});
});
};
Base64 func:
function toDataUrl(url, callback) {
var xhr = new XMLHttpRequest();
xhr.onload = function () {
var reader = new FileReader();
reader.onloadend = function () {
callback(reader.result);
};
reader.readAsDataURL(xhr.response);
};
xhr.open("GET", url);
xhr.responseType = "blob";
xhr.send();
}
Now, instead of using the original URL, use the picBase64 property:
<image src={data.picBase64)}/>
I have built a simple PHP based media proxy to minimize copy&paste.
https://github.com/skmachine/instagram-php-scraper#media-proxy-solving-cors-issue-neterr_blocked_by_response
Create mediaproxy.php file in web server public folder and pass instagram image urls to it.
<?php
use InstagramScraper\MediaProxy;
// use allowedReferersRegex to restrict other websites hotlinking images from your website
$proxy = new MediaProxy(['allowedReferersRegex' => "/(yourwebsite\.com|anotherallowedwebsite\.com)$/"]);
$proxy->handle($_GET, $_SERVER);
I was too lazy to do the suggested solutions and since i had a nodejs server sending me urls i just wrote new functions to get the images, convered them to base64 and sent them to my frontend. Yes it's slower and heavier but it gets the job done for me since i don't have a huge need for performance.
Fetch and return base64 from url snippet
const getBase64Image = async (url) => {
return new Promise((resolve, reject) => {
// Safety net so the entire up
// doesn't fucking crash
if (!url) {
resolve(null);
}
https
.get(url, (resp) => {
resp.setEncoding("base64");
body = "data:" + resp.headers["content-type"] + ";base64,";
resp.on("data", (data) => {
body += data;
});
resp.on("end", () => {
resolve(body);
});
})
.on("error", (e) => {
reject(e.message);
});
});
};
You don't need any external modules for this.

I'm getting a 404 error when trying to render my results for my API Hack assignment

I'm working on an API Hack assignment for my class with Thinkful and my issue has been that I've been trying to make a call to spoonacular's food api and render the results onto the DOM. However, when I try to do that, All I get in return is a 404 error. I'm wondering if i did something wrong or is some unforeseen problem that is beyond my control?
I've already look at manually typing the composed URL and postman as well.
function queryParams(params) {
const queryItems = Object.keys(params).map(key => `${encodeURIComponent(key)}= ${encodeURIComponent(params[key])}`)
return queryItems.join('&');
}
function displayResults(responseJson){
console.log(responseJson);
$('#results-list').empty();
for(let i = 0; i < responseJson.results.length; i++){
$('#results-list').append(
`<li><h3>${responseJson.results[i].id},${responseJson.results[i].protein}</h3>
<p>By ${responseJson.results[i].calories}</p>
<img src='${responseJson.results[i].image}'>
</li>`
)};
$('#results').removeClass('hidden');
};
function getRecipe(query,maxResults,){
const params ={
q:query,
number: maxResults,
};
const queryString = queryParams(params)
const url = searchUrl+'?'+ queryString +'?apiKey='+ apikey;
console.log(url);
fetch(url,option)
.then(response =>{
if(response.ok){
return response.json();
}
throw new Error(response.statusText);
})
.then(response => console.log(responseJson))
.catch(err =>{
$('#js-error-message').text(`Something went wrong: ${err.message}`);
});
}
function watchForm() {
$('form').submit(event => {
event.preventDefault();
const searchRecipe = $('.js-search-recipe').val();
const maxResults = $('.js-max-results').val();
getRecipe(searchRecipe, maxResults);
});
}
$(watchForm);
It looks like you have a couple issues:
First, you're constructing an invalid url:
const url = searchUrl+'?'+ queryString +'?apiKey='+ apikey;
notice the 2 ?s
Also, when you're constructing the query params, you're adding a space between the = and the value of your param
${encodeURIComponent(key)}= ${encodeURIComponent(params[key])}
If you're using the correct path and a valid API key, fixing those things may be enough to make it work.

Is it possible to ignore &#65279 in innerhtml

I have a line of code that looks
await page.$$eval("a", as => as.find(a => a.innerText.includes("shop")).click());
So, it will click at shop and all okay, but if shop is written like this - "S&#65279h&#65279op". So, puppeteer wouldn't be able to find it. Is it possible to ignore &#65279? So, that puppeteer would only see "shop".
You can decode the innerText using DOMParser. Example copied from this answer.
window.getDecodedHTML = function getDecodedHTML(encodedStr) {
const parser = new DOMParser();
const dom = parser.parseFromString(
`<!doctype html><body>${encodedStr}`,
"text/html"
);
return dom.body.textContent;
}
Save the above snippet to some file like script.js and inject it for easier usage.
await page.evaluate(fs.readFileSync('script.js', 'utf8'));
Now you can use it to decode the innerText.
await page.$$eval("a", as => as.find(a => getDecodedHTML(a.innerText).includes("shop")).click());
The solution might not be optimal. But it should work out.
Here is another snippet for you which doesn't require DOMparser.
window.getDecodedHTML = function(str) {
return str.replace(/&#(\d+);/g, function(match, dec) {
return String.fromCharCode(dec);
});
};

How to make a pdf Generator with ionic?

I want to make a pdf of the current page so the user can print it out but every page is dynamic so I will need a sort of a text to pdf generator to make it work.
It is an ionic2 app and is for like a recipe page so you can click on a button and it just makes a pdf out of the text.
Do you guys know how I can achieve that?
Your best bet is probably this plugin:
https://github.com/cesarvr/pdf-generator
give your dynamic page section an id like 'pdf-area' and then select it in your .ts file like this:
let content = document.getElementById('pdf-area').innerHTML
you can then turn that into a file or print it like this
cordova.plugins.pdf.htmlToPDF({
data: content,
type: "base64"
},
(success) => {
// you might have to turn the base64 into a binary blob to save it
// to a file at this point
},
(error) => console.log('error:', error);
);
}
put your html in assets folder and give the html path like this.
In my case this path is working:
var file = 'file:///android_asset/www/assets/lolc.html';
generatePdf(){
const before = Date.now();
document.addEventListener('deviceready', () => {
console.log('DEVICE READY FIRED AFTER', (Date.now() - before), 'ms');
var file = 'file:///android_asset/www/assets/lolc.html';
cordova.plugins.pdf.fromURL(file,{
documentSize: "A4",
landscape: "portrait",
type: "share"
}),
(sucess) => console.log('sucess: ', sucess),
(error) => console.log('error:', error);
});
}