Get the text of the current node only - cheerio

In Cheerio, how do you get just the text of the current node only?
var cheerio = require('cheerio')
const htmlString = '<div>hello<span>world</span></div>'
$ = cheerio.load(htmlString, { ignoreWhitespace: true })
console.log($('div').text()) //helloworld
console.log($('span').text()) //world
How do you get just hello?

You can do this:
console.log($('div').contents().first().text()) # hello

Related

how to url split and use the second element as a new url

I try to split url with '?' and use the second element on html
example:
https://url/page?google.com
the output I want to receive is: google.com
and redirect the page to the output, I'm using webflow so if anyone can help with a full script it will be amazing.
I tried:
window.location.replace(id="new_url");
let url = window.location;
const array = url.split("?");
document.getElementById("new_url").innerHTML = array[1];
but it doesn't work :(
window.location.replace(id="new_url"); is not valid syntax.
window.location.replace(new_url); where new_url contained a valid URL would instantly change the page and ignore all other script after it.
I assume you can use the URL api?
Note
your parameter is non-standard
you need to add protocol (https://) to go to the URL
Here is a complicated version, but using a standard tool
const urlString = "https://url/page?google.com"
const url = new URL(urlString)
console.log(url.toString())
const firstSearchKey = [...url.searchParams.entries()][0][0]; // normally parameter=value
console.log(firstSearchKey)
location.replace(`https://${firstSearchKey}`)
Here is a simpler version
const urlString = "https://url/page?google.com"
const [origin,passedUrl] = urlString.split("?");
location.replace(`https://${passedUrl}`)
Try this
const url = window.location.search.split("?")[1]
window.location.href = url
let url = "https://url/page?google.com"
const regex = /\?(.*)/;
let res = regex.exec(url)
console.log(res[1])
Is this what you want?
const inputUrl = window.location.href // ex. https://url/page?google.com
const splitUrl = inputUrl.split("?") // = ["https://url/page", "google.com"]
const targetUrl = splitUrl[1] // = "google.com"
window.location.href = targetUrl // sets current window URL to google.com

Cheerio scraper won't find any links in a sitemap

I'm trying to fetch URLs from a sitemap (XML) that I want to scrape.
I tried using the standard Cheerio template for this but it keeps returning that no URLs are found.
Any idea why this happens?
const Apify = require("apify");
const cheerio = require("cheerio");
Apify.main(async () => {
const input = await Apify.getInput();
// Download sitemap
const xml = await Apify.utils.requestAsBrowser({
url: input?.url || "https://www.example.com/product-sitemap2.xml",
headers: {
"User-Agent": "curl/7.54.0",
},
});
// Parse sitemap and create RequestList from it
// const $ = cheerio.load(xml.toString());
const $ = cheerio.load(xml);
const sources = [];
$("loc").each(function (val) {
const url = $(this).text().trim();
sources.push({
url,
headers: {
// NOTE: Otherwise the target doesn't allow to download the page!
"User-Agent": "curl/7.54.0",
},
});
});
});
.
It seems you need to use xml.body instead of xml.
Docs for Apify.utils.requestAsBrowser function.
const $ = cheerio.load(xml.body);
You trying outdated version, the latest is https://crawlee.dev/docs/examples/crawl-sitemap

CheerioJS to parse data on script tag

I've been trying to parse the data that is in the script tag using cheerio however It's been difficult for the following reasons.
Can't parse string that is generated into JSON because of html-entities
More Info:
Also what is strange to me is that you have to re-load the content into cheerio a second time to get the text.
Your welcome to fork this replit or copy and paste the code to try it yourself
https://replit.com/#Graciasc/Cheerio-Script-Parse
const cheerio = require('cheerio')
const {decode} = require('html-entities')
const html = `
<body>
<script type="text/javascript"src="/data/common.0e95a19724a68c79df7b.js"></script>
<script>require("dynamic-module-registry").set("from-server-context", JSON.parse("\x7B\x22data\x22\x3A\x7B\x22available\x22\x3Atrue,\x22name\x22\x3A"Gracias"\x7D\x7D"));</script>
</body>
`;
const $ = cheerio.load(html, {
decodeEntities: false,
});
const text = $('body').find('script:not([type="text/javascript"])');
const cheerioText = text.eq(0).html();
//implement a better way to grab the string
const scriptInfo = cheerio.load(text.eq(0).html()).text();
const regex = new RegExp(/^.*?JSON.parse\(((?:(?!\)\);).)*)/);
const testing = regex.exec(scriptInfo)[1];
// real output:
//\x7B\x22data\x22\x3A\x7B\x22available\x22\x3Atrue,\x22name\x22\x3A"Gracias"\x7D\x7D when logged
console.log(testing)
// Not Working
const json = JSON.parse(testing)
const decoding = decode(testing)
// same output as testing
console.log(decoding)
// Not working
console.log('decode', JSON.parse(decoding))
//JSON
{ Data: { available: true, name: 'Gracias' } }
A clean solution is to use JSDOM
repl.it link( https://replit.com/#Graciasc/Cheerio-Script-Parse#index.js)
const { JSDOM } = require('jsdom')
const dom = new JSDOM(`<body>
<script type="text/javascript"src="/data/common.0e95a19724a68c79df7b.js"></script>
<script>require("dynamic-module-registry").set("from-server-context", JSON.parse("\x7B\x22data\x22\x3A\x7B\x22available\x22\x3Atrue,\x22name\x22\x3A"Gracias"\x7D\x7D"));</script>
</body>`)
const serializedDom = dom.serialize()
const regex = new RegExp(/^.*?JSON.parse\("((?:(?!"\)\);).)*)/gm);
const jsonString = regex.exec(serializedDom)[1];
console.log(JSON.parse(jsonString))
// output: { data: { available: true, name: 'Gracias' } }

How to populate a HTML page with Firebase data?

I am trying to populate a page with firebase data.
This is my firebase data structure...
What I want is to create number of divs according to the number of posts in firebase. And in the divs with title and subtitle in h2 tag and p tag.
I am new to firebase soo any help would be appreciated...
and also i want to limit the number of divs to 4 starting from the latest post.
this is my java script
firebase.initializeApp(firebaseConfig);
var postsRef = firebase.database().ref("posts").orderByKey();
postsRef.once("value").then(function (snapshot) {
snapshot.forEach(function (childSnapshot) {
var key = childSnapshot.key;
var childData = childSnapshot.val();
var name_val = childSnapshot.val().title;
var id_val = childSnapshot.val().subtitle;
console.log(name_val);
var post = document.getElementById('#tst-post');
var divh2 = document.createElement('h2');
divh2.innerText - childData.val().title + "---" + JSON.stringify(childData.val());
$(post).append(divh2);
});
});
i dont know what i am doing in this code, I just watched some tutorials. Please help me.
You are not very far from a result.
By searching on the internet (https://www.google.com/search?client=firefox-b-d&q=how+to+dynamically+create+div+in+javascript) you can easily find a lot of examples on how to create DIVs dynamically. For example: https://stackoverflow.com/a/50950179/3371862
Then, in the Firebase Realtime Database documentation you find how to filter data and in particular how to "Sets the maximum number of items to return from the end of the ordered list of results" with limitToLast().
So if you put all of that together as follows, it should do the trick:
<script>
var postsRef = firebase
.database()
.ref('posts')
.orderByKey()
.limitToLast(4);
postsRef.once('value').then(function(snapshot) {
snapshot.forEach(function(childSnapshot) {
var key = childSnapshot.key;
var childData = childSnapshot.val();
var name_val = childSnapshot.val().title;
var id_val = childSnapshot.val().subtitle;
createDiv(name_val, id_val);
});
});
function createDiv(title, subtitle) {
var myDiv = document.createElement('DIV'); // Create a <div> node
var myTitle = document.createTextNode(title); // Create a text node
myDiv.appendChild(myTitle); // Append the text
var mySubtitle = document.createTextNode(subtitle); // Create a text node
myDiv.appendChild(mySubtitle); // Append the text
myDiv.style.backgroundColor = 'grey';
myDiv.style.border = 'solid';
myDiv.style.margin = '10px';
document.body.appendChild(myDiv);
}
</script>

How to pass parameter from a .properties file to an HTML page

Greetings Fellow Stackers,
I have a property file "demo.properties" which contains key - value pair:
Build=47
I also have an HTML (static) page 'demo.html'
<html>
<body>
The current build is: <!--here I want the value of build from the demo.properties -->
</body>
</html>
Is there a way to access the value the 'Build' value here? Any suggestions would be very much appreciated.Thanks!
You can use javascript to read your file and then split the text read from demo.properties file on "=" just to get the build version.
var readFile = function(event) {
var input = event.target;
var reader = new FileReader();
reader.onload = function() {
var result = reader.result;
var outputDiv = document.getElementById('output');
outputDiv.innerText = "The current build is: " + result.split("=")[1];
};
reader.readAsText(input.files[0]);
};
Working plnkr is: Plnkr