More efficient way to create an array of objects using puppeteer - puppeteer

I'm trying to scrape a page that contains a bunch of text messages. The messages are arrange in a similar manner to the example below. I would like to use puppeeter to create an array of objects. Each object would contain the inner text of each message excluding one of the elements.
The array I would like to build should look like:
const messages = [{name: 'Greg', textMessage: 'Blah Blah Blah'}, {name: 'James', textMessage: 'Blah Blah Blah'},{name: 'Sam', textMessage: 'Blah Blah Blah'}]
Example: HTML markup
<div class="messages">
<div class="message">
<a class="name">Greg</a>
<p class="element-you-dont-want">Don't scrape this</p>
<p class="textMessage">Blah Blah Blah</p>
</div>
<div class="message">
<a class="name">James</a>
<p class="element-you-dont-want">Don't scrape this</p>
<p class="textMessage">Blah Blah Blah</p>
</div>
<div class="message">
<a class="name">Sam</a>
<p class="element-you-dont-want">Don't scrape this</p>
<p class="textMessage">Blah Blah Blah</p>
</div>
</div>
My current code creates two arrays, one for names the other for the textMessages, then I have to combine them. Is there a more efficient way to do this.
const names = await page.evaluate(
() => Array.from(document.querySelectorAll("div.messages a.name")).map(name => name.innerText)
);
const textMessages = await page.evaluate(
() => Array.from(document.querySelectorAll("div.messages p.textMessage")).map(textMessage => textMessage.innerText)
);
... From here I combine the two into an object of arrays.

There is an $$eval function in Page, which translates to Array.from(document.querySelectorAll(selector)) within the context and passes it as the first argument to pageFunction.
Usage:
const result = await page.$$eval('div.message', (msgs) => msgs.map((msg) => {
return {
name: msg.querySelector('a.name').innerText,
textMessage: msg.querySelector('a.textMessage').innerText
}})
);

You can scrape them together,
page.evaluate(() => {
const messages = [...document.querySelectorAll("div.message")]; // notice this is not .messages
return messages.map(message => {
return {
name: message.querySelector('a.name').innerText,
textMessage: message.querySelector('a.textMessage').innerText
}
}
}
});

Related

Not being able to test a HTML view including an async variable in Angular

I am writing a simple test for my game component. Just checking if all child components are getting loaded in right. They all seem to work except WordFormComponent. I am guessing this is because I only render it when a async variable has been set to True. This happens only when all variables have been set.
My game.component.html looks like this:
<div class="u-w-full lg:u-w-[70%] u-mx-auto">
<a routerLink="/gameList" class="lg:u-bg-white hover:u-bg-gray-100 u-text-[13px] u-font-medium u-py-1 u-px-3 u-border u-border-gray-300 u-rounded u-flex u-items-center" style="max-width: fit-content">
<mat-icon aria-label="Submit word" class="u-h-[50%] u-text-base">keyboard_backspace</mat-icon>
Games overview
</a>
<div class="lg:u-grid u-grid-cols-3 u-gap-y-[2rem]">
<div class="u-col-span-full u-p-6 u-w-full u-bg-white u-rounded u-mt-1 u-border u-border-gray-200">
<app-final-word-form (onGuessFinalWord)="submitFinalWord($event)"></app-final-word-form>
</div>
<div clas="u-col-span-1">
<app-dashboard (onNextRound)="nextRound($event)"></app-dashboard>
</div>
<div class="u-col-span-2 u-row-span-2 lg:u-ml-[2rem]">
<div *ngIf="dataLoaded | async; then thenBlock else elseBlock"></div>
<ng-template #thenBlock>
<!-- Does not show up in test -->
<app-word-form [game]="game" [word]="word" [gameWord]="gameWord" (onGuessWord)="submitWord($event)"></app-word-form>
</ng-template>
</div>
</div>
</div>
And my test looks like this:
beforeEach(async () => {
await TestBed.configureTestingModule({
declarations: [ GameComponent, FinalWordFormComponent, DashboardComponent, WordFormComponent ],
imports: [ ToastrModule.forRoot() ]
})
.compileComponents();
gameFixture = TestBed.createComponent(GameComponent);
gameComponent = gameFixture.componentInstance;
gameService = TestBed.inject(GameService);
spyOn(gameService, 'createGame').and.returnValue(of({ 'Game': game, 'Gameword': gameWord, 'Word': word, 'Finalword': finalWord }));
gameFixture.detectChanges();
});
fit('should display titles of all child components', waitForAsync(() => {
gameFixture.detectChanges();
expect(gameFixture.nativeElement.querySelector('a').textContent).toContain('Games overview'); // Works
expect(gameFixture.nativeElement.querySelector('p').textContent).toContain('How to win: guess the finalword correctly.'); // Works
expect(gameFixture.nativeElement.querySelector('#wordheader').textContent).toContain('Game word.'); // Failed: Cannot read properties of null (reading 'textContent')
}));
Whenever I log this.dataLoaded when running my test it does return true. So that should not be the problem. It seems like the view does not pick up on it. Anyone knows how to make this work?

Fetch content from MYSQL database not showing line breaks (MYSQL, SEQUALIZE, NODE, HANDLEBARS)

Using a database management tool (HeidiSQL) I can see that the content of an entry is storing returns (good):
MYSQL storing line breaks
However when I read the data on my front-end:
router.get('/story/:id', async (req, res) => {
try {
const getStory = await Story.findByPk(req.params.id, {
include: [
{
model: User,
attributes: ['username'],
},
],
});
const story = getStory.get({ plain: true });
res.render('story', {
story,
logged_in: req.session.logged_in,
});
} catch (err) {
res.status(500).json(err);
}
});
Rendered in Handlebars:
<div class="card">
<div class="card-content">
<p class="title">
{{story.title}}
</p>
<p class="content">
{{story.content}}
</p>
</div>
</div>
It eliminates the line-breaks:
no line-breaks
I'm wondering what I need to do to maintain the linebreaks.
I haven't tried modifying anything yet. I will try encapsulating the handlebars {{story.content}} in a string-literal to see if that does it.
So I found the answer - I needed to add a custom handlebars helper in the server.js
hbs.handlebars.registerHelper('breaklines', function(text) {
text = hbs.handlebars.Utils.escapeExpression(text);
text = text.replace(/(\r\n|\n|\r)/gm, '<br>');
return new hbs.handlebars.SafeString(text);
});
Then pass the content through the helper
<div class="card">
<div class="card-content">
<p class="title">
{{story.title}}
</p>
<p class="content">
{{breaklines story.content}}
</p>
</div>
</div>

Alpine JS fetch data - limit x-for iteration results and store data for later use

Alpine JS fetch data
How we should do to limit x-for iteration (like the json have 10 results but i want to show only five) and store data for later use with another script outside like a slider to add data ater each slide.
In short, retrieve the json response data to load the next slider image only when the slider arrow will be clicked or the slider will be swiped.
The data should be stored for use in javascript.
HTML:
<div class="main" x-data="init()">
<h4 class="font-xxlarge">Movie search in Alpine.js</h4>
<div class="searchArea">
<input
class="inputText"
type="text"
placeholder="Type to search a fact"
x-model="q"
#keyup.enter="search()"
/>
<button class="bg-default" #click="search()">Search</button>
<br><br>
</div>
<div>
<template x-for="result in results">
<div class="movieCard">
<div>
<img x-bind:src="result.Poster" />
</div>
<div>
<div class="movieDetailItem">
<span style="padding-right: 5px">Title:</span
><span><b x-text="result.Title">Man of Steel</b></span>
</div>
<div class="movieDetailItem">
<span style="padding-right: 5px">Year:</span
><span><b x-text="result.Year">2008</b></span>
</div>
</div>
</div>
</template>
JS:
function init() {
return {
results: [],
q: "",
search: function () {
fetch(
"https://www.omdbapi.com/?&apikey=e1a73560&s=" + this.q + "&type=movie"
)
.then((response) => response.json())
.then((response) => (this.results = response.Search))
.then(response => console.log(response))
.catch((err) => console.log(err));
// console.log(response);
},
};
}
Codepen example: https://codepen.io/onigetoc/pen/yLKXwQa
Alpine.js calls this feature getters, they return data based on other states. Let's say we have startIndex and endIndex variables, then we can do a simple filtering with filter() in the getter method, that returns the items between these two indexes.
<script src="https://unpkg.com/alpinejs#3.x.x/dist/cdn.min.js"></script>
<script>
function init() {
return {
results: ['#1', '#2', '#3', '#4', '#5'],
startIndex: 2,
endIndex: 4,
get filteredResults() {
return this.results.filter((val, index) => {
return index >= this.startIndex && index <= this.endIndex
})
}
}
}
</script>
<div class="main" x-data="init()">
Items:<br>
<template x-for="result in results">
<span x-text="`${result} `"></span>
</template>
<br><br>
Filtered items between index: <span x-text="`${startIndex} and ${endIndex}`"></span>:<br>
<template x-for="result in filteredResults">
<span x-text="`${result} `"></span>
</template>
</div>

how can you render an HTML array?

I have an array with <p> and <div> items and I'm trying to render them as HTML, but whenever I try to render it, the values just appear as plain code and not the normal paragraphs. So I have an array with let's say that this information:
<p>The first paragraph of this pinned topic will be visible as a welcome message to all new visitors on your homepage. It’s important!</p>
<p><strong>Edit this</strong> into a brief description of your community:</p>
And when it gets rendered in the page, it gets rendered as this instead of the paragraph that should be rendered it gets rendered as plain code:
this is how it renders
This is the code I've used for render:
useEffect(() => {
axios.get(`/api/get-post?id=${pid}`)
.then(res => setPostData(res.data))
.catch(err => console.log(err.response))
}, [])
console.log(postData?.post_stream?.posts[0]?.cooked)
return (
<div>
<div className={styles.containerPadding}>
<div className={styles.mainContainer}>
<div className={styles.blackLine}>
</div>
<div className={styles.titleContainer}>
<div>
<h1>{postData.title}</h1>
</div>
<div>
<h1></h1>
</div>
</div>
<div className={styles.postInformationContainer}>
<div>
</div>
<div>
<p>{postData?.post_stream?.posts[0]?.cooked}</p>
</div>
</div>
</div>
</div>
</div>
You can use dangerouslySetInnerHTML for converting your string data to HTML, but for safety (avoiding XSS attack), you should sanitize your HTML string before using dangerouslySetInnerHTML by DOMPurify
import DOMPurify from 'dompurify'
const sanitizedHtml = DOMPurify.sanitize(postData?.post_stream?.posts[0]?.cooked)
And then call it like this
dangerouslySetInnerHTML={{__html: sanitizedHtml}}
useEffect(() => {
axios.get(`/api/get-post?id=${pid}`)
.then(res => setPostData(res.data))
.catch(err => console.log(err.response))
}, [])
const sanitizedHtml = DOMPurify.sanitize(postData?.post_stream?.posts[0]?.cooked)
return (
<div>
<div className={styles.containerPadding}>
<div className={styles.mainContainer}>
<div className={styles.blackLine}>
</div>
<div className={styles.titleContainer}>
<div>
<h1>{postData.title}</h1>
</div>
<div>
<h1></h1>
</div>
</div>
<div className={styles.postInformationContainer}>
<div>
</div>
<div>
<p dangerouslySetInnerHTML={{__html: sanitizedHtml}}></p>
</div>
</div>
</div>
</div>
</div>
One side note, without HTML string sanitization, your HTML data can be interfered by some script injections which would harm your website or your system.
You can use html-react-parser package
This is how your code will look like
import parse from 'html-react-parser'
useEffect(() => {
axios.get(`/api/get-post?id=${pid}`)
.then(res => setPostData(res.data))
.catch(err => console.log(err.response))
}, [])
return (
<div>
<div className={styles.containerPadding}>
<div className={styles.mainContainer}>
<div className={styles.blackLine}>
</div>
<div className={styles.titleContainer}>
<div>
<h1>{postData.title}</h1>
</div>
<div>
<h1></h1>
</div>
</div>
<div className={styles.postInformationContainer}>
<div>
</div>
<div>
<p>{parse(postData?.post_stream?.posts[0]?.cooked)}</p>
</div>
</div>
</div>
</div>
</div>
You can use 'dangerouslySetInnerHTML' like everyone else.
And I need to register the tags to use.
import DOMPurify from 'dompurify';
<div
dangerouslySetInnerHTML={{
__html: DOMPurify.sanitize(
view_inner ?? '<div>Unable to get data.</div>',
{
FORCE_BODY: true,
ADD_TAGS: ['style'],
ADD_ATTR: ['target'],
ALLOWED_TAGS: [
'span',
'div',
'link',
'table',
'thead',
'tr',
'th',
'tbody',
'td',
],
},
),
}}
/>

Angular Protactor test getting h1 text from html

I am very new in end to end testing. In my app I have a login page, which I want to show the user when they logout from the app. Now there is a text h1 inside the div. But I am not getting the text from that div and that is why the expected result is different from the actual result.
Here is my login page html.
<div *ngIf="!isLoggedIn" class="login-controller">
<div layout="column" class="login-dialog">
<h1>Here is a heading</h1>
<h2>Second Heading</h2>
<div class="border">
...
</div>
</div>
</div>
Here is my app.po.ts
async getLogInPage(){
return await element(by.css('h1')).getText();
}
async logoutOfApplication() {
var userMenu = element(by.css(".prof-dropbtn > span"));
browser.wait(ExpectedConditions.presenceOf(userMenu), 10000);
await userMenu.click();
var logoutButton = element(by.id("logout"));
await logoutButton.click();
}
Now app.e2e-spec.ts
it('Test for logout', () => {
page.logoutOfApplication();
expect(page.getLogInPage()).toEqual('Here is a heading');
page.loginToApplication("email.com", "demo");
});
If you want get h1 value you have to write like this
it('Test for logout', () => {
page.logoutOfApplication();
expect(element.all(by.css('.login-dialog h1')).getText()).toEqual('Here is a heading');
page.loginToApplication("eamil.com", "demo");
});