Right way to parse this HTML page? - html

I'm trying to parse some parts of an HTML page but I have problems with my regular expression.
My code looks like this:
... Download page using wget and some other stuff ...
$PAGE_REGEXP = "\<div class="col bg_dark clear">";
#Array HTMLLines
#HTMLLines = split(/\n/, $Page);
foreach $ThisOne (#HTMLLines) {
if ( ($Team) = ($ThisOne =~ /$PAGE_REGEXP/) ) {
$T{TranslateTeams($Team)}++;
$LastTeam=TranslateTeams($Team);
};
};
This is the HTML page:
<div class="col bg_dark clear">
<div class="col_1 left">15:30</div>
<div class="col_3_archive left">Team A - Team B</div>
<div class="col_2_archive left">
1:4 (0:2)
</div>
<div class="col_5 left ">2.4 </div>
<div class="col_5 left ">3.6 </div>
<div class="col_5 left bold">2.9 </div>
<div class="col_8 left">
</div>
<div class="col clear">
<div class="col_1 left">15:30</div>
<div class="col_3_archive left">Team C - Team D</div>
<div class="col_2_archive left">
2:3 (1:1)
</div>
<div class="col_5 left ">2.7 </div>
<div class="col_5 left ">3.7 </div>
<div class="col_5 left bold">2.5 </div>
<div class="col_8 left">
</div>
The informations I need to parse are the team names, the end and halftime result and the numbers in e.g., col_5_left: 2.4, 3.6 and 2.9(for the game Team A - Team B).
If I start my script, Perl gives me following error:
Bareword found where operator expected at parser.pl line 11, near ""\
I'm not familiar with all existing modules in Perl, maybe I'm trying to do something which is quite easily to achieve using
the correct module. Can anybody please provide me some hints/tips how to parse this HTML page?
Thx

The line with regexp should probably look something like this:
$PAGE_REGEXP = '<div class="col bg_dark clear">';

Related

CK editor data displaying with html tags in php

CK editor data displaying with html tags in php
please help me...
<div class="row">
<div class="col-lg-12">
<div class="card">
<div class="card-header text-uppercase">Product Description</div>
<?php echo $result['description'];?>
</div>
</div>
</div>
it's displaying like this
but should displaying like this
Please use this "html_entity_decode"
Example:
$str = '<a href="https://www.w3schools.com">w3schools.com</a>'; echo html_entity_decode($str);

ng-if and col col-50 don't seem to go together?

I am trying to generate a table using angular code in HTML. Now I seem to have how I want the table to look which is how I have set up this code:
<div class="row">
<div class="col col-25">Date</div>
<div class="col col-50">Name of Customer</div>
<div class="col col-25">Hours Used</div>
</div>
What I have done is generated a list in angular code (which is done correctly) and I have ordered them in a certain way with Date being the first entry and Hours Used being the last. To generate the table in HTML, I use this code:
<div class="row">
<div ng-repeat="B in ConvertToTableRow(T) track by $index">
<div class="col col-25" ng-if="$index == 0">
{{ B }} &nbsp
</div>
<div class="col col-50" ng-if="$index == 1">
{{ B }} &nbsp
</div>
<div class="col col-25" ng-if="$index == 2">
{{ B }} &nbsp
</div>
</div>
</div>
Everything in the first row works well and styled to what I need it to be but the second entry (when the index is 1) will never style its table to a col-50 but instead will stick with a col-25. Now I have tested to see the if the values are equal to 1 or 2 but it seems that is working correctly and the infomation is also coming out correctly but its just not styling correctly.
Is it possible to style a table this way using Angular and HTML?
Would it not be better to use ng-class? I think there's an even better way, but this is just off the top of my head. I hope this helps! :) Documentation for ng-class
<div class="row">
<div ng-repeat="B in ConvertToTableRow(T) track by $index">
<div class="col" ng-class="{'col-25': $index % 3 !== 1, 'col-50': $index % 3 === 1}">
{{ B }} &nbsp
</div>
</div>
</div>

Better way in bootstrap to display columns instead of using php to create rows after every 3

Sorry for the question title. I really didn't know how to describe it best in a short sentence.
Basically I want to loop through each of my images and display them. However when doing the front end I did this:
<div class="row">
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
</div>
<div class="row">
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
</div>
Because there can only be 3 columns in a row for my design this worked fine. However now if i want to do this in php i have to kind of check the $count of the loop. If its a multiple of 3 display
<div class="row">
and if its a multiple of 3 + 1 then show
</div>
to close the row.
IS there any way I can make the mark up look the same as the example above but using this html below:
<div class="row">
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
<div class="col-sm-4">
<img .../>
</div>
</div>
Its just making my mark up look very messy. Is there a better way in bootstrap to do this without using php at all?
Thanks
You'll want to include conditions in your PHP loop so that PHP knows when to add in the row.
Below, I've laid out an untested set of PHP code that will hopefully explain where you're going wrong.
<?php
$images = array(...);
$images_per_row = 3;
// Here is where we determine how the images
// should be divided into the rows.
$total_images = count($images);
$total_rows = ceil($total_images / $images_per_row);
// Loop through each `row`
for ($row=0; $row<$total_rows; $row++)
{
// Open the `row`
echo '<div class="row">';
// We have an `offset` to avoid displaying the same image
// multiple times in the loop.
$offset = $row * $images_per_row;
// Loop through each `col-sm-4` and `img`
for ($i=0; $i<$images_per_row; $i++)
{
echo '<div class="col-sm-4">';
echo "<img src='{$images[$offset + $i]}' />";
echo '</div>';
}
// Closing the `row`
echo '</div>';
}
There are countless ways to do this. You can even do it all within a single loop if you use proper conditions within the loop.
The point is, you need a way of telling PHP "Hey, we need to divide all of this into thirds or we're never going to get anywhere with this."
Side note, PHP + HTML is a very ugly marriage. Please avoid it at all costs (unless you only have 10 minutes and you're trying to help someone on Stack Overflow).

How to get Nokogiri to scrape text from span in Ruby

I'm trying to scrape information from a website using Nokogiri and Curb, but I can't seem to find the right name/ to find where to scrape. I'm trying to scrape the API key, which is at the bottom of the HTML code as "xxxxxxx".
The HTML code is:
<body class="html not-front logged-in no-sidebars page-app page-app- page-app-8383900 page-app-keys i18n-en" data-twttr-rendered="true">
<div id="skip-link"></div>
<div id="page-wrapper">
<!--
Code for the global nav
-->
<nav id="globalnav" class="without-subnav"></nav>
<nav id="subnav"></nav>
<section id="hero" class="hero-short"></section>
<section id="gaz-content">
<div class="container">
::before
<div id="messages"></div>
<div id="gaz-content-wrap-outer" class="row">
::before
<div id="gaz-content-wrap-inner" class="span12">
<div class="row">
::before
<div class="article-wrap span12">
<article id="gaz-content-body" class="content">
<header></header>
<div class="header-action"></div>
<div class="tabs"></div>
lass="d-block d-block-system g-main">
<div class="app-details">
<h2>
Application Settings
</h2>
<div class="description"></div>
<div class="app-settings">
<div class="row">
::before
<span class="heading">
Consumer Key (API Key)
</span>
<span>
xxxxxxxxx
</span>
All I can seem to get is the "content" text.
My code looks like:
consumer = html.at("#gaz-content-body")['class']
puts consumer
I'm not sure what to type to select the class and/or span then the input text. All I can get is Nokogiri to put "content".
In this case we need to find the second span after the span class="heading", and inside the div class="app-settings" - I'm being a bit general but not too much. I'm using search instead of at to retrieve the two spans and get the second one:
# Gets the 2 span elements under <div class='app-settings'>.
res = html.search('#gaz-content-body .app-settings span')
# Use .text to get the contents of the 2nd element.
res[1].text.strip
# => "xxxxxxxx"
But you can also use at to target the same:
res = html.at("#gaz-content-body .app-settings span:nth-child(2)")
res.text.strip
# => "xxxxxxxx"

Parse html page with mechanize to receive the appropriate array

I have the following html code on the page received by mechanize (agent.get):
<div class="b-resumehistorylist-views">
<!-- first date start-->
<div class="b-resumehistory-date">date1</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time1</div>
company1</div>
<!-- second date start -->
<div class="b-resumehistory-date">date2</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time2</div>
company2
</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time3</div>
company3</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time4</div>
company4</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time5</div>
company5</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time6</div>
company6</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time7</div>
company7</div>
...
</div>
I need to search inside the div with class="b-resumehistorylist-views" each date.
Then find all divs between two div-dates and link each item to this particular date.
The problem is that each item (div class = b-resumehistorylist-views) is not inside div=b-resumehistorylist-views.
At final stage I need to receive the following array:
array = [ [date1, time1, company1, companylink1], [date2, time2, company2, companylink2], [date2, time3, company3, companylink3],[date2, time4, company4, companylink4] ]
I know that I must use method search with text() option, but I cannot find the solution.
My code right now can parse all companies information between div class=b-resumehistory-company, but I need to find right date.
It would be the same thing as before, just some of the class attributes have been changed:
doc = agent.get(someurl).parser
doc.css('.b-resumehistory-company').map{|x| [x.at('./preceding-sibling::div[#class="b-resumehistory-date"][1]').text , x.at('.b-resumehistory-time').text, x.at('a').text, x.at('a')[:href]]}