Ads filtering server side [closed] - html

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm working on a web application where I display HTML from other websites. Before displaying the final version I'd like to get rid of the ads.
Any ideas, suggestions on how to accomplish this? it doesn't need to be a super efficient filtering tool, I was thinking in porting some of the filters defined by adblockplus to Ruby and return the parsed doc with some help of Nokogiri.
Let's say I use the super wildcard filter ad. That's not an official adblock but for simplicity I'll use it here. The idea then would be to remove all the elements for which any of the attributes match the filter, e.g: src="http://ad.foo.com?my-ad.gif" href="http://ad.foo.com" class="annoying-ad" etc.
The Nokogiri command for this filter would be:
doc.xpath("//*[#*[contains(., 'ad')]]").each { |element| element.remove }
I applied the filter for this page:
And the result was:
Not that bad, note that the global wildcard filter also got rid of valid elements like headers because they have attributes like id="masthead".
So I think this approach is ok for my case, now the question would be what filters to use? they have a huge list of filters and I don't feel like iterating over all of them. I'm thinking in grabbing the top 10-20 and parse the docs based on that, is there a list out there with the most popular ones? If so, I haven't been able to find it.

Related

Word count regex in HTML [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
This is the same question as this. But since I'm not using javascript, 'innerText' is not a solution for me and I was wondering if it was possible for regex to combine /(<.*?>)/g and /\S+/g to get the actual word count without having to make a bunch of string operations.
The language I'm using here is Dart, if a solution I haven't found already exist within it that would work too as an answer. Thanks !
Edit : Someone edited the tags ? This question is not Dart-specific and is about regex, so I'm putting them back as they were.
Edit 2 : The question was closed because it is not "focused", but I do not know how I can make "if it was possible for regex to combine /(<.*?>)/g and /\S+/g" any more focused.
Assuming all text is enclosed in HTML elements, you can use (?<=>|\s)[^<\s>='"]+?(?=<|\s).
With the string <p>One</p><p>Two Three, Four. Five</p><p>Six</p> there are six matches.
Note:
It uses a lookbehind group, which might not be supported in all browsers.
Punctuation at the end of words are grouped with them, e.g. "three," so keep that in mind if you're planning to use the actual words and not just count them.

How to write for auto update [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
(I'm new to all this... instead of closing my question, it would help if I could have an idea of what needs to be done) (please excuse if I'm not asking the proper question for what I'm trying to achieve, new to code)
New to coding: web development Learning HTML, CSS and then JS.
I see websites where data is automatically updated. How is this achieved?
I would like to create a website that will display economic data but not have to manually input the data. How would I incorporate code to automatically do this for me?
Would I use a websites API?
Example of the type of information I would like to display on my own website: https://www.marketwatch.com/economy-politics/calendar
Automatically updated data on a website can be achieved by using an API call. You make a request to an API that has the data and then render the data on your HTML page.
The process requires a good understanding of modern JS concepts.
Search for an API that offers the service you need and read the documentation to understand how to use it.
Let me know if there' anything else.

Are HTML tables still the way to go? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
AFAIK people create their own table-like-components and the use of the good old HTML tables is kind of "outdated". Now I'm at the point where I would need a table for my vuejs application. I'm using bulma and the doc states this:
Table
The inevitable HTML table, with special case cells
The "support" badges below that title state that there isn't much support for it (only variables) and the text itself sounds like "well... since so many not-up-to-date-people want it, here, take it".
Should I rather go the "unordered list" way or something else? Like creating a component to represent a row and a component holding the table together? I'm not asking for vue specifically, but for a rather "modern approach" and how to do something like that properly.
The table element is still the correct way to provide tabular data in a semantically correct way in HTML. So if you use it semantically correct it is fine and not outdated per se.
However having that in mind, it might be a valid decision for you to go with new approaches like CSS Grid if that helps you to faster and more user friendly accomplish the creation of new Elements on your website, as the end-user should always benefit from your decisions.

Create nicely designed resumé from info in database [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I designed a nice resumé template in Sketch and now I want to make it available to use it for the users of a site.
The data will be stored in MySql database and the design should be modular depending on the information.
What is the best way of doing it? I though of replicating the design in CSS3 and then converting it with some of this scripts fpdf or mpdf but I don't thing that it's the easiest way of doing it.
What do you think?
Thanks!
An example of the resumé is the following:
If it's a set template/pattern I'd approach it like each segment as an object with a varying number of attributes based on data it returns from the mysql call.
IE when you pull the data from your table and start looping through a person's skills you can add that to the SKILLS object. Same for the Experience, etc etc.
Since this would essentially be like Parent Child nodes you could also do it with XML but the approach is really up to you.
You could then easily output the constructed resume as HTML (so your users on the site can see it live and may make changes, and then use a converter to convert to PDF (alots of languages have libraries to do just that). Most modern browsers can also already convert HTML pages to PDF too nowadays so you could also give them instructions on how to do that.
Just my two cents,
Hope it helps!

Extracting an article from the BBC website [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to extract an article say this:
http://www.bbc.com/news/magazine-32156264
and only display the article content, so no BBC heading or footer. How would I do this? I'm thinking put it in an iFrame.
As you ask specifically about the BBC:
You are allowed to display the RSS feed of BBC headlines - you could use the WordPress RSS Links widget to do this.
You certainly aren't allowed to just copy someone else's story (or start removing branding etc.) – which is quite reasonable.
Note: The BBC doesn't have an API for news, but some do - e.g. The Guardian's Open Platform - again there will usually be strict restrictions on how you can display things, required branding, what you are/aren't allowed to change.
Correct approach: choose one or two relevant quotes you find interesting, highlight those, and make sure you have prominent link back to the original article.
First of all, there will be legal issues. Second, your page rank will be destroyed because to duplicate content.
If you already considered the above, you should do a PHP curl request, then parse it using a regular expression to get the target data and finally post the retrieved data.
Or, you can use APIs of other news providers like williamt mentioned.