I've started keeping a blog on my site, without the assistance of WordPress, Blogger, or any other external services.
If I want to keep no more than say .. five posts on a page .. what's the simplest way to go about indexing them? I mean, after fifteen posts, I'd have three pages, and if I wanted to put a link on each page that linked to next oldest collection of posts, I would need to constantly be updating the links as page 2 turns into page 3, then turns into page 4, etc., etc., etc.
For instance, this popular blog brainpickings has pages nicely indexed:
https://www.brainpickings.org/page/2/ , https://www.brainpickings.org/page/3/
.. and so on for 1,465 pages.
How might I painlessly index my own pages in the same manner?
Related
i try to ask for help here. I have a wordpress site where I use Elementor for pages.
Assuming to create an area of ββthe site where there are reviews divided into pages with menus for navigating them.
The problem is that there are thousands of reviews, how can I avoid having to enter them one by one? is there an automatic system that inserts them all into pages and also allows me to add them in the future?
And if it exists, is it possible to give it the style I want?
I've already done a similar thing by creating individual pages and inserting them the way I want. But when the reviews start to grow it becomes difficult and you have to create even hundreds of pages.
I have the reviews in a csv with columns "review" and "name".
Thanks!
I'm working on a site to help students with ACT prep, and I want to have a page where I can post explanations to questions that people submit. I want to be able to put a few tags on each post so that site visitors can click on or search whatever's relevant for them in the archives ("semicolons", "geometry", etc.) and all the relevant posts will come up, blog style. I'm very new to this, though, and I don't know how to do it or even what to search - when I search for tags I keep getting SEO recommendations, and that doesn't seem like the right thing.
Here's a solution (but it's not great)
It might be the only way to make what you want happen with a static HTML site.
You could, by hand, create pages that you fill with links to all of the posts that fit a certain category or "tag". For example, you could make a page that has links to all of your posts concerning geometry. Lets call this your archive page for geometry.
Then, when you include tags in a post, you would make each tag link to it's corresponding archive page.
Why do I say its not the best solution?
Virtually every blog that you see has a "back end" with a database that stores posts. When someone comes to your website and looks at a post, that posts data is inserted into a template and displayed to the user. You do not have to re-write the entire web page every time. Thing like the header, sidebar, footer, main page background etc are all in a template.
Having a database also lets you search the database and return relevant results. And a blog with a back end will typically let you write rules (or have them already written) that say, when you add a "tag" to a post, a link to that post should be automatically added to an archive page etc.
As far as I can tell you don't have database, so you'll just be linking static HTML pages. That means that every time you make a new post, you'll have to add a link to all of it's relevant archive pages by hand. Maybe you don't mind that now, but eventually it will be a nightmare to maintain.
I would strongly encourage you to look into a blogging platform like Wordpress to make your site. It will be more complicated to learn at first, but technology that's meant to do what you want it to do will ultimately be easier to use and maintain than technology that's simply meant to mark up a page.
I'm trying to build a crawler and scraper in Apache Nutch to find all the pages containing a section talking about a particular word-topic (e.g. "election","elections", "vote", etc).
Once I have crawled, Nutch cleans the HTML from stop words, and tags, but it doesn't take out menu voices (that are in every pages of the website).
So it could happen that when you look for all the pages talking about elections, you could retrieve a whole website because it has the word "elections" in its menu and therefore in every page.
I was wondering if techniques that analyze multiple pages of the website to understand what is the main template of a page, exist. Useful papers and/or implementations/libraries.
I was thinking about creating some kind of hadoop Job that analyzed similarities between multiple pages to extract a template. But the same website could have multiple templates, so it is hard to think of an effective way to do that.
E.G.
WEBPage 1:
MENU HOME VOTE ELECTION NEWS
meaningful text... elections ....
WebPage 2:
MENU HOME VOTE ELECTION NEWS
meaningful text... talking about swimming pools ....
You didn't mention which branch of Nutch (1.x/2.x) are you using, but at the moment I can think of a couple of approaches:
Take a look at NUTCH-585 which will be helpful if you are not crawling many different sites and if you can specify which nodes of your HTML content you want to exclude from the indexed content.
If you're working with different sites and the previous approach is not feasible take a look at NUTCH-961 which uses the boilerplate feature inside Apache Tika to guess what texts matter from your HTML content. This library uses some algorithms and provides several extractors, you could try it and see what works for you. In my experience I've had some issues with news sites that had a lot of comments and some of the comments ended up being indexed alone with the main article content, but it was a minor issue after all. In any case this approach could work very well for a lot of cases.
Also you can take a peek at NUTCH-1870 which let you specify XPath expressions to extract certain specific parts of the webpage as separated fields, using this with the right boost parameters in Solr could improve your precision.
We are doing real good in Google search results and have a high pagerank with our HTML webpage (several pages like 30).
Now we are switching to a Wordpress website on the same domain, and are keeping most of the HTML-pages. But we also are building another Wordpress page on a NEW Domain, here we will showcase the hardware products (we are now showing at our existing domain with HTML).
How could we safely switch one half of HTML-page to Wordpress )on same domain) and keep pagerank, and move the other HTML-page to a Wordpress page on a NEW Domain and keep the pagerank?
Thanks in advance!
Try this tutorial. It's not quite the same, but it's going to talk you through the important parts of a transfer to minimize loss of SEO.
Basically make sure you keep all the current links to your pages working after the transfer.
Import all posts, comments & pages.
Maintaining permalinks for posts & pages (1-on-1 mapping between Blogger.com and WordPress pages).
Redirecting permalinks for labels & search archives.
Retaining all feed subscribers.
This is a rephrasing of my original question https://stackoverflow.com/questions/14516983/google-sites-trying-to-script-announcements-page-on-steroids:
I've been looking into ways to make subpages of a parent page appear in a grid like "articles" on the home page of my Google Site β like on a Joomla home page and almost like a standard "Announcements" template, except:
The articles should appear in a configurable order, not chronologically (or alphabetically).
The first two articles should be displayed full-width and the ones beneath in two columns.
All articles will contain one or more images, and at least the first one should be displayed.
The timestamp and author of each subpage/article shouldn't be displayed.
At the moment I don't care if everything except the ordering is hardcoded, but ideally there should be a place to input prefs like the number of articles displayed, image size, snippet length, css styling etc.
My progress so far:
I tried using an iframe with an outside-hosted Javascript (using google.feeds.Feed) that pulls the RSS feed from the "Announcements" template, but I can't configure the order of the articles. One possibility would be to have a number at the beginning of every subpage title and parse it, but it's going to mess up with time and the number would also be visible on the standalone article page. Or could the number be hidden with Javascript?
I tried making a spreadsheet with a row for each article with columns "OrderId", "Title", "Content", "Image" and process and format the data with a Google App Script (using createHTML and createImage), but a) there doesn't seem to be a way to get a spreadsheet image to show up inside the webapp and b) these articles are not "real" pages that can be linked to easily on the menus.
This feature would be super-useful for lots of sites, and to me it just seems odd that it isn't a standard gadget (edit: or template). Ideas, anyone?
I don't know if this is helpful, but I wanted something similar and used the RSS XML announcements feed within a Google Gadget embedded into my sites page
Example gadget / site:
http://hosting.gmodules.com/ig/gadgets/file/105840169337292240573/CBC_news_v3_1.xml
http://www.cambridgebridgeclub.org
It is badly written, messy and I'm sure someone could do better than me, but it seems to work fairly reliably. The xml seems to have all the necessary data to be able to chop up articles, and I seem to remember it has image urls as well, so can play with them (although not implemented in my gadget).
Apologies if I am missing the point. I agree with your feature request - it would be great not to have to get so low-level to implement stuff like this in sites....