Search for tool: Connection between two pages - mediawiki

Search for a tool, that will tell me if two pages are in any way linked to another.
For example: Is Page A connected to Page C or are they in different "systems"?
Page A consists [[Page B]]
Page B consists [[Page C]]
The tool should tell me, that Page A is connected to Page C in the following way:
Page A -> Page B -> Page C

Mediawiki's Standard API provides this service.
API Links :
Returns all links from the given pages.

Related

WikiMedia API - How to determine which portal(s) a Page belongs to?

I wish to determine whether a given Wikipedia page belongs to a certain Wikipedia Portal using the MediaWiki API. So far, I have been experimenting with the page properties of the API but I cannot seem to find a way to derive what Portal a given page belongs to.
As an example, on the Wikipedia page for Cake in the very bottom of the page, I can press Show on the section Cakes, and a bunch of links to different cake pages show up. There I can also see that all of these belong to the Food portal. It is that information that I would wish to extract from a given page using the MediaWiki API.
As far as I know, there is actually no formal definition of "belongings to a portal" in Wikipedia. Opposed to categories which are part of the MediaWiki software, portals are custom pages for Wikipedia that are aimed to make it easier to explore a topic.
Instead of a formal definition though, you can use an heuristic and determine the connection between the page and some portal based on one of them linking to the other. There are API endpoints for both:
(Note: 100 is the id of the 'Portal` namespace)
Which portal pages are linked from the page "Cake" or "Pizza"
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&titles=Cake%7CPizza&plnamespace=100
Which portal pages link to the page "Cake" or "Pizza"
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=linkshere&titles=Cake%7CPizza&lhnamespace=100
(though as you can see, many unrelated portals link to "Cake" and none link to "Pizza")
A combined query for both directions
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links%7Clinkshere&titles=Cake%7CPizza&plnamespace=100&lhnamespace=100
So trough some more investigation i found the answer:
I ended up using the Revisions property in the API. This allows me to to give a series of page titles that I want to investigate, and have the HTML of each page returned to me in json format. Then I can just search for lines containing Portal and figure out what portal (if any) the page belongs to.
If anyone are in a similar situation, here is an example query to the API:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Bread|Bubble_tea|Pizza&format=json&redirects&rvprop=content&rvslots=main

view html source while using CSR with React

I'm studying ways to develop a SEO-friendly React website with CSR.
I have read many articles pointing out that to provide a SEO-friendly website, one should go with the SSR approach.
To my knowledge, when using browser's view source feature in CSR, the html content is a bunch of javascript bundle files and the actual html would not be present since view source only shows what's rendered from server side. while in SSR html is rendered and passed to the browser and the displayed html would be present in source view of the page.
However https://divar.ir (a well known retailer site) seems to be using CSR (upon clicking any link, the data is fetched from an api endpoint in json format via an ajax call and then it looks like the page is rendered in client side).
The thing is, when I view the source of the page even after clicking any link, I can see the actual html that is being displayed.
So to sum it up, How can I use CSR in React, and when I view the source of a page, I actually see the html that is being displayed to the user?
Server side rendered react applications usually only pre-render the initial page load. Subsequent navigation may still be entirely handled and rendered by the client.
By using the view source tool it will open the code in a new tab (at least in chrome) that leads to a fresh load of the current route from the server. If the application is server side rendered you will receive a pre-rendered version of that route and therefore see the html for that route.
By providing a sitemap of your website a bot can discover all SEO relevant routes by visiting the urls provided in the sitemap. Each of those requests are independent requests to the server and will be pre-rendered in contrast to how a real user would navigate the page by clicking the links.

How can I use google to find all unknown webpages point to one specific known web page?

It's easy to find all children pages of one webpage. But it is not trivial to get all parent pages of one webpage, how can I do that by using Google?
You can't and Google cannot help you as it doesn't index all of the web.
At best it follows links on other pages, or is initiated by someone wanting to have something indexed explicitly.
Create a server. Put an HTML page on it that no other page on your server has a link to. Name the page with some non-guessable UUID in the name.
Google will not find this unless they start to randomly change parts of URLs to test for existing pages (a lengthy process).
Within that page you can have links pointing to other pages. It is a parent page for those pages, is a web page, and will not be found via Google.

How to buffer results from a webpage to another?

I am using flask for my website.
I want to submit data using a form from my website to another website and then format the results page (an html page) before displaying it on the browser on my website.
After the form is submitted, the browser is directed to results page of that (another) website. Can I redirect again it to my site?
How to process the results page(call its html content as a string in code) ?
Sounds the process like this: user input the data -> your website get the data and post it to another website -> the website return a worked data -> your website receive this data, parse it and display it. I don't know you are familiar with API or not. If you are familiar with it you should know the process is just like using API. You just need to put all the process in a view and add_url_rule for it.

HTML Page Text search and navigation without pre-embedded tags

I'm looking for ideas/solutions for the following scenario:
I'm a website developer that is given 150'ish HTML pages from a 3rd party who update and re-issue the html pages from time to time.
I'm looking for a way to implement search functionality for these pages and then navigate to that location within the page.
I don't want to add navigation tags to the html pages as these would be lost when the 3rd party re-issue the html pages.
Ideally, I would like to have a search string, search the html files, then return a list of results (kinda like Google results) then when the user clicks on the link for a particular result, the page opens and navigates to the result location within the page.
I'm familiar with c#/javascript/jquery
Any ideas/suggestions to achieve this would be welcome...or confirmation that this cant be done :)
Don't Google, Bing, and other search engines provide APIs that let you use them to index the site then use their search capabilities to show results on only your site?