Short version: How do I know how to phrase additional data (like specific options on the page that display different html files but belong to the same URL) when getting an URL with urllib?
Long version:
I am having trouble to figure out how to handle properties of an url request that are not determined by the Link URL but by probably other information that your browser is usually sending.
To be more precise:
This page contains a table that i want to read with python, but the length of the table depends on the number of items per page you choose in the bottom left (i.e. the number of items in the code I get from urllib.request.urlopen is the standard of 50 or something, not the complete table). Clicking the buttons for e.g. 400 items per page doesn't change the URL so I expect that there is some information sent somewhere else. I understand that using urllib can send additional data besides the url, but it is unclear to me how to figure out how I should phrase the "give me the whole table" (or "give me 400 items per page") in that data.
Studying the .html file I get from saving the webpage in my browser didn't give me any hints and I miss the vocabulary to search for answers on the web (that is, googling "urllib request parameter" is too vague).
Hence I'd be completely satisfied if someone would point me to a duplicate of this question.
Thanks in advance :)
For everyone else finding this question I'll elaborate on the answer #deceze gave in the comments:
Open the webpage you want to read in your browser
Open your Browsers network panel (in chromium this is [Strg+Shift+I] or right-click > Inspect
Go to the "Network" Tab (at least in chromium)
Do whatever you want your program to do and the empty network panel list will fill with a lot of data
Find your request in the list of events (one of the very first ones is right, I would guess), click it and select "Headers"
Related
UPDATE: At time of asking this question, this was related to SignalR library and not plain WebSockets. I see correctly formatted messages now.
Is there any way to word-wrap messages in WS tab in Chrome Developer Tools or display JSON with formatting ? It's really annoying to scroll to right to see whole message.
Example with message selected and it's preview doesn't have any formatting or word wrapping applied:
Thank you in advance.
It's working fine here on Chrome/78.0.3904.97:
What I did:
Go to http://crawl.develz.org/play.htm
Open one of the listed servers
Start devtools
Go to the Application tab and add a cookie called "no-compression" with value "yeah no" to the relevant server. (Any truthy string should work, I just chose the least confusing one I could think of in about a minute.)
Otherwise, crawl's webtiles server can end up compressing messages even when browser supports RFC 7692's "permessage-deflate" extension, which ruins the demonstration.
Open the Network tab
Reload the page
Select the "socket" request, switch to the "Messages" tab, and pick a frame.
Start drilling down in the tree view in the bottom pane!
I'm creating a VCL Application with Delpi 10.3 and want to support some web functionality by having the user enter the ISBN of a book into a TEdit component and from there passing/sending this value to a search field on this website: https://isbnsearch.org after which the website looks up the ISBN and displays the Author of the book. I want to somehow access the information (i.e Author) presented by the search result and again use it in my application.
This is my GUI, for a better idea of what I want to accomplish:
What code can I use for this? Any other feasible suggestions or approaches are acceptable.
When performing a search on that website, it simply loads a page with a specific URL query string...
https://isbnsearch.org/search?s=suess
The above example is when I search for "suess", so you can easily concatenate a search URL.
You can use any HTTP component, such as TIdHTTP, to load this search page, then use an HTML parser to scrape the page and read what you need. Much, much easier than trying to read through the TWebBrowser.
In the end, you won't actually display the HTML (I mean you can if you want to), but the idea is to read the data and display it in your own format.
On that specific page, start by locating the ul element with id searchresults. Then, each li element contains individual results. Unfortunately, this website uses pagination, and only shows 10 results per page. To do this, call this page again with another parameter &p=2 for the 2nd page, &p=3 for the 3rd page, and so on.
On the other hand, that is the worst way to acquire such information. What you should be doing is using a proper API which gives you machine-friendly data. The service you are referencing doesn't appear to have an option, but here's an example of one which does:
https://openlibrary.org/dev/docs/api/books - this also appears to provide you MUCH more information than the one you're using.
I am writing a program for managing an inventory. It serves up html based on records from a postresql database, or writes to the database using html forms.
Different functions (adding records, searching, etc.) are accessible using <a></a> tags or form submits, which in turn call functions using http.HandleFunc(), functions then generate queries, parse results and render these to html templates.
The search function renders query results to an html table. To keep the search results page ideally usable and uncluttered I intent to provide only the most relevant information there. However, since there are many more details stored in the database, I need a way to access that information too. In order to do that I wanted to have each table row clickable, displaying the details of the selected record in a status area at the bottom or side of the page for instance.
I could try to follow the pattern that works for running the other functions, that is use <a></a> tags and http.HandleFunc() to render new content but this isn't exactly what I want for a couple of reasons.
First: There should be no need to navigate away from the search result page to view the additional details; there are not so many details that a single record's full data should not be able to be rendered on the same page as the search results.
Second: I want the whole row clickable, not merely the text within a table cell, which is what the <a></a> tags get me.
Using the id returned from the database in an attribute, as in <div id="search-result-row-id-{{.ID}}"></div> I am able to work with individual records but I have yet to find a way to then capture a click in Go.
Before I run off and write this in javascript, does anyone know of a way to do this strictly in Go? I am not particularly adverse to using the tried-and-true js methods but I am curious to see if it could be done without it.
does anyone know of a way to do this strictly in Go?
As others have indicated in the comments, no, Go cannot capture the event in the browser.
For that you will need to use some JavaScript to send to the server (where Go runs) the web request for more information.
You could also push all the required information to the browser when you first serve the page and hide/show it based on CSS/JavaScript event but again, that's just regular web development and nothing to do with Go.
We have a web application that creates a web page. In one section of the page, a graph is diplayed. The graph is created by calling graphing program with an "img src=..." tag in the HTML body. The graphing program takes a number of arguments about the height, width, legends, etc., and the data to be graphed. The only way we have found so far to pass the arguments to the graphing program is to use the GET method. This works, but in some cases the size of the query string passed to the grapher is approaching the 2058 (or whatever) character limit for URLs in Internet Explorer. I've included an example of the tag below. If the length is too long, the query string is truncated and either the program bombs or even worse, displays a graph that is not correct (depending on where the truncation occurs).
The POST method with an auto submit does not work for our purposes, because we want the image inserted on the page where the grapher is invoked. We don't want the graph displayed on a separate web page, which is what the POST method does with the URL in the "action=" attribute.
Does anyone know a way around this problem, or do we just have to stick with the GET method and inform users to stay away from Internet Explorer when they're using our application?
Thanks!
One solution is to have the page put data into the session, then have the img generation script pull from that session information. For example page stores $_SESSION['tempdata12345'] and creates an img src="myimage.php?data=tempdata12345". Then myimage.php pulls from the session information.
One solution is to have the web application that generates the entire page to pre-emptively
call the actual graphing program with all the necessary parameters.
Perhaps store the generated image in a /tmp folder.
Then have the web application create the web page and send it to the browser with a "img src=..." tag that, instead of referring to the graphing program, refers to the pre-generated image.
So I'm working on a web app, and I want to filter search results.
A nice restful implementation might look like this:
1. mysite.com/clothes/men/hats+scarfs
But lets say we want to ajax up the filtering, like the cool kids, and we want to retain deep linking, we might use the anchor tag and parse that with Javascript to show the correct listings:
2. mysite.com/clothes#/men/hats+scarfs
However, if someone clicks the first link with JS enabled, and then changes filters, we might get:
3. mysite.com/clothes/men/hats+scarfs#/women/shoes
Urk.
Similarly, if someone does not have JS enabled, and clicks link 2 - JS will not parse the options and the correct listings will not be shown.
Are Ajax deep links and non-Ajax links incompatible? It would seem so, as servers cannot parse the # part of a url, since it is not sent to the server.
There's a monkeywrench being thrown into this issue by Google: A proposal for making Ajax crawlable. Google is including recommendations for url structure there that may give you ideas for your own application.
Here's the wrapup:
In summary, starting with a stateful
URL such as
http://example.com/dictionary.html#AJAX
, it could be available to both
crawlers and users as
http://example.com/dictionary.html#!AJAX
which could be crawled as
http://example.com/dictionary.html?_escaped_fragment_=AJAX
which in turn would be shown to users
and accessed as
http://example.com/dictionary.html#!AJAX
View Google's Presentation here (note: google docs presentation)
In general I think it's useful to simply turn off JavaScript and CSS entirely and browse your website and web application and see what ends up getting exposed. Once you get a sense of what's visible, you will understand what most search engines see and that in turn will show you what is and is not getting spidered.
If you go to mysite.com/clothes/men/hats+scarfs with JavaScript enabled then your JavaScript should automatically rewrite that to mysite.com/clothes#men/hats+scarfs - when you click on a filter, they should be controlled by JavaScript meaning you'll only change the hashtag rather than the entire URL (as you're going to have return false anyway).
The problem you have is for non-JS users going to your JS enabled deeplinks as the server can't determine that stuff. Unfortunately, the only thing you can do is take them to mysite.com/clothes and make them start their journey again (as far as I'm aware). You'll need to try and ensure that when people link to the site, they use the hardcoded deeplink rather than the hashed deeplink
I don't recommend ever using the query string as you are sending data back to the server without direct relevance to the prior specified destination. That is a corruptible security hole as malicious code can be manually added to the query string to cause a XSS or buffer overflow attack at your webserver.
I believe REST was intended to work with absolute URIs without a query string, because then your specifying only a location of a resource and it is that location that is descriptive and semantically relevant in addition to the possibility of the resource being so equally relevant. Even if there is no resource at the specified path you have still instantiated a potentially unique and descriptive location that can be processed accordingly.
Users entering the site via deep links
Nonsensical links (like /clothes/men/hats#women/shoes) can be avoided if you construct your Ajax initialisation code in such a way that users who enter the site on filtered pages (e.g. /clothes/women/shoes) are taken to the /clothes page before any Ajax filtering happens. For example, you might do something like this (using jQuery):
$("a.filter")
.each(function() {
var href = $(this).attr("href").replace("/clothes/", "/clothes#");
$(this).attr("href", href);
})
.click(function() {
update_filter($(this).attr("href").split("#")[1]);
});
Users without JavaScript
As you said in the question, there's no way for the server to know about the URL fragment so filtering would not be applied for users without JavaScript enabled if they were given a link to /clothes#filter.
However, even without filtering, these links could be made more meaningful for non-JS users by using the filter strings as IDs in your /clothes page. To prevent this messing with the Ajax experience the IDs would need to be changed (or the elements removed) with JavaScript before the Ajax links were initialised.
How practical this is depends on how many categories you have and what your /clothes page contains.