Where is the Data stored on Website - html

I am at this website -
http://www.zoominfo.com/s/#!search/company/1.64.eyJjb21wYW55TmFtZSI6xIB2YWx1xIw6ImEiLCJpc1VzZWTEjXRyxJN9fQ%3D%3D
If you see the company name - Agilent Technologies Inc.
Its neither there in page source, nor in any json format.
But it does show in the Dom of Chrome Developer tool.
I have looked and analysed almost every requests that it sent, but still couldn't find where this data is saved.
By where the data is saved - I am looking to find where I can scrape that data from?
If by using python-requests and BeautifulSoup
I do see an XMLHTTPREQUEST made, not sure what that means, or if that is the clue to my answer.
I am still learning python, and it would be a very useful information if someone helps me with this.
Thanks in advance.

After the HTML is loaded, js requests for the data through an XMLHTTPREQUEST which is loaded right after the request is received on your client. That's why you see the DOM element right there using element inspector.
You didn't mention what goal you want to achieve or what tool you are using. Please be specific on your question. If you do not have any idea about this kind of pattern, google out angularjs, see some example.

do see an XMLHTTPREQUEST made, not sure what that means, or if that is the clue to my answer.
It means that javascript embedded in the page is sending an extra HHTP request to the web server. It is likely that the "Agilent Technologies Inc." text is being returned in the server's response to that request, and the javascript in the page is then injecting the text into the DOM in the appropriate place.
Where is the Data stored on Website
That is a completely different question ...
(You have already noted that the data (e.g. the company name) gets injected into the page displayed by your browser.)
On the server side, the data could be stored in the web server (or its back-end systems) in a variety of ways. Or it might not be stored at all. There is no way of knowing ... without looking at the server-side code and configurations.

Related

Is there a way to scrape data from a website that is not available in the page's source?

What are the few things that I'll have to include in my code that will point me in the right direction?
For Example this website
Open your browser's debugger on Network tab and observe what are the requests when site is loading dynamic content (when you click). You'll see it's getting all the data using some API, for example: https://www.bestfightodds.com/api?f=ggd&b=3&m=16001&p=2
You can download all the data by changing parameters in this URL.
Usually that's enough but here it's more tricky as the data returned by the server is somehow encoded and not easily readable. You'd have to debug its javascript to find function which is used to decode this data before you can parse it.

"Reverse" JSON Status API

I've been wondering how to fetch the PlayStation server status. They display it on this page:
https://status.playstation.com/en-us/
But PlayStation is known to use APIs instead of PHP database fetches. After looking around in the source code of the site, I found that they have a separate file called /data.json.
https://status.playstation.com/en-us/data.json
The content of this file is the same as the index file (for some reason). They use stuff like {{endDateTitle}} and {{message}}, but I can't find where it's defined, if it's pulled using a separate file or just pulled from a database using PHP.
How can I "reverse" this site and see if there's a API I can use to display the status on my site?
Maybe I did not get the question right, but it seems pretty straightforward.
If using firefox, open Developer tools, Network. Reload the page.
You can clearly see the requested URL
https://status.playstation.com/data/statuses/region/SCEA.json
It seems that an empty list as a status means "No problems" (since there are no problems I cannot verify this assumption. That's all
The parenthesis {{}} are used by various HTML templating languages, like angular, so you'd have to go through the js code to understand where they get updated.

Drupal 7 (VERY) Custom Preview

I have a drupal site that is being used strictly as a CMS that produces JSON feeds using services and services_views, which are consumed by a separate site. What I would like to do (and I have a working proof of concept of this) is allow for a "live preview" on the real site, by intercepting the node form preview / submit, encoding the node as JSON, and loading a special page on the live site that consumes that JSON and displays the page accordingly.
The problem with this JSONized node is, it's different from the JSON being produced by my view (using services_views). My end goal is to produce JSON that is identical for both previewed and non-previewed objects, without having to maintain separate output methods (I could easily hand-customize the json but then when my view for the public api changes I have to make the same changes to the preview json. Trying to avoid this).
I'm looking for feedback on this approach. Is what I'm attempting even possible? The ideas I've been able to come up with so far are:
being able to (conditionally) drive my view with data from a non-databse source
sneakily inserting data into the view object during one of the stages of execution? Kludgy but I'm not above that :)
saving a "clone" node (or revision?) of the node being previewed and let the view use that to display the preview JSON?
Maybe this is the wrong approach altogether and there's something better? (Trying to intercept and format the services output in my module... maybe avoid services_views altogether?)
If anyone can offer some advice, insight or opinions on how to best proceed here, I'd be really grateful.
in a custom module, you could set up a page that grabs the json output from the view page.
$JSON = file_get_contents($url);
that way the preview stays bound to the view, even if the view changes.
First I think it's not an easy task what you are trying to achieve. So before all, good luck.
I think you could intercept the node submission data, then create a node programatically, then render that node, and then export the rendered node to JSON. Inmediately after you get the JSON, delete this node, because the programmatically created node is only for preview.
This task could be more CPU demanding but think that previewing content exactly as the content will look is difficult.
Your rss feeds that your site reads could be filtered with some parameter to avoid programmatically created nodes (prewiew nodes), despite these nodes will be available for a very short time.

Could we pass GET data to css?

I just came across a website pagesource and saw this in the header:
<link href="../css/style.css?V1" rel="stylesheet" type="text/css" />
Could we actually pass GET data to css? I tried searching but found no results apart from using PHP. Could anyone help make meaning of the ?V1 after the .css
I know this forum is for asking programming problems, however I decided to ask this since I have found no results in my searches
First of all, no you can't pass GET parameters to CSS. Sorry. That would have been great though.
As for the example url. It can either be a CSS page generated by any web server (doesn't have to be PHP). In this case the server can serve different pages or versions of the same page which might explain the meaning of V1, Version 1. The server can also dynamically generate the page with a server-side template. This is an example from the Jade documentaion:
http://cssdeck.com/labs/learning-the-jade-templating-engine-syntax
It can also just be used as cache buster, for versioning purposes. Whenever you enter a url the browser will try to fetch it only if it doesn't already have a cached copy which is specific to that URL. If you have made a change in your content (in this instance the css file) and you want the browser to use it and not the cached version you can change the url and trick the browser to think it's a new resource that is not cached, so it'll fetch the new content from the server. V1 can then have a symantic meaning to the developer serving as a note (ie I've changed this file once...twice..etc) but not actually do anything but break the cache. This question addresses cache busting.
There are different concepts.
At first, it only is a link - it has a name, it might have an extension, but this is just a convention for humans, and nothing more than a resource identifier for the server. Once the browser requests it, it becomes a server request for a resource. The server then decides how to handle this request. It might be a simple file it just has to return, it might be a server side script, which has to be executed by a server side scripting interpreter, or basically anything else you can imagine.
Again, do not trick yourself in thinking "this is a CSS file", just because it has a css extension, or is called style.
Whatever runs at the server, and actually answers the request, will return something. And this something then is given a meaning. It might be CSS, it might be HTML, it might be JavaScript, or an image or just a binary download. To help the browser to understand what it is, the server returns a Content-Type header.
If no content type is given, the browser has to guess what it is. Or the nice web author gave a hint on what to expect as response - in this case he gave the hint of text/css. Again, this is how the returned content should be interpreted by the client/browser, not how that content is supposed to created on the server side.
And about the ?V1? This could mean different things. Maybe the user can configure a style (theme) for the website and this method is used to dispatch different styles. Or it can be used for something called "cache busting" (look it up).
You can pass whatever you want; the server decides what to do with the data.
After all, PHP isn't your only option for creating a server. If i wrote a server in Node.js, set up a route for /css/style.css and made it return different things depending on what query was given, neither the server nor browser will bat an eyelid.

How does the add comment option work on this site?

I want an option like add comment in my forum to let the users post a quick reply.
Вован Путин -
I think you mean the 'Add Comment' option.
It's some AJAX stuff. That's how it works basically :
Some lines of
JavaScript calls a server-side
page, giving it some arguments, like
content of the comment, author, etc.
Server do some work, here it
certainly adds some rows in the
database, on the comments table..
Server give an answer to the client :
in PHP everything that is echoed is
returne to the client.
Client receive the answer, and
manipulate the page, usually via
the DOM, add the HTML that show
the comment, refresh some stuff,
etc.
See here for tutorial and overview of this technology.
Add comment is done using AJAX.
From wikipedia
Ajax, sometimes written as AJAX
(shorthand for asynchronous JavaScript
and XML), is a group of interrelated
web development techniques used on the
client-side to create interactive web
applications or rich Internet
applications. With Ajax, web
applications can retrieve data from
the server asynchronously in the
background without interfering with
the display and behavior of the
existing page. The use of Ajax has led
to an increase in interactive
animation on web pages1[2] and
better quality of Web services due to
the asynchronous mode. Data is usually
retrieved using the XMLHttpRequest
object. Despite the name, the use of
JavaScript and XML is not actually
required, nor do the requests need to
be asynchronous.[3]
It works via HTTP.