Im wondering if there is anyway to get scores from a gamecast that uses javascript or flash to update the content dynamically. Here's an example: http://www.cstv.com/gametracker/launch/gt_wlacros.html?sport=wlacros&camefrom=&startschool=md&event=952412&school=cs&
How could I pull the scores from the teams out of this page?
Really what you need to do is sniff the requests being made under the hood. You can get any sort of HTTP sniffer. I use Live HTTP Headers extension for firefox. You start capturing, then click the link above. You'll see all sorts of requests. The underlying data seems to be coming from http://origin.livestats.www.cstv.com. I got this request that has a lot of useful player stats from the game:
http://origin.livestats.www.cstv.com/livestats/data/w-lacros/952412/player_stats.xml?344026907808
http://origin.livestats.www.cstv.com/livestats/data/w-lacros/952412/summary.xml?644493847800
(note the second url throws an XML parse error, but you could still try to parse it manually)
Related
I'm trying to get data from this site: [1] https://www.eurobet.it/it/scommesse/#!/calcio/?temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
I found this link where I can get the data in JSON format: [2] https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
But there is a problem:
The JSON link Doesn't work every time in fact sometimes I get a 404 error.
I noticed that if I open the first link [1] before opening the second [2] it works perfectly.
This error is also more frequent when I try to scrape other data on the same site: [3] https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio/piu-giocate/u-o-goal?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
In this link [3] I try to get all "u-o-goal" odds but this link works only if (before starting my program to scrape data) in the main link [1] I press the "U/O GOAL" button -> https://i.stack.imgur.com/Nei5u.png
In my code, I'm using Java and htmlunit to scrape the data.
My question is: how this webpage works, why couldn't I open directly the links [2]/[3], I know that there is a sort of request and approval system behind but I can't see where.
You cannot directly open these URLs since the website (and many like it) will use cookies and bot-prevention techniques/session tracking so they can gather data about usage of their website. eg. they set a "Referer".
I'm not going to code a solution for you but I can at least help you understand what you need to do to get to where you want...
I've attempted to summarise how I'd typically unpick a request like this to recreate it, but in its essence, you need to understand the sequence of HTTP requests being made (this is how the web works - HTTP requests).
First you typically start with no session cookies and you access the site directly (no referer).
Once you access a website, typically the server responds with a session cookie for you to communicate back to the server a unique session ID so it has some sort of record of your browser having already been in contact.
Your browser may make more requests (asynchronously) and in doing so typically sends the cookies and the referring URL (usually the base Url will work... just don't use something that starts with something other than "https://www.eurobet.it"
anything else you're going to need to figure it out. Lots of headers are optional. Lots of query params have defaults.
https://stackoverflow.com/a/64671815/7619034 - here's an answer I've given before that answers this type of question which comes up often enough.
so to explain a bit further, for your specific scenario...
When you access https://www.eurobet.it/it/scommesse/#!/calcio/?temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI, the server responds with HTTP headers:
...
set-cookie: __cfduid=dd38d***********41125; ...
...
The rest doesn't look that relevant:
Going straight to the other request: https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
This HTTP request takes (as input):
cookie: __cfduid=dd38d***********41125; mbox=session#6661556c.....b6e8cc1fa6f03#1608242987; at_check=true; s_ecid=MCMID%***********2021453010; AMCVS_45F10C3A53DAEC9F0A490D4D%40AdobeOrg=1; AMCV_45F10C3A53DAEC9F0A490D4D%40AdobeOrg=1075005958%7CMCIDTS%7C18614%7CMCMID%7C91883906030825914429183258312021453010%7CMCAID%7CNONE%7CMCOPTOUT-1608248327s%7CNONE%7CvVersion%7C4.4.1; s_cc=true
...
referer: https://www.eurobet.it/it/scommesse/
...
x-eb-accept-language: it_IT
x-eb-marketid: 5
x-eb-platformid: 1
Cookies are set in an initial request (typically) using Set-Cookie header and then are passed back to the server in subsequent requests using the cookie header.
I'm not certain how many of these values are relevant but you'd need to figure out where each came from in the chain of HTTP requests between the initial one and this one and you'd need to replicate them (see url above of my previous answer - warning this can be time consuming).
The other headers can be set statically most likely since they probably aren't due to change.
If you have access to curl on the command line, you can attempt to reconstruct some of these requests by hand. Some will be time sensitive since cookies do expire after an amount of time (see set-cookie header details for exactly when). Once you've reconstructed a working request, you can then start coding it in your application.
If you can work all this out you should be able to re-construct the chain of HTTP GET requests to get the JSON data you want. Good luck!
I'm trying to replicate a request I make on a website (ie zoominfo.com) using the same http POST parameters using chrome rest console, but it fails for some reason. I'm not sure if there is a missing field or it's not working because the origin of the request isn't valid.. can someone point me out in the right direction? Below is a detailed explanation of the experiment:
ORIGINAL CASE
basically if I go to zoominfo.com (registered and all) I see a form page that I need to fill:
if I hit enter.. the site makes an ajax call. If I open the chrome web dev tools, and open the network tab, I see the details of the ajax call:
notice the body of the POST has the name John Becker in it:
{"boardMember":{"value":"Include","isUsed":true},"workHistory":{"value":"CurrentAndPast","isUsed":true},"includePartialProfiles":{"value":true,"isUsed":true},"personName":{"value":"john%20becker","isUsed":true},"lastUpdated":{"value":0,"isUsed":true}}
the response is shown under the respones tag:
WHAT I'M TRYING TO DO
basically replicate what i've done above using a REST console (note: so there is nothing illegal here.. i'm just replacing a chrome browser action with a rest client action.. i'm not hacking anyone and i'm not getting information I can't get the normal way, but if someone feels otherwise.. please let me know)..
so I plug in the same parameters as above into the rest console:
now i'm not sure about authentication.. but just to be safe, i entered the same user name and pwd i have for the site into the REST console:
but then I keep on getting an error as a response to my rest console's request:
UPDATE: CORRECT ANSWER:
so according to JMTyler's answer.. I had to simply include criteria in the RAW body, and convert it to url encoding.. in addition to that, I had to explicitly set the encoding in the rest console body..
looking at the chrome inspector more closely, it turns out that I simply had to click on view source:
to get the url-encoded value that I needed to put in the RAW body in the rest console:
I also had to set encoding to gzip,deflate,sdch and things worked fine!
The form is posting all that JSON under the field criteria. You can see this in the screencap of the chrome dev console you posted.
Just start your raw body in rest console with criteria= and make sure the json has been url-encoded. That should do it.
No authentication is needed because none is passed through the headers in your screencap. Any cookies you have when you load the page normally will also be loaded through rest console, so you don't need to worry about explicitly setting them.
Reading your problems I'll make an educated guess:
zoominfo does not provide an RESTful API.
Rest-Console understands and uses HTTP Authentication, which is different from the authentication handler zoominfo implemented.
A possible way to work around may be:
Make a call to the login-page via rest console. you'll get back cookies and a lot more.
In subsequent requests to zoominfo be sure to include those cookies (likely holding some session information) in your request, therefore acting like a browser.
I want to make a turn based game (Something like Checkers) with the help of Servlets and jsp pages.I created a page that has a newGame button that redircet to the gamePage(It redirect the first into a Black.jsp and the other request will be redirected to Red.jsp).
My problem is ,how could I refresh the other jsp automaticaly if one of them changed.
Note:After the change in one of the jsp it redirect the request to servlet and servlet update the changed jsp graphics.but the other jsp stay inactive.I want to make it active.
Thank You
It sounds like what you need is Comet. Here's an overview of how it works.
http://www.ibm.com/developerworks/web/library/wa-cometjava/
Basically, the "other" user's browser will send a request to a servlet to get an update, but that request won't receive receive its response until the current player makes a move. This gets around the problem posed by the fact that, with traditional HTTP, the browser always has to be the one sending the request to the server, it can't be the other way around.
There are some variations on the technique. Now that you know the name, I'm sure you'll be able to find lots of useful information about it.
There's another technology called WebSocket which can also serve this purpose, but it requires additional capability built into the browser and, as of now, probably not all of your users will be using compatible browsers.
I am trying to understand how POST and/or GET methods work in terms of the actual browser.
I am attempting to contact an API which requires API key, method you wish to use on their side, and an IP address at the minimum.
My original idea was to do something like this:
I feel like I'm on the right track, it does something and gets an error as opposed to telling me the page does not exist. I'm expecting either JSON or XML in response as the API supports both but instead I get this error:
This page contains the following errors:
error on line 1 at column 1: Document is empty
Below is a rendering of the page up to the first error.
Upon studying the documentation of the API more, I found something saying that methods are called using HTML form application/x-www-form-urlencoded and the resuource models are given as form elements.
I tried researching what that means to see what the problem was and found this site http://www.w3.org/TR/xforms11/ but I'm still unclear.
Ideas?
It seems to mean that the application is expecting a POST method but you're doing a request with a GET method (when you use the querystrings).
Since you can't just do browser requests using POST using the address bar, you may need to:
Construct a simple JS function that does a xmlhttprequest request using that method instead, and running it from the console;
Create a simple HTML page that automates the above process, allowing you do make POST calls;
Using CURL instead, which is a great tool for testing those kinds of requests.
I have an embedable widget. For each impression, I would like to track the referrer (the page where the widget is embedded onto). Right now I am using ExternalInterface to use javascript to check window.location.href when its available, however, I am finding that most of the time I am unable to set the referrer.
Is there a better way to do this? Or perhaps am I not using javascript correctly to get the referrer?
Thanks!
I don't think you can directly get it in this way. There are a couple options I can think of:
Get the referrer from your web server HTTP logs. Apache for example logs referrer info by default.
Have people include some referral code in their widget request, that you can use to identify where it came from.
Make a request from your widget back to your server...I think this request will contain the HTTP Referrer field pointing at where it is embedded
Use something like [swfmill][1] to embed the referrer into the actual SWF itself when it is requested from your server...but this might have too much performance overhead.