Selenium html.text returning incomplete results - html

My problem is simple. The url below is for a linkedin page that contains 25 jobs. The code seems to be working but I'm surprised that it is returning some of the jobs but not all of the jobs displayed on the page. Why is it returning only half of the results? so can anyone provide some suggestions as I'm new to this? I've heard about requests but I want to do it using Selenium only. I'm grateful for any help! Thanks.
here's the url
url = "https://www.linkedin.com/jobs/search/?f_TP=1&keywords=forecast&location=Toronto%2C%20Canada%20Area&locationId=ca%3A4876&sortBy=DD&start=0"
Here's my code:
body_html = driver.find_element_by_xpath("/html/body")
print(body_html.text)

Related

How do I find an element from a web page and import it into a Google Sheets cell?

I'm attempting to scrape from this web page here: https://explorer.helium.com/accounts/14Jydka1ufeZBXAHNmjK9SWedvWtufdaJRbEMgtF8Bifc6Dv7Gm
I'm trying to find out how to pull the number below "Rewards(24H)" and import it into a cell.
I've tried using ImportXML function but it gave me the "Imported content is empty" error.
After doing some research, I think the element is not server-side because I can't find it in the source code. So I opened up the Developer Tools for the page, clicked the Network Tab and refreshed the page.
I filtered the results to see only the XHR parts. Clicking the Headers tab will display a number of APIs in the Request URL section.
This is as far as I have gotten. I cannot find any reference to the Rewards(24H) number in any of the JSON code.
It'd be much appreciated if anyone can explain how I can find that number and import it into a Google Sheets cell, preferably self updating every hour.
Thanks!
After seeing to the network requests of the above URL data is coming from the api.helium.io so you can follow below code to get your desired data and FYI that website provides API already and has docs here : API Documentation so if you other data you can follow this also and according to my guess you were not able to find the data cause it was rounded off to the 2 decimals where as API gives result up to 5 to 6 so that could be the reason you were not able to find the data which was displayed in XHR requests.
Code:
import datetime
import requests
dt = datetime.datetime.now(datetime.timezone.utc)
querystring = {"min_time":"-60 day","max_time":dt.strftime("%Y-%m-%dT%H:%M:%SZ"),"bucket":"day"}
response=requests.get('https://api.helium.io/v1/accounts/14Jydka1ufeZBXAHNmjK9SWedvWtufdaJRbEMgtF8Bifc6Dv7Gm/hotspots').json()
for data in response["data"]:
print(f"name:{data['name']}")
internal_data =requests.get(f"https://api.helium.io/v1/hotspots/{data['address']}/rewards/sum",params=querystring).json()
last24hour=internal_data["data"][0]["total"]
last7days=0
for i in range(0,7):
last7days+=internal_data["data"][i]["total"]
last30days = 0
for i in range(0, 30):
last30days += internal_data["data"][i]["total"]
print(f"24H : {round(last24hour,2)}")
print(f"7D : {round(last7days, 2)}")
print(f"30D : {round(last30days, 2)}")
Output:
Let me know if you have any questions :)

Newsletter2Go endpoint /forms/submit{{form_id}}

I created a wrapper for using the newsletter2go endpoint
https://api.newsletter2go.com/forms/submit/{{form_id}}
When I post the request to that ep w/ an form_id I get the following response:
http status: 400
code: 10020
error message: Bad Request (invalid code xxxx)
Im using the id from the n2go backend:
Can anyone tell me whats about the error code 10020? The api docs containing nothing about that.
This is probably because you used big letters in your form_id. I had the same problem. The support told me, that only small letters are allowed.
I found finally out what the form_id have to be. It's the value you see in the Opt-In Email for the n2g argument, for example:
https://subscribe.newsletter2go.com?n2g=dummy-code-here
Otherweise check the sourcecode of the Embedded Form, there is a script in it which contains the form id.
Maybe this helps someone one day.

SocketTimeOutException for Docs Api

I am using Google Document AclFeed APi to get the list of users to whom the document is shared. This is the code I am using
AclFeed aclFeed = docsService.getFeed(new URL("https://docs.google.com/feeds/default/private/full/file%3A"+fileId+"/acl"), AclFeed.class);//No I18N
Until last week everything was working fine , but from past two days I am getting the SocketTimeOutException for many requests...Is anyone else facing this issue? Any help will be grateful..
TIA,
VijayRaj
It happens a lot. What is your timeout set to? As an example the default on AppEngine is 5s which is way to low.

Does SWFJunkie Tweetr's PHP Proxy still work?

I've recently been asked to help support a system which uses the PHP Proxy provided by SWFJunkie. In following the proxy's install steps I get to Step 4 at which point I get no results. I've tried hosting on Apache and IIS, have confirmed that URL rewrites work correctly on both platforms and that CURL is correctly installed, and have ensured there are no firewalls blocking my requests.
In researching this issue I've seen that Twitter have been changing their API rules (with more changes set to take place in March next year). The last activity I can see on the Proxy project was back in 2010, after which it seems to have gone dead. Before I put in more effort trying to get this to work I thought I'd ask - is anyone else currently using this / do you have it working?
If you have it working I'd welcome any tips / advice you have also, but mainly I just want to know in advance whether this utility still works in order to justify spending time on it.
Thanks in advance.
the proxy does still work, but I had to tweak the code a little.
Thanks to #MikeHayes for the solution: Twitter OAUTH - returns response code of "0"
For anyone else having this issue, the quick fix is to update Tweetr.php. Find the code matching what's below & insert the additional (commented) line on the end.
$opt[CURLOPT_URL] = $twitterURL;
$opt[CURLOPT_USERAGENT] = $this->userAgent;
$opt[CURLOPT_RETURNTRANSFER] = true;
$opt[CURLOPT_TIMEOUT] = 60;
//BEGIN UPDATED CODE
$opt[CURLOPT_SSL_VERIFYPEER] = false;
//END UPDATED CODE
Thanks again to Mike & good luck to anyone else currently having this issue.

bungie.net stats api

Has anyone tried accessing bungie.net reach stats api (statistics from Halo Matchmaking)?
As described here http://www.bungie.net/fanclub/statsapi/Group/Resources/Article.aspx?cid=545064
I can't seem to get any data returned, for example if i use this (with correct API key and gamertag values of course) ignore the first 2 asterisks ...
**http://www.bungie.net/api/reach/reachapijson.svc/player/details/byplaylist/MyIdentifierAPIkey/Gamertag
I don't receive a response - but no errors either, am i doing something wrong?
looking to use this for a Titanium (appcelerator) app eventually.
Any help or advice welcome, thanks in advance.
Unfortunately the API is not yet live to the public. I asked in a Private Message. They didn't say when it would be live.
In the Docs that Achronos posted, he put spaces in the URL's, I'm not sure if those are supposed to be there or not, so I tried it with the spaces and I got a 403 Forbidden error page. When I remove all the spaces, I get an error page that says:
Request Error
The server encountered an error processing the request. See server logs for more details.
I kinda can't check the server logs though... Bungie did say they were having some issues with the site though, so this might be a biproduct of that. I want them to get it working soon though, I wanna see just what it can do!