Python3 Download Image from an Webpage - html

how can I download the image from this URL: https://thispersondoesnotexist.com/
I know it hasn't some image-url but I hope to get some images from there.
My idea is to write a code to get more images from this URL.

Duplicate: Loop through webpages and download all images
Python 2
Using urllib.urlretrieve
import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")
Python 3
Using urllib.request.urlretrieve (part of Python 3's legacy interface, works exactly the same)
import urllib.request
urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

Related

Getting Image through HTML and using Python Code

I am trying to make a website that gets the user image and uses
facelandmark code(python) to tell the user about user's face shape and etc.
How can I get the imange through html and use the image file in python code and show the result to the user again? Is using django the only way? I have tried to study django in many ways and most of the stuffs I found were not directly helping on my planning website. Thank you for reading
You can use this code to directly embed the image in your HTML: Python 3
import base64
data_uri = base64.b64encode(open('Graph.png', 'rb').read()).decode('utf-8')
img_tag = '<img src="data:image/png;base64,{0}">'.format(data_uri)
print(img_tag)

Trying to get the html on an open page

I am trying to make a bot that can play Cookie Clicker. I have successfully opened the website using the webbrowser module. When I use the developer tool to see the html I can see the information I want to obtain, such as how much money I have, how expensive items are ect. But when I try to get that information using the requests and beautifulsoup it instead gets the html of a new window. How can I make it so that I get the html of the already opened tab?
import webbrowser
webbrowser.open('https://orteil.dashnet.org/cookieclicker/')
from bs4 import BeautifulSoup
import requests
def scrape():
html = requests.get('https://orteil.dashnet.org/cookieclicker/')
print(html)
scrape()
You can try to do this:
body_element = html.find_element_by_xpath("//body")
body_content = body_element.get_attribute("innerHTML")
print(body_content)

missing HTML information when using requests.get

I am trying to scrape surfline.com using python3 with beautiful soup and requests. I am using this bit of code. Additionally I am using spyder 3.7. Also I am fairly new to webscraping.
import requests
from bs4 import BeautifulSoup
url = 'https://www.surfline.com/surf-report/salt-creek/5842041f4e65fad6a770882e'
r = requests.get(url)
html_soup = BeautifulSoup(r.text,'html.parser')
print(html_soup.prettify())
The goal is to scrape the surf height for each day. Using inspect I found the HTML section that contains the wave height. screen shot of surfline website & HTML . I run the code and it prints out the HMTL of the website. When I do ctrl find to look for the section I want to scrape it is not there. My question is why is it not being printed out and how do I fix it. I am aware that some website use java to load data onto website, is that the case here.
Thank you for any help you can provide.

Selenium login to page with python 3.6 can't find element by name

Today I tried to write a code to make a bot for ytpals.com webpage.
I am using python selenium library.
What I am trying to do first is to login to page with my youtube channel ID.
But I was unsucessfull to find element 'channelid' whatever I do.
Adding to this this, page sometimes doesn't load fully...
Btw it worked for me with other pages to find an input form, but this page... I can't understand.
Maybe someone has better understanding than me and know how to log in in this page?
My simple code:
import time
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://www.ytpals.com/')
search = browser.find_element_by_name('channelid')
search.send_keys("testchannel")
time.sleep(5) # sleep for 5 seconds so you can see the results
browser.quit()
So I found a solution to my problem.
I downloaded SELENIUM IDE, and I can use it as a debugger, such a great tool!
if someone will need it, grab a link:
https://www.seleniumhq.org/docs/02_selenium_ide.jsp

Website hiding page footer from parser

I am trying to find the donation button on the website of
The University of British Columbia.
The donation button is located at the page footer, within the div classed as "span7"
However, when scraped, the html yeilded the div with nothing inside it.
My program works perfectly with direct div as source:
from bs4 import BeautifulSoup as bs
import re
site = '''<div class="span7" id="ubc7-footer-menu"><div class="row-fluid"><div class="span6"><h3>About UBC</h3><div>Contact UBC</div><div>About the University</div><div>News</div><div>Events</div><div>Careers</div><div>Make a Gift</div><div>Search UBC.ca</div></div><div class="span6"><h3>UBC Campuses</h3><div>Vancouver Campus</div><div>Okanagan Campus</div><h4>UBC Sites</h4><div>Robson Square</div><div>Centre for Digital Media</div><div>Faculty of Medicine Across BC</div><div>Asia Pacific Regional Office</div></div></div></'''
html = bs(site, 'html.parser')
link = html.find('a', string=re.compile('(?)(donate|donation|gift)'))
#returns proper donation URL
However, using the site does not work
from bs4 import BeautifulSoup as bs
import requests
import re
site = requests.get('https://www.ubc.ca/')
html = bs(site.content, 'html.parser')
link = html.find('a', string=re.compile('(?i)(donate|donation|gift)'))
#returns none
Is there something wrong with my parser? Is it some-sort of anti-scrape maneuver? Am I doomed?
I cannot seem to find the 'Donate' button on the URL that you provided, but there is nothing inherently wrong with your parser, its just that the GET request that you send only gives you the HTML initially returned from the response, rather than waiting for the page to fully render.
It appears that parts of the page are filled in by Javascript. You can use Splash, which is used to render Javascript-based pages. You can run Splash in Docker quite easily, and just make HTTP requests to the Splash container which will return HTML that looks just like the webpage as rendered in a web browser.
Although this sounds overly complicated, it is actually quite simple to set up since you don't need to modify the Docker image at all, and you need no previous knowledge of Docker to get it to work. It requires just a single line from the command line to start a local Splash server:
docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash
You then just modify any existing requests you have in your Python code to route to splash instead:
i.e. http://example.com/ becomes
http://localhost:8050/render.html?url=http://example.com/