How to load images from URL using Pytorch - deep-learning

I want to load the images using Pytorch
I have a dataset of image_urls with its corresponding labels(offer_id are labels.)
Is there any efficient way of doing it in Pytorch?.

This should work if image url is a public URL using pillow, requests and torch together
from PIL import Image
import requests
from torchvision.io import read_image
import torchvision.transforms as transforms
url = "https://example.jpg"
image = Image.open(requests.get(url, stream=True).raw)
transform = transforms.Compose([
transforms.PILToTensor()])
torch_image = transform(image)

You can use the requests package:
import requests
from PIL import Image
import io
import cv2
response = requests.get(df1.URL[0]).content
im = Image.open(io.BytesIO(response))

You may convert your image URLs to files first by downloading them to specific folder representing the label. You will certainly find a way to do so. Then you may do like this to check what you have:
%%time
import glob
f=glob.glob('/content/imgs/**/*.png')
print(len(f), f)
There is a need to create a image loader that will read the image from disk. In here the pil_loader:
def pil_loader(path):
with open(path, 'rb') as f:
img = Image.open(f)
return img.convert('RGB')
ds = torchvision.datasets.DatasetFolder('/content/imgs',
loader=pil_loader,
extensions=('.png'),
transform=t)
print(ds)
You may check how I did that for Cifar10.
Check the section "From PNGs to dataset".

Related

Web-scraping a link from web-page

New to web-scraping here. I basically want to extract a link from a web page into my jupyter notebook as shown in the image below :
Following is the code that I tried out:
from flask import Flask, render_template, request, jsonify
from flask_cors import CORS, cross_origin
import requests
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen as uReq
flipkart_url = "https://www.flipkart.com/search?q=" + 'acer-aspire-7-core-i5'
uClient = uReq(flipkart_url)
flipkartPage = uClient.read()
flipkart_html = bs(flipkartPage, "html.parser")
#Since I am only interested in the class "_1AtVbE col-12-12"
bigboxes = flipkart_html.findAll("div", {"class": "_1AtVbE col-12-12"})
Now here's the thing, I don't exactly understand what bigboxes is storing. The type of bigboxes is bs4.element.ResultSet, the length is 16.
Now if I run:
box = bigboxes[0]
productlink = "https://www.flipkart.com" + box.div.div.div.a['href']
I am getting an error. However when I run:
box = bigboxes[2]
productlink = "https://www.flipkart.com" + box.div.div.div.a['href']
I am successfully able to extract the link. Can someone please explain to me why the third element was able to read the link? I have a basic knowledge of HTML (at least I thought so) and I don't understand the layers to it. What exactly is bigboxes storing? Clearly, the HTML script shows no layers as such.
Your class filter is not very specific.
The first and second elements are pointing to html nodes which do not contain the link. Thus you are getting error.
A more specific class to check could be: _13oc-S
bigboxes = flipkart_html.findAll("div", {"class": "_13oc-S"})

Display SVG image from qrcode in Django

As per the qrcode docs, I generate an SVG QR code:
import qrcode
import qrcode.image.svg
def get_qrcode_svg( uri ):
img = qrcode.make( uri, image_factory = qrcode.image.svg.SvgImage)
return img
And that's all fine. I get a <qrcode.image.svg.SvgImage object at 0x7f94d84aada0> object.
I'd like to display this now in Django html. I pass this object as context (as qrcode_svg) and try to display with <img src="{{qrcode_svg}}"/> but don't get really anywhere with this. The error shows it's trying to get the img url, but isn't there a way I can do this without saving the img etc.? Terminal output:
>>> UNKNOWN ?????? 2020-06-16 07:38:28.295038 10.0.2.2 GET
/user/<qrcode.image.svg.SvgImage object at 0x7f94d84aada0>
Not Found: /user/<qrcode.image.svg.SvgImage object at 0x7f94d84aada0>
"GET /user/%3Cqrcode.image.svg.SvgImage%20object%20at%200x7f94d84aada0%3E HTTP/1.1" 404 32447
You can write it to the response stream:
import qrcode
from qrcode.image.svg import SvgImage
from django.http import HttpResponse
from bytesIO import bytesIO
def get_qrcode_svg(uri):
stream = bytesIO()
img = qrcode.make( uri, image_factory=SvgImage)
img.save(stream)
return stream.getvalue().decode()
This will pass the svg source, not a URI with the source code. In the template, you thus render this with:
{{ qrcode_svg|safe }}
To solve this I transformed the <qrcode.image.svg.SvgImage> into a base64 string, this can then be used as an img src="{{variable}}" in HTML.
import io
import base64
from qrcode import make as qr_code_make
from qrcode.image.svg import SvgPathFillImage
def get_qr_image_for_user(qr_url: str) -> str:
svg_image_obj = qr_code_make(qr_url, image_factory=SvgPathFillImage)
image = io.BytesIO()
svg_image_obj.save(stream=image)
base64_image = base64.b64encode(image.getvalue()).decode()
return 'data:image/svg+xml;utf8;base64,' + base64_image
If this is not a good solution I would love some feedback. Cheers

How to preprocess the new dataset while model deployement using flask

data = dataset.iloc[:,:-1].values
label = dataset.iloc[:,-1].values
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
for i in range(0,5):
data[:,i] = labelencoder.fit_transform(data[:,i])
data1=pd.DataFrame(data[:,:5])
for i in range(7,12):
data[:,i]=labelencoder.fit_transform(data[:,i])
data2=pd.DataFrame(data[:,7:12])
from sklearn.preprocessing import Normalizer
#----Normalizing Uncategorical Data----#
data3=dataset.iloc[:,[5,6,12]]
dataset.iloc[:,:5]
normalized_data = Normalizer().fit_transform(data3)
data3=pd.DataFrame(normalized_data)
data_full=pd.concat([data1,data2,data3],axis=1)
label=labelencoder.fit_transform(label)
label=pd.DataFrame(label)
Above are my preprocessing steps...the same i want to do to new input data after model deployment through web app.
How to write a function for this..?
i am using flask for developing apis
What to write under predict fund...? in app.py
#app.route('/predict' methods= 'POST' )
def predict():
You will have to pickle all the transformers that you are using while pre-processing your data. Then you will have to load the same transformers and use them during predictions.
Creating a new transformer and fitting it on different value will give your weird predictions.
I created a demo flask project for a meetup. It has all the code that you need.
Deployment: https://github.com/Ankur-singh/flask_demo/blob/master/final_ml_flask.py
Training: https://github.com/Ankur-singh/flask_demo/blob/master/iris.py

How to isolate a part of HTML page in Python 3

I made a simple script to retrieve sourcecode of a page, but I'd like to "isolate" the part of ips so that I can save to proxy.txt file. Any suggestions?
import urllib.request
sourcecode = urllib.request.urlopen("https://www.inforge.net/xi/threads/dichvusocks-us-15h10-pm-update-24-24-good-socks.455588/")
sourcecode = str(sourcecode.read())
out_file = open("proxy.txt","w")
out_file.write(sourcecode)
out_file.close()
I've added a couple of lines to your code, the only problem is that the UI version (check the page source) is being added as an IP address.
import urllib.request
import re
sourcecode = urllib.request.urlopen("https://www.inforge.net/xi/threads/dichvusocks-us-15h10-pm-update-24-24-good-socks.455588/")
sourcecode = str(sourcecode.read())
out_file = open("proxy.txt","w")
out_file.write(sourcecode)
out_file.close()
with open('proxy.txt') as fp:
for line in fp:
ip = re.findall('(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', line)
for addr in ip:
print(addr)
UPDATE:
This is what you are looking for, BeatifulSoup can extract only the data we need from the page using CSS classes, however it needs to be installed with pip. You don't need to save the page to a file.
from bs4 import BeautifulSoup
import urllib.request
import re
url = urllib.request.urlopen('https://www.inforge.net/xi/threads/dichvusocks-us-15h10-pm-update-24-24-good-socks.455588/').read()
soup = BeautifulSoup(url, "html.parser")
# Searching the CSS class name
msg_content = soup.find_all("div", class_="messageContent")
ips = re.findall('(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', str(msg_content))
for addr in ips:
print(addr)
Why won't you use re?
I need the source code to say exactly how.

Entering Value into Search Bar and Downloading Output from Webpage

I'm trying to search a webpage (http://www.phillyhistory.org/historicstreets/). I think the relevent source html is this:
<input name="txtStreetName" type="text" id="txtStreetName">
You can see the rest of the source html at the website. I want to go into the that text box and enter an street name and download an output (ie enter 'Jefferson' in the search box of the page and see historic street names with Jefferson). I have tried using requests.post, and tried typing ?get=Jefferson in the url to test if that works with no luck. Anyone have any ideas how to get this page? Thanks,
Cameron
code that I currently tried (some imports are unused as I plan to parse etc):
import requests
from bs4 import BeautifulSoup
import csv
from string import ascii_lowercase
import codecs
import os.path
import time
arrayofstreets = []
arrayofstreets = ['Jefferson']
for each in arrayofstreets:
url = 'http://www.phillyhistory.org/historicstreets/default.aspx'
payload = {'txtStreetName': each}
r = requests.post(url, data=payload).content
outfile = "raw/" + each + ".html"
with open(outfile, "w") as code:
code.write(r)
time.sleep(2)
This did not work and only gave me the default webpage downloaded (ie Jefferson not entered in the search bar and retrieved.
I'm guessing your reference to 'requests.post' relates to the requests module for python.
As you have not specified what you want to scrape from the search results I will simply give you a snippet to get the html for a given search query:
import requests
query = 'Jefferson'
url = 'http://www.phillyhistory.org/historicstreets/default.aspx'
post_data = {'txtStreetName': query}
html_result = requests.post(url, data=post_data).content
print html_result
If you need to further process the html file to extract some data, I suggest you use the Beautiful Soup module to do so.
UPDATED VERSION:
#!/usr/bin/python
import requests
from bs4 import BeautifulSoup
import csv
from string import ascii_lowercase
import codecs
import os.path
import time
def get_post_data(html_soup, query):
view_state = html_soup.find('input', {'name': '__VIEWSTATE'})['value']
event_validation = html_soup.find('input', {'name': '__EVENTVALIDATION'})['value']
textbox1 = ''
btn_search = 'Find'
return {'__VIEWSTATE': view_state,
'__EVENTVALIDATION': event_validation,
'Textbox1': '',
'txtStreetName': query,
'btnSearch': btn_search
}
arrayofstreets = ['Jefferson']
url = 'http://www.phillyhistory.org/historicstreets/default.aspx'
html = requests.get(url).content
for each in arrayofstreets:
payload = get_post_data(BeautifulSoup(html, 'lxml'), each)
r = requests.post(url, data=payload).content
outfile = "raw/" + each + ".html"
with open(outfile, "w") as code:
code.write(r)
time.sleep(2)
The problem in my/your first version was that we weren't posting all the required parameters. To find out what you need to send, open the network monitor in your browser (Ctrl+Shitf+Q in Firefox) and make that search as you would normally. If you select the POST request in the network log, on the right you should see 'parameters tab' where the post parameters your browser sent.