No results for the web-scrapping from autoscout24 - html

I have a code to retreive data from autoscout24.com for my thesis which I will be working on used car analytics. However, I cannot retreive data and loops do not end. I do not understand the reason behind it. Can anyone help me? Here is the code.
brand = \[\]
model = \[\]
price = \[\]
total = \[\]
a = 101
k = 100
l = 20000 # calculates between € 0 - € 2.000.000
j = 1
while j \<= l:
i = 1
website = 'https://www.autoscout24.com/lst?sort=price&desc=0&cy=NL&atype=C&ustate=N%2CU&damaged_listing=exclude&powertype=kw&pricefrom=' + str(a) + '&priceto=' + str(a+k-1) + '&search_id=2gfy6suaasl&page='
a = a + k
j = j + 1
while i <= 20:
website = website + str(i)
response = requests.get(website)
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find_all('div', {'class' : 'ListItem_wrapper__J_a_C'})
i = i + 1
for result in results:
brand.append(result.find('h2').get_text())
model.append(result.find('span', {'class':'ListItem_version__jNjur'}).get_text())
price.append(result.find('p', {'class':'Price_price__WZayw'}).get_text().strip())
total.append(result.find('div', {'class':'VehicleDetailTable_container__mUUbY'}).get_text())
I used my computer, I used Google CoLab; however, I could not reach anything at all. If you do this manually, you can reach the data; but no results if you do it in a loop.

Related

Why doesn't converted script calculate the same as original when converting from V2 to V4 in Pinescript

When converting a script from V2 to V4 in Pinescript, it doesn't appear to be calculating the same.
V2:
study(title = "POC bands 2.0", shorttitle="POCB", overlay=true)
resCustom = input(title="Timeframe", type=resolution, defval="240")
Length = input(6, minval=1)
xPrice = security(tickerid, resCustom, hlc3)
xvnoise = abs(xPrice - xPrice[1])
nfastend = 0.666
nslowend = 0.0645
nsignal = abs(xPrice - xPrice[Length])
nnoise = sum(xvnoise, Length)
nefratio = iff(nnoise != 0, nsignal / nnoise, 0)
nsmooth = pow(nefratio * (nfastend - nslowend) + nslowend, 2)
nAMA = nz(nAMA[1]) + nsmooth * (xPrice - nz(nAMA[1]))
basis = nAMA
atr = ema(tr,11)
upper = basis[3] + (atr*3)
lower = basis[3] - (atr*3)
plot(basis, color=blue)
plot(upper, color=blue)
plot(lower, color=blue)
V4:
study(title = "POC bands 2.0", shorttitle="POCB", overlay=true)
resCustom = input(title="Timeframe", type=input.resolution, defval="240")
Length = input(6, minval=1)
xPrice = security(syminfo.tickerid, resCustom, hlc3)
xvnoise = abs(xPrice - xPrice[1])
nAMA = 0.0
nfastend = 0.666
nslowend = 0.0645
nsignal = abs(xPrice - xPrice[Length])
nnoise = sum(xvnoise, Length)
nefratio = iff(nnoise != 0, nsignal / nnoise, 0)
nsmooth = pow(nefratio * (nfastend - nslowend) + nslowend, 2)
nAMA := nz(nAMA[1]) + nsmooth * (xPrice - nz(nAMA[1]))
basis = nAMA
atr = ema(tr,11)
upper = basis[3] + (atr*3)
lower = basis[3] - (atr*3)
plot(basis, color=color.new(color.blue,0))
plot(upper, color=color.new(color.white,0))
plot(lower, color=color.new(color.white,0))
V2 are the blue bands with a red center and V4 are the white bands with a blue center.
I think that's because the security function has different default values for the bargaps and lookahead parameters between v2 and v4
With v4, both are set to false by default
With v2, probably one or both of them are set to true

Finding prime numbers up till a number

I am trying to list down all the prime numbers up till a specific number e.g. 1000. The code gets slower as the number increase. I am pretty sure it is because of the for loop where (number -1) is checked by all the prime_factors. Need some advise how I can decrease the processing time of the code for larger numbers. Thanks
import time
t0 = time.time()
prime_list = [2]
number = 0
is_not_prime = 0
count = 0
while number < 1000:
print(number)
for i in range (2,number):
count = 0
if (number%i) == 0:
is_not_prime = 1
if is_not_prime == 1:
for j in range (0,len(prime_list)):
if(number-1)%prime_list[j] != 0:
count += 1
if count == len(prime_list):
prime_list.append(number-1)
is_not_prime = 0
count = 0
break
number += 1
print(prime_list)
t1 = time.time()
total = t1-t0
print(total)
Your solution, on top of being confusing, is very inefficient - O(n^3). Please, use the Sieve of Eratosthenes. Also, learn how to use booleans.
Something like this (not optimal, just a mock-up). Essentially, you start with a list of all numbers, 1-1000. Then, you remove ones that are the multiple of something.
amount = 1000
numbers = range(1, amount)
i = 1
while i < len(numbers):
n = i + 1
while n < len(numbers):
if numbers[n] % numbers[i] == 0:
numbers.pop(n)
else:
n += 1
i += 1
print(numbers)
Finally, I was able to answer because your question isn't language-specific, but please tag the question with the language you're using in the example.

Load completely with request in python (or other ways)

Hi
I was wondering if I can load the page completely with python, for example, a hashtag page form Instagram
there is code I tried but it wouldn't load completely
Here's my code
import json
import re
import requests
x = input("Enter your hashtag: ")
response = requests.get('https://www.instagram.com/explore/tags/' + x + '/?__a=1')
if response.status_code == 404:
print('page not found')
input()
exit()
data = response.text
x = re.findall("\"shortcode\":\"[^\"][^\"][^\"][^\"][^\"][^\"][^\"][^\"][^\"][^\"][^\"][^\,]", data)
y = [i.split('"')[3] for i in x]
x = 0
z = len(y)
print(str(z)+' Posts found')
while x < z:
print('\r' + str(x) + ' posts done', end="")
data = requests.get('https://www.instagram.com/p/' + y[x] + '/?__a=1')
y[x] = data.text
x = x + 1
print()
print('post link finished')
Usernames = []
Posts = []
Followers = []
Following = []
x = 0
while x < z:
print('\r' + str(x) + ' Usernames done' , end="")
data = json.loads(y[x])
Usernames.append(data['graphql']['shortcode_media']['owner']['username'])
x = x + 1
print()
print('Usernames finished')
print(len(Usernames))
I want to have more usernames like 100k or more if you can help me with other libraries it isn't important

Scrapy returns no output - just a [

I'm trying to run the spider found in this crawler and for simplicity sake I'm using this start_url because it is just a list of 320 movies. (So, the crawler won't run for 5 hours as given in the github page).
I crawl using scrapy crawl imdb -o output.json but the output.json file contains nothing. It has just a [ in it.
import scrapy
from texteval.items import ImdbMovie, ImdbReview
import urlparse
import math
import re
class ImdbSpider(scrapy.Spider):
name = "imdb"
allowed_domains = ["imdb.com"]
start_urls = [
# "http://www.imdb.com/chart/top",
# "http://www.imdb.com/chart/bottom"
"http://www.imdb.com/search/title?countries=csxx&sort=moviemeter,asc"
]
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.robotstxt.ROBOTSTXT_OBEY': True,
}
base_url = "http://www.imdb.com"
def parse(self, response):
movies = response.xpath("//*[#id='main']/table/tr/td[3]/a/#href")
for i in xrange(len(movies)):
l = self.base_url + movies[i].extract()
print l
request = scrapy.Request(l, callback=self.parse_movie)
yield request
next = response.xpath("//*[#id='right']/span/a")[-1]
next_url = self.base_url + next.xpath(".//#href")[0].extract()
next_text = next.xpath(".//text()").extract()[0][:4]
if next_text == "Next":
request = scrapy.Request(next_url, callback=self.parse)
yield request
'''
for sel in response.xpath("//table[#class='chart']/tbody/tr"):
url = urlparse.urljoin(response.url, sel.xpath("td[2]/a/#href").extract()[0].strip())
request = scrapy.Request(url, callback=self.parse_movie)
yield request
'''
def parse_movie(self, response):
movie = ImdbMovie()
i1 = response.url.find('/tt') + 1
i2 = response.url.find('?')
i2 = i2 - 1 if i2 > -1 else i2
movie['id'] = response.url[i1:i2]
movie['url'] = "http://www.imdb.com/title/" + movie['id']
r_tmp = response.xpath("//div[#class='titlePageSprite star-box-giga-star']/text()")
if r_tmp is None or r_tmp == "" or len(r_tmp) < 1:
return
movie['rating'] = int(float(r_tmp.extract()[0].strip()) * 10)
movie['title'] = response.xpath("//span[#itemprop='name']/text()").extract()[0]
movie['reviews_url'] = movie['url'] + "/reviews"
# Number of reviews associated with this movie
n = response.xpath("//*[#id='titleUserReviewsTeaser']/div/div[3]/a[2]/text()")
if n is None or n == "" or len(n) < 1:
return
n = n[0].extract().replace("See all ", "").replace(" user reviews", "")\
.replace(" user review", "").replace(",", "").replace(".", "").replace("See ", "")
if n == "one":
n = 1
else:
n = int(n)
movie['number_of_reviews'] = n
r = int(math.ceil(n / 10))
for x in xrange(1, r):
start = x * 10 - 10
url = movie['reviews_url'] + "?start=" + str(start)
request = scrapy.Request(url, callback=self.parse_review)
request.meta['movieObj'] = movie
yield request
def parse_review(self, response):
ranks = response.xpath("//*[#id='tn15content']/div")[0::2]
texts = response.xpath("//*[#id='tn15content']/p")
del texts[-1]
if len(ranks) != len(texts):
return
for i in xrange(0, len(ranks) - 1):
review = ImdbReview()
review['movieObj'] = response.meta['movieObj']
review['text'] = texts[i].xpath("text()").extract()
rating = ranks[i].xpath(".//img[2]/#src").re("-?\\d+")
if rating is None or rating == "" or len(rating) < 1:
return
review['rating'] = int(rating[0])
yield review
Can someone tell me where am I going wrong?
In my opinion, this web site should be load the list of movies use by js. Fristly, I suggest you should check the output about: movies = response.xpath("//*[#id='main']/table/tr/td[3]/a/#href"). If you want to get js content, you can use webkit in scrapy as a downloader middleware.

improving a function in python

I created this function to figure out if betting either side of proposition bet on how many 3 pointers there will be in a particular basketball game is profitable. I project how many total three pointers will be made pjTotal3s and the standard deviation pjGame3STD earlier in the code. The threes_over is the the number given to me by the betting site for which I try to find if the total number of threes will be over or under that number. In this case it is 14.5.
threes_over = 14.5
def overunder(n):
over_count = 0
under_count = 0
push_count = 0
overodds = 0
underodds = 0
for i in range(n):
if round(np.random.normal(pjTotal3s,pjGame3STD)) > threes_over:
over_count = over_count + 1
if round(np.random.normal(pjTotal3s,pjGame3STD)) < threes_over:
under_count = under_count +1
if round(np.random.normal(pjTotal3s,pjGame3STD)) == threes_over:
push_count = push_count + 1
return over_count, under_count, push_count
Then I simulate it a 100,000 overunder(100000) times and it gives me how many times the number of three pointers will be over, under or equal to the number given. This works fine but I still have to more work to do to find if it is a profitable bet.
Assuming that this the output (57550, 42646, 0) I have to manually input it like so and do more to find out if either side of the bet is worthwhile.
over_count = 57550
under_count = 42646
over = 1/(over_count / (over_count + under_count))
under = 1/ (under_count / (over_count + under_count))
over_odds_given = 1.77
under_odds_given = 2.05
overedge = 1/over * over_odds_given - 1
underedge = 1/under * under_odds_given - 1
print overedge, underedge
How do I combine the second set of calculations into the same function as the first. I would like to avoid having to manually input the results of the first function in order to save time and avoid inputting a wrong number.
If you really want the second bit of code in the same function as the first, you could just paste all but the part where you set the over_count and under_count into the function...
def overunder(n):
over_count = 0
under_count = 0
push_count = 0
overodds = 0
underodds = 0
for i in range(n):
if round(np.random.normal(pjTotal3s,pjGame3STD)) > threes_over:
over_count = over_count + 1
if round(np.random.normal(pjTotal3s,pjGame3STD)) < threes_over:
under_count = under_count +1
if round(np.random.normal(pjTotal3s,pjGame3STD)) == threes_over:
push_count = push_count + 1
over = 1/(over_count / float(over_count + under_count))
under = 1/ (under_count / float(over_count + under_count))
over_odds_given = 1.77
under_odds_given = 2.05
overedge = 1/over * over_odds_given - 1
underedge = 1/under * under_odds_given - 1
print overedge, underedge
return over_count, under_count, push_count
Or, probably better, you could put the second bit of code (overedge, underedge) into a separate function and pass it the results from overunder:
def edges(over_count, under_count, push_count):
over = 1/(over_count / float(over_count + under_count))
under = 1/ (under_count / float(over_count + under_count))
over_odds_given = 1.77
under_odds_given = 2.05
overedge = 1/over * over_odds_given - 1
underedge = 1/under * under_odds_given - 1
print overedge, underedge
And then call it with the results from overunder:
c = overunder(100000)
edges(c[0],c[1],c[2])