Django endless pagination and anchors - html

I'm creating a Django site that hosts a large set of long, transcribed debates.
I have two main views: a Haystack search view which indexes each individual speech, and a full view which indexes each transcript (containing hundreds of individual speeches). Both views use django endless pagination to display results.
I'm trying to link between these two views so that any search result (speech) can be viewed in context within its parent transcript, and I want the page to jump to the anchor of that speech ID on page load.
I calculate the page where the individual result appears and redirect to that URL while storing the pk of the result in messages so I can highlight the result:
def full_view_redirect(request, year, month, day, pk):
y=str(year)
m=str(month)
d=str(day)
qs = transcripts.objects.filter(speechdate__year=year).filter(
speechdate__month=month).filter(speechdate__day=day).order_by('basepk').all()
firstpk = int(qs[0].basepk)
pageNo = ((int(pk)-firstpk)//15)+1
messages.add_message(request, messages.INFO, pk)
if pageNo == 1:
return redirect("/full/"+y+"/"+m+"/"+d+"/"+"#"+str(pk))
else: ## this doesn't work
return redirect("/full/"+y+"/"+m+"/"+d+"/"+"?page="+str(pageNo)+"#"+str(pk))
My question is similar to http://htmlasks.com/how_to_make_this_link_work_page2reviews_reload_the_page_and_jump_to_the_anchor but the suggestion here to switch around the anchor and the ?page doesn't work.
I can't get the anchor to work so the page jumps to the desired result on page load. Am I missing something obvious?
Edit:
I verified the div I want to jump to has a proper id, eg.
<div class="panel panelhighlight" id="54969">
A url like /full/1903/04/29/?page=7#54969 loads the proper page but does not jump to the div.
A url like /full/1903/04/29/#54969?page=7 loads the first page and not page 7.
Edit 2:
I have switched from django-endless-pagination to django-digg-paginator so that pagination is handled within my view, not at the template level.
Then, I had to ensure the redirect reloads by omitting the slash between the page number and the anchor. /full/1903/04/29/7#54969 repositions the page successfully on load.

As detailed in my edit, I had two problems going on that I needed to fix:
I switched from django-endless-pagination to
django-digg-paginator so that pagination is handled within my view/URL pattern,
not at the template level.
I had to ensure the redirect reloads by omitting the slash
between the page number and the anchor, ie. /full/1903/04/29/7#54969
repositions the page successfully on load.

Related

How can I differentiate between two url requests from different HTML pages but with the same namespace in views.py?

I am creating a simple eBay like e-commerce website to get introduced with django. For removing an item from the watchlist, I placed two same links in two different HTML files, that is, I can either remove the item from the watchlist.html page or either from the item's page which was saved as listing.html. The url for both the pages look like this:
Remove from watchlist
Now, in my views.py, I want to render different pages on the basis of the request. For example, if someone clicked Remove from watchlist from listing.html then the link should redirect again to listing.html and same goes for the watchlist.html.
I tried using request.resolver_match.view_name but this gave me 'removeFromWatchlist' as the url namespace for both of these request is same.
Is there any way I can render two different HTML pages based on the origin of the url request?
Also, this is my second question here so apologies for incorrect or bad formatting.
You could check the HTTP_REFERER in the request.META attribute of the view to get the url that referred the request as so:
from django.shortcuts import redirect
def myview(request):
...
return redirect(request.META.get("HTTP_REFERER"))#Or however you prefer redirecting
https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META

Scraping prices with BeautifulSoup4 in Python3

I am new scraping with Python and BeautifulSoup4. Also, I do not have knowledge of HTML. To practice, I am trying to use it on Carrefour website to extract the price and price per kilogram of the product that I search for EAN code.
My code:
barcodes = ['5449000000996']
for barcode in barcodes:
url = 'https://www.carrefour.es/?q=' + barcode
html = requests.get(url).content
bs = BeautifulSoup(html, 'lxml')
searchingprice = bs.find_all('strong', {'class':'ebx-result-price__value'})
print(searchingprice)
searchingpricerperkg = bs.find_all('span', {'class':'ebx-result__quantity ebx-result-quantity'})
print(searchingpricerperkg)
But I do not get any result at all
Here is a screenshot of the HTML code:
What am I doing wrong? I tried with other website and it seems to work
The problem here is that you're scraping a page with Javascript-generated content. Basically, the page that you're grabbing with requests actually doesn't have the thing you're grabbing from it - it has a bunch of javascript. When your browser goes to the page, it runs the javascript, which generates the content - so the page you see in the rendered version in your browser is not the same thing returned from the actual page itself. The page contains instructions for your browser to write the page that you see.
If you're just practicing, you might want to simply try a different source to scrape from, but to scrape from this page, you'll need to look into other solutions that can handle javascript generated content:
Web-scraping JavaScript page with Python
Alternatively, the javascript generates content by requesting data from other sources. I don't speak spanish, so I'm not much help in figuring this part out, but you might be able to.
As an exercise, go ahead and have BS4 prettify and print out the page that it receives. You'll see that within that page there are requests to other locations to get the info you're asking for. You might be able to change your request to not go to the page where you view the info, but to the location that page gets it's data from.

Is there any way to show all the components data from /jcr:content/par/ location

I have a query regarding the data rendering of the different page at one place. As every page is build using many components and all the components data gets stored under jcr location of page ie. /jcr:content/par/{components list}. The data is properly rendering on this page.
Now I have a situation where I need to create a component to search the page(i.e unique product), if this product page is available in the repository, I need to render its data just under the search box. For this I am creating json which I will use to render the content after search found.
But if there is any other way i can include this component from the /par location of the page to just display the data as it is, rather than building json(of all the components data) and then reading it at the time of display.
I am wondering if we have any method to display all the components data by just including the /par/{components} on a page. This way I can speed up the development, and it looks faster way to display the content as well.
thanks in advance....
If you have static page, then you can go through list of search results (product resources) and include component, which renders product, for each of them. Like:
<c:forEach items="${productsList}" var="productPath">
<cq:include path="${productPath}" resourceType="/apps/you-project/components/product-component-name"/>
</c:forEach>
If you show results dynamically - then you can do ajax requests for product resources. Something like this:
var productHtml = CQ.shared.HTTP.get(productPath + ".html");
Or the same using JQuery. Then you can add html to your page.
However, with second approach you should add clientlibs from component /apps/you-project/components/product-component-name to search results page yourself, because they will not be loaded with ajax request.

How to go from a MVC view to a section of another view without the hash in URL

I have an issue in MVC to go with anchor from one view to a section in another view without the hash in the URL to be displayed. I am navigating in my Controler's views like normal /news or /test, but I want to go to a section of the Index view and I am doing it with /#contact. I have tried to javascript it and remove it from the visual URL with the following solution:
window.location.href.substr(0, window.location.href.indexOf('#'))
and this:
history.pushState(obj, title, url);
But this is not helping, because the URL is not updating at the time of the first click. I tried to delay the event with some ms, but it is not pro at all... Can you give me some advices or hints to remove #contact from the URL and still be on the section "contact" of the View.

retrieving URLs from functions within HTML (python)

I need to scrape some URLs from some retailer product pages, but the specific URLs I need to get aren't in the html part of the page. The html looks like this for each of the items on which one would click to get to the page with the URL I need to grab:
<div id="name" class="hand bold" onclick="AVON.productcontrol.Go(45714);">ADVANCE TECHNIQUES Color Protection Conditioner Bonus Size</div>
I wrote the following to get URLs from the page, but since the actual URLs I need don’t seem to be stored in the page, it doesn’t get what I need:
def getUrls(URL):
"""input: product page url
output: list of urls to products
"""
connection = urllib.urlopen(URL)
dom = lxml.html.fromstring(connection.read())
selAnchor = CSSSelector('a')
foundElements = selAnchor(dom)
urlList = [e.get('href') for e in foundElements]
return urlList
Is there a way to get the link that the function after ‘onclick’ (I guess AVON.productcontrol.Go(#);) takes you to? I don’t fully understand html, and while I’ve read a bit about onclick, I can’t figure out how the function after 'onclick' works.
In order to find the URL that you are taken to on click, you need to find the JavaScript source code of the 'Go' function and read and understand it. It's buried somewhere within a tag or some JavaScript .js file that is referenced directly or indirectly by the HTML page. Happy digging!
Or: you automate the interaction with the web page with a tool like Selenium (http://docs.seleniumhq.org/) and just check where it takes you if you click.