Is there any way to show all the components data from /jcr:content/par/ location - json

I have a query regarding the data rendering of the different page at one place. As every page is build using many components and all the components data gets stored under jcr location of page ie. /jcr:content/par/{components list}. The data is properly rendering on this page.
Now I have a situation where I need to create a component to search the page(i.e unique product), if this product page is available in the repository, I need to render its data just under the search box. For this I am creating json which I will use to render the content after search found.
But if there is any other way i can include this component from the /par location of the page to just display the data as it is, rather than building json(of all the components data) and then reading it at the time of display.
I am wondering if we have any method to display all the components data by just including the /par/{components} on a page. This way I can speed up the development, and it looks faster way to display the content as well.
thanks in advance....

If you have static page, then you can go through list of search results (product resources) and include component, which renders product, for each of them. Like:
<c:forEach items="${productsList}" var="productPath">
<cq:include path="${productPath}" resourceType="/apps/you-project/components/product-component-name"/>
</c:forEach>
If you show results dynamically - then you can do ajax requests for product resources. Something like this:
var productHtml = CQ.shared.HTTP.get(productPath + ".html");
Or the same using JQuery. Then you can add html to your page.
However, with second approach you should add clientlibs from component /apps/you-project/components/product-component-name to search results page yourself, because they will not be loaded with ajax request.

Related

How can I differentiate between two url requests from different HTML pages but with the same namespace in views.py?

I am creating a simple eBay like e-commerce website to get introduced with django. For removing an item from the watchlist, I placed two same links in two different HTML files, that is, I can either remove the item from the watchlist.html page or either from the item's page which was saved as listing.html. The url for both the pages look like this:
Remove from watchlist
Now, in my views.py, I want to render different pages on the basis of the request. For example, if someone clicked Remove from watchlist from listing.html then the link should redirect again to listing.html and same goes for the watchlist.html.
I tried using request.resolver_match.view_name but this gave me 'removeFromWatchlist' as the url namespace for both of these request is same.
Is there any way I can render two different HTML pages based on the origin of the url request?
Also, this is my second question here so apologies for incorrect or bad formatting.
You could check the HTTP_REFERER in the request.META attribute of the view to get the url that referred the request as so:
from django.shortcuts import redirect
def myview(request):
...
return redirect(request.META.get("HTTP_REFERER"))#Or however you prefer redirecting
https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META

Scraping prices with BeautifulSoup4 in Python3

I am new scraping with Python and BeautifulSoup4. Also, I do not have knowledge of HTML. To practice, I am trying to use it on Carrefour website to extract the price and price per kilogram of the product that I search for EAN code.
My code:
barcodes = ['5449000000996']
for barcode in barcodes:
url = 'https://www.carrefour.es/?q=' + barcode
html = requests.get(url).content
bs = BeautifulSoup(html, 'lxml')
searchingprice = bs.find_all('strong', {'class':'ebx-result-price__value'})
print(searchingprice)
searchingpricerperkg = bs.find_all('span', {'class':'ebx-result__quantity ebx-result-quantity'})
print(searchingpricerperkg)
But I do not get any result at all
Here is a screenshot of the HTML code:
What am I doing wrong? I tried with other website and it seems to work
The problem here is that you're scraping a page with Javascript-generated content. Basically, the page that you're grabbing with requests actually doesn't have the thing you're grabbing from it - it has a bunch of javascript. When your browser goes to the page, it runs the javascript, which generates the content - so the page you see in the rendered version in your browser is not the same thing returned from the actual page itself. The page contains instructions for your browser to write the page that you see.
If you're just practicing, you might want to simply try a different source to scrape from, but to scrape from this page, you'll need to look into other solutions that can handle javascript generated content:
Web-scraping JavaScript page with Python
Alternatively, the javascript generates content by requesting data from other sources. I don't speak spanish, so I'm not much help in figuring this part out, but you might be able to.
As an exercise, go ahead and have BS4 prettify and print out the page that it receives. You'll see that within that page there are requests to other locations to get the info you're asking for. You might be able to change your request to not go to the page where you view the info, but to the location that page gets it's data from.

What would be the correct approach for a light AngularJS web-app

I am building a very light Web-App using AngularJS, and i can't seem to find the correct approach as to how to organize it.
To explain it briefly, the App loads a list of objects after the user logs in, and when he choses an object it loads all the detail from that object.
The app (as I am currently building it) will only have to load short JSON text data, so I thought I could have a single page app in a single HTLM file, directed by a single controller who will handle all the data received from the server, and the different views would have been handled by using HTML snippets and AngularJS directives ng-show and ng-includ, like so :
<div ng-show="correctView" ng-include="login_snippet.html >
</div>
<div ng-show="correctView" ng-include="table-view_snippet.html >
</div>
<div ng-show="correctView" ng-include="detail_view_snippet.html >
</div>
The correctView string is changed by the controller to decide which view is to be showned.
Is this a reasonable approach ? I can't seem to find whick one would suit my App best; it doesn't seem to be the right thing to do because the previous button doesnt work with this method, which can't do.
So,
Is there a way to make it so the previous page button would work ?
If not, what would be the correct thing to do ?Is it possible to have several HTML files sharing the same controller ? Or can some controller send data to another ?
I only found examples of single page applications where only parts of the page is changed when the user interacts with it, and this can't do for mine.

retrieving URLs from functions within HTML (python)

I need to scrape some URLs from some retailer product pages, but the specific URLs I need to get aren't in the html part of the page. The html looks like this for each of the items on which one would click to get to the page with the URL I need to grab:
<div id="name" class="hand bold" onclick="AVON.productcontrol.Go(45714);">ADVANCE TECHNIQUES Color Protection Conditioner Bonus Size</div>
I wrote the following to get URLs from the page, but since the actual URLs I need don’t seem to be stored in the page, it doesn’t get what I need:
def getUrls(URL):
"""input: product page url
output: list of urls to products
"""
connection = urllib.urlopen(URL)
dom = lxml.html.fromstring(connection.read())
selAnchor = CSSSelector('a')
foundElements = selAnchor(dom)
urlList = [e.get('href') for e in foundElements]
return urlList
Is there a way to get the link that the function after ‘onclick’ (I guess AVON.productcontrol.Go(#);) takes you to? I don’t fully understand html, and while I’ve read a bit about onclick, I can’t figure out how the function after 'onclick' works.
In order to find the URL that you are taken to on click, you need to find the JavaScript source code of the 'Go' function and read and understand it. It's buried somewhere within a tag or some JavaScript .js file that is referenced directly or indirectly by the HTML page. Happy digging!
Or: you automate the interaction with the web page with a tool like Selenium (http://docs.seleniumhq.org/) and just check where it takes you if you click.

Get innerHTML via Jsoup

Im trying to scrape data from this website: http://www.bundesliga.de/de/liga/tabelle/
In the source code i can see the tables but there's no content, just things like:
<td>[no content]</td>
<td>[no content]</td>
<td>[no content]</td>
<td>[no content]</td>
....
With firebug (F12 in Firefox) i wont see any content too but i can select the table and then copy the innerHTML via firebug option. In that case i get all the informations about the teams, but i dont know how to get the table with the content in Jsoup.
To get the value of an attribute, use the Node.attr(String key) method
For the text on an element (and its combined children), use Element.text()
For HTML, use Element.html(), or Node.outerHtml() as appropriate
For example:
String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""
String linkOuterH = link.outerHtml();
// "<b>example</b>"
String linkInnerH = link.html(); // "<b>example</b>"
reference:
http://jsoup.org/cookbook/extracting-data/attributes-text-html
The table is not rendered on the server directly, but build by the client side JavaScript of the page and constructed with data that is getting to the client via AJAX. So what you get with the naive Jsoup approach is expected.
I see two possible solutions:
You analyze the network traffic and identify the ajax calls that the site is making. Then you try to reconstruct the format and fire the same requests as the JavaScript would. Then you can reconstruct the table.
you don't use Jsoup but a real browser, that loads the page and runs the JavaScript including all AJAX calls. You could use Selenium webdriver for that. There is a headless browser called phantomjs which has a relatively small footprint that you can use in combination with selenium webdriver.
both options have their (dis)advantages:
This takes more time, since you need to understand the network traffic pretty good. The reward will be a very fast and memory efficient scraper.
The programming of selenium is very easy and you should not have any difficulties achieving your goal. You don't need to understand the inner workings of the site you want to scrape. However, the price is a further dependency in your project. Memory consumption is high. Another process runs. The scraping will be slow.
Maybe you find another source with the soccer table that is holding the infos you want? That might be the easiest. For example http://www.fussballdaten.de/bundesliga/