The extension tt_news is very useful for me but there is this little thingy called "register:newsMoreLink". This register does contain the singlePid of the contentelement (defined a single view page) and the uid of the newsarticle from the news extension.
This is the typoscript section of the "new ts" of the extension tt_news
As you can see there is "append.data = register:newsMoreLink"...
plugin.tt_news {
displayLatest {
subheader_stdWrap {
# the "more" link is directly appended to the subheader
append = TEXT
append.data = register:newsMoreLink
append.wrap = <span class="news-list-morelink">|</span>
# display the "more" link only if the field bodytext contains something
append.if.isTrue.field = bodytext
outerWrap = <p>|</p>
}
}
}
What is "register:newsMoreLink"? Is this like a function or something? I do not know. But "register:newsMoreLink" produces a strange link if I use this on "append.data". It produces are "More >" link. The "More >" link after a news article teaser looks like this:
http://192.168.1.29/website/index.php?id=474&tx_ttnews%5Btt_news%5D=24&cHash=95d80a09fb9cbade7e934cda5e14e00a
474 is the "singlePid" (this is what it calls in the database
24 is the "uid" of the news article (the ones you create with the tt_news plugin in the backend)
My question is: Where is the "register:newsMoreLink" defined? Is it defined generally or do I miss a fact of Typo3..? How can I add an anchor link at the end of this "More >" href? Like:
http://192.168.1.29/website/index.php?id=474&tx_ttnews%5Btt_news%5D=24&cHash=95d80a09fb9cbade7e934cda5e14e00a#myAnchor1
register:newsMoreLink is not a function. It's one of the data types. In other words a type of data that you can access with stdWrap.data. register is set with LOAD_REGISTER. Though, in case of tt_news this is set in the PHP code with $this->local_cObj->LOAD_REGISTER().
I'm afraid you cannot easily add an anchor to that link. However, you can set the append to create your own custom link to the news record using typolink:
append = TEXT
append {
value = text of the link
typolink {
# ...typolink configuration...
}
}
You shall be interested in the typolink's attributes parameter, additionalParams and section.
this is the code I use to link to an pid with a anchor target:
displayList.plugin.tt_news.subheader_stdWrap {
append = TEXT
append.data >
append {
value = mehr
typolink{
parameter = 47 // pid
section = entry_{field:uid} // anchor name
section.insertData = 1
}
}
Related
I tried to get the routeName from the URL because i need to set another class in the Layout of the body if i'm on the /Category page.
#{string classContent = Request.QueryString["routeName"] != "/Category" ? "container" : "";};
<div id="Content" class="body-wrapper #classContent">
My problem is, Request.QueryString["routeName"] is always empty and couldn't find why.
Does someone know why it's always empty or has a better approach for setting a different class if you're on a certain page?
In the end i solved it with that code:
var segments = Request.Url.AbsolutePath.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
string classContent = "container";
if (segments.Count() > 1) { classContent = segments[1] != "category" ? "" : "container";}
Request.Url.AbsolutePath gets the whole URL.
After that i split the whole URL and save it into a list.
Then i test if the list is long enough to be on another site except home.
In the end i look if the second part of the url is /Category and save the Css class appropriate to the output of the url.
I have been tasked with the coding of a web crawler that goes through several URLs (around 400, but the list could grow), each with a completely different html structure and extract the links containing certain information. The only thing the program knows beforehand is what are the keywords it should search for, but the html structure and any semantic cues as to where to look for those keywords is unknown.
So far, I have used the request-promise module for Node.js to send a request to the URL where the search for keywords will take place:
const htmlResult = await request.get(url);
htmlResult stores the response as a string, and I can save it both as an .txt or .html if needed.
The problem I have is that I don't know how to instruct the program how to extract a URL based on words that aren't necessarily present in the url string. An example might help clarify:
<a href="site/with/no/keywords-just-a-random-string" title="Keywords might be here, but title attribute might be absent"><span class="img"><img data-cfsrc="/thumbpdf/618a8nb4.jpg" alt="" style="display:none;visibility:hidden;"><noscript><img src="/thumbpdf/8bfa84.jpg" alt=""></noscript></span>
<h2>KEYWORDS ARE IN THIS TAG, WHICH IN TURN IS INSIDE THE <a> TAG</h2>
<span class="date--type">2 Nov 2021 </span>
<span class="tag">
oher stuff with no keywords in it</span>
</a>
As you can see, this tag has a complex structure. The keywords I need to parse are inside an h2 tag which, in turn, is inside the a tag. But he a tag could also be like this:
KEYWORDS TO PARSE
Here the keywords are simply within the a tag.
My question, thus, is how do I parse htmlResult (either as a string or saved as a .txt/.html file), and, once I get a match, instruct the program to extract the url that is in the bounds of the a tag wherein I go the match of keywords?
As I am using Node.js I open to using any tool available.
Could someone offer some advice on how to tackle this challenge?
Thanks so much in advance.
This is very quick and dirty, and I'm sure it can be further streamlined, but it should get you at least closer to where you need to be.
This assumes a bunch of <div> elements, each containing one of your your <a> elements, all in one document (see link below). It uses xpath to locate the data:
function xpathEval(xpath, context) {
return document.evaluate(xpath, context, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
}
desiredHrefs = []
let targets = xpathEval("//div[#class='container']", document);
for (let i = 0; i < targets.snapshotLength; i++) {
let attribs = xpathEval('.//*/#*', targets.snapshotItem(i)),
texts = xpathEval('.//*/text()', targets.snapshotItem(i));
for (let k = 0; k < attribs.snapshotLength; k++) {
attribData = attribs.snapshotItem(k).textContent
if (attribData.includes("trainer") & attribData.includes("dog")) {
//either
//console.log(targets.snapshotItem(i).querySelector('a').getAttribute('href'))
//ot
let href = xpathEval('.//a/#href', targets.snapshotItem(i));
desiredHrefs.push(href.snapshotItem(0).textContent)
}
}
for (let j = 0; j < texts.snapshotLength; j++) {
data = texts.snapshotItem(j).nodeValue.trim().toLowerCase()
if (data.includes("trainer") & data.includes("dog")) {
//either
//console.log(targets.snapshotItem(i).querySelector('a').getAttribute('href'))
//or
let href = xpathEval('.//a/#href', targets.snapshotItem(i));
desiredHrefs.push(href.snapshotItem(0).textContent)
}
}
}
for (let href of [...new Set(desiredHrefs)])
console.log(href)
You can see it in action here.
Hello I am currently using Python 3, BeautifulSoup 4 and, requests to scrape some information from supremenewyork.com UK. I have implemented a proxy script (that I know works) into the script. The only problem is that this website does not like programs to scrape this information automatically and so they have decided to scramble this script which I think makes it unusable as text.
My question: is there a way to get the text without using the .text thing and/or is there a way to get the script to read the text? and when it sees a special character like # to skip over it or to read the text when it sees & skip until it sees ;?
because basically how this website scrambles the text is by doing this. Here is an example, the text shown when you inspect element is:
supremetshirt
Which is supposed to say "supreme t-shirt" and so on (you get the idea, they don't use letters to scramble only numbers and special keys)
this is kind of highlighted in a box automatically when you inspect the element using a VPN on the UK supreme website, and is different than the text (which isn't highlighted at all). And whenever I run my script without the proxy code onto my local supremenewyork.com, It works fine (but only because of the code, not being scrambled on my local website and I want to pull this info from the UK website) any ideas? here is my code:
import requests
from bs4 import BeautifulSoup
categorys = ['jackets', 'shirts', 'tops_sweaters', 'sweatshirts', 'pants', 'shorts', 't-shirts', 'hats', 'bags', 'accessories', 'shoes', 'skate']
catNumb = 0
#use new proxy every so often for testing (will add something that pulls proxys and usses them for you.
UK_Proxy1 = '51.143.153.167:80'
proxies = {
'http': 'http://' + UK_Proxy1 + '',
'https': 'https://' + UK_Proxy1 + '',
}
for cat in categorys:
catStr = str(categorys[catNumb])
cUrl = 'http://www.supremenewyork.com/shop/all/' + catStr
proxy_script = requests.get(cUrl, proxies=proxies).text
bSoup = BeautifulSoup(proxy_script, 'lxml')
print('\n*******************"'+ catStr.upper() + '"*******************\n')
catNumb += 1
for item in bSoup.find_all('div', class_='inner-article'):
url = item.a['href']
alt = item.find('img')['alt']
req = requests.get('http://www.supremenewyork.com' + url)
item_soup = BeautifulSoup(req.text, 'lxml')
name = item_soup.find('h1', itemprop='name').text
#name = item_soup.find('h1', itemprop='name')
style = item_soup.find('p', itemprop='model').text
#style = item_soup.find('p', itemprop='model')
print (alt +(' --- ')+ name +(' --- ')+ style)
#print(alt)
#print(str(name))
#print (str(style))
When I run this script I get this error:
name = item_soup.find('h1', itemprop='name').text
AttributeError: 'NoneType' object has no attribute 'text'
And so what I did was I un-hash-tagged the stuff that is hash-tagged above, and hash-tagged the other stuff that is similar but different, and I get some kind of str error and so I tried the print(str(name)). I am able to print the alt fine (with every script, the alt is not scrambled), but when it comes to printing the name and style all it prints is a None under every alt code is printed.
I have been working on fixing this for days and have come up with no solutions. can anyone help me solve this?
I have solved my own answer using this solution:
thetable = soup5.find('div', class_='turbolink_scroller')
items = thetable.find_all('div', class_='inner-article')
for item in items:
alt = item.find('img')['alt']
name = item.h1.a.text
color = item.p.a.text
print(alt,' --- ', name, ' --- ',color)
I have a field called icon, which is a droplist sourced from folder in the content tree. I would like the list to not just show the text value(shown in the screen shot) but also to utilize an icon font and display what the actual icon would look like. Basically customizing the content editor's droplist for this field from:
<option value="gears">gears</option>
to
<option value="gears">gears <span class="my-icon-font-gears"></span></option>
Is there any documentation on how to modify the outputted html for a droplist, and to modify the content editor page to load another link, in this case a font-file.
I created a module on the marketplace that does something similar. You can have a look here. There is some documentation on there explaining how to use it.
The code is also on Git if you want to have a look.
Suggest you use the Droplink field type instead of the Droplist, since the value is stored by GUID and this will lead to less longer term problems if the link item is renamed or moved. In any case you need a custom field, inherit from Sitecore.Shell.Applications.ContentEditor.LookupEx (which is the DropLink field type) and override the DoRender() method with the custom markup you require.
It's not possible to embed a span tag since the option tag cannot contain other tags as it is invalid HTML. Adding it will cause the browser to strip it out. You can however set the class on the option itself and style that.
`<option value="gears" style="my-icon-font-gears">gears</option>`
Here is some sample code to achieve the field.
using System;
using System.Web.UI;
using Sitecore;
using Sitecore.Data.Items;
using Sitecore.Diagnostics;
using Sitecore.Globalization;
namespace MyProject.CMS.Custom.Controls
{
public class StyledLookupEx : Sitecore.Shell.Applications.ContentEditor.LookupEx
{
private string _styleClassField;
private string StyleClassField
{
get
{
if (String.IsNullOrEmpty(_styleClassField))
_styleClassField = StringUtil.ExtractParameter("StyleClassField", this.Source).Trim();
return _styleClassField;
}
}
// This method is copied pasted from the base class apart from thhe single lined marked below
protected override void DoRender(HtmlTextWriter output)
{
Assert.ArgumentNotNull(output, "output");
Item[] items = this.GetItems(Sitecore.Context.ContentDatabase.GetItem(this.ItemID, Language.Parse(this.ItemLanguage)));
output.Write("<select" + this.GetControlAttributes() + ">");
output.Write("<option value=\"\"></option>");
bool flag1 = false;
foreach (Item obj in items)
{
string itemHeader = this.GetItemHeader(obj);
bool flag2 = this.IsSelected(obj);
if (flag2)
flag1 = true;
/* Option markup modified with class added */
output.Write("<option value=\"" + this.GetItemValue(obj) + "\"" + (flag2 ? " selected=\"selected\"" : string.Empty) + " class=\"" + obj[StyleClassField] + "\" >" + itemHeader + "</option>");
}
bool flag3 = !string.IsNullOrEmpty(this.Value) && !flag1;
if (flag3)
{
output.Write("<optgroup label=\"" + Translate.Text("Value not in the selection list.") + "\">");
output.Write("<option value=\"" + this.Value + "\" selected=\"selected\">" + this.Value + "</option>");
output.Write("</optgroup>");
}
output.Write("</select>");
if (!flag3)
return;
output.Write("<div style=\"color:#999999;padding:2px 0px 0px 0px\">{0}</div>", Translate.Text("The field contains a value that is not in the selection list."));
}
}
}
This field adds a custom properties to allow you to specify the linked field to use for the style class. The assumption is that you have another single line text field on the linked item to specify the CSS class.
Usage: Set the source property of the field in the following format:
Datasource={path-or-guid-to-options}&StyleClassField={fieldname}
e.g. Datasource=/sitecore/content/lookup/iconfonts&StyleClassField=IconClassName
To use this new field compile the above code in to project, switch over to the core database and then create a new field type – you can duplicate the existing Droplink field located in /sitecore/system/Field types/Link Types/Droplink. Delete the existing Control field and instead set the ASSEMBLY and CLASS fields to point to your implementation.
You also need to load a custom CSS stylesheet with the style defintions into the Content Editor, which you can achieve that by following this blog post.
I have a Template in TYPO3 that i want to use to some pages, in here i have a DIV.
Is it possible depending on the page UID, to changes the DIV ID. its the only div thats changes content/image, and im looking to put this DIV inside my main.html template.
so if
UID = 2 <div id="topbanner_about"></div>
UID = 3 <div id="topbanner_drills"></div>
and so on....
Can i do this, and can i do it in TS (Typo Script) or how can i do this, so i dont need to make 5 templates.
You can do this by inserting a marker in your template. It would look somehow like this:
In the template:
[...]
<div id="topbanner_###ID_SUFFIX###"></div>
[...]
In the TypoScript, where the template is inserted:
10 = TEMPLATE
10 {
template = FILE
template.file = fileadmin/main.html
marks {
ID_SUFFIX = TEXT
ID_SUFFIX {
insertData = 1
# This makes sure that the output is valid and prevents XSS attacks
htmlSpecialChars = 1
value = {page:uid} # Use this to insert the page ID or
value = {page:subtitle} # Use to insert subtitle of page
... # Same works for other fields of the page record.
}
}
}
If the fields provided by default pages are not enough, you could add another field to the page records. The best way to do that would be to build an extension that does it.
I found a better solution, if i use ressource and add an image and on the next page do the same just with another images, then in my main TS i add this code.
lib.imageElement = FILES
lib.imageElement {
references {
data = levelmedia:-1,slide
listNum = 0
}
renderObj = COA
renderObj {
10 = IMAGE
10 {
file.import.data = file:current:originalUid
altText.data = file:current:title
}
}
}
It do the trick, then it shows a diffrent image in the top/header on every page.
But thx...