How can I include a single section from one wikipage to another as a reference?
I use a completely new MediaWiki (1.16.5) installation without any extensions. The syntax I use to solve the task is the following: {{:otherPage|sectionName}} I have also used {{:otherPage#sectionName}} but it doesn't work for me too.
Thanks in advance!
Not quite what you want, but there is the Labeled Section Transclusion extension. AFAIK there is no way to include just any section.
Use
Wrap the text you want transcluded on the other page in the tag. It will not affect the display.
If there's just one section that you want, then <onlyinclude></onlyinclude> tags will produce the desired effect. Wrap them around the section in question and then transclude the whole page normally ({{:otherpage}}). Only the section in <onlyinclude> tags will be transcluded.
Related
A client is using Cloudfare to deliver his web content but their filters are removing tags that are unknown to them. Unfortunately that breaks the service that we provide.
For example, before the DOCTYPE tag we use a tag of our own like <!-- Example --> which tells our server filter to encrypt the HTML that follows. But Cloudfare filters are removing that tag, and thus breaking the service.
Do they have a whitelist or something that can be used to prevent the corruption?
You Just don't minify the HTML,CSS And JavaScript.
just skip them while you adding the domain. it works for me.
So you want to preserve html tags similar to "comments" that don't normally appear on a page?
Page speed modifiers strip such tags because they are not important to a page and thus are not necessary. By removing all comments a few Bytes can be removed from the download of a page. On most pages that will make little difference, but some websites, especially those running a CMS with a multitude of plugins, can contain a lot of comments.
So it is page speed enhancement that you need to disable to preserve such tags.
Cloudfare provides a Control Panel to make adjustments. In the top menu, click on "Rules" to Create a Page Rule for your website. Then enter the URL of the page that you want to exempt. Enclosing the URL in asterisks [*] will cater for all similar urls, like example.com/special. Then pick a setting by selecting "Disable Performance".
This will create a rule to disable pagespeed enhancement of all pages that include "example.com/special" in their URL.
I really dislike the non-semantic usage of <big> on our wiki, and would like to prevent it. Flat-out commands didn't work so far, so I'm switching to doing it by code...
AFAIK, there's no configuration switch to control the blacklist/whitelist of HTML tags. Looking at the source code, it seems like the data is coming from Sanitizer::getRecognizedTagData(), while the work itself is done in Sanitizer::removeHTMLtags(). However, I do not see a way to add to the list myself, except using one of the hooks before or after (InternalParseBeforeSanitize, InternalParseBeforeLinks) and either:
Call Sanitizer::removeHTMLtags() again myself, with the additional tag to blacklist as a parameter
Do a search myself on the text to remove all the <big> tags.
The first one is a duplication of work, the second one is a duplication of code. Is there a better way? What would you recommend?
No coding is needed: just install AbuseFilter and create a rule that warns or disallows on save of pages containing these tags.
I have two pages that have related topics, and share a significant amount of data & text between the two pages.
Since these two pages are both linked to from the same location, side by side, I am wondering if I can use an argument with the link to change the CSS being applied and have ALL the data on one page.
The original setup:
domain.com/subdir/one.page.php
domain.com/subdir/two.page.php
Can I use this instead?
domain.com/subdir/full.page.php?one
domain.com/subdir/full.page.php?two
And with that, have the page selectively use the CSS visible attribute to change what is actually displayed on the screen?
No, the CSS cannot be affected by the URL.
Instead, you should use a server-side programming language to only display the appropriate content.
You could use PHP include statements to prevent duplication.
Having the HTML of a webpage, what would be the easiest strategy to get the text that's visible on the correspondent page? I have thought of getting everything that's between the <a>..</a> and <p>...</p> but that is not working that well.
Keep in mind as that this is for a school project, I am not allowed to use any kind of external library (the idea is to have to do the parsing myself). Also, this will be implemented as the HTML of the page is downloaded, that is, I can't assume I already have the whole HTML page downloaded. It has to be showing up the extracted visible words as the HTML is being downloaded.
Also, it doesn't have to work for ALL the cases, just to be satisfatory most of the times.
I am not allowed to use any kind of external library
This is a poor requirement for a ‘software architecture’ course. Parsing HTML is extremely difficult to do correctly—certainly way outside the bounds of a course exercise. Any naïve approach you come up involving regex hacks is going to fall over badly on common web pages.
The software-architecturally correct thing to do here is use an external library that has already solved the problem of parsing HTML (such as, for .NET, the HTML Agility Pack), and then iterate over the document objects it generates looking for text nodes that aren't in ‘invisible’ elements like <script>.
If the task of grabbing data from web pages is of your own choosing, to demonstrate some other principle, then I would advise picking a different challenge, one you can usefully solve. For example, just changing the input from HTML to XML might allow you to use the built-in XML parser.
Literally all the text that is visible sounds like a big ask for a school project, as it would depend not only on the HTML itself, but also any in-page or external styling. One solution would be to simply strip the HTML tags from the input, though that wouldn't strictly meet your requirements as you have stated them.
Assuming that near enough is good enough, you could make a first pass to strip out the content of entire elements which you know won't be visible (such as script, style), and a second pass to remove the remaining tags themselves.
i'd consider writing regex to remove all html tags and you should be left with your desired text. This can be done in Javascript and doesn't require anything special.
I know this is not exactly what you asked for, but it can be done using Regular Expressions:
//javascript code
//should (could) work in C# (needs escaping for quotes) :
h = h.replace(/<(?:"[^"]*"|'[^']*'|[^'">])*>/g,'');
This RegExp will remove HTML tags, notice however that you first need to remove script,link,style,... tags.
If you decide to go this way, I can help you with the regular expressions needed.
HTML 5 includes a detailed description of how to build a parser. It is probably more complicated then you are looking for, but it is the recommended way.
You'll need to parse every DOM element for text, and then detect whether that DOM element is visible (el.style.display == 'block' or 'inline'), and then you'll need to detect whether that element is positioned in such a manner that it isn't outside of the viewable area of the page. Then you'll need to detect the z-index of each element and the background of each element in order to detect if any overlapping is hiding some text.
Basically, this is impossible to do within a month's time.
I am trying to add an html link to a website but the website strips out the html:
When I view the source I get this:
<a href = "http://www.soandso.com">http://www.soandso.com/</a>
instead of what I actually want which is this:
www.soandso.com
Is there an html command to bypass the filter?
Almost certainly not.
Most sites quite rightly don't just let users inject arbitrary HTML. That's the source of XSS (cross site scripting) vulnerabilities.
If the site strips (or escapes) tags, just put in www.example.com and that will have to do.
No. The filters are there for a reason; to prevent you from adding your own HTML to the website. There is no standard for how the filters work, but they will generally either escape all HTML that isn't allowed, or strip out HTML that isn't allowed. There is no general purpose way to get around the filter.
First check if the site uses any sort of special markup. For instance, Stack Overflow supports a variation of Markdown. Other sites support Textile, or BBCode. If that is the case, check the associated reference on how to include a link. If none of those are the case, then just use the URL without the <a> element wrapper.