Auto Generate TOC for Article with Less than Four Headings - mediawiki

In the documentation https://www.mediawiki.org/wiki/Help:Formatting it states that an article will have a TOC generated if there are 4 or more headings. We would like a TOC to be generated even if there is less than 4 headings. Is this value customisable ?
Our workaround at present is to add FORCETOC to every page, but we would prefer it if this wasn't needed.
Thanks in Advance

You need to use a parser hook and change the parser's TOC settings. It's not elegant but doable. E.g. the ForceTocOnEveryPage extension does something like this:
$wgHooks['InternalParseBeforeLinks'][] = function ( &$parser, &$text ) {
return $text . '__FORCETOC__';
};

Changing the ToC heading count is not (currently) possible, as the 4 is hardcoded. It wouldn't be a huge change to make it configurable though.

Related

Can I replace placeholder text in a rendered HTML page dynamically?

I wish I could think of a better way to word my question, but basically here is what I want to do: in an HTML file, I would like to fill the body with a specific string multiple times. For example:
<div>
This is some content. XXX
</div>
<div>
This is some more content. XXX
</div>
<div>
This is even more content. XXX
</div>
Then, I would like some script to go through the page, and replace every instance of the string (in this case XXX but it could be anything) with an incrementing number, so, like:
<div>
This is some content. 001
</div>
<div>
This is some more content. 002
</div>
<div>
This is even more content. 003
</div>
This is a simple example of course, and you might be thinking well that's dumb, just type the numbers. But obviously this is simpler than what I'm intending to do, and right now what I'm building, the order of all the content has not been decided yet, so things could move up or down in their placement on the page, but I'd like all the numbers to be sequential in order of their appearance on the page.
So, final thoughts: I am super sure there's a way better way to do this than I'm even thinking of, methodology wise (i.e., make an XML table or something). I am definitely open to ANY suggestion on how to do this, but I am kind of an idiot so if your answer is "pff this would be super easy in Ruby just use Ruby", that's not gonna really get me where I need to be. Also if this has already been answered, it was hard to think of how to word the question to search for previous answers so I apologize in advance if I didn't find the pre-existing answer when I was searching.
You can easily do this with CSS counters, sample here:
CSS
ul {
counter-reset:list;
}
li:after {
counter-increment:list;
content: " (" counter(list) ")";
}
For some more advanced examples visit the MDN documentation page.
You could use PHP to achieve this. If you've had no experience with it, it does integrate with HTML easily. Basically you write your html as usual, but you name the file .php instead of .html. Then you insert php scripts as follows, for example: <p>I can count to <?php nextNumber(); ?></p>.
at the top of the page you should insert more script with a counter function:
<?php
$i = 1;
$places = 4;
function nextNumber() {
GLOBAL $i, $places;
print str_pad($i++,$places,'0',STR_PAD_LEFT);
}
?>
This may be better than CSS. It's not browser-dependant.
Change $places to the number of digits you'd like to have (for leading zeros)

How to generate hash from ~200k text/html that would match/compare to similar text?

I would like to make a sort of hash key out of a text (in my case html) that would match/compare to the hash of other similar text
ex of matching texts:
"2012/10/01 This is my webpage #1"+ 100k_of_same_text + random_words_1 + ..
"2012/10/02 This is my webpage #2"+ 100k_of_same_text + random_words_2 + ..
...
"2012/10/02 This is my webpage #2"+ 100k_of_same_text + random_words_3 + ..
So far I've thought of removing numbers and tags but that wold still leave the random words.
Is there anything out there that dose this?
I have root access to the server so I can add any UDF that is necesare and if needed I can do the processing in c or other languages.
The ideal would be a function like generateSimilarHash(text) and an other function compareSimilarHashes(hash1,hash2) that would return the procent of matching text.
Any function like compare(text1,text2) would not work as in my case as I have many pages to compare (~20 mil at the moment)
Any advice is welcomed!
UPDATE:
I'm refering to ahash function as it is described on wikipedia:
A hash function is any algorithm or subroutine that maps large data
sets of variable length to smaller data sets of a fixed length.
the fixed length part is not necessary in my case.
It sounds like you need to utilize a program like diff.
If you are just trying to compare text a hash is not the way to go because slight differences in input cause total and complete differnces in output. (Thus the reason why they are used to encode passwords, and secure text). Character difference programs are pretty complicated, unless you really are interested in how they work and are trying to write your own I would just use a solution like the one that is shown here using sdiff to get a percentage.
Percentage value with GNU Diff
You could use some sort of Levenshtein distance algoritm. this works for small pieces of text, but I'm rather sure that something similar can be applied to large chunks of text.
Ref: http://en.m.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance
I've found out that tag order in webpages can create a very distinctive pattern, that remains the same even if portions of text / css / script change. So I've made a string generated by the tag order (ex: html head meta title body div table tr td span bold... => "hhmtbdttsb...") and then I just do exact matches between these strings. I can even apply the Levenshtein distance algorithm and get accurate results.
If I didn't have html, I would have used the punctuation/end-lines for splitting, or something similar.

Good way to store formatted text in DB to output later

I write news for my website and format it like this:
[h1]News[h1]
[red]Happy New Year[/red]
[white]Happy New Year[/white]
The news are stored as is on the MySQL DB.
Then when it's called by my website, a function converts every code into HTML format.
[h1][/h1] = <h1></h1>
[red][/red] = <font color=red></font>
I'm not happy with this method for a long time, but now such codes are obsolet for HTML5.
Instead of using I should add it to CSS.
I'm very beginner with PHP, MySQL, CSS, HTML...really, but I'm trying and learning.
So, what I need is the best solution for this matter.
I was thinking to create a CSS rule like:
span.news-red { color=red }
span.news-white { color=white }
And then them into the code for red text, etc...
Is this an effective solution or just a palliative?
Thank you.
EDIT
I have this two functions to convert format of my text in order to be outputed for the visitor.
1st = Converts [white-text][/white-text] into
$string = preg_replace("/\[white-text\](\S+?)\[\/white-text\]/si","<font color=white>\\1</font>", $string);
2nd - Converts [url][/url] into
$string = preg_replace("/\[url\](\S+?)\[\/url\]/si","\\1", $string);
Problems:
WHITE-TEXT - It only changes the color of one word phrases.
URL - It works fine, but I would like to be able to write anything in the readable part of the URL.
In general, you want to have styles of text that are common. Give them descriptions as to why you are doing what you are doing. If I were you, I would name them something as to what they are in the db. Then let's say you decide that Red is just a horrible choice of colors. You could always change it to a different one very easily, just by editing the CSS.
Not knowing why you choose to make something red, I can't give you much of an answer, other than to try and use the css name that relates to why you chose red, rather than what you are doing in the first place.

Regex to extract text from inside an HTML tag

I know this has been asked at least a thousand times but I can't find a proper regex that will match a name in this string here:
<td><div id="topbarUserName">Donald</div></td>
I want to get the name 'Donald' and the regex that's the closest is >[a-zA-Z0-9]+ but the result is >Donald.
I'm coding in PureBasic (It's syntax is similar to that of Basic) and it uses the PCRE library for regular expressions.
Can anyone help?
Josh's pattern will work if you only make use of the numbered group, not the whole match. If you have to use the whole match, use something like (?<=>)(\w+?)(?=<)
Either way, regex is widely known to not be good for parsing HTML.
Explanation:
(?<=) is used to check if something appears before the current item.
\w+? will match any "word"-character, one or more times, but stop whenever the rest of the pattern matches something, for this situation the ? could have been left out.
(?=) is used to check if something appears after the current item.
Try this
It should capture anything that is a letter / number
>([\w]+)<
Also I'm not exactly sure what your project limitations are, but it would be much easier to do something like this
$('#topbarUserName').text();
in jQuery instead of using a regex.
>([a-zA-Z]+) should do the Trick. Remember to get the grouping right.
Why not doing it with plain old basic string-functions?
a.w = FindString(HTMLstring.s, "topbarUserName") + 16 ; 2 for "> and topbar...
If a > 0
b.w = FindString(HTMLstring, "<", a)
If b > 0
c.w = b - a
Donald.s = Mid(HTMLstring,a, c)
EndIf
EndIf
Debug Donald

How can I hide/remove/disable "forums views" in vbulletin?

anyone have an idea how to do this.
i need to get rid of forum views either by hide, delete, disable or any other way.
I assume you mean THREAD views in the text below:
Do a template search for $thread[views], and there should be a template called threadbit. If you want to quickly and easily obscure the views just delete $thread[views] and replace with or asterisks, or whatever you'd like.
If you want to remove the whole <td> it becomes more complicated. First you remove that <td>, and then in FORUMDISPLAY template you have to remove the <td> that contains $vbphrase[views] (do a search for it if you can't find it).
But I believe there may be some issue with removing that entire column, and any of the hardcoded colspan attributes among the templates. If so then you would have to reduce the colspan number by one. I'm not sure about the colspan part, it's been a long time since I edited the FORUMDISPLAY and threadbit templates.
Also, you will need to remove the Views from another location in the threadbit template:
title="<phrase 1="$thread[replycount]" 2="$thread[views]"
This shows up when you hover on top of the Last Post column. Just delete $thread[views] and it will show up blank.
i need 50 points to reply, sorry for keep using answer.
i was thinking of going 1 step futher and swapping the word hidden for a picture?
I used the word hidden just as a test to see if it would work which it does