Error Creating "Large" mediawiki Post - mediawiki

I recently installed the latest version of mediawiki, and it's more or less running fine. However, whenever I try and post what I might consider a "large" entry, I get an error that says I cannot write to index.php, and so the post fails. I have looked though a lot of the documentation, including the variables settings, and cannot seem to nail down the issue or solution. Is it possible that some of the characters in the post are preventing the post? Or, is there a limit to the amount of text content (characters or total size)? Any help would be greatly appreciated!
Mark

For starters, check that $wgMaxArticleSize is greater than what you are trying to post. Even in this case, though, you should get an error message, not an outright failure. The content of the post is unlikely to cause problems, MediaWiki is UTF-8 safe.
Run through the checklist here as well: http://www.mediawiki.org/wiki/Manual:Errors_and_symptoms

Have you tried writing the text in a text editor and then pasting it into mediawiki in smaller chunks, saving the page then pasting another piece? As long as you don't want to do this too often this could be significantly easier than trying to solve the problem.

Related

Simple Regex Question - I've sat here for close to 4 hours trying to figure this out, and for most of you, it's simple, but not for me

I know what this site is for, and I've read people griping at others for asking a question like this, but I truly need help, and I've been working on this for a very long time.
I have a huge sheet of passwords and usernames in a .csv file, which I'm trying to format so it can be read by my Google profile (through Google Password Manager). I've tried all different kinds of ways, but Google is picky, and I've had a hell of a time just getting it to register anything except for "failed." I've searched through so many damn forum posts on this site and others. I know I'm almost there, as I finally got an error message, but I just want this to be done, so I finally figured I'd ask all you smart people for help. I've put forth the effort, and I just can't figure it out, just as I can't figure out how to convert any of the examples I find on this site to what I am needing. I've tried, many, many times.
I've used a regex tester to crawl and scrape my way to this:
/https:\/\/w{3}\.(...+.*\.com|net),?/g
I'm working on this with Notepad++, so it may be the wrong format. I've been using regex for a while, not really understanding it, which figuring this out above, has helped my understanding a lot.
Below is an edited version of how my document looks, which illustrates most of the issues I am dealing with. I imported it into an excel program, to get all the columns correct and to remove excess (which is how I finally got Google to register an error message).
In Notepad++, I fixed all of the "https://" and all of the variations I had. But for the life of me the hardest thing has been to figure out how to get the "www." on the remaining lines. My code above seems close, but it combines the URL address, with the email address when they are on the same line only separated by a comma, (which combines them into one group) and for the life of me I cannot figure out how to fix it. So, please help.
The top line is how it's organized, except it's in the order of URL, username, and password.
username,password,url,
https://www.nvidia.com,myemail#mail.com,password#1,
https://www.google.com,myemail#mail.com,password#2,
https://www.firefox.com,username,password#3,
https://na.alienwarearena.com,myemail#mail.com,password#4,
https://www.pinterest.cl,myemail#mail.com,password#5,
https://www.cplusplus.com,Username,password#6,
https://myaccount.google.com,myemail#gmail.com,password#7,
https://twitter.com,username,password#8,
Please, just a simple example of what I'm doing incorrectly, and if you feel like it, a simple explanation, as this has given me a massive headache. But any help is very much appreciated.
I might be over-simplifying this a bit much, but sounds like you're just wanting to re-order the columns on the file. Which excel would be the better option (you can just cut/paste the columns around)
But if you want to do this via regex, here's the simplest regex I can think of:
^(http[^,]+),([^,]+),([^,]+),?
Demo
I'm making the assumption that a , will never appear in your data. If it does, then the regex will get much more complex.
^ - Forces the regex to start of line
(http[^,]+), - Matches url. Ensures it starts with http, just so it skips the header. Ends with a comma
([^,]+), - grabs whatever is between the previous , and the next (which should be the username
([^,]+),? - grabs the last part, which is the password
Each one of the () is called a group. When you do a replacement, you can reference these groups. They're numbered left to right starting at 1.
This allows you to do a find/replace in notepad++. The replacement line will look something like this:
$2,$3,$1
I'm using sublime and not notepad++. So the replacement might be something like \2,\3,\1 instead. Can't quite remember how notepad++ works on the replacement.
Best of luck.

What is the general consensus on user-error correction for web apps?

I'm building a RoR site, and today I get the pagination done. Upon showing it to my coworker, his first question is "what happens if you set the querystring to "?page=-1". It died with a runtime exception (error 500). He suggested that that should definitely be fixed before this site goes anywhere near live.
I happen to disagree with him (hear me out). Now, I've been in the web dev business for all of four months, so I very well could be wrong. But I would think that this isn't a big deal. I would think that, so long as said errors do not constitute a security risk, things like this shouldn't be a priority. The only way to cause this error is if you manually edit the query string, and, well, garbage in garbage out. If you're smart enough to know that you even can edit the querystring, you should be smart enough to not give it a negative number.
What is the general consensus on things like this? Do you completely idiot proof the site, so that no matter what the query string is, you never generate an error? Do you let things slide so long as it works the way it's supposed to (and doesn't expose a security risk)? Somewhere in the middle?
EDIT: Somehow my question didn't really come out completely as I intended it. The crux of my question was, where to draw the line between proactively correcting for things versus not doing them. If there's invalid input in the get string, for instance, would it be better practice to display a tasteful error as suggested in the posted replies, or to try to figure out what the user was doing, and do that. Or, as a more concrete example: If a user sets page=-1 in the get string, would it be better to silently assume they meant page=0, or to display some kind of tasteful error page saying somethign like "invalid page specified"?
You should be error checking anything that comes in from the query string. If you get an invalid page number, you should have an error message that's a little more graceful than the Error 500 page. Maybe a sorry, bad request. Try this: <possible suggestions>. It's just plain sloppy and unprofessional to knowingly and deliberately leave an easily accessible error like that on a live site.
You say you're new to web apps, but if your previous dev experience was other GUI apps being used by the "general public" (non-developers, non-techies), would it have been OK to have stack traces thrown into the user's face as the app falls apart around them? In my experience, this is never really acceptable.
You make some good points, but an incorrect query string can have many reasons. For example, a link to a record that has since been deleted. Or a Google result pointing to a page that doesn't exist in the current result set any more.
In these cases, you should show the user something a bit more verbose than a 500 error.
If you have an error-page that looks nice, and gives a polite message, I'd say it's fine. Though I might consider responding with a 404 instead. Garbage in should preferably not produce an error.
I don't think a 500 error page is very meaningful to your average user. At least tell him something is wrong with your page and guide him back on the right track by providing a link to get back to your site.
Sometimes I redirect users to a page that is likely to what he wanted. So when a query goes below zero and this is not permitted, redirect your user to ?page=0 and maybe display a message on top of that page. I think you should prefer this method because it is a better approach in terms of user experience to not use modal windows.
I agree with you, that error messages are necessary and useful but you should try to differentiate, e.g. give an 404 where the user requested a page that doesn't exist.
It varies from project to project. How many users do you expect? If it's below 10K visitors a day it might not be so bad. What percentage of users do you expect will hit the problem? I don't expect that very many but you would know best.
The goal should be to ship the product and roll out improvements regularly. Hopefully the product is sound overall.
Regarding a solution, if its a page not found, a 4xx error should be thrown instead of a 5xx. 5xx errors typically warrant a deeper look and while it's hard to write an air-tight application directly on launch, you should try to have a generic handler for 4xx and 5xx errors.
In the PCI game (Credit Card Verification / Validation) the rule is validate everything and allow for no idiots. So the answer depends on your application.

html tags in mysql value fields, is that right?

I was looking at status.net source code and mysql tables, and they seem to have html tags in their mysql field values. I was just wondering is that right thing to do or is it going to cause some problems in the future?
It depends on where it will be used. It isn't an issue if the intention is to have arbitrary html there. Especially not if the developers and admins are the only ones who can put it in there.
On the other hand, if for example a user of your system managed to put it there and also used the opportunity to put in a script-tag and a reference to their own scripts you might very well be in big trouble (if you don't escape the strings before you render them on your site).
i would like to take the opportunity to quote the favorite sentence of my old it-teacher:
Oh, it depends.
without knowing where and why the tags are stored in a db, it's hard to say if this is a good ideo...
A database can be used for storing just like the filesystem. So in most cases it's not a problem if you store HTML.
Lets take the articles of an WordPress blog as an example. It's definitely OK to store them in the database.
Short answer: Depends
Long answer: This practice is quite common and often unavoidable.
Think about blog posts: the HTML code that is in it marks up the content cannot be separated from the content itself.
Possible issues:
Javascript injection. If I can inject malicious HTML code into your database, I could create links to malware or javascript commands that help install viruses or trojans.
There's always a trade-off.

Markdown or HTML

I have a requirement for users to create, modify and delete their own articles. I plan on using the WMD editor that SO uses to create the articles.
From what I can gather SO stores the markdown and the HTML. Why does it do this - what is the benefit?
I can't decide whether to store the markdown, HTML or both. If I store both which one do I retrieve and convert to display to the user.
UPDATE:
Ok, I think from the answers so far, i should be storing both the markdown and HTML. That seems cool. I have also been reading a blog post from Jeff regarding XSS exploits. Because the WMD editor allows you to input any HTML this could cause me some headaches.
The blog post in question is here. I am guessing that I will have to follow the same approach as SO - and sanitize the input on the server side.
Is the sanitize code that SO uses available as Open Source or will I have to start this from scratch?
Any help would be much appreciated.
Thanks
Storing both is extremely useful/helpful in terms of performance and compatiblity (and eventually also social control).
If you store only Markdown (or whatever non-HTML markup), then there's a performance cost by parsing it into HTML flavor everytime. This is not always noticeably cheap.
If you store only HTML, then you'll risk that bugs are silently creeping in the generated HTML. This would lead to lot of maintenance and bugfixing headache. You'll also lose social control because you don't know anymore what the user has actually filled in. You'd for example as being an admin also like to know which users are trying to do XSS using <script> and so on. Also, the enduser won't be able to edit the data in Markdown format. You'd need to convert it back from HTML.
To update the HTML on every change of Markdown version, you just add one extra field representing the Markdown version being used for generating the HTML output. Whenever this has been changed in the server side at the moment you retrieve the row, re-parse the data using the new version and update the row in the DB. This is only an one-time extra cost.
By storing both you only have to process the markdown once (when it is posted). You would then retrieve the HTML so that you can load your pages faster.
If you only stored one, you'd forever have to recreate the other for either the display view or the edit view.

How much information in error messages to regular users?

I'm want to get an idea how I should handle end-user visible error messages in my web application.
How much information do you give in
error messages?
Do you redirect all errors,
regardless of type, to a common error
page, or do you have a small set of pages (404, 403, all others)?
Do you give error codes that the user
could reference/give to you that only
you understand?
Do you give any technical details?
As I stated, my users are non-technical regular Joe folks.
Display a nice error to the user, Log a detailed error for yourself.
I try to do the following:
make sure you never run the risk of passwords or connection strings appearing in error messages.
Make sure the errors get logged to a persistable medium. I prefer a database so that I can query by time range and other paramaters. I don't log 404s.
If the application is an internal app that does not need to be pretty, it may be ok to have the error info on the page. Even if you are logging this stuff, it is nice to be able to have your users email you a screen shot or copy/paste.
If 3 seems distasteful, have some error info written as HTML comments. Then you can at least see the info by viewing source.
In general I try to give users as much information needed to help them solve their problems themselves. For example, in the case of a 404, you might want to let them know to double check that the URL they are looking for is correct.
They obviously wont need stack traces, and the like, but it will make sense for you to log that level of detail somewhere for diagnostics and debugging.
for fatal errors, keep them short, so they can repeat them over the phone or e-mail: can't connect to database, etc.
for non-fatal errors, describe the condition fully: Error, can not save the invoice without an invoice date.
I also always log everything, the parameters to the function and any internal values that may be of use.
I try to show users enough information that they know it's an issue they need to tell someone about, but try to avoid showing them so much it scares them!
If possible the error message should tell them what just failed e.g did their save just fail, or has it saved fine, but the refresh of the screen afterwards had an issue. Extra error information (e.g. stack traces) should be logged somewhere where you can get at it without the user having to send it to you.
When it comes to displaying errors for the end user, I find it a good practise to display a errorcode (so me and administrators know what error it is) and a typical "ops something went wrong, please contact an administrator"
It can be good to give a bit more information for common errors that could be the cause of the users actions. But usually too much information can scare or confuse the user.
None, just show give a reference number so user can give it to you, and you can check the details from the application logs (obviously you need to keep a copy of error logs).
Your web application's error messages should always (at
least) be the answers these 3 questions (in that order):
What happened?
Why did it happen?
What can be done about it?
I have used it for many years, originally from Apple's
"Human Interface Guidelines: The Apple Desktop Interface". Newer version.
Microsoft has similar guidelines.
This also makes it easy to write them - this structured
approach makes it faster to write them as one can just
answer the questions.
The error messages should also be specific. Any information
that the web application know about and that the user may
need to resolve the problem should be in the error message.
The (infamous) error message "An error happend." is simply
not acceptable.
Optional: more technical information that the user may not
understand can be placed at the end. But it should be marked
as such.