Filenames with forward slash - box-api

I was wondering how you handled filenames with forward slashes in them? Currently(and I have no control or way to change it) our users can save files with slashes in them. When we test this out, the file on your system gets renamed to the whatever is after the last slash.
For example a file named Test10/09/2012.ppt would be renamed 2012.ppt.
What I would like to know is how do you guys handle incoming filename strings, and how we can encode them to have you accept a filenamed with slashes.

The forced renaming is actually not intended behavior. It's a bug that we're currently working on fixing. Box has a set of characters that are forbidden (\, /, ", :, <, >, |, *, ?, .), but we should alternatively be returning an error when you send such a character in the name of a file through the API.
We should have this fixed soon.

Related

TCL: file normalize gives unexpected output

I have the following line of code:
file normalize [string map {\\ /} $file]
The string map operation is to make the line work for paths containing backslashes instead of forward (as is the case in Windows)
For some values of $file (let's say it's "/path/to/my/file") I get output similar to:
/path/to/"/path/to/my/file/"
This doesn't happen for all paths but I'm unable to figure out what causes it. There are no symbolic links in the path.
Is there something I'm doing wrong, or is there an alternative to file normalize that I could try?
my tcl version is 8.5
UPDATE:
On further investigation I see that the string map is not making any difference. The output of file normalize itself is coming with that extra text before the desired text. Also, the extra text seems to be from a previous run of the code.
UPDATE 2: It was because of the quotation marks in the input to file normalize
Most likely the path has backslashes where it shouldn't have them.
% file normalize {"/path/to/some/file"}
/path/to/"/path/to/some/file"
% file normalize \"/path/to/some/file\"
/path/to/"/path/to/some/file"
Perhaps some pathname handling code escaped special characters for some reason and left the path in a mangled state.
I would try to keep the pathname pristine and when it needs to be changed for display or other processing, make a copy of it first.

Why are uri chars (or at least spaces) being dropped on an html file upload?

I have a file upload form and would like to use the filename on the server, however I notice that when I upload it the spaces are dropped. On the client/browser I can do something like this in an event called after the input type='file' element has changed:
function process_svg (e) {
var files = e.target.files || e.originalEvent.dataTransfer.files;
console.log(files[0].filename);
And if I upload a file with the name 'some file - type.ext' 'some file - type.ext' will be printed in the console. On the server (running bottle) however if I run:
#route('/some_route')
def some_route():
print(request.files['form_name_attr'].filename)
I get 'somefile-type.ext.' I am guessing this has to do with uri escaping (or lack there of), but since you cannot change a file preupload how do you get around this and preserve it? Strangely I cannot find mention of this on google, in part I have had trouble thinking of appropriate search terms, but I'm also aware that this may not actually be native behaviour, but a bug elsewhere in my code.
I do not think that is the case as I've issued these console.log and print statements at the end (right before the upload) and beginning (right when the server starts processing the request) and do not believe I really have any code to touch it in between, however if that is the case please let me know as I could be looking in the wrong direction.
You want raw_filename, not filename.
(Note that it may contain unsafe characters.)
#route('/some_route', method='POST')
def some_route():
print(request.files['form_name_attr'].filename) # "cleaned" file name
print(request.files['form_name_attr'].raw_filename) # unmodified file name
Found this in the source code for FileUpload.filename:
Only ASCII letters, digits, dashes, underscores and dots are allowed
in the final filename. Accents are removed, if possible. Whitespace is
replaced by a single dash. Leading or tailing dots or dashes are
removed. The filename is limited to 255 characters.

Find all JPG pathnames in HTML files and convert them into all lowercase

I have a very basic understanding of regexp. I have searched and searched the internet for this.....
I have a linux server which only likes lowercase file names and I stupidly have image filenames in title case!
I want to batch find all jpg pathnames in my HTML files and convert them into all lowercase with Regex.
My-File-Name1.jpg needs to be my-file-name1.jpg
I think I need a regex expression to find them all, and another that replaces them converted into lowercase.
Any help?
EDIT
#Sniffer gave me the regex that gets the filename path.
In notepad ++ find and replace using regex. You can use
([\w/-]+)\.jpe?g to find image pathnames and
: \L\1 to change to lowercase and using replace
\U\2 to change to higher case using replace
I found the lower/uppercase regex here http://sourceforge.net/p/notepad-plus/discussion/331754/thread/ecb11904/
Usually I would say use an HTML parser which is the best tool for the job here but since you only want jpg files then you might be able to find them all by using the following:
([\w/-]+)\.jpe?g
^
|
|
As you can see I have added the forward slash / and the dash - to the
character class, WARNING: the dash - should always be the last character in the
class, keep that in mind if you have more special characters.
You will have to match this globally in your file.
As for the conversion, it can't be done using a regex. You will have to call an API that converts a string to lower case, and use it on the captured groub $1.

Encoded URL does not work

I am confused about encoded URLs.
For example, when I write my browser:
stackoverflow.com/questions
I can successfully view the page.
However, when I write:
stackoverflow.com%2Fquestions
I am unable to view.
Since %2F means "/", I want to understand why this does not work properly.
The reason why I want to find out is that I am getting an encoded URL and I don't know how I can decode that URL right after I receive it in order not to have an error page.
The / is one of the percent-encoding reserved characters. URLs use percent-encoding reserved characters for defining their syntax. Only when these characters are not used in their special role inside a URL, they need to be encoded.
Percent-encoding reserved characters:
! * ' ( ) ; : # & = + $ , / ? # [ ]
%21 %2A %27 %28 %29 %3B %3A %40 %26 %3D %2B %24 %2C %2F %3F %23 %5B %5D
%2F is a URL escaped /. It means, treat / as a character, not a directory separator.
In essence, it is looking for a domain stackoverflow.com/questions, not the domain stackoverflow.com with the path questions.
%2F is what you write when you want to include a / in a parameter but don't want the browser to navigate to a different directory/route.
So if you had the file path 'root/subdirectory' passed as a querystring parameter, you would want to encode that like:
http://www.testurl.com/page.php?path=root%2Fsubdirectory
rather than
http://www.testurl.com/page.php?path=root/subdirectory
URL encoding is used e.g. for encoding a string URL parameter coming from an HTML form, which contains special characters, like '/'. Writing "stackoverflow.com%2Fquestions" is wrong, in this case the '/' is part of the URL itself, and must not be encoded.
%2F is an escaped character entity - meaning it would be included in a name, rather than the character /, which denotes directory hierarchy, as specified in RFC 1630, page 8.

What are the Legal / Allowed characters for web server file names on?

What characters are allowed in filenames for HTML files on ALL servers (*nix, Windows, etc.) ?
I'm looking for the "lowest common denominator" that will work on all servers.
USE: I'm naming a file to be served up publicly (Mysite.com/My-Page.htm)
E.g., space? _ - , etc.
E.g., can I have File-Name.htm, File_Name.htm File Name.htm?
Obviously, this needs to work with all servers and browsers. (IIRC, the name is limited by the server not the browser, but I could be wrong).
What characters are allowed in filenames for HTML files on servers?
That totally depends on the server. HTTP itself allows any character at all, including control characters and non-ASCII characters, as long as they are suitably %-encoded when requested in a URL.
On a Unix server you cannot use ‘/’ or the zero byte. (If you could use them, they'd appear in the URL as ‘%2F’ and ‘%00’ respectively.) You also can't have the specific filenames ‘.’ or ‘..’, or the empty string.
On a Windows server you have all the limitations of a Unix server, plus you also can't use any of \/:*?"<>| or control characters 1-31 and you can't have leading or trailing dot or spaces, and you'll have difficulty using any of the legacy device filenames (CON, PRN, COM1 and many more).
This is nothing to do with HTTP; just how filenames work on Windows, which is complicated.
can I have File-Name.htm, File_Name.htm File Name.htm?
Certainly. But in the last case you should link to it by URL-encoding the space:
thingy
Browsers will usually let you get away with leaving the space in, but it's not really valid. If you want to avoid having to think about URL-escaping, HTML-escaping and case-sensitive issues, stick to a–z, 0–9 and underscore.
If you don't want your filenames to be encoded by the server, you should avoid reserved characters: $&+,/:;=?# and unsafe characters: space, quotation marks, <>#%{}|\^~[]`
But as the previous answers stated, the web servers should cope with whatever you want to use by encoding the chars.
Be sure to eliminate
* . " / \ [ ] : ; | = ,
which are never allowed, due to inconsistencies in file naming conventions standard practice is to use a-z and 0-9 and the underscore character. Space is needful for most users but if you can get away from using it there are parsing issues that improve reliability, you can read rfc's on mime ( multi-part internet mail extensions ) to get a taste of what is involved.
No matter what you do, something somewhere is likely to make life difficult - so much so that I now use cryptographic methods to generate random a-z lowercase strings and use those as filenames, embedding the useful info in the file source code.
Avoid the ampersand at any cost, ...
I would say a good rule of thumb for filenames for HTML files on ALL servers can be any combination of alphabet (lowercase preferred) and number characters (1 though 9), plus the underline(_), minus(-) or plus(+) characters but no spaces. Also, end the filename with dot html (e.g. filename.html). I personally avoid using underline and plus characters.
There isn't such a thing as an html filename.
Certain characters have to be encoded in html (eg if used in links) but the allowed characters in the document names will depend on the web server (and possibly the file system on the server).
Any file name will be URL-encoded so you should be fine. And for the record all three of your file names would work just fine.