I'm trying to create an s3 bucket through cloudformation. I tried using regex ^([0-9a-z.-]){3,63}$, but it also accepts the patterns "..." and "---" which are invalid according to new s3 naming conventions. (Ref: https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html) Please help?
Answer
The simplest and safest regex is:
(?!(^xn--|.+-s3alias$))^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$
It ensures that names work for all cases - including when you are using S3 Transfer Acceleration. Also, as it doesn't include any backslashes, it's easier to use in string contexts.
Alternative
If you need S3 bucket names that include dots (and you don't use S3 Transfer Acceleration), you can use this instead:
(?!(^((2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})\.){3}(2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})$|^xn--|.+-s3alias$))^[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$
Explanation
The Amazon S3 bucket naming rules as of 2022-05-14 are:
Bucket names must be between 3 (min) and 63 (max) characters long.
Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-).
Bucket names must begin and end with a letter or number.
Bucket names must not be formatted as an IP address (for example, 192.168.5.4).
Bucket names must not start with the prefix xn--.
Bucket names must not end with the suffix -s3alias.
Buckets used with Amazon S3 Transfer Acceleration can't have dots (.) in their names.
This regex matches all the rules (including rule 7):
(?!(^xn--|.+-s3alias$))^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$
The first group (?!(^xn--|-s3alias$)) is a negative lookahead that ensures that the name doesn't start with xn-- or end with -s3alias (satisfying rules 5 and 6).
The rest of the expression ^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$ ensures that:
the name starts with a lowercase letter or number (^[a-z0-9]) and ends with a lowercase letter or number ([a-z0-9]$) (rule 3).
the rest of the name consists of 1 to 61 lowercase letters, numbers or hyphens ([a-z0-9-]{1,61}) (rule 2).
the entire expression matches names from 3 to 63 characters in length (rule 1).
Lastly, we don't need to worry about rule 4 (which forbids names that look like IP addresses) because rule 7 implicitly covers this by forbidding dots in names.
If you do not use Amazon S3 Transfer Acceleration and want to permit more complex bucket names, then you can use this more complicated regular expression:
(?!(^((2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})\.){3}(2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})$|^xn--|.+-s3alias$))^[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$
The main change is the addition of the expression to match IPv4 addresses (while the spec simply says that bucket names must not be formatted as IP addresses, as IPv6 addresses contain colons, they are already forbidden by rule 2.)
The following regex fulfils the AWS specifications provided the fact you don't want to allow . in the bucket name (which is a recommendation, otherwise Transfer Acceleration can't be enabled):
^((?!xn--)(?!.*-s3alias$)[a-z0-9][a-z0-9-]{1,61}[a-z0-9])$
This one is good because it allows to be incorporated in more complex checks simply replacing ^ and $ with other strings, thus allowing for ARN checks and so on.
EDIT:
added -s3alias exclusion as per the comment by #ryanjdillon
I've adapted Zak's answer a little bit. I found it was a little too complicated and threw out valid domain names. Here's the new regex (available with tests on regex101.com**):
(?!^(\d{1,3}\.){3}\d{1,3}$)(^[a-z0-9]([a-z0-9-]*(\.[a-z0-9])?)*$)
The first part is the negative lookahead (?!^(\d{1,3}\.){3}\d{1,3}$), which only matches valid IP addresses. Basically, we try to match 1-3 numbers followed by a period 3 times (\d{1,3}\.){3}) followed by 1-3 numbers (\d{1,3}).
The second part says that the name must start with a lowercase letter or a number (^[a-z0-9]) followed by lowercase letters, numbers, or hyphens repeated 0 to many times ([a-z0-9-]*). If there is a period, it must be followed by a lowercase letter or number ((\.[a-z0-9])?). These last 2 patterns are repeated 0 to many times (([a-z0-9-]*(\.[a-z0-9])?)*).
The regex does not attempt to enforce the size restrictions set forth by AWS (3-63 characters). That can either be handled by another regex (.{3,6}) or by checking the size of the string.
** At that link, one of the tests I added are failing, but if you switch to the test area and type in the same pattern, it passes. It also works if you copy/paste it into the terminal, so I assume that's a bug on the regex101.com side.
Regular expression for S3 Bucket Name:
String S3_REPORT_NAME_PATTERN = "[0-9A-Za-z!\\-_.*\'()]+";
String S3_PREFIX_PATTERN = "[0-9A-Za-z!\\-_.*\\'()/]*";
String S3_BUCKET_PATTERN = "(?=^.{3,63}$)(?!^(\\d+\\.)+\\d+$)(^(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])$)";
I used #Zak regex but it isn't 100% correct. I used this for all rules for AWS bucket name. I make validation step by step so it looks like this:
Bucket names must be at least 3 and no more than 63 characters long -> ^.{3,63}$
Bucket names must not contain uppercase characters or underscores -> [A-Z_]
Bucket names must start with a lowercase letter or number -> ^[a-z0-9]
Bucket names must not be formatted as an IP address (for example, 192.168.5.4) ->^(\d+\.)+\d+$. That is more restricted then AWS.
Bucket names must be a series of one or more labels. Adjacent labels are separated by a single period (.) -> In python if ".." in bucket_name:
.. Each label must end with a lowercase letter or a number ->^(.*[a-z0-9]\.)*.*[a-z0-9]$
var bucketRGEX = new RegExp(/(?=^.{3,63}$)/);
var bucketRGEX1 = new RegExp(/(?!^(\d+\.)+\d+$)/);
var bucketRGEX2 = new RegExp(/(^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$)/);
var result = bucketRGEX.test(bucketName);
var result1 = bucketRGEX1.test(bucketName);
var result2 = bucketRGEX2.test(bucketName);
console.log('bucketName '+bucketName +' result '+result);
console.log('bucketName '+bucketName +' result1 '+result1);
console.log('bucketName '+bucketName +' result 2 '+result2);
if(result && result1 && result2)
{
//condition pass
}
else
{
//not valid bucket name
}
AWS issued new guidelines where '.' is considered not recommended and bucket names starting with 'xn--' are now prohibited (https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html). If you disallow '.' the regex becomes much more readable:
(?=^.{3,63}$)(?!xn--)([a-z0-9](?:[a-z0-9-]*)[a-z0-9])$
I tried passing a wrong bucket name to the S3 API itself to see how it validates. Looks like the following are valid regex patterns as returned in the API response.
Bucket name must match the regex
^[a-zA-Z0-9.\-_]{1,255}$
or be an ARN matching the regex
^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$
Edit: Modified the regexp to allow required size (3-63) and add some other options.
The names must be DNS-compliant, so you could try with:
^[A-Za-z0-9][A-Za-z0-9\-]{1,61}[A-Za-z0-9]$
See: https://regexr.com/3psne
Use this if you need to use periods:
^[A-Za-z0-9][A-Za-z0-9\-.]{1,61}[A-Za-z0-9]$
See: https://regexr.com/3psnb
Finally, if you want to disallow two consecutive 'non-word' characters, you can use:
^[A-Za-z0-9](?!.*[.-]{2})[A-Za-z0-9\-.]{1,61}[A-Za-z0-9]$
See: https://regexr.com/3psn8
Based on: Regexp for subdomain
If you do not use transfer accelration, you can simply remove the period option. This code accounts for all the rules, including
no double dots
no slash/doto combo in a ow
no dot/slash combo
it also allows for you to put a trailing slash at the bucket name or not, depending if you want your user to do so
(?!^|xn--)[a-z0-9]{1}[a-z0-9-.]{1,61}[a-z0-9]{1}(?<!-s3alias|\.\..{1,63}|.{1,63}\.\.|\.\-.{1,63}|.{1,63}\.\-|\-\..{1,63}|.{1,63}\-\.)(?=$|\/)
So my company needs to send our i10n json file to a translator that can translate the thing into other languages.
Now the our system uses this file. Because of this we are able to make some "funky" statements that can be understood by our system but not by our translators when they extract the file.
For instance we have a case like this:
"CHOOSE": "{VALUE, select, 1{Vælg bruger} other{Fejl}}",
In the above example our system either takes Vælg bruger or Fejl
We also have something like this:
"HAS_MATERIAL": "Indeholder {{COUNT}} {{COUNT > 1 ? 'filer' : 'fil'}}",
Basicly the result of this would be Indeholder and if count is bigger than 1 filer else fil.
The last case we have is something like this:
"YOU_HAVE_NOTIFICATION": "You have { LENGTH } {LENGTH, select, 1{new notification} other{new notifications}}",
Again Length is a temp value and that then decides which translation to take.
So now its my job to make a regex for this file so we can get a list of all the words that need to be translated. and i am rather lost. the above 3 cases has different ways of approaching the wanted value.
i attempted with something like this:
{(.*?)}
With a global flag
However this doesnt work on all the cases.
Since there are some kind of "command language" (or two) involved this probably will fail at some point, but it handles your given examples:
{\w+,\s*select,\s*\w+\s*{([^}]*)}\s\w+\s*{([^}]*)}|{[^?{}]+\?\s*'([^']*)'\s*:\s*'([^']*)'\s*}]*}|^([^{}]+)|([^{}]+)$
It treats individual case one by one:
The SELECT statement
Inside braces, expect some expression followed by a ,, a command (in this case select), a ,, a case value and here we grab the text inside braces. Then expect some other case value and again - grab the text inside braces. I expect there will be case were there are more than two cases -> fail. (it can be expanded to handle more though)
Then the ternary operator
Inside braces, expect some expression followed by a ?, then grab the text inside single quotes**. Then expect a : and again - grab the text inside single quotes.
At the start of a line
grab all text up to {.
And end of line
grab anything after last }.
I guess this is far from complete. E.g. it won't handle text between "selects", and feels very fragile, but it might help you get started.
Check it out here at regex101.
Consider a model Post which has a title, description and a locale_restrictions field.
The locale restrictions field specifies in which locales the post should be displayed. It contains a CSV value: en,de,be,nl.
What I would like to do is use either a default_scope or a named scope to only return the model instances for a specific locale. Something like (with a localized scope): Post.localized.all. This scope then looks at the current locale I18n.locale and returns the posts that have that locale in their locale_restrictions CSV.
I cannot seem to get this working, having tried quite a couple of options. The closest I came was with a SQL LIKE expression:
default_scope -> { where("locale_restrictions LIKE (?)", "%#{I18n.locale.to_s}%") }
However, this fails when there's, for example, both a :en and :benl locale, since %en% will match :benl.
Apparently you can't get access to self.locale_restrictions within a scope. self returns the class instead of the instance. I can't figure out a way to split the locale_restrictions and check them.
What would be the best way to go about this using scopes, or are there any best practices regarding localizing database that I'm missing out on?
I'm basically looking for an easy way to scope my controller instance variables to a specific locale. Any help would be greatly appreciated.
Instead of using LIKE you can use REGEXP and include beginning-of-word and end-of-word boundries in the regular expression. This should do the trick:
default_scope -> { where("locale_restrictions REGEXP (?)", "[[:<:]]#{I18n.locale.to_s}[[:>:]]") }
I am currently working on a stub server I can plug into a webpage so I do not need to hit sagepay every time I test my payment screen. I need the server to receive a request from the web page and use the dynamic parameters contained in the URL to build the server response. The stub uses regex targets to pick out the parameters I need and add them to the response.
I am using this stub server
I built the accepted URL piece by piece, using the regex tester contained here to test each bit of logic. The expressions work separately, but when I try to join two or more of them together they refuse to work. Each parameter is separated by an ampersand (&) and the name of the parameter.
Here is a sample of the parameters:
paymentType=A&amount=147.06&policyUid=07ef493b-0000-0000-6a05-9fa4d6a5b5ad&paymentMethod=A&script=Retail/accept.py&scriptParams=uid=07ef461a-0000-0000-6a059fa44a8870bf&invokePCL=true&paymentType=A&description=New Business Payment&firstName=Adam&surname=Har&addressLine1=20 Potters Road&city=London&postalCode=EC1 4JS&payerUid=07ef3ff7-0000-0000-6a05-9fa42e92d56b&cardType=valid&continuousAuthority=true&makeCurrent=true
and in a list for ease of reading (without &'s)
paymentType=A
amount=147.06
policyUid=07ef493b-0000-0000-6a05-9fa4d6a5b5ad
paymentMethod=A
script=Retail/accept.py
scriptParams=uid=07ef461a-0000-0000-6a059fa44a8870bf&invokePCL=true&paymentType=A
description=New Business Payment
firstName=Adam
surname=Har
addressLine1=20 Chase road
city=London
postalCode=EC1 3PF
payerUid=07ef3ff7-0000-0000-6a05-9fa42e92d56b
cardType=valid
continuousAuthority=true
makeCurrent=true
And here is my accepted URL parameters with the regex logic:
paymentType=A&amount=([0-9]+.[0-9]{2})&policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)$)&paymentMethod=([a-zA-Z]+)&script=([a-zA-Z]+/[a-zA-Z]+.py)&scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)))&description=([a-zA-Z0-9 ]+s)&firstName=[A-Za-z]&surname=[A-Za-z]&addressLine1=[a-zA-Z0-9 ]+&city=([a-zA-Z ]+)&postalCode=[a-zA-Z0-9 ]+&payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)$)&cardType=[a-zA-Z]+&continuousAuthority=[a-zA-Z]+&makeCurrent=[a-zA-Z]+
again in a list:
registerPayment?outputType=xml
country=GB
paymentType=A
amount=([0-9]+.[0-9]{2})
policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*$)
paymentMethod=([a-zA-Z]+)
script=([a-zA-Z]+/[a-zA-Z]+.py)
scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)))
description=([a-zA-Z0-9 ]+s)
firstName=[A-Za-z]
surname=[A-Za-z]
addressLine1=[a-zA-Z0-9 ]+
city=([a-zA-Z ]+)
postalCode=[a-zA-Z0-9 ]+
payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*$)
cardType=[a-zA-Z]+
continuousAuthority=[a-zA-Z]+
makeCurrent=[a-zA-Z]+
My question is; why does my regex and sample match ok seperately, but dont when I put them all together ?
Additional question:
I am using the logic (([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+))) for the whole ScriptParams parameter (the &'s here are part of the parameter.) If I just want to get the 'uid' part and leave the rest, what expression would I need to target this (it is made up of A-z a-z 0-9 and dashes)?
thank you
UPDATE
I have tweaked your answer slightly, because the stub server I am using will not accept the (?:[\s-]) when it loads the file containing the URL templates. I have also incorporated a lot of % and 0-9 because the request is UTF encoded before it is matched (which I had not anticipated), and a few of the params have rogue spaces beyond my control. Other than that, your solution worked great :)
Here is my new version of the scriptParams regex:
&scriptParams=[a-zA-Z]{3}%3d[-A-Za-z0-9]+
This accepts the whole parameter, and works fine in the regex tester. Now when I link anything after this part, there is an unsuccessful match.
I do not understand why this is a problem as the regex seem to string together nicely otherwise. Any ideas are appreciated.
Here is the full regex:
paymentType=[-%a-zA-Z0-9 ]+&amount=[0-9]+.[0-9]{2}&policyUid=([-A-Za-z0-9]+)&paymentMethod=([%a-zA-Z0-9]+)&script=[%/.a-zA-Z0-9]+&scriptParams=[a-zA-Z]{3}%3d[-A-Za-z0-9]+&description=[%a-zA-Z0-9 ]+&firstName=[-%A-Za-z0-9]+&surname=[-%A-Za-z0-9]+&addressLine1=[-%a-zA-Z0-9 ]+&city=[-%a-zA-Z 0-9]+&postalCode=[-%a-zA-Z 0-9]+&payerUid=([-A-Za-z0-9]+)&cardType=[%A-Za-z0-9]+&continuousAuthority=[A-Za-z]+&makeCurrent=[A-Za-z]+
And here is the full set of URL params (with UTF encoding present):
paymentType=A&amount=104.85&policyUid=16a9cc22-0000-0000-5a96-5654d9a31f92&paymentMethod=A%20&script=RetailQuotes%2FacceptQuote.py%20&scriptParams=uid%3d16a9c958-0000-0000-5a96-565435311d07%26invokePCL%3dtrue%26paymentType%3dA%20&description=New%2520Business%2520Payment&firstName=Adam&surname=Har%20&addressLine1=26%2520Close&city=Potters%2520Town&postalCode=EC1%25206LR%20&payerUid=16a9c24e-0000-0000-5a96-5654b3f956e0&cardType=valid%20&continuousAuthority=true&makeCurrent=true
Thank you
PS
(Solved the server problem. Was a slight mistake I was making in the usage of URL params.)
First, your regex not all work, some are missing quantifiers, others have a $ for some reason and some parameters are even missing! Here's what they should have been:
paymentType=A
amount=([0-9]+.[0-9]{2})
policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)
paymentMethod=([a-zA-Z]+)
script=([a-zA-Z]+/[a-zA-Z]+.py)
scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)+))
invokePCL=([a-z]+)
paymentType=A
description=([a-zA-Z0-9 ]+)
firstName=[A-Za-z]+
surname=[A-Za-z]+
addressLine1=[a-zA-Z0-9 ]+
city=([a-zA-Z ]+)
postalCode=[a-zA-Z0-9 ]+
payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)
cardType=[a-zA-Z]+
continuousAuthority=[a-zA-Z]+
makeCurrent=[a-zA-Z]+
And combined, you get:
paymentType=A&amount=([0-9]+.[0-9]{2})&policyUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)&paymentMethod=([a-zA-Z]+)&script=([a-zA-Z]+/[a-zA-Z]+.py)&scriptParams=[a-zA-Z]{3}=(([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)+))&invokePCL=([a-z]+)&paymentType=A&description=([a-zA-Z0-9 ]+)&firstName=[A-Za-z]+&surname=[A-Za-z]+&addressLine1=[a-zA-Z0-9 ]+&city=([a-zA-Z ]+)&postalCode=[a-zA-Z0-9 ]+&payerUid=([A-Za-z0-9]+(?:[\s-][A-Za-z0-9]+)*)&cardType=[a-zA-Z]+&continuousAuthority=[a-zA-Z]+&makeCurrent=[a-zA-Z]+
regex101 demo
[Note, I took your regexes where they matched and ran minimal edits to them].
For your second question, I'm not sure what you mean by the Uid part and that & are part of the parameter. Given that there are 3 Uids in the url with similar format (policy, scriptparams, user), you will have to put them in the expression, unless you know a specific pattern to the scriptparams' Uid.
In the expression below, I made use of the fact that only scriptparams' uid was in lowercase:
uid=[0-9a-f]+(?:-[0-9a-f]+)+
regex101 demo
TLDR; I want to enable database-logging of xss_clean() when replacing evil data.
I want to enable database logging of the xss_clean() function in Security.php, basically what I want to do is to know if the input I'm feeding xss_clean() with successfully was identified to have malicious data in it that was filtered out or not.
So basically:
$str = '<script>alert();</script>';
$str = xss_clean($str);
What would happen ideally for me is:
Clean the string from XSS
Return the clean $str
Input information about the evil data (and eventually the logged in user) to the database
As far as I can see in the Security.php-file there is nothing that takes care of this for me, or something that COULD do so by hooks etc. I might be mistaken of course.
Since no logging of how many replaces that were made in Security.php - am I forced to extend Security.php, copy pasting the current code in the original function and altering it to support this? Or is there a solution that is more clean and safe for future updates of CodeIgniter (and especially the files being tampered/extended with)?
You would need to extend the Security class, but there is absolutely no need to copy and paste any code if all you need is a log of the input/output. Something along the lines of the following would allow you to do so:
Class My_Security extends CI_Security {
public function xss_clean($str, $is_image = FALSE) {
// Do whatever you need here with the input ... ($str, $is_image)
$str = parent::xss_clean($str, $is_image);
// Do whatever you need here with the output ... ($str)
return $str;
}
}
That way, you are just wrapping the existing function and messing with the input/output. You could be more forward compatible by using the PHP function get_args to transparently pass around the arguments object, if you were concerned about changes to the underlying method.