Category Members API - Mediawiki - mediawiki

I had a script which used the sort keys from this API call. Previously it returned values in such a fashion:
ns = 876bcb5a0a63ac522ceb8c54647bf59b+\43168
title = Album:Bits And Blood (EP)
sortkey = CAT001
But now, the sortkey does not return the value as it used to. Any ideas?
ns= 876bcb5a0a63ac522ceb8c54647bf59b+\43168
title=Album:Bits And Blood (EP)
sortkey=383038312d3036310a4249545320414e4420424c4f4f442028455029

See this announcement to the mediawiki-api mailing list. In short, for internationalized sorting they changed the sortkey value stored in the database to a binary representation, and so in the API they changed it to output it as hex-encoded to not break clients expecting text rather than binary content.
The human-readable value is available as sortkeyprefix.

Related

How to get a name or word from a string that is inside a JSON value in Android Studio?

I'm not sure If the title is correct, but I'll explain what I mean. So I'm doing a project that involves an API. I created the data classes that was need to store that information as well. Where it gets weird is actually getting the information I need. Here's an instance of a list of information I need for the project.
"Text":"Alma Gutierrez - Alma M. Gutierrez is a fictional character on the HBO drama The Wire, played by actress Michelle Paress. Gutierrez is a dedicated and idealistic young reporter on the city desk of The Baltimore Sun."
You see, the name of the character and the description is all in a single string value. I'm usually used to name and the description being separated like this for example
Text:{
name: "Alma Gutierrez"
description:"Alma is a..."
}
So my question is, how can I manipulate the response so that I can get the name and the description separately? I am thinking maybe some sort of function that will take the string value from the JSON call and split it to a name and description values. But I'm not sure how to do that.
I'll leave my project GitHub URL so you guys for reference. Thanks for the help.
https://github.com/OEThe11/AnywhereCE
You can use split() to split a string into parts based on a delimiter.
For instance, if you have a string containing the description as mentioned in the question, you can do the following:
val text = "Alma Gutierrez - Alma M. Gutierrez is a fictional character on the HBO drama The Wire, played by actress Michelle Paress. Gutierrez is a dedicated and idealistic young reporter on the city desk of The Baltimore Sun."
val (name, description) = text.split(" - ", limit = 2)
(see the behaviour in this playground)
The limit = 2 parameter ensures that you won't miss a part of the description if it contains - . It only splits in maximum 2 parts, so it will consider everything until the first occurrence of - as the name, and everything after that as the description, even if it includes more occurrences of - .
Note that using the deconstruction val (name, description) = ... like this will fail if split() returns less than 2 parts (in other words, it will fail if the initial text doesn't contain - at all. It may be ok for you depending on the input you expect here.
To add on to what Joffery said, I actually created a variable to hold the the spited string.
val parts = Text.split(" - ", limit = 2)
Since there's only two values in the split, I can use the variable and call the index that I need for the corresponding text field.

What does {+} mean in api variable types?

[![enter image description here][1]][1][![enter image description here][2]][2]This variable type was specified in an API. What does it mean? An object with properties?
webinar_id* {+}
It says string but when I input a string for webinar_id I get a 400 missing required parameter.
The linked document shows you that webinar_id is a string that is obtained by the call from section 1. Section 1 also specifies an array called schedules, and since the API call in question asks for an integer called schedule it is logical to assume this is the array of the index previously returned in the section 1 JSON response.
EDIT: webinar_id is also a string and schedule is also an int in the response from the call of section 2. I assume those will be identical to those from section 1.
The fact that {+} indicates re-using values from other requests is outlined in the subscript in section 3.
{+} webinar_id and schedule must be obtained from a previous API call to retrieve the details from whatever specific webinar you want to register the person to.
Hope that helps. It mostly just comes down to careful reading of the documentation document (which you should be glad you have, things aren't always this explicit!)

Output index of ELKI

I am using ELKI to cluster data from CSV file
I use
-resulthandler ResultWriter
-out folder/
to save the outputdata
But as an output I have some strange indexes
ID=2138 0.1799 0.2761
ID=2137 0.1797 0.2778
ID=2136 0.1796 0.2787
ID=2109 0.1161 0.2072
ID=2007 0.1139 0.2047
The ID is more than 2000 despite I have less than 100 training samples
DBIDs are internal; the documentation clearly says that you shouldn't make too much assumptions on them because their implementation may change. The only reason they are written to the output at all is because some methods (such as OPTICS) may require cross-referencing objects by this unique ID.
Because they are meant to be unique identifiers, they are usually continuously incremented. The next time you click on "run" in the MiniGUI, you will get the next n IDs... so clearly, you clicked run more than once.
The "Tips & Tricks" in the ELKI DBID documentation probably answer your underlying question - how to use map DBIDs to line numbers of your input file. The best way is to if you want to have object identifiers, assign object identifiers yourself by using an identifier column (and configuring it to be an external identifier).
For further information, see the documentation: https://elki-project.github.io/dev/dbids

Regex for s3 bucket name

I'm trying to create an s3 bucket through cloudformation. I tried using regex ^([0-9a-z.-]){3,63}$, but it also accepts the patterns "..." and "---" which are invalid according to new s3 naming conventions. (Ref: https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html) Please help?
Answer
The simplest and safest regex is:
(?!(^xn--|.+-s3alias$))^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$
It ensures that names work for all cases - including when you are using S3 Transfer Acceleration. Also, as it doesn't include any backslashes, it's easier to use in string contexts.
Alternative
If you need S3 bucket names that include dots (and you don't use S3 Transfer Acceleration), you can use this instead:
(?!(^((2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})\.){3}(2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})$|^xn--|.+-s3alias$))^[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$
Explanation
The Amazon S3 bucket naming rules as of 2022-05-14 are:
Bucket names must be between 3 (min) and 63 (max) characters long.
Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-).
Bucket names must begin and end with a letter or number.
Bucket names must not be formatted as an IP address (for example, 192.168.5.4).
Bucket names must not start with the prefix xn--.
Bucket names must not end with the suffix -s3alias.
Buckets used with Amazon S3 Transfer Acceleration can't have dots (.) in their names.
This regex matches all the rules (including rule 7):
(?!(^xn--|.+-s3alias$))^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$
The first group (?!(^xn--|-s3alias$)) is a negative lookahead that ensures that the name doesn't start with xn-- or end with -s3alias (satisfying rules 5 and 6).
The rest of the expression ^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$ ensures that:
the name starts with a lowercase letter or number (^[a-z0-9]) and ends with a lowercase letter or number ([a-z0-9]$) (rule 3).
the rest of the name consists of 1 to 61 lowercase letters, numbers or hyphens ([a-z0-9-]{1,61}) (rule 2).
the entire expression matches names from 3 to 63 characters in length (rule 1).
Lastly, we don't need to worry about rule 4 (which forbids names that look like IP addresses) because rule 7 implicitly covers this by forbidding dots in names.
If you do not use Amazon S3 Transfer Acceleration and want to permit more complex bucket names, then you can use this more complicated regular expression:
(?!(^((2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})\.){3}(2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})$|^xn--|.+-s3alias$))^[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$
The main change is the addition of the expression to match IPv4 addresses (while the spec simply says that bucket names must not be formatted as IP addresses, as IPv6 addresses contain colons, they are already forbidden by rule 2.)
The following regex fulfils the AWS specifications provided the fact you don't want to allow . in the bucket name (which is a recommendation, otherwise Transfer Acceleration can't be enabled):
^((?!xn--)(?!.*-s3alias$)[a-z0-9][a-z0-9-]{1,61}[a-z0-9])$
This one is good because it allows to be incorporated in more complex checks simply replacing ^ and $ with other strings, thus allowing for ARN checks and so on.
EDIT:
added -s3alias exclusion as per the comment by #ryanjdillon
I've adapted Zak's answer a little bit. I found it was a little too complicated and threw out valid domain names. Here's the new regex (available with tests on regex101.com**):
(?!^(\d{1,3}\.){3}\d{1,3}$)(^[a-z0-9]([a-z0-9-]*(\.[a-z0-9])?)*$)
The first part is the negative lookahead (?!^(\d{1,3}\.){3}\d{1,3}$), which only matches valid IP addresses. Basically, we try to match 1-3 numbers followed by a period 3 times (\d{1,3}\.){3}) followed by 1-3 numbers (\d{1,3}).
The second part says that the name must start with a lowercase letter or a number (^[a-z0-9]) followed by lowercase letters, numbers, or hyphens repeated 0 to many times ([a-z0-9-]*). If there is a period, it must be followed by a lowercase letter or number ((\.[a-z0-9])?). These last 2 patterns are repeated 0 to many times (([a-z0-9-]*(\.[a-z0-9])?)*).
The regex does not attempt to enforce the size restrictions set forth by AWS (3-63 characters). That can either be handled by another regex (.{3,6}) or by checking the size of the string.
** At that link, one of the tests I added are failing, but if you switch to the test area and type in the same pattern, it passes. It also works if you copy/paste it into the terminal, so I assume that's a bug on the regex101.com side.
Regular expression for S3 Bucket Name:
String S3_REPORT_NAME_PATTERN = "[0-9A-Za-z!\\-_.*\'()]+";
String S3_PREFIX_PATTERN = "[0-9A-Za-z!\\-_.*\\'()/]*";
String S3_BUCKET_PATTERN = "(?=^.{3,63}$)(?!^(\\d+\\.)+\\d+$)(^(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])$)";
I used #Zak regex but it isn't 100% correct. I used this for all rules for AWS bucket name. I make validation step by step so it looks like this:
Bucket names must be at least 3 and no more than 63 characters long -> ^.{3,63}$
Bucket names must not contain uppercase characters or underscores -> [A-Z_]
Bucket names must start with a lowercase letter or number -> ^[a-z0-9]
Bucket names must not be formatted as an IP address (for example, 192.168.5.4) ->^(\d+\.)+\d+$. That is more restricted then AWS.
Bucket names must be a series of one or more labels. Adjacent labels are separated by a single period (.) -> In python if ".." in bucket_name:
.. Each label must end with a lowercase letter or a number ->^(.*[a-z0-9]\.)*.*[a-z0-9]$
var bucketRGEX = new RegExp(/(?=^.{3,63}$)/);
var bucketRGEX1 = new RegExp(/(?!^(\d+\.)+\d+$)/);
var bucketRGEX2 = new RegExp(/(^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$)/);
var result = bucketRGEX.test(bucketName);
var result1 = bucketRGEX1.test(bucketName);
var result2 = bucketRGEX2.test(bucketName);
console.log('bucketName '+bucketName +' result '+result);
console.log('bucketName '+bucketName +' result1 '+result1);
console.log('bucketName '+bucketName +' result 2 '+result2);
if(result && result1 && result2)
{
//condition pass
}
else
{
//not valid bucket name
}
AWS issued new guidelines where '.' is considered not recommended and bucket names starting with 'xn--' are now prohibited (https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html). If you disallow '.' the regex becomes much more readable:
(?=^.{3,63}$)(?!xn--)([a-z0-9](?:[a-z0-9-]*)[a-z0-9])$
I tried passing a wrong bucket name to the S3 API itself to see how it validates. Looks like the following are valid regex patterns as returned in the API response.
Bucket name must match the regex
^[a-zA-Z0-9.\-_]{1,255}$
or be an ARN matching the regex
^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$
Edit: Modified the regexp to allow required size (3-63) and add some other options.
The names must be DNS-compliant, so you could try with:
^[A-Za-z0-9][A-Za-z0-9\-]{1,61}[A-Za-z0-9]$
See: https://regexr.com/3psne
Use this if you need to use periods:
^[A-Za-z0-9][A-Za-z0-9\-.]{1,61}[A-Za-z0-9]$
See: https://regexr.com/3psnb
Finally, if you want to disallow two consecutive 'non-word' characters, you can use:
^[A-Za-z0-9](?!.*[.-]{2})[A-Za-z0-9\-.]{1,61}[A-Za-z0-9]$
See: https://regexr.com/3psn8
Based on: Regexp for subdomain
If you do not use transfer accelration, you can simply remove the period option. This code accounts for all the rules, including
no double dots
no slash/doto combo in a ow
no dot/slash combo
it also allows for you to put a trailing slash at the bucket name or not, depending if you want your user to do so
(?!^|xn--)[a-z0-9]{1}[a-z0-9-.]{1,61}[a-z0-9]{1}(?<!-s3alias|\.\..{1,63}|.{1,63}\.\.|\.\-.{1,63}|.{1,63}\.\-|\-\..{1,63}|.{1,63}\-\.)(?=$|\/)

Simperium Data Dictionary or Decoder Ring for Return Value on "all" call?

I've looked through all of the Simperium API docs for all of the different programming languages and can't seem to find this. Is there any documentation for the data returned from an ".all" call (e.g. api.todo.all(:cv=>nil, :data=>false, :username=>false, :most_recent=>false, :timeout=>nil) )?
For example, this is some data returned:
{"ccid"=>"10101010101010101010101010110101010",
"o"=>"M",
"cv"=>"232323232323232323232323232",
"clientid"=>"ab-123123123123123123123123",
"v"=>{
"date"=>{"o"=>"+", "v"=>"2015-08-20T00:00:00-07:00"},
"calendar"=>{"o"=>"+", "v"=>false},
"desc"=>{"o"=>"+", "v"=>"<p>test</p>\r\n"},
"location"=>{"o"=>"+", "v"=>"Los Angeles"},
"id"=>{"o"=>"+", "v"=>43}
},
"ev"=>1,
"id"=>"abababababababababababababab/10101010101010101010101010110101010"}
I can figure out some of it just from context or from the name of the key but a lot of it is guesswork and trial and error. The one that concerns me is the value returned for the "o" key. I assume that a value of "M" is modify and a value of "+" is add. I've also run into "-" for delete and just recently discovered that there is also a "! '-'" which is also a delete but don't know what else it signifies. What other values can be returned in the "o" key? Are there other keys/values that can be returned but are rare? Is there documentation that details what can be returned (that would be the most helpful)?
If it matters, I am using the Ruby API but I think this is a question that, if answered, can be helpful for all APIs.
The response you are seeing is a list of all of the changes which have occurred in the given bucket since some point in its history. In the case where cv is blank, it tries to get the full history.
You can find some of the details in the protocol documentation though it's incomplete and focused on the WebSocket message syntax (the operations are the same however as with the HTTP API).
The information provided by the v parameter is the result of applying the JSON-diff algorithm to the data between changes. With this diff information you can reconstruct the data at any given version as the changes stream in.