Does an uncompressable string exist? [closed] - language-agnostic

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I was wondering if there is one or more strings that cannot be losslessy compressed. More formally:
Let String be a string, f(var) a compression function which returns a compressed version of var, g(var) a decompression function such that g(f(var)) = var and strlen(var) a function which returns the length of var,
is there a valid value for String such that strlen(String) < strlen(f(String)) or strlen(String) = strlen(f(String))?
Theoretical answers are welcome, as well as examples in different languages and with different compression algorithms.

The pigeonhole principle tells us that for any given compression function*, there must always be at least one input string that will be expanded.
* i.e. a function that genuinely compresses at least one input string.

I would expect that this string would fit the bill: ""

Yes and for a simple reason: take for example a function that is garanty to return a losslessy compressed string that will be at least one bit less for any input string. Is such a function exists, then by reapplying this same function to its previous result over and over again, we are garanty to compress any string at least one bit further successively for each pass and therefore, we are garanty to be able to compress losslessy any string of any length down to a single bit every time.
Obviously, this is false (some initial strings could give such a final result and it's easy to find about them be applying the decompression algorithm to a compressed string of one bit in length but this result cannot be extended to all uncompressed strings) and therefore, such a function cannot exists; which means that for any compression algorithm, there exists at least one uncompressable string.

Related

How to create quasi-copy of a file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
Improve this question
I would like to create quasi-copy of my directory with sensitive data.
Then I would like to share this quasi-copy with others to provide so called 'real data'.
This 'real data' would allow others to do tests in matters related to storage performance.
My question is how to create copy of any file ( text, jpeg, sqlite.db, ... ) that will not contain any of its original data, but from point of view of compression, de-duplication and so on would be very similar.
I appreciate any pointers to tools, libs that helps with creating such quasi copy.
I appreciate any pointers what to measure and how to measure similarity of original file and its quasi copy.
I don't know whether a "quasi-copy" is an established notion and whether there are accepted rules and procedures. But here is a crude take on how to "mask" data for protection: replace words by equal-length sequences of (perhaps adjusted) random characters. One cannot then do a very accurate storage analysis of real data but that has to suffer after any data scrambling.
One way to build such a "quasi-word," wrapped in a program for convenience
use warnings;
use strict;
use feature 'say';
use Scalar::Util qw(looks_like_number);
my $word = shift // die "Usage: $0 word\n";
my #alphabet = 'a'..'z';
my $quasi_word;
foreach my $c (split '', $word) {
if (looks_like_number($c)) {
$quasi_word .= int rand 10;
}
else {
$quasi_word .= $alphabet[int rand 26];
}
}
say $quasi_word;
This doesn't cut it at all for a de-duplication analysis. For that one can replace repeated words by the same random sequence, for example as follows.
First make a pass over the words from the file and build a frequency hash, of how many times each word appears. Then as each word is processed it is first checked whether it repeats, and if it does a random replacement is built only the first time and later that is used every time.
Further adjustments for specific needs should be easy to add.
Any full masking (scrambling/tokenization...) of data of course cannot allow a precise analysis of compression of real data using such a mangled set.
If you know specific sensitive parts then only those can be masked and that would improve the accuracy of the following analyses considerably.
This will be slow but if a set of files need be built once in a while it shouldn't matter.
A question was raised of the "criteria" for the "similarity" of masked data, and the question itself was closed for lack of detail. Here is a comment on that.
It seems that only "measure" of "similarity" is simply whether the copy behaves the same in the storage performance analysis as the real data would. But, one can't tell without using real data for that analysis! (What clearly would reveal that data.)
The one way I can think of is to build a copy using a candidate approach and then use it (yourself) for individual components of that analysis. Does it compress (roughly) the same? How about de-duplication? How about ...? Etc. Then make your choices.
If the used approach is flexible enough the masking can then be adjusted for whichever part of analysis "failed" -- the copy behaved substantially differently. (If compression was very different perhaps refine your algorithm to study words and produce more similar obfuscation, etc.)

Adding user defined functions to a simple calculator YACC [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I've been searching all over the internet for a comprehensible example to how you can define and call a function in a simple calculator interpreter. Maybe I've found the answer but since I'm not familiar with YACC I couldn't see it.
So the question is, how do you set up a symbol table for user defined functions and how do you store/call these functions in a calculator interpreter?
I'm basically looking to achieve something like this:
def sum(a,b) { a + b }
sum(5,5)
result:
10
Any pointers or examples would be appreciated
That's definitely diving in to the concepts required to interpret (or compile) a programming language, which makes it difficult to provide an answer in a format suitable for StackOverflow. Here's a quick outline:
You need a symbol table which can hold both functions and variables. Or two symbol tables. In the first case, the mapped value will be some kind of variant type, such as a discriminated union; you might need that anyway if you have more than one type of variable. In the second case, you can use a specific type for the mapped value of function names. I'd go for the first option, because it allows functions to be first-class objects.
You need some kind of type which represents the "value" of a function definition. The obvious type is the Abstract Syntax Tree (AST) of an expression (or a program), and doing that will generally simplify your code so I'd highly recommend it. That means that the calculator/parser will not actually evaluate 5+5 (even if that is the literal input) or a+b, but rather will return an AST to whoever called the parser. Consequently, you will need:
A function which can evaluate an AST. That's usually straightforward to write, since it's just a depth-first tree walk. But now you need to worry about variable scope because when you do evaluate they body of your function sum, you probably want to only locally set the values of the parameters.
If you manage all that, you will have gone several steps beyond the usual "let's build a calculator with flex and bison" project, and I definitely encourage you to do so. You may want to take a look at the classic text Structure and Interpretation of Computer Programs (Abelson & Sussman, 1996; often referred to simply as SICP).

Best practices: what should an "IsValid" boolean function return when passed a null argument? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Given a bool validation function that takes an argument and returns true or false based on argument's compliance to some internal rules:
if the argument is null, should the function:
return false
return true
do neither and simply raise a ArgumentNullException
I would tend to believe the best practice is to raise the exception.
I am however curious to hear others experience on the subject.
Given the sole choice of a bool, I am personally tempted to return false, but could see benefits in returning true also, based on the context of the function's usage.
A null string for instance could be interpreted as empty and may be considered valid.
Are there best practice guidelines for this specific situation?
I am looking for a guideline, like ones found in books like Code Complete.
Does it always need to be a case by case?
I don't think there's a general best practice, it will depend on the semantics.
Does it make sense to receive null? If so, return true or false based on what makes more sense, e.g. an hypothetical isAlphaNumericString(String) returning true when passed null is most likely nonsensical, but returning false may make sense.
But if it makes no sense to receive null, then a null marks a problem in the call, raise an exception to enforce the caller to make sense.
As I interpret your question, your input-variable space is determined by each value the variable can take augmented by a null state. In SQL for example, this corresponds to some type, say INTEGER and NULL. In C++, for example, it corresponds to something like boost::optional<int> in which null means "uninitialized".
Now, I think the cleanest solution is to augment the result-space as well with null. This is also the choice both examples above follow (or at least commonly follow). For example, if a scalar function such as LENGTH() or also a comparison operator in SQL takes a NULL argument, it usually also returns NULL.
More on the theoretical side, this implements somthing like an isomorphism from the null-subspace in definition space to the null-subspace in result space (and the same holds for the complement, i.e. the non-null space). The advantage of this is tha you do not have to reinterpret your original function at all.

Explain function returns in c [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Why do these message functions return the return value of the printf ?
I know this code should not be used, it was just out of curiosity
#include<stdio.h>
message1()
{
printf("%d",(printf("Good")+printf("Morning")));
}
message2()
{
printf("%d",printf("Good"),printf("Morning"));
}
message3()
{
printf("%d",(printf("Good"),printf("Morning")));
}
int main()
{
printf("%d\n",message1());
printf("%d\n",message2());
printf("%d\n",message3());
return 0;
}
When you write:
message1()
{
...
}
The function is assumed to return an int. It is usually better to make it explicit:
int message1()
But your functions do not have any return, so the return value is undefined.
In your case, it happens that, by chance, the value stored as return value is the return value of the previous function call, that is printf(). But you should never trust in this.
printf returns the number of characters printed. The part you seem to be asking about then boils down to comparing these:
printf("%d", (4 + 7));
printf("%d", 4, 7);
printf("%d", (4, 7));
In the first one, 4 + 7 gives 11. In the second one, 4 is printed and the excess argument is ignored (because no format specifier corresponds). In the third one , (4, 7) is an expression featuring the comma operator and so it evaluates to 7.
NB. As everyone has pointed out, your code causes undefined behaviour by not returning a value from the function which returns int. You seem to be asking two different questions: why do your functions appear to return a value anyway; and what is the explanation of the other output you see.
The explanation of the first part is that it's undefined behaviour; to fix this you should change your functions to:
int message1(void)
{
return printf("%d", (printf("Good")+printf("Morning")));
}
and so on (which I assume is what your intention was when you wrote the function).
Your message functions lack a return type. C deduces it to be int. However, not returning a value from a non-void function is Undefined Behaviour.
In your case, the return value from printf has probably been stored in a register, which was not overwritten by message before it itself returned. Thus, the return value propagated.
While this can seem to ba all good and dandy, don't forget that UB is random by definition, and that you ought to avoid it at all cost (lest you want your program to mysteriously blow up when you change compiler / OS / Moon phase). Enabling (and acknowledging) compiler warnings would have saved you here.
the functions you declared are implicitly declared as returning int. The compiler should warn you about that, and also tell you that it is deprecated: don't do deprecated stuff. Since you miss a return statement in them the return value is undefined. I guess you get the return value of printf because that's what is in the stack at that moment, in practice, just luck. You probably wouldn't have the same result if you defined a variable inside those functions.

How do I know which is a function and which is an operator? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
MySQL have both functions and operators. However, it is not that clear for an arbitrary keyword whether it is a function or an operator.
For example, I believe ASCII() is a function (it appears in the string functions section of the manual). However, LIKE appears there as well, and it does not appear to be a function; for example, since the syntax does not force (...) after the LIKE keyword, and the docs mention that
By default, there must be no whitespace between a function name and the parenthesis following it.
In some cases it is that clear. For example, the IN keyword appears in the Comparison Functions and Operators section of the manual (a non-disclosing section name), and it appears there with the name IN() (as if it was a function), but the examples show SELECT 2 IN (0,3,5,7);, which hints that this is an operator (watch the space after the keyword).
In the same section there is INTERVAL(). Reading carefully shows the following line in the description of this keyword:
It is required that N1 < N2 < N3 < ... < Nn for this function to work correctly.
which hints that this is indeed a function, and not an operator. LEAST(), which also appears there, does not mention whether it is a function or an operator.
My questions are as follows:
Are there any internal differences between the concepts of function and of operator in MySQL?
Is there a way to figure out, given a keyword, whether it is a function of an operator?
Can a keyword be both, depending on context? I know that some keywords can both a function and a type, for example.
I wish to know that both in order to understand the abstract structure of MySQL, and in order to use it for syntax highlighting.
MySQL provides a list of "non-typed" operators here.
Basically, a function is followed by a list of arguments enclosed in parentheses. Even functions that don't take arguments, such as now() require the parentheses.
An operator, on the other hand, is part of the syntax of the MySQL query language. These are known to the parser, which recognizes them. Operators often use "infix" notation, where the operator appears between the arguments. However, this is not required (just consider the unary minus operator).
A cursory look at the list of operators shows that something can be both an operator and a function. An example is mod().
The most important difference to me is that users can define functions. But users cannot (yet) define operators. Unlike object oriented languages, SQL does not offer a way to provide additional definitions for operators.
And, for your purpose, you should peruse the manual pages to get the lists of things that you care about.