Generating unsatisfiable test problems - language-agnostic

I'm trying to generate some test problems for propositional satisfiability, in particular to generate some that are unsatisfiable, but according to a fixed pattern, so that for any N, an unsatisfiable problem of N variables can be generated.
An easy solution would be x1, x1=>x2, x2=>x3 ... !xN except this would be all unit clauses, which any SAT solver can handle instantly, so that's not a tough enough test.
What would be a pattern for unsatisfiable problems of N variables, that are not random and can be seen by inspection to be unsatisfiable, but are at least somewhat nontrivial for an SAT solver?

Pigeonhole problems are non-trivial for CDCL-based SAT solvers without preprocessing, i.e. see Detecting Cardinality Constraints in CNF. The paper Hard Examples for Resolution may be of interest for you.

First example that comes to mind is to take the conjunction of all possible disjunctions containing every variable once. For example, if your variables are p1, p2, and p3:
(¬p1 ∨ ¬p2 ∨ ¬p3) ∧ (¬p1 ∨ ¬p2 ∨ p3) ∧ (¬p1 ∨ p2 ∨ ¬p3) ∧ (¬p1 ∨ p2 ∨ p3) ∧ (p1 ∨ ¬p2 ∨ ¬p3) ∧ (p1 ∨ ¬p2 ∨ p3) ∧ (p1 ∨ p2 ∨ ¬p3) ∧ (p1 ∨ p2 ∨ p3)
Another way to describe this is: the conjunction of negations of every possible assignment. For example, ¬(p1 ∧ p2 ∧ p3) = (¬p1 ∨ ¬p2 ∨ ¬p3) is a clause of the formula. So, every possible assignment fails to satisfy exactly one clause. We only know this, however, because we verified that the clauses are exhaustive.
If we try to convert to canonical disjunctive normal form, we can’t do it any faster regardless of what order we try the variables in, and we eventually get:
(p1 ∧ ¬p1 ∧ p2 ∧ p3) ∨ (p1 ∧ p2 ∧ ¬p2 ∧ p3) ∨ (p1 ∧ p2 ∧ p3 ∧ ¬p3)
Where every clause we generate turns out to be unsatisfiable, but we only see this when we expand out all of them. If we try to search for a satisfying assignment, regardless of what order we try the variables in, we can only test every possible assignment exhaustively in exponential time.
There might be a SAT-solver out there that tests for this special case, although testing that every possible clause is in the input would itself take exponential time, and putting arbitrary input into a canonical form where you could efficiently say, there are only eight possible clauses of three variables and we’ve already checked that there are no duplicates, would take a while too.

Related

GROUP BY Syntax Mysql - Leaving out a groupable column

I have Table A with columns X,Y,Z.
X is an FK, Y is a description. Each X has exactly one corresponding Y. So if X stays the same over multiple records, Y stays the same too.
So there may be any number of records where X and Y are the same.
Now I'm running the following query:
SELECT X, Y
FROM A
GROUP BY X;
Will this query work?
Y is supposed to be grouped alongside X, but I didnt explicitely specify it in the query.
Does MySQL still implicitely act this way though? And is this behavior reliable/standardized?
Furthermore, will the results vary based on the datatype of Y. For example, is there a difference if Y is either VARCHAR, CHAR or INT? In case of an int, will the result be a SUM() of the grouped records?
Is the behavior MySQL will expose in such a case normed/standardized and where can I look it up?
Each X has exactly one corresponding Y
SELECT X, Y FROM A GROUP BY X;
Will this query work?
Technically, what happens when you run this query under MySQL depends on wether sql mode ONLY_FULL_GROUP_BY is enabled or not:
it it is enabled, the query errors: all non-aggregated columns must appear in the GROUP BY clause (you need to add Y to the GROUP BY clause)
else, the query executes, and gives you an arbitrary value of Y for each X; but since Y is functionnaly dependant on X, the value is actually predictable, so this is OK.
Generally, although the SQL standard does recognizes the notion of functionnaly-dependant column, it is a good practice to always include all non-aggregated colums in the GROUP BY clause. It is also a requirement in most databases other than MySQL (and, starting MySQL 5.7, ONLY_FULL_GROUP_BY is enabled by default). This also prevents you from various pitfalls and unpredictable behaviors.
Using ANY_VALUE() makes the query both valid and explicit about its purpose:
SELECT X, ANY_VALUE(Y) FROM A GROUP BY X;
Note that if you only want the distinct combinations of X, Y, it is simpler to use SELECT DISTINCT:
SELECT DISTINCT X, Y FROM A;
Your query will work if Y is functionally dependent on X (depending on SQL mode being used), but if you are trying to get distinct X,Y pairs from the table, it is better to use DISTINCT. The GROUP BY is meant to be used with the aggregate functions.
So you should use:
SELECT DISTINCT X, Y
FROM A;
A sample case where you would use GROUP BY would be with an aggregate functions:
SELECT DISTINCT X, Y, COUNT(*)
FROM A
GROUP BY X, Y;

SELECT with 9 columns in a WHERE or GROUP BY. Is there a better way to index multiple columns than a composite index for almost each permutation?

I have 9 columns in a table with 500 million rows in which I will do SELECT queries and all of the columns may or may not be in a WHERE or GROUP BY.
For example:
Columns -> (A, B, C, D, E, F, I, J, K)
query ->
SELECT *
FROM table WHERE A = 'x' AND J = 'y' GROUP BY B, E, K
What is the best way to indexing and optimize the database? Do I have to do a multiple column index (composite index) for each permutation of columns?
For 3 columns I know I could do:
(a, b, c), (b, c), (c), (a, c)
but what about 9 columns?
You can't achieve the desired goal. Some possible alternatives:
Discover which columns are most commonly used; then make up to 10 indexes with up to 3 columns each. If you help the most common combinations, then it might be "good enough".
Look into MariaDB with "Columnstore".
Look into addon packages.
First, select * is not appropriate with group by. Happily, MySQL no longer allows that syntax (by default).
If you intend:
SELECT B, E, K
FROM table
WHERE A = 'x' AND J = 'y'
GROUP BY B, E, K;
Then the best index is on (A, J, B, E, K);

Writing a SQL query that returns tuples

You are given a relation R with N columns of the same type. Write an SQL query that returns
the tuples (perhaps more than one) having the minimum number of different values. The solution
should have size polynomial in N and use aggregation in an essential way.
Examples:
R1 = {t1 : (a, a, b),(t2 : (b, a, c)}: the result is t1.
R2 = {t1 : (a, a, b),(t2 : (b, a, c), t3 : (b, b, b)}: the result is t3.
A part of the solution is devising a way to uniquely identify tuples without introducing the
notion of a tuple identifier.
I can't grasp the tuple concept.

How can I apply arithmetic operations to aggregated columns in MySQL?

TL;DR
Is there a way to use aggregated results in arithmetic operations?
Details
I want to take two aggregated columns (SUM(..), COUNT(..)) and operate them together, eg:
-- doesn't work
SELECT
SUM(x) AS x,
COUNT(y) AS y,
(x / y) AS x_per_y -- Problem HERE
FROM
my_tab
GROUP BY groupable_col;
That doesn't work, but I've found this does:
SELECT
SUM(x) AS x,
COUNT(y) AS y,
SUM(x) / COUNT(y) AS x_per_y -- notice the repeated aggregate
FROM
my_tab
GROUP BY groupable_col;
But if I need many columns that operate on aggregates, it quickly becomes very repetitive, and I'm not sure how to tell whether or not MySQL can optimize so that I'm not calculating aggregates multiple times.
I've searched SO, for a while now, as well as asked some pros, and the best alternative I can come up with is nested selects, ~~which my db doesn't support.~~
EDIT: it did support them, I had been doing something wrong, and ruled out nested selects prematurely
Also, MySQL documentation seems to support it, but I can't get something like this to work (example at very bottom of link)
https://dev.mysql.com/doc/refman/5.5/en/group-by-handling.html
One way is using subquery:
select x,
y,
x / y as x_per_y
from (
select SUM(x) as x,
COUNT(y) as y
from my_tab
group by groupable_col
) t
Also note that the value of count(y) can be zero (when all y are null).
MySQL handles this case automatically and produce NULL in case the denominator is zero.
Some DBMSes throw divide by zero error in this case, which is usually handled by producing null in that case:
select x,
y,
case when y > 0 then x / y end as x_per_y
from (
select SUM(x) as x,
COUNT(y) as y
from my_tab
group by groupable_col
) t

mysql spatial data speed

I have a table with (among others) x and y fields of SMALLINT type and pt of POINT type, set to POINT(x,y);
x and y have normal indecies and pt has a spatial index set.
Profiling typical query
select sql_no_cache
count(0) from `table_name`
where (x between -50 and 50)
and (y between -50 and 50);
-- vs
set #g = GeomFromText('Polygon((-50 -50, 50 -50, 50 50, -50 50, -50 -50))');
select sql_no_cache
count(0) from `table_name`
where MBRContains(#g, `pt`);
... shows that query via x and y is 1.5 times faster:
3.45±0.10ms vs 4.61±0.14ms over 10 queries.
x and y would always be INT and only rectangular (even square) areas would be queried. Yes, this is carved in stone ;-)
The main question is:
Have I missed something about indecies or is spatial data an overkill in such case?
MySQL version is 5.1.37
DB Engine type is MyISAM (default)
Current table size is 5k rows, 10-30k planned in production.
I have had some experience with MySQL, but never worked with spatial data types and spatial indecies.
Do you have a combined x & y INDEX on the table? If so then yes, this is extremely fast.
I believe Spatial indexes have more broad use. A polygon structure can have many vertices and the rectangle is a single case of a more generic construct.
If a rectangular boundary area is enough for your needs then I would rather suggest you go with the x and y fields solution than adding the complexity of the geospatial extension features.