Can Postgres use a function in a partial index where clause? - function

I have a large Postgres table where I want to partial index on 1 of the 2 columns indexed. Can I and how do I use a Postgres function in the where clause of a partial index and then have the select query utilize that partial index?
Example Scenario
First column is "magazine" and the second column is "volume" and the third column is "issue". All the magazines can have same "volume" and "issue" #'s but I want the index to only contain the two most recent volumes for that magazine. This is because a magazine could be older than others and have higher volume numbers than younger magazines.
Two immutable strict functions were created to determine the current and last years volumes for a magazine f_current_volume('gq') and f_previous_volume('gq'). Note: current/past volume # only changes once per year.
I tried creating a partial index with the functions however when using explain on a query it only does a seq scan for a current volume magazine.
CREATE INDEX ix_issue_magazine_volume ON issue USING BTREE ( magazine, volume )
WHERE volume IN (f_current_volume(magazine), f_previous_volume(magazine));
-- Both these do seq scans.
select * from issue where magazine = 'gq' and volume = 100;
select * from issue where magazine = 'gq' and volume = f_current_volume('gq');
What am I doing wrong to get this work? And if it is possible why does it need to be done that way for Postgres to use the index?
-- UPDATE: 2013-06-17, the following surprisingly used the index.
-- Why would using a field name rather than value allow the index to be used?
select * from issue where magazine = 'gq' and volume = f_current_volume(magazine);

Immutability and 'current'
If your f_current_volume function ever changes its behaviour - as is implied by its name, and the presence of an f_previous_volume function, then the database is free to return completely bogus results.
PostgreSQL would've refused to let you create the index, complaining that you can only use IMMUTABLE functions. The thing is, marking a function IMMUTABLE means that you are telling PostgreSQL something about the function's behaviour, as per the documentation. You're saying "I promise this function's results won't change, feel free to make assumptions on that basis."
One of the biggest assumptions made is when building an index. If the function returns different outputs for different inputs on multiple invocations, things go splat. Or possibly boom if you're unlucky. In theory you can kind-of get away with changing an immutable function by REINDEXing everything, but the only really safe way is to DROP every index that uses it, DROP the function, re-create the function with its new definition and re-create the indexes.
That can actually be really useful to do if you have something that changes only infrequently, but you really have two different immutable functions at different points in time that just happen to have the same name.
Partial index matching
PostgreSQL's partial index matching is pretty dumb - but, as I found when writing test cases for this, a lot smarter than it used to be. It ignores a dummy OR true. It uses an index on WHERE (a%100=0 OR a%1000=0) for a WHERE a = 100 query. It even got it with a non-inline-able identity function:
regress=> CREATE TABLE partial AS SELECT x AS a, x
AS b FROM generate_series(1,10000) x;
regress=> CREATE OR REPLACE FUNCTION identity(integer)
RETURNS integer AS $$
SELECT $1;
$$ LANGUAGE sql IMMUTABLE STRICT;
regress=> CREATE INDEX partial_b_fn_idx
ON partial(b) WHERE (identity(b) % 1000 = 0);
regress=> EXPLAIN SELECT b FROM partial WHERE b % 1000 = 0;
QUERY PLAN
---------------------------------------------------------------------------------------
Index Only Scan using partial_b_fn_idx on partial (cost=0.00..13.05 rows=50 width=4)
(1 row)
However, it was unable to prove the IN clause match, eg:
regress=> DROP INDEX partial_b_fn_idx;
regress=> CREATE INDEX partial_b_fn_in_idx ON partial(b)
WHERE (b IN (identity(b), 1));
regress=> EXPLAIN SELECT b FROM partial WHERE b % 1000 = 0;
QUERY PLAN
----------------------------------------------------------------------------
Seq Scan on partial (cost=10000000000.00..10000000195.00 rows=50 width=4)
So my advice? Rewrite IN as an OR list:
CREATE INDEX ix_issue_magazine_volume ON issue USING BTREE ( magazine, volume )
WHERE (volume = f_current_volume(magazine) OR volume = f_previous_volume(magazine));
... and on a current version it might just work, so long as you keep the immutability rules outlined above in mind. Well, the second version:
select * from issue where magazine = 'gq' and volume = f_current_volume('gq');
might. Update: No, it won't; for it to be used, Pg would have to recognise that magazine='gq' and realise that f_current_volume('gq') was therefore equiavalent to f_current_volume(magazine). It doesn't attempt to prove equivalences on that level with partial index matching, so as you've noted in your update you have to write f_current_volume(magazine) directly. I should've spotted that. In theory PostgreSQL could use the index with the second query if the planner was smart enough, but I'm not sure how you'd go about efficiently looking for places where a substitution like this would be worthwhile.
The first example, volume = 100 will never use the index, since at query planning time PostgreSQL has no idea that f_current_volumne('gg'); will evaluate to 100. You could add an OR clause OR volume = 100 to your partial index WHERE clause and PostgreSQL would figure it out then, though.

First off, I'd like to volunteer a wild guess, because you're making it sound like your f_current_volume() function calculates something based on a separate table.
If so, be wary because this means your function volatile, in that it needs to be recalculated on every call (a concurrent transaction might be inserting, updating or deleting rows). Postgres won't allow to index those, and I presume you worked around this by declaring the function immutable. Not only is this incorrect, but you also run into the issue of the index containing garbage, because the function gets evaluated as you edit the row, rather than at run time. What you'd probably want instead -- again if my guess is correct -- is to store and maintain the totals in the table itself using triggers.
Regarding your specific question, partial indexes need to have their where condition be met in the query to prompt Postgres to use them. I'm quite sure that Postgres is smart enough to identify that e.g. 10 is between 5 and 15 and use a partial index with that clause. I'm very suspicious that it would know that f_current_volume('gq') is 100 in your case, however, considering the above-mentioned caveat.
You could try this query and see if the index gets used:
select *
from issue
where magazine = 'gq'
and volume in (f_current_volume('gq'), f_previous_volume('gq'));
(Though again, if your function is in fact volatile, you'll get a seq scan as well.)

Related

MySQL- INDEX(): How to Create a Functional Key Part Using Last nth Characters?

How would I write the INDEX() statement to use the last Nth characters of a functional keypart? I'm brand new to SQL/MySQL, and believe that's the proper verbiage of my question. explanation of what I'm looking for is below.
The MySQL 8.0 Ref Manual explains how to use the first nth characters, showing that the secondary index using col2's first 10 characters, via example:
CREATE TABLE t1 (
col1 VARCHAR(40),
col2 VARCHAR(30),
INDEX (col1, col2(10))
);
However, I would like to know how one could form this using the ending characters? Perhaps something like:
...
INDEX ((RIGHT (col2,3)));
);
However, I think that says to index over a column called 'xyz' instead of "put an index on each column value using the last 3 of 30 potential characters"? That's what I'm really trying to figure out.
For some context, it'd be helpful to index something with smooshed/mixed data and am playing around as to how such a thing could be accomplished. Example of the kind of data I'm talking about, below, is a simplified, adjusted version of exported data from an inventory/billing manager that hails from the 90's that I had to endure some years back...:
Col1
Col2
GP6500012_SALES_FY2023_SBucks_503_Thurs
R-DK_Sumat__SKU-503-20230174
GP6500012_SALES_FY2023_SBucks_607_Mon
R-MD_Columb__SKU-607-2023035
GP6500012_SALES_FY2023_SBucks_627_Mon-pm
R-BLD_House__SKU-503-20230024
GP6500012_SALES_FY2023_SBucks_929_Wed
R-FR_Ethp__SKU-929-20230324
Undoubtedly, better options exist that bypass this question altogether- and I'll presumably learn those techniques with time in my data analytics coursework. For now, I'm just curious if it's possible to somehow index the rows by suffix instead of prefix, and what a code example would look like to accomplish that. TIA.
Proposed solution (INDEX ((RIGHT (col2,3)))):
Not available.
Case 1:
When you need to split apart a column to search it, you have probably designed the schema wrong. In particular, that part of the columns needs to be in its own column. That being said, it is possible to use a 'virtual' (or 'generated') column that is a function of the original column, then INDEX that.
Case 2:
If you are suggesting that the last 3 characters are the most selective and that might speed up any lookup, don't bother. Simply index the entire column.
That data:
I would consider splitting up the stuff that is concatenated together by _. Do it as you INSERT the rows. If it needs to be put back together, do so during subsequent SELECTs.
DATEs:
Do not, on the other hand, split up dates (into year, month, etc). Keep them together. (That's another discussion.) Always go to the effort to convert dates (and datetimes) to the MySQL format (year-first) when storing. That way, you can properly use indexes and use the many date functions.
MySQL's Prefix indexing:
In general it is a "bad idea" to use the INDEX(col(10)) construct. It rarely is of any benefit; it often fails to use the index as much as you would expect. This is especially deceptive: UNIQUE(col(10)) -- It declares that the first 10 chars are unique, not the entire col!
CAST:
If the data is the wrong datatype (string vs int; wrong collation; etc), the I argue that it is a bad schema design. This is a common problem with EAV (Entity-Attribute-Value) schemas. When a number is stored as a string, CAST is needed to sort (ORDER BY) it.
Functional indexes:
Your proposed solution not a "prefix", it is something more complicated. I suspect any expression, even on non-string columns will work. This is when it became available:
---- 2018-10-22 8.0.13 General Availability -- -- -----
MySQL now supports creation of functional index key parts that index
expression values rather than column values. Functional key parts
enable indexing of values that cannot be indexed otherwise, such as
JSON values. For details, see CREATE INDEX Syntax.

MySQL index usage on join

I know there are several questions similar to this one, but those I've found do not relate directly to my problem.
Some initial context: I have a facts table, called ft_booking, with around 10MM records. I have a dimension called dm_date, with around 11k records, which are dates. These tables are related through foreign keys, as usual. There are 3 date foreign keys in the table ft_booking, one for boarding, one for booking, and other for cancellation. All columns have the very same definition, and the amount of distinct records for each is similar (ranging from 2.5k to 3k distinct values in each column).
There I go:
EXPLAIN SELECT
*
FROM dw.ft_booking b
LEFT JOIN dw.dm_date db ON db.sk_date = b.fk_date_booking
WHERE date (db.date) = '2018-05-05'
As you can see, index is being used in the table booking, and the query runs really fast, even though, in my filter, I'm using the date() function. For brevity, I'll just state that the same happens using the column fk_date_boarding. But, check this out:
EXPLAIN SELECT
*
FROM dw.ft_booking b
LEFT JOIN dw.dm_date db ON db.sk_date = b.fk_date_cancellation
WHERE date (db.date) = '2018-05-05';
For some mysterious reason, the planner chooses not to use the index. Now, I understand that using some function over a column kind of forces the database to perform a full table scan, in order to be able to apply that function over the column, thus bypassing the index. But, in this case, the function is not over the actual foreign key column, which is where the lookup in the booking table should be ocurring.
If I remove the date() function, the index will be used in any of those columns, as expected. One might say, then, "well, why don't you just get rid of the date() function?" - I use metabase, an Interface which allow users to use a graphical interface in order to build queries without knowing MySQL, and one of the current limitations of that tool is that it always uses the date() function when building queries not directly written in MySQL - hence, I have no way to remove the function in the queries I'm running.
Actual question: why does MySQL use index in the first two cases, but doesn't in the latter, considering the amount of distinct values is pretty much the same for all columns and they have the exact smae definition, apart from the name? Am I missing something here?
EDIT: Here is the CREATE statment of each table involved. There are some more, but we just need here tables ft_booking and dm_date (first two tables of the file).
You are "hiding date in a function call". If db.date is declared a DATE, then
date (db.date) = '2018-05-05'
can be simply
db.date = '2018-05-05'
If db.date is declared a DATETIME, then change to
db.date >= '2018-05-05'
AND db.date < '2018-05-05' + INTERVAL 1 DAY
In either case, be sure there is an index on db.date.
If by "I have a dimension called dm_date", you mean you built a dimension table to hold just dates, and then you are JOINing to the main table with some id, ... To put it bluntly, don't do that! Do not normalize "continuous" things such as DATE, DATETIME, FLOAT, or other numeric values.
If you need to discuss this further, please provide SHOW CREATE TABLE for the relevant table(s). (And please use text, not screen shots.)
Why??
The simple answer is that the Optimizer does not know how to unravel any function. Perhaps it could; perhaps it should. But it does not. Perhaps the answer involves not wanting to see how the function result will be used... comparing against a DATE? against a DATETIME? being used as a string? other?
Still, I suggest the real performance killer is the existence of dm_date rather than indexing and using the date in the main table.
Furthermore, the main table is bigger than it needs to be! fk_date_booking is a 4-byte INT SIGNED instead of a 3-byte DATE.

Django startswith vs endswith performance on MySQL

Lets say I have the following model
class Person(models.Model):
name = models.CharField(max_length=20, primary_key=True)
So I would have objects in the database like
Person.objects.create(name='alex white')
Person.objects.create(name='alex chen')
Person.objects.create(name='tony white')
I could then subsequently query for all users whose first name is alex or last name is white by doing the following
all_alex = Person.objects.filter(name__startswith='alex')
all_white = Person.objects.filter(name__endswith='white')
I do not know how Django implements this under the hood, but I am going to guess it is with a SQL LIKE 'alex%' or LIKE '%white'
However, since according to MySQL index documentation, since the primary key index can only be used (e.g. as opposed to a full table scan) if % appears on the end of the LIKE query.
Does that mean that, as the database grows, startswith will be viable - whereas endswith will not be since it will resort to full table scans?
Am I correct or did I go wrong somewhere? Keep in mind these are not facts but just my deductions that I made from general assumptions - hence why I am asking for confirmation.
Assuming you want AND -- that is only Alex White and not Alex Chen or Tony White, ...
Even better (assuming there is an index starting with name) is
SELECT ...
WHERE name LIKE 'Alex%White'
If Django can't generate that, then it is getting in the way of efficient use of MySQL.
This construct will scan all the names starting with alex, further filtering on the rest of the expression.
If you do want OR (and 3 names), then you are stuck with
SELECT ...
WHERE ( name LIKE 'Alex%'
OR name LIKE '%White' )
And there is no choice but to scan all the names.
In some situations, perhaps this one, FULLTEXT would be better:
FULLTEXT(name) -- This index is needed for the following:
SELECT ...
WHERE MATCH(name) AGAINST('Alex White' IN BOOLEAN MODE) -- for OR
SELECT ...
WHERE MATCH(name) AGAINST('+Alex +White' IN BOOLEAN MODE) -- for AND
(Again, I don't know the Django capabilities.)
Yes, your understanding is correct.
select *
from foo
where bar like 'text1%' and bar like '%text2'
is not necessarily optimal. This could be an improvement:
select *
from (select *
from foo
where foo.bar like 'text1%') t
where t.bar like '%text2'
You need to make measurements to check whether this is better. If it is, the cause is thatin the inner query you use an index, while in the outer query you not use an index, but the set is prefiltered by the first query, therefore you have a much smaller set to query.
I am not at all a Django expert, so my answer might be wrong, but I believe chaining your filter would be helpful if filter actually executes the query. If that is the case, then you can use the optimization described above. If filter just prepares a query and chaining filters will result in a single query different from the one above, then I recommend using hand-written MySQL. However, if you do not have performance issues yet, then it is premature to optimize it, since you cannot really test the amount of performance you gained.

How does a hash table work? Is it faster than "SELECT * from .."

Let's say, I have :
Key | Indexes | Key-values
----+---------+------------
001 | 100001 | Alex
002 | 100002 | Micheal
003 | 100003 | Daniel
Lets say, we want to search 001, how to do the fast searching process using hash table?
Isn't it the same as we use the "SELECT * from .. " in mysql? I read alot, they say, the "SELECT *" searching from beginning to end, but hash table is not? Why and how?
By using hash table, are we reducing the records we are searching? How?
Can anyone demonstrate how to insert and retrieve hash table process in mysql query code? e.g.,
SELECT * from table1 where hash_value="bla" ...
Another scenario:
If the indexes are like S0001, S0002, T0001, T0002, etc. In mysql i could use:
SELECT * from table WHERE value = S*
isn't it the same and faster?
A simple hash table works by keeping the items on several lists, instead of just one. It uses a very fast and repeatable (i.e. non-random) method to choose which list to keep each item on. So when it is time to find the item again, it repeats that method to discover which list to look in, and then does a normal (slow) linear search in that list.
By dividing the items up into 17 lists, the search becomes 17 times faster, which is a good improvement.
Although of course this is only true if the lists are roughly the same length, so it is important to choose a good method of distributing the items between the lists.
In your example table, the first column is the key, the thing we need to find the item. And lets suppose we will maintain 17 lists. To insert something, we perform an operation on the key called hashing. This just turns the key into a number. It doesn't return a random number, because it must always return the same number for the same key. But at the same time, the numbers must be "spread out" widely.
Then we take the resulting number and use modulus to shrink it down to the size of our list:
Hash(key) % 17
This all happens extremely fast. Our lists are in an array, so:
_lists[Hash(key % 17)].Add(record);
And then later, to find the item using that key:
Record found = _lists[Hash(key % 17)].Find(key);
Note that each list can just be any container type, or a linked list class that you write by hand. When we execute a Find in that list, it works the slow way (examine the key of each record).
Do not worry about what MySQL is doing internally to locate records quickly. The job of a database is to do that sort of thing for you. Just run a SELECT [columns] FROM table WHERE [condition]; query and let the database generate a query plan for you. Note that you don't want to use SELECT *, since if you ever add a column to the table that will break all your old queries that relied on there being a certain number of columns in a certain order.
If you really want to know what's going on under the hood (it's good to know, but do not implement it yourself: that is the purpose of a database!), you need to know what indexes are and how they work. If a table has no index on the columns involved in the WHERE clause, then, as you say, the database will have to search through every row in the table to find the ones matching your condition. But if there is an index, the database will search the index to find the exact location of the rows you want, and jump directly to them. Indexes are usually implemented as B+-trees, a type of search tree that uses very few comparisons to locate a specific element. Searching a B-tree for a specific key is very fast. MySQL is also capable of using hash indexes, but these tend to be slower for database uses. Hash indexes usually only perform well on long keys (character strings especially), since they reduce the size of the key to a fixed hash size. For data types like integers and real numbers, which have a well-defined ordering and fixed length, the easy searchability of a B-tree usually provides better performance.
You might like to look at the chapters in the MySQL manual and PostgreSQL manual on indexing.
http://en.wikipedia.org/wiki/Hash_table
Hash tables may be used as in-memory data structures. Hash tables may also be adopted for use with persistent data structures; database indices sometimes use disk-based data structures based on hash tables, although balanced trees are more popular.
I guess you could use a hash function to get the ID you want to select from. Like
SELECT * FROM table WHERE value = hash_fn(whatever_input_you_build_your_hash_value_from)
Then you don't need to know the id of the row you want to select and can do an exact query. Since you know that the row will always have the same id because of the input you build the hash value form and you can always recreate this id through the hash function.
However this isn't always true depending on the size of the table and the maximum number of hashvalues (you often have "X mod hash-table-size" somewhere in your hash). To take care of this you should have a deterministic strategy you use each time you get two values with the same id. You should check Wikipedia for more info on this strategy, its called collision handling and should be mentioned in the same article as hash-tables.
MySQL probably uses hashtables somewhere because of the O(1) feature norheim.se (up) mentioned.
Hash tables are great for locating entries at O(1) cost where the key (that is used for hashing) is already known. They are in widespread use both in collection libraries and in database engines. You should be able to find plenty of information about them on the internet. Why don't you start with Wikipedia or just do a Google search?
I don't know the details of mysql. If there is a structure in there called "hash table", that would probably be a kind of table that uses hashing for locating the keys. I'm sure someone else will tell you about that. =)
EDIT: (in response to comment)
Ok. I'll try to make a grossly simplified explanation: A hash table is a table where the entries are located based on a function of the key. For instance, say that you want to store info about a set of persons. If you store it in a plain unsorted array, you would need to iterate over the elements in sequence in order to find the entry you are looking for. On average, this will need N/2 comparisons.
If, instead, you put all entries at indexes based on the first character of the persons first name. (A=0, B=1, C=2 etc), you will immediately be able to find the correct entry as long as you know the first name. This is the basic idea. You probably realize that some special handling (rehashing, or allowing lists of entries) is required in order to support multiple entries having the same first letter. If you have a well-dimensioned hash table, you should be able to get straight to the item you are searching for. This means approx one comparison, with the disclaimer of the special handling I just mentioned.

How do I optimize MySQL's queries with constants?

NOTE: the original question is moot but scan to the bottom for something relevant.
I have a query I want to optimize that looks something like this:
select cols from tbl where col = "some run time value" limit 1;
I want to know what keys are being used but whatever I pass to explain, it is able to optimize the where clause to nothing ("Impossible WHERE noticed...") because I fed it a constant.
Is there a way to tell mysql to not do constant optimizations in explain?
Am I missing something?
Is there a better way to get the info I need?
Edit: EXPLAIN seems to be giving me the query plan that will result from constant values. As the query is part of a stored procedure (and IIRC query plans in spocs are generated before they are called) this does me no good because the value are not constant. What I want is to find out what query plan the optimizer will generate when it doesn't known what the actual value will be.
Am I missing soemthing?
Edit2: Asking around elsewhere, it seems that MySQL always regenerates query plans unless you go out of your way to make it re-use them. Even in stored procedures. From this it would seem that my question is moot.
However that doesn't make what I really wanted to know moot: How do you optimize a query that contains values that are constant within any specific query but where I, the programmer, don't known in advance what value will be used? -- For example say my client side code is generating a query with a number in it's where clause. Some times the number will result in an impossible where clause other times it won't. How can I use explain to examine how well optimized the query is?
The best approach I'm seeing right off the bat would be to run EXPLAIN on it for the full matrix of exist/non-exist cases. Really that isn't a very good solution as it would be both hard and error prone to do by hand.
You are getting "Impossible WHERE noticed" because the value you specified is not in the column, not just because it is a constant. You could either 1) use a value that exists in the column or 2) just say col = col:
explain select cols from tbl where col = col;
For example say my client side code is generating a query with a number in it's where clause.
Some times the number will result in an impossible where clause other times it won't.
How can I use explain to examine how well optimized the query is?
MySQL builds different query plans for different values of bound parameters.
In this article you can read the list of when does the MySQL optimizer does what:
Action When
Query parse PREPARE
Negation elimination PREPARE
Subquery re-writes PREPARE
Nested JOIN simplification First EXECUTE
OUTER->INNER JOIN conversions First EXECUTE
Partition pruning Every EXECUTE
COUNT/MIN/MAX elimination Every EXECUTE
Constant subexpression removal Every EXECUTE
Equality propagation Every EXECUTE
Constant table detection Every EXECUTE
ref access analysis Every EXECUTE
range/index_merge analysis and optimization Every EXECUTE
Join optimization Every EXECUTE
There is one more thing missing in this list.
MySQL can rebuild a query plan on every JOIN iteration: a such called range checking for each record.
If you have a composite index on a table:
CREATE INDEX ix_table2_col1_col2 ON table2 (col1, col2)
and a query like this:
SELECT *
FROM table1 t1
JOIN table2 t2
ON t2.col1 = t1.value1
AND t2.col2 BETWEEN t1.value2_lowerbound AND t2.value2_upperbound
, MySQL will NOT use an index RANGE access from (t1.value1, t1.value2_lowerbound) to (t1.value1, t1.value2_upperbound). Instead, it will use an index REF access on (t1.value) and just filter out the wrong values.
But if you rewrite the query like this:
SELECT *
FROM table1 t1
JOIN table2 t2
ON t2.col1 <= t1.value1
AND t2.col1 >= t2.value1
AND t2.col2 BETWEEN t1.value2_lowerbound AND t2.value2_upperbound
, then MySQL will recheck index RANGE access for each record from table1, and decide whether to use RANGE access on the fly.
You can read about it in these articles in my blog:
Selecting timestamps for a time zone - how to use coarse filtering to filter out timestamps without a timezone
Emulating SKIP SCAN - how to emulate SKIP SCAN access method in MySQL
Analytic functions: optimizing LAG, LEAD, FIRST_VALUE, LAST_VALUE - how to emulate Oracle's analytic functions in MySQL
Advanced row sampling - how to select N records from each group in MySQL
All these things employ RANGE CHECKING FOR EACH RECORD
Returning to your question: there is no way to tell which plan will MySQL use for every given constant, since there is no plan before the constant is given.
Unfortunately, there is no way to force MySQL to use one query plan for every value of a bound parameter.
You can control the JOIN order and INDEX'es being chosen by using STRAIGHT_JOIN and FORCE INDEX clauses, but they will not force a certain access path on an index or forbid the IMPOSSIBLE WHERE.
On the other hand, for all JOIN's, MySQL employs only NESTED LOOPS. That means that if you build right JOIN order or choose right indexes, MySQL will probably benefit from all IMPOSSIBLE WHERE's.
How do you optimize a query with values that are constant only to the query but where I, the programmer, don't known in advance what value will be used?
By using indexes on the specific columns (or even on combination of columns if you always query the given columns together). If you have indexes, the query planner will potentially use them.
Regarding "impossible" values: the query planner can conclude that a given value is not in the table from several sources:
if there is an index on the particular column, it can observe that the particular value is large or smaller than any value in the index (min/max values take constant time to extract from indexes)
if you are passing in the wrong type (if you are asking for a numeric column to be equal with a text)
PS. In general, creation of the query plan is not expensive and it is better to re-create than to re-use them, since the conditions might have changed since the query plan was generated and a better query plan might exists.