Caching in the new trait solver
Caching results of the trait solver is necessary for performance.
We have to make sure that it is sound. Caching is handled by the
SearchGraph
The global cache
At its core, the cache is fairly straightforward. When evaluating a goal, we check whether it's in the global cache. If so, we reuse that entry. If not, we compute the goal and then store its result in the cache.
To handle incremental compilation the computation of a goal happens inside of
DepGraph::with_anon_task
which creates a new DepNode
which depends on all queries
used inside of this computation. When accessing the global cache we then read this
DepNode
, manually adding a dependency edge to all the queries used: source.
Dealing with overflow
Hitting the recursion limit is not fatal in the new trait solver but instead simply causes it to return ambiguity: source. Whether we hit the recursion limit can therefore change the result without resulting in a compilation failure. This means we must consider the remaining available depth when accessing a cache result.
We do this by storing more information in the cache entry. For goals whose evaluation
did not reach the recursion limit, we simply store its reached depth: source.
These results can freely be used as long as the current available_depth
is higher than
its reached_depth
: source. We then update the reached depth of the
current goal to make sure that whether we've used the global cache entry is not
observable: source.
For goals which reach the recursion limit we currently only use the cached result if the available depth exactly matches the depth of the entry. The cache entry for each goal therefore contains a separate result for each remaining depth: source.1
Handling cycles
The trait solver has to support cycles. These cycles are either inductive or coinductive,
depending on the participating goals. See the chapter on coinduction for more details.
We distinguish between the cycle heads and the cycle root: a stack entry is a
cycle head if it recursively accessed. The root is the deepest goal on the stack which
is involved in any cycle. Given the following dependency tree, A
and B
are both cycle
heads, while only A
is a root.
graph TB A --> B B --> C C --> B C --> A
The result of cycle participants depends on the result of goals still on the stack.
However, we are currently computing that result, so its result is still unknown. This is
handled by evaluating cycle heads until we reach a fixpoint. In the first iteration, we
return either success or overflow with no constraints, depending on whether the cycle is
coinductive: source. After evaluating the head of a cycle, we
check whether its provisional_result
is equal to the result of this iteration. If so,
we've finished evaluating this cycle and return its result. If not, we update the provisional
result and reevaluate the goal: source. After the first iteration it does not
matter whether cycles are coinductive or inductive. We always use the provisional result.
Only caching cycle roots
We cannot move the result of any cycle participant to the global cache until we've finished evaluating the cycle root. However, even after we've completely evaluated the cycle, we are still forced to discard the result of all participants apart from the root itself.
We track the query dependencies of all global cache entries. This causes the caching of
cycle participants to be non-trivial. We cannot simply reuse the DepNode
of the cycle
root.2 If we have a cycle A -> B -> A
, then the DepNode
for A
contains a dependency
from A -> B
. Reusing this entry for B
may break if the source is changed. The B -> A
edge may not exist anymore and A
may have been completely removed. This can easily result
in an ICE.
However, it's even worse as the result of a cycle can change depending on which goal is the root: example. This forces us to weaken caching even further. We must not use a cache entry of a cycle root, if there exists a stack entry, which was a participant of its cycle involving that root. We do this by storing all cycle participants of a given root in its global cache entry and checking that it contains no element of the stack: source.
The provisional cache
TODO: write this :3
- stack dependence of provisional results
- edge case: provisional cache impacts behavior
This is overly restrictive: if all nested goals return the overflow response with some
available depth n
, then their result should be the same for any depths smaller than n
.
We can implement this optimization in the future.
summarizing the relevant zulip thread