Caching in the new trait solver

Caching results of the trait solver is necessary for performance. We have to make sure that it is sound. Caching is handled by the SearchGraph

The global cache

At its core, the cache is fairly straightforward. When evaluating a goal, we check whether it's in the global cache. If so, we reuse that entry. If not, we compute the goal and then store its result in the cache.

To handle incremental compilation the computation of a goal happens inside of [DepGraph::with_anon_task] which creates a new DepNode which depends on all queries used inside of this computation. When accessing the global cache we then read this DepNode, manually adding a dependency edge to all the queries used: source.

Dealing with overflow

Hitting the recursion limit is not fatal in the new trait solver but instead simply causes it to return ambiguity: source. Whether we hit the recursion limit can therefore change the result without resulting in a compilation failure. This means we must consider the remaining available depth when accessing a cache result.

We do this by storing more information in the cache entry. For goals whose evaluation did not reach the recursion limit, we simply store its reached depth: source. These results can freely be used as long as the current available_depth is higher than its reached_depth: source. We then update the reached depth of the current goal to make sure that whether we've used the global cache entry is not observable: source.

For goals which reach the recursion limit we currently only use the cached result if the available depth exactly matches the depth of the entry. The cache entry for each goal therefore contains a separate result for each remaining depth: source.1

Handling cycles

The trait solver has to support cycles. These cycles are either inductive or coinductive, depending on the participating goals. See the chapter on coinduction for more details. We distinguish between the cycle heads and the cycle root: a stack entry is a cycle head if it recursively accessed. The root is the deepest goal on the stack which is involved in any cycle. Given the following dependency tree, A and B are both cycle heads, while only A is a root.

graph TB
    A --> B
    B --> C
    C --> B
    C --> A

The result of cycle participants depends on the result of goals still on the stack. However, we are currently computing that result, so its result is still unknown. This is handled by evaluating cycle heads until we reach a fixpoint. In the first iteration, we return either success or overflow with no constraints, depending on whether the cycle is coinductive: source. After evaluating the head of a cycle, we check whether its provisional_result is equal to the result of this iteration. If so, we've finished evaluating this cycle and return its result. If not, we update the provisional result and reevaluate the goal: source. After the first iteration it does not matter whether cycles are coinductive or inductive. We always use the provisional result.

Only caching cycle roots

We cannot move the result of any cycle participant to the global cache until we've finished evaluating the cycle root. However, even after we've completely evaluated the cycle, we are still forced to discard the result of all participants apart from the root itself.

We track the query dependencies of all global cache entries. This causes the caching of cycle participants to be non-trivial. We cannot simply reuse the DepNode of the cycle root.2 If we have a cycle A -> B -> A, then the DepNode for A contains a dependency from A -> B. Reusing this entry for B may break if the source is changed. The B -> A edge may not exist anymore and A may have been completely removed. This can easily result in an ICE.

However, it's even worse as the result of a cycle can change depending on which goal is the root: example. This forces us to weaken caching even further. We must not use a cache entry of a cycle root, if there exists a stack entry, which was a participant of its cycle involving that root. We do this by storing all cycle participants of a given root in its global cache entry and checking that it contains no element of the stack: source.

The provisional cache

TODO: write this :3

  • stack dependence of provisional results
  • edge case: provisional cache impacts behavior

This is overly restrictive: if all nested goal return the overflow response with some availabledepth n, then their result should be the same for any depths smaller than n. We can implement this optimization in the future.


summarizing the relevant zulip thread