Compare commits

...

23 Commits

Author SHA1 Message Date
Joachim Breitner
7234d6477f perf: eliminate checkSystemInterval overhead when disabled
Skip the FFI call to lean_check_system_interval entirely when
LEAN_CHECK_SYSTEM_INTERVAL_MS is unset by checking a builtin_initialize'd
Bool flag inline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 18:14:34 +00:00
Joachim Breitner
f3348db41a Merge master into joachim/checkSystemTime 2026-04-06 14:19:13 +00:00
Joachim Breitner
ca28d0719b chore: remove non-instrumentation changes from branch
Keep only the checkSystemInterval FFI and its call in checkInterrupted.
The withIncRecDepth+checkInterrupted change belongs in a separate PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 10:14:43 +00:00
Joachim Breitner
9d90cb957a Merge branch 'master' of https://github.com/leanprover/lean4 into joachim/checkSystemTime 2026-04-06 10:10:56 +00:00
Joachim Breitner
e8836ac7a2 chore: remove perf_event_open, amortize LCNF simp checkSystem to % 128
Remove instruction-count mode (perf_event_open) from check_system
interval monitoring — keep only the simpler CPU-time mode
(LEAN_CHECK_SYSTEM_INTERVAL_MS). The perf_event_open code can be
re-added later if needed.

Also amortize LCNF simp checkSystem to every 128 visits (unconditional
was +0.44% on big_do.lean per CI benchmarks).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 09:00:16 +00:00
Joachim Breitner
0a0aca5db4 feat: add instruction-count mode to check_system interval monitoring
Adds `LEAN_CHECK_SYSTEM_INTERVAL_INSN=N` env var (N in millions of
instructions) using perf_event_open on Linux. This is deterministic and
load-independent, unlike the existing CPU-time mode. Instruction count
mode takes priority over `LEAN_CHECK_SYSTEM_INTERVAL_MS` when both are
set.

Also includes LCNF checkSystem calls (simp + per-declaration) for
testing gap coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 11:47:20 +00:00
Joachim Breitner
555e957bbd Merge branch 'master' of https://github.com/leanprover/lean4 into joachim/checkSystemTime 2026-04-04 21:28:48 +00:00
Joachim Breitner
1581f29df7 Merge remote-tracking branch 'origin/master' into joachim/checkSystemTime 2026-04-04 07:12:07 +00:00
Joachim Breitner
bf3743ef12 Merge branch 'master' of https://github.com/leanprover/lean4 into joachim/checkSystemTime 2026-04-03 21:15:00 +00:00
Joachim Breitner
6ba4b30109 fix: improve check_system interval monitoring
Move the timer reset in check_system_interval to after the backtrace,
so that backtrace overhead doesn't inflate the next gap measurement.

Also fix merge: restore withIncRecDepth calling checkInterrupted
(from #13258) which was lost during the merge with master.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 20:33:36 +00:00
Joachim Breitner
d1af3f6b07 Merge remote-tracking branch 'origin/joachim/inferType-checkSystem' into joachim/checkSystemTime 2026-04-03 06:39:10 +00:00
Joachim Breitner
1de717dc1a Merge branch 'master' of https://github.com/leanprover/lean4 into joachim/checkSystemTime 2026-04-03 06:38:42 +00:00
Joachim Breitner
623b027952 perf: add checkInterrupted to inferType cache miss path
This adds a `Core.checkInterrupted` call in `checkInferTypeCache` on
cache miss, allowing cancellation to be detected during large type
inference traversals. Previously, `inferTypeImp` could run for >100ms
without any interruption check when processing large expressions (e.g.
BVDecide proof terms), making IDE cancellation unresponsive.

The check is only on cache miss (actual computation), so cache hits
have zero overhead. `checkInterrupted` is used instead of the heavier
`checkSystem` to minimize performance impact.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 12:51:21 +00:00
Joachim Breitner
92108472e5 revert: remove ineffective check_heartbeat amortization
check_heartbeat is only ever called from check_system, so amortizing
check_system calls from within check_heartbeat has no effect on the
kernel type checker gaps. Remove the amortization to avoid unnecessary
CI regression.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 11:42:08 +00:00
Joachim Breitner
480b2bffc5 perf: increase check_heartbeat amortization interval to 1024
Reduce overhead of the amortized check_system in check_heartbeat
by checking every 1024 heartbeats instead of 256.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 10:55:35 +00:00
Joachim Breitner
908b647be9 perf: amortize check_system via check_heartbeat
Add an amortized check_system call every 256 heartbeats inside
check_heartbeat. This ensures periodic stack, memory, and monitoring
interval checks for any code path that uses heartbeats, including
the kernel type checker.

Also fix backtrace cascade in check_system_interval monitoring by
resetting the timer after printing the warning, and remove the
redundant per-call checkSystem in LCNF simp (keeping only the
amortized one).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 09:59:28 +00:00
Joachim Breitner
9beb4dad84 Merge remote-tracking branch 'origin/master' into joachim/checkSystemTime 2026-04-02 07:57:47 +00:00
Joachim Breitner
29bd37591c fix: add amortized checkSystem to LCNF simp
This PR adds a periodic `checkSystem` call to the LCNF simplifier's
recursive traversal, checking every 512 visited nodes. This breaks up
multi-second gaps (observed up to 2.3s on `big_do.lean`) without
measurable performance regression — the amortized cost is effectively
zero on benchmarks like `big_do` and `simp_local`.

An unconditional `checkSystem` on every node caused a 0.4% instruction
count regression, which is why the amortized approach is used instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 14:14:18 +00:00
Joachim Breitner
e2937ad233 fix: add checkSystem calls to long-running elaboration paths
This PR adds `checkSystem` calls to several code paths that can run for
extended periods without checking for cancellation, heartbeat limits, or
stack overflow. This improves responsiveness of the cancellation mechanism
in the language server.

Affected paths:
- `simp` rewrite candidate loops (`Rewrite.lean`)
- `simpAppUsingCongr` argument traversal (`Types.lean`)
- `synthesizeSyntheticMVarsStep` mvar loop (`SyntheticMVars.lean`)
- `abstractNestedProofs` visitor (`AbstractNestedProofs.lean`)
- `transform`/`transformWithCache` visitors (`Transform.lean`)
- LCNF compiler pass runner loop (`Main.lean`)
- LCNF checker recursive traversal (`Check.lean`)
- LCNF simplifier recursive traversal (`Simp/Main.lean`)
- `whnfCore` recursive reduction loop (`WHNF.lean`)

With these changes, at a 50ms monitoring threshold across the full test suite,
the only remaining Lean-side elaborator gaps are:
- bv_decide pure computation wrapped in `IO.lazyPure` (architectural)
- Single deep `whnf` reductions from the `constructorNameAsVariable` linter (~100-180ms)

Remaining C++ gaps (type checker, interpreter module init) are tracked separately.

Found using `LEAN_CHECK_SYSTEM_INTERVAL_MS` monitoring from #13218.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 12:56:03 +00:00
Joachim Breitner
66924cc4ac fix: add checkSystem calls to long-running elaboration paths
This PR adds `checkSystem` calls to several code paths that can run for
extended periods without checking for cancellation, heartbeat limits, or
stack overflow. This improves responsiveness of the cancellation mechanism
in the language server.

Affected paths:
- `simp` rewrite candidate loops (`Rewrite.lean`)
- `simpAppUsingCongr` argument traversal (`Types.lean`)
- `synthesizeSyntheticMVarsStep` mvar loop (`SyntheticMVars.lean`)
- `abstractNestedProofs` visitor (`AbstractNestedProofs.lean`)
- `transform`/`transformWithCache` visitors (`Transform.lean`)
- LCNF compiler pass runner loop (`Main.lean`)
- LCNF checker recursive traversal (`Check.lean`)
- LCNF simplifier recursive traversal (`Simp/Main.lean`)
- `whnfCore` recursive reduction loop (`WHNF.lean`)

With these changes, at a 50ms monitoring threshold across the full test suite,
the only remaining Lean-side elaborator gaps are:
- bv_decide pure computation wrapped in `IO.lazyPure` (architectural)
- Single deep `whnf` reductions from the `constructorNameAsVariable` linter (~100-180ms)

Remaining C++ gaps (type checker, interpreter module init) are tracked separately.

Found using `LEAN_CHECK_SYSTEM_INTERVAL_MS` monitoring from #13218.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 09:12:33 +00:00
Joachim Breitner
7488a1f712 feat: share check_system interval timer between C++ and Lean
Factor the interval monitoring into a standalone `check_system_interval`
function exposed to Lean via `@[extern "lean_check_system_interval"]`.
Call it from both the C++ `check_system` and the Lean-level
`Core.checkInterrupted`, so that CPU time spent in Lean elaboration
(e.g. simp) is properly tracked and attributed.

Previously the timer only fired from C++ kernel code, so gaps caused by
Lean-level elaboration were misleadingly attributed to the kernel
component that happened to call `check_system` next.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:28:36 +00:00
Joachim Breitner
ffd0f7de85 feat: add backtrace to check_system interval warnings
This adds a stack trace dump when the check_system interval warning
fires, making it easier to identify the code path responsible for
the gap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:17:27 +00:00
Joachim Breitner
4f7de4a1d9 feat: add optional check_system interval monitoring
This adds optional monitoring of how frequently `check_system` is called
per thread. When `LEAN_CHECK_SYSTEM_INTERVAL_MS` is set, a warning is
printed to stderr if the interval between consecutive calls exceeds the
threshold. This helps identify code paths where cancellation checks are
too infrequent for responsive cancellation.

Disabled by default; set e.g. `LEAN_CHECK_SYSTEM_INTERVAL_MS=1000` to
enable with a 1-second threshold.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 13:51:13 +00:00
2 changed files with 85 additions and 0 deletions

View File

@@ -446,6 +446,24 @@ Note that the value of `ctx.initHeartbeats` is ignored and replaced with `IO.get
@[inline] def CoreM.toIO' (x : CoreM α) (ctx : Context) (s : State) : IO α :=
(·.1) <$> x.toIO ctx s
/--
Record a check-in for the `LEAN_CHECK_SYSTEM_INTERVAL_MS` monitoring.
When that env var is set, warns on stderr if too much CPU time has elapsed since
the last check-in from either the C++ `check_system` or this function.
-/
@[extern "lean_check_system_interval"]
private opaque checkSystemIntervalImpl (componentName : @& String) : BaseIO Unit
@[extern "lean_check_system_interval_is_enabled"]
private opaque checkSystemIntervalIsEnabled : Unit Bool
private builtin_initialize checkSystemIntervalEnabled : Bool
pure (checkSystemIntervalIsEnabled ())
@[inline] def checkSystemInterval (componentName : @& String) : BaseIO Unit := do
if checkSystemIntervalEnabled then
checkSystemIntervalImpl componentName
/--
Throws an internal interrupt exception if cancellation has been requested. The exception is not
caught by `try catch` but is intended to be caught by `Command.withLoggingExceptions` at the top
@@ -456,6 +474,7 @@ Like `checkSystem` but without the global heartbeat check, for callers that have
heartbeat tracking (e.g. `SynthInstance`).
-/
@[inline] def checkInterrupted : CoreM Unit := do
checkSystemInterval "Lean elaborator"
if let some tk := ( read).cancelTk? then
if ( tk.isSet) then
throwInterruptException

View File

@@ -5,6 +5,9 @@ Released under Apache 2.0 license as described in the file LICENSE.
Author: Leonardo de Moura
*/
#include <limits>
#include <cstdlib>
#include <ctime>
#include <execinfo.h>
#include "runtime/thread.h"
#include "runtime/interrupt.h"
#include "runtime/exception.h"
@@ -16,6 +19,68 @@ namespace lean {
LEAN_THREAD_VALUE(size_t, g_max_heartbeat, 0);
LEAN_THREAD_VALUE(size_t, g_heartbeat, 0);
// --- check_system interval monitoring ---
//
// When LEAN_CHECK_SYSTEM_INTERVAL_MS=N is set, warn on stderr if check_system
// is not called within N milliseconds of CPU time on the current thread.
// Uses CLOCK_THREAD_CPUTIME_ID so that IO waits do not count towards the interval.
// Zero overhead when env var is unset.
// 0 = disabled
static unsigned g_check_system_interval_ms = 0;
static bool g_check_system_interval_initialized = false;
static unsigned get_check_system_interval_ms() {
if (!g_check_system_interval_initialized) {
g_check_system_interval_initialized = true;
if (const char * env = std::getenv("LEAN_CHECK_SYSTEM_INTERVAL_MS")) {
g_check_system_interval_ms = std::atoi(env);
}
}
return g_check_system_interval_ms;
}
static uint64_t thread_cpu_time_ns() {
struct timespec ts;
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
return static_cast<uint64_t>(ts.tv_sec) * 1000000000ULL + static_cast<uint64_t>(ts.tv_nsec);
}
// Thread-local: last CPU time when check_system was called on this thread.
LEAN_THREAD_VALUE(uint64_t, g_last_check_system_ns, 0);
static void check_system_interval(char const * component_name) {
unsigned interval_ms = get_check_system_interval_ms();
if (interval_ms > 0) {
uint64_t now_ns = thread_cpu_time_ns();
uint64_t last_ns = g_last_check_system_ns;
g_last_check_system_ns = now_ns;
if (last_ns != 0) {
uint64_t elapsed_ms = (now_ns - last_ns) / 1000000;
if (elapsed_ms > interval_ms) {
fprintf(stderr,
"[check_system] WARNING: %llu ms CPU time since last check_system call "
"(component: %s)\n",
(unsigned long long)elapsed_ms, component_name);
void * bt_buf[64];
int nptrs = backtrace(bt_buf, 64);
backtrace_symbols_fd(bt_buf, nptrs, 2); // fd 2 = stderr
// Reset timer after printing to avoid backtrace overhead cascading
g_last_check_system_ns = thread_cpu_time_ns();
}
}
}
}
extern "C" LEAN_EXPORT obj_res lean_check_system_interval(b_lean_obj_arg component_name) {
check_system_interval(lean_string_cstr(component_name));
return lean_io_result_mk_ok(lean_box(0));
}
extern "C" LEAN_EXPORT uint8_t lean_check_system_interval_is_enabled(lean_obj_arg /* unit */) {
return get_check_system_interval_ms() > 0;
}
extern "C" LEAN_EXPORT obj_res lean_internal_get_default_max_heartbeat() {
#ifdef LEAN_DEFAULT_MAX_HEARTBEAT
return lean_box(LEAN_DEFAULT_MAX_HEARTBEAT);
@@ -71,6 +136,7 @@ void check_interrupted() {
}
void check_system(char const * component_name, bool do_check_interrupted) {
check_system_interval(component_name);
check_stack(component_name);
check_memory(component_name);
if (do_check_interrupted) {