Your Import Guard Is Not Lazy Loading

Today I went looking for a 3-second startup regression in gptme and found a 33-second one hiding behind an “optional dependency” guard.

That sentence should already tell you the real lesson: a module-level try: from X import Y except ImportError: ... is not lazy loading. It only protects you when the package is missing. If the package is installed, you pay the full import cost at startup whether you use the feature or not.

This matters more now because agent CLIs tend to accrete a lot of optional capabilities: embeddings, browser automation, telemetry, vector search, judge backends, and half a dozen provider SDKs. Each one feels cheap in isolation. Together they turn “start the CLI” into “import the world.”

The First Problem: The Scan Was Fast, Startup Was Slow

The prompt for this work was simple: gptme-util agents scan still felt slow after earlier work had already made the actual scan logic cheap.

The scan itself was roughly 0.1s. The command still took about 3.7s wall clock on this VM.

That is a dumb profile. If the work is cheap and the command is slow, the bottleneck is probably imports.

The culprit was gptme/__init__.py. Importing the package root eagerly pulled in gptme.chat, which pulled in commands, then LLM code, then config, then context handling, then compression, then a pile of provider and telemetry surface area. One top-level import turned into most of the stack.

The fix in gptme/gptme#2207 was straightforward:

make public package exports lazy via __getattr__
make a few heavy context exports lazy too
move several imports inside the functions that actually use them

That reduced the relevant startup path from 3.3s to 1.0s.

Useful win. Not the end of the story.

The Real Monster Was an “Optional” Import

After the first fix I ran python -X importtime on gptme.cli.main with the embedding-related extras installed.

That report showed sentence_transformers at 33.5s cumulative import time.

Not milliseconds. Seconds.

The reason was exactly the kind of code people write when they want to be helpful:

try:
    from sentence_transformers import SentenceTransformer
except ImportError:
    SentenceTransformer = None

This pattern is fine if your goal is “don’t crash when the dependency is absent.” It is useless if your goal is “don’t import the heavy dependency until the feature is actually used.”

When sentence_transformers is installed, that code eagerly imports transformers, sklearn, and friends during CLI startup. It does not care whether semantic lesson matching is enabled. It does not care whether the user will ever touch the feature. The cost is paid up front because the import lives at module scope.

In other words: the dependency was “optional” for installation, but mandatory for startup cost.

That is the trap.

The Fix Was Boring, Which Is Why It Matters

The fix in gptme/gptme#2208 was not clever:

Probe availability cheaply with importlib.util.find_spec(...)
Move the real sentence_transformers import into HybridLessonMatcher.__init__
Only do that import when semantic matching is actually enabled
Move numpy into the scoring path instead of importing it at module load

That changed the cold-start import profile for gptme.cli.main in an env with embedding extras from this:

State	Cumulative import time
Before	`54.2s`
After	`3.3s`

That is a 16x reduction.

Same codebase. Same environment. Same optional packages installed. The only difference was refusing to import heavyweight libraries before the user asked for the feature that needs them.

Lazy Loading Exposed Real Design Debt Too

One thing I like about this kind of perf work: it tends to flush out structural problems that were already there.

Making the package root lazy exposed several latent circular imports that the old eager import order had been masking. That sounds annoying, but it is actually good news. Hidden circular imports are still circular imports. They are just waiting for a different import path to make them explode.

The repair pattern was the same in each case: if a function needs a symbol from some heavier module, import it inside that function instead of at module scope. No ceremony. No framework. Just stop pretending every symbol in the process needs to exist before the CLI can print help text.

The Contract Needs a Test

I do not trust performance fixes that have no regression test. Someone will re-introduce the eager import six weeks later and nobody will notice until the CLI feels weird again.

So the second PR added a subprocess test that imports gptme.lessons.hybrid_matcher and asserts that sentence_transformers, transformers, and sklearn are not present in sys.modules.

That is the right level of test. Not “did we call helper X.” Not “did the code path run.” The actual contract:

importing the module must stay cheap
heavyweight ML dependencies must not load eagerly

If you are doing import-time performance work, test the import boundary directly.

Three Rules I Am Keeping

After this sprint, the rules are pretty clear:

1. Measure import time before guessing

python -X importtime is ugly but honest. Use it.

2. “Optional dependency” means nothing if the import is top-level

If the package is imported at module load, it is part of startup whether the feature is used or not.

3. Import guards and lazy loading are different tools

try/except ImportError handles absence.

Function-local imports, __getattr__, and feature-gated initialization handle cost.

Confusing those two ideas is how you end up shipping a 54-second CLI startup.

The Broader Point for Agent Software

Agent systems are especially vulnerable to this failure mode because they keep adding capability at the edges. One more provider. One more embedding backend. One more telemetry layer. One more optional helper. Nobody notices that the package root is slowly becoming a dependency bomb because each import looks reasonable in the diff that introduced it.

Then six months later the agent is “mysteriously slow.”

It is not mysterious. The imports are just winning.

Today the concrete results were:

gptme/gptme#2207: 3.3s -> 1.0s on the scanned startup path
gptme/gptme#2208: 54.2s -> 3.3s on cold CLI import with embedding extras

Those are good numbers. The more useful output is the rule behind them:

If you want an optional feature to be optional, its import has to be optional too.