E56B
Skip to content

Stream mode: Phase 3#125

Open
merv1n34k wants to merge 2 commits intolet-def:mainfrom
merv1n34k:feat/request-file
Open

Stream mode: Phase 3#125
merv1n34k wants to merge 2 commits intolet-def:mainfrom
merv1n34k:feat/request-file

Conversation

@merv1n34k
Copy link
Copy Markdown
Contributor
@merv1n34k merv1n34k commented Mar 30, 2026

Hey!

Closes #120 phase 3. The implementation I come up with is quite interesting, although I am not sure if is does not bear any noticeable performance issues.

Report

feat: add non-blocking request-file via Q_OPRL last-resort query

Idea: Phases 1 and 2 let the editor push files, but there's no way for the engine to pull -- if it needs a file that kpathsea can't resolve and the editor hasn't pushed yet, it just fails. New protocol query Q_OPRL ("Open Read Last-resort") is sent by the engine only after all local fallbacks have failed:

  1. Q_OPRD -> driver VFS/disk -> A_OPEN or A_PASS
  2. fopen() -> local disk
  3. kpathsea/tectonic -> system packages
  4. Q_OPRL -> A_PASS immediately + request-file notification <- new
  5. NULL if still unresolved
Details
  • texpresso_protocol.c/h: New T_OPRL tag and txp_open_last_resort() function (identical to txp_open but sends T_OPRL, always READ mode).
  • main.c (engine): Calls txp_open_last_resort() at the end of ttstub_input_open after all fallbacks fail.
  • sprotocol.h/c: New Q_OPRL query enum value, parsing and logging.
  • engine_tex.c: Q_OPRL handler with two branches -- if file is already in VFS (edit_data), respond A_OPEN; otherwise respond A_PASS and emit request-file via stdout.
  • editor.c/h: editor_request_file() outputs request-file "path" message.
  • EDITOR-PROTOCOL.md: Documents the new message.

Design decisions:

Q_OPRL always returns A_PASS immediately and Restart via existing rollback
  • Non-blocking: Q_OPRL always returns A_PASS immediately. Why not block? Q_OPRL fires for all unresolved files, including .aux on first run. TeX handles missing .aux gracefully (writes a fresh one), but a blocking query would hang forever for files the editor can't provide. Research into XeTeX source confirmed only \input and format loading are fatal on NULL -- everything else (.aux, \openin, fonts, pictures) handles it fine, and there's no engine-side flag to distinguish them.
  • Restart via existing rollback: When the editor provides a file via open, interpret_open stores it in edit_data -> notify_file_changes -> rollback_processes kills the dead engine -> prepare_process forks from snapshot -> new engine sends Q_OPRD -> driver finds file in VFS -> A_OPEN -> compilation succeeds. Multiple request-file messages can fire in one run; the editor batches all open responses and a single restart resolves all of them.
  • Performance: Restart cost is ~0.5-2s (fork from snapshot). Standard packages resolved by kpathsea never trigger Q_OPRL.

fix: interpret_open relative path handling

Issue: relative_path() was called unconditionally on paths from open commands. Files provided via request-file responses use relative paths, which got rejected as "different root."

Fix: Guard relative_path() behind path[0] == '/' so relative paths pass through directly.

fix: -test-initialize exit condition

Issue: With non-blocking Q_OPRL, the engine can die before producing pages (e.g. \input of a missing file). The old initialize_only check exited immediately without giving the editor a chance to provide files.

Fix: Exit on page_count > 0 (success) or DOC_TERMINATED && stdin_eof (engine died and no editor is connected -- no recovery possible).

chore: add CI

  • test/request-file.tex: Document that \inputs a non-existent file.
  • test/test-request-file.sh: Integration test using FIFO -- waits for request-file on stdout, provides the file via stdin open command, verifies engine exits successfully.
  • Makefile: make test-request-file target.

New capabilities

I had more use cases, but I think these are the most powerful, correct me if I'm wrong on something. We can also add a note to README if you want. With phases 1 (-stream), 2 (open-base64), and now 3 (request-file):

  • Editor can start with just the root.tex file
  • TeXpresso discovers dependencies and requests them on-demand via request-file
  • System packages are resolved by kpathsea without editor involvement
  • The protocol is truly bidirectional for file content: editor pushes via open, TeXpresso pulls via request-file
  • Fetch from anywhere — on request-file, the editor could fetch from a remote server, database, or generate content programmatically. TeXpresso doesn't care where the bytes come from.

@merv1n34k merv1n34k changed the title Steam mode: Phase 3 Stream mode: Phase 3 Mar 30, 2026
@let-def
Copy link
Copy Markdown
Owner
let-def commented Mar 30, 2026

I guess you had to go this way to avoid the overhead of asking and waiting for the editor for every lookup tentative?

@merv1n34k
Copy link
Copy Markdown
Contributor Author

Most likely. I think this approach should reduce work for editors in general, as now engine will directly request the files it needs and no guessing logic needed. If editors can implement batch mode for these requests then the performance difference should be negligible.

I plan to test it out with texpresso.vim as soon as we merge all 3 phases. For now restart cost was estimated very roughly, but I feel this won't be critical.

@let-def
Copy link
Copy Markdown
Owner
let-def commented Mar 30, 2026

Ok, I think we can proceed with this approach. Let me review a bit more thoroughly.

Independent but related idea: what about allowing the editor to push just the filenames, and have the engine pull a file when a lookup happens on a known but unpushed file?

@merv1n34k
Copy link
Copy Markdown
Contributor Author

Independent but related idea: what about allowing the editor to push just the filenames, and have the engine pull a file when a lookup happens on a known but unpushed file?

Oh, that's could be interesting. I am thinking of (register "chapter1.tex"). So this is where the editor must walk the root directory and probably all \input requests, then pack them and send to driver. This could act like a promise. How about this workflow (still, a draft):

  1. Editor sends (register "chapter1.tex")
  2. Engine hits Q_OPRD for chapter1.tex
  3. Driver sees it's promised -> emits request-file, does NOT answer Q_OPRD yet
  4. Editor responds with (open "chapter1.tex" "content...")
  5. Driver answers Q_OPRD with A_OPEN
  6. Engine continues -- no restart

This eliminates the restart cycle entirely for known files. The engine just pauses, gets the content, and moves on. This is safe to block because the editor made an explicit promise (unlike Q_OPRL where we don't know if anyone can provide the file).

You think it can be added as phase 4? :)

@let-def
Copy link
Copy Markdown
Owner
let-def commented Apr 2, 2026

Independent but related idea: what about allowing the editor to push just the filenames, and have the engine pull a file when a lookup happens on a known but unpushed file?

Oh, that's could be interesting. I am thinking of (register "chapter1.tex"). So this is where the editor must walk the root directory and probably all \input requests, then pack them and send to driver. This could act like a promise. How about this workflow (still, a draft):

1. Editor sends `(register "chapter1.tex")`

2. Engine hits `Q_OPRD` for `chapter1.tex`

3. Driver sees it's promised -> emits `request-file`, does NOT answer `Q_OPRD` yet

4. Editor responds with `(open "chapter1.tex" "content...")`

5. Driver answers `Q_OPRD` with `A_OPEN`

6. Engine continues -- no restart

This eliminates the restart cycle entirely for known files. The engine just pauses, gets the content, and moves on. This is safe to block because the editor made an explicit promise (unlike Q_OPRL where we don't know if anyone can provide the file).

You think it can be added as phase 4? :)

I was just sharing my thoughts, feel free to add it if it matches the workflow you have in mind.

Thinking more about the problem, I am not super satisfied by Q_OPRL... It is expected to return A_PASS, but might not. So we have to duplicated some of the opening logic, in case it returns something it is not expected too.

Let me suggest another strategy (you tell me if it is inappropriate) which requires no change to the engine. Everytime the driver is not able to satisfy a lookup, it also writes a message to the editor about the filename that failed to be satisfied (e.g. adding a function editor_lookup_failed(path) which sends a message ["lookup-failed","foo.tex"]).
And that's all.

The editor can then decide to push (or not) based on the filename. This way, the editor will also see the different attempts (e.g. when using \input{foo} it should receives both lookup-failed "foo" and lookup-failed "foo.tex").

@merv1n34k
Copy link
Copy Markdown
Contributor Author
merv1n34k commented Apr 2, 2026

Well, the issue with this approach is engine will emit lookup-failed for ALL files it interacts with, this includes .aux, .cls, .sty, etc, which must be resolved by kpathsea which goes after Q_OPRD, so the only way to validate engine really needs to request the file is to have a Q_OPRD after kpathsea. The mitigation for it in lookup-failed approach would be to filter out manually all the files we consider "internal", which IMO is a bad desіcion.

But you're right having duplicated logic is messy, I'll simplify it to return A_PASS whenever we reach Q_OPRL. It's essentially become a one-way notification dressed as a query because the protocol is query-based.

@let-def
Copy link
Copy Markdown
Owner
let-def commented Apr 2, 2026

Apologies, but I will be a bit annoying on this one, since the other solution is so simple and seems to be able to cover your use-case.

I made a proof-of-concept implementation, here is the lookup output for compiling test/simple.tex:

["lookup-file", "read", "successful", "simple.tex"]
["lookup-file", "write", "successful", "simple.log"]
["lookup-file", "read", "failed", "article.cls"]
["lookup-file", "read", "failed", "article.cls"]
["lookup-file", "write", "successful", "simple.synctex"]
["lookup-file", "read", "failed", "size12.clo"]
["lookup-file", "read", "failed", "size12.clo"]
["lookup-file", "read", "failed", "size12.clo"]
["lookup-file", "read", "failed", "lmroman12-regular"]
["lookup-file", "read", "failed", "tex-text.tec"]
["lookup-file", "read", "failed", "l3backend-xetex.def"]
["lookup-file", "read", "failed", "l3backend-xetex.def"]
["lookup-file", "read", "failed", "simple.aux"]
["lookup-file", "read", "failed", "simple.aux"]
["lookup-file", "write", "successful", "simple.aux"]
["lookup-file", "read", "failed", "cmr12"]
["lookup-file", "read", "failed", "cmr8"]
["lookup-file", "read", "failed", "cmr6"]
["lookup-file", "read", "failed", "cmmi12"]
["lookup-file", "read", "failed", "cmmi8"]
["lookup-file", "read", "failed", "cmmi6"]
["lookup-file", "read", "failed", "cmsy10"]
["lookup-file", "read", "failed", "cmsy8"]
["lookup-file", "read", "failed", "cmsy6"]
["lookup-file", "read", "failed", "lmroman8-regular"]
["lookup-file", "read", "failed", "tex-text.tec"]
["lookup-file", "write", "successful", "simple.xdv"]

Wouldn't that be sufficient for what you have in mind?

@merv1n34k
Copy link
Copy Markdown
Contributor Author

I agree it will be much simpler, but the thing is I don't like that 1) it clutters the traffic and make editor respond to every lookup-file notif, which would be solved in most cases by kpathsea and 2) engine can't really do pull requests and generates noise by any file it interacts with.

I have tried this solution before, but it just generated to much noise. I would vote on clean, but more complex solution then simple, but noisy.

@let-def
Copy link
Copy Markdown
Owner
let-def commented Apr 2, 2026
  1. it clutters the traffic and make editor respond to every lookup-file notif

These messages will be buffered and asynchronous (the driver is not waiting for an answer), this is essentially free.

which would be solved in most cases by kpathsea

The driver could skip the lookup-file messages for the ones that could be answered by kpathsea. However that breaks the case where someone overrides a kpathsea provided file with their own version (which is also not supported by OPLR).

Alternatively, the lookup-file message could also contain this information, and let the editor decide what to do (as I said, lookup-file messages are essentially free, they might clutter the log, but one can just grep them away if they need to read the log).

@merv1n34k
Copy link
Copy Markdown
Contributor Author

where someone overrides a kpathsea provided file with their own version

As I understand in both solutions this still be an issue, but for now I don't think we need to care.

Okay, I have looked through my further implementation, and I think you're right on request-file, we can drop it and do this driver-side only. This will be more noisy then with Q_OPRL, but it's easy to fix/ignore from editor.

However, I would highly suggest adding register PR #126, as it 8000 will solve the main performance issue I foresee on TeXpresso startup -- it's driver changes only.

Should I close this PR or I can push the lookup-file changes instead?

@let-def
Copy link
Copy Markdown
Owner
let-def commented Apr 2, 2026

As I understand in both solutions this still be an issue, but for now I don't think we need to care.

At least when it sees a lookup-file message, the editor knows that it should push the file if it is in its workspace.

However, I would highly suggest adding register PR #126, as it will solve the main performance issue I foresee on TeXpresso startup -- it's driver changes only.

Yes, I think #126 will be good. Another issue is that on startup, TeXpresso immediately starts processing the document, which do not give the editor the opportunity to populate the initial file system.

Okay, I have looked through my further implementation, and I think you're right on request-file, we can drop it and do this driver-side only. This will be more noisy then with Q_OPRL, but it's easy to fix/ignore from editor.
...
Should I close this PR or I can push the lookup-file changes instead?

You can if you are in hurry. Since we are still working out the design, I don't mind waiting a few more days to iron out other possible issues (and maybe go back to OPLR).

@merv1n34k
Copy link
Copy Markdown
Contributor Author

At least when it sees a lookup-file message, the editor knows that it should push the file if it is in its workspace.

Yes, now is up to editor to decide.

TeXpresso immediately starts processing the document, which do not give the editor the opportunity to populate the initial file system.

Hmm, I think editor can do preparation to pre-fetch all the paths it needs, but it (maybe?) can cause a race conditions issue with TeXpresso immediate response.

You can if you are in hurry.

Not at all. I will be glad to find the approach we will be comfortable with. The other project I am developing separately, and stream mode will be needed to properly link texpresso with it, but the linker it can wait.

@let-def
Copy link
Copy Markdown
Owner
let-def commented Apr 2, 2026

Hmm, I think editor can do preparation to pre-fetch all the paths it needs, but it (maybe?) can cause a race conditions issue with TeXpresso immediate response.

Yes, that's what I was thinking about. Maybe it does not matter (it will cause one initial backtrack once receiving the FS information from the editor). If it does, texpresso we could add an option to start in a "pause" mode, waiting for start/resume command from the editor.

@merv1n34k
Copy link
Copy Markdown
Contributor Author
merv1n34k commented Apr 3, 2026

I think having "pause"/"resume" commands would be super useful -- user could keep stop engine when they need to, without restarting driver itself.

Preserving its state is nice, and it seems very easy to implement.

@merv1n34k
Copy link
Copy Markdown
Contributor Author

Hey @let-def!

Have you got a chance to look at the stream mode?

@let-def
Copy link
Copy Markdown
Owner
let-def commented Apr 7, 2026

@merv1n34k I opened #127 which just adds the lookup-file message.
I will port your changes to interpret_open and import the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support stream-based editing without filesystem dependency

2 participants

0