fix(studio): ensure first chat tool call starts in session sandbox#4810
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
c5f6ca0 to
bebcb4c
Compare
bebcb4c to
17c0e32
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bebcb4c42d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4577fcbcd1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| // Re-read store after potential auto-load / model ready wait | ||
| runtime = useChatRuntimeStore.getState(); | ||
| resolvedThreadId = unstable_threadId ?? runtime.activeThreadId ?? undefined; |
There was a problem hiding this comment.
Guard session fallback against stale active thread
resolvedThreadId now falls back to runtime.activeThreadId when unstable_threadId is missing, but during a new-chat transition activeThreadId can still point to the previous thread (it is driven by ActiveThreadSync from mainThreadId in runtime-provider.tsx and only later overwritten after ThreadNewChatSwitch finishes initialize()). In that window, the first run can send session_id for the old conversation, so tool execution and adapter selection happen in the wrong thread sandbox and can mix state/files across chats. This regression is triggered by quickly sending the first message after creating a new chat, before the new remote thread ID is finalized.
Useful? React with 👍 / 👎.
d1ca422 to
4577fcb
Compare
|
I temporarily reverted the last commit so I can take a closer look at it. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d1ca4226eb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return () => { | ||
| cancelled = true; | ||
| }; |
There was a problem hiding this comment.
Clear init flag when new-chat switch is cancelled
If ThreadNewChatSwitch unmounts before initialize() resolves (for example, the user clicks New Chat and immediately selects another thread), cleanup only flips cancelled and never resets newThreadInitializing. Because both success and error paths guard setNewThreadInitializing(false) behind !cancelled, the flag can stay true indefinitely; then ActiveThreadSync keeps bailing out and activeThreadId stops updating, which can route later tool calls with a missing/stale session_id.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Thank you for the PR! The goal of this PR is to fix the first tool call in a new Studio chat starting outside the session sandbox. As a summary, this PR resolves the thread ID eagerly at request time using unstable_threadId/activeThreadId as a fallback, eagerly initializes the thread when switching to a new chat, and adds a backend safety net to use a _default subdirectory instead of the sandbox root.
I pushed a follow-up commit (ce0becc) that fixes three issues found during review:
| Reviewers | Severity | Finding |
|---|---|---|
| 9/10 | High | ActiveThreadSync can repopulate stale activeThreadId from previous chat during new-thread initialization, routing first tool call to the wrong sandbox |
| 7/10 | High | resolvedThreadId was re-read after model load waits, allowing it to drift if the user switches chats during auto-load |
| 5/10 | Medium | initialize() in the fire-and-forget IIFE had no error handling, leaving unhandled rejections and inconsistent store state |
Changes in the follow-up commit:
-
Disable ActiveThreadSync during new-chat creation -- pass
!newThreadNonceto theenabledprop so the old mainThreadId does not get written back into activeThreadId while initialize() is resolving. -
Clear activeThreadId synchronously at the start of the ThreadNewChatSwitch effect, before the async IIFE. This ensures the adapter never picks up a stale thread ID from a previous chat during the initialization window.
-
Make resolvedThreadId stable -- captured once with
constat the start ofrun()instead of re-read after model load. This prevents the request from being retargeted if the user switches chats while waiting for auto-load. -
Add try/catch around the async initialize() call so failures are logged and the store is reset to null rather than left in a broken state.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the backend to use a _default subdirectory for shared sandboxes and improves frontend thread ID stability during chat initialization. The frontend now captures the thread ID at the start of the stream adapter and manages active thread IDs more robustly when creating new chats. Feedback was provided regarding the fallback session ID in the frontend, which should be updated to match the backend's _default directory name to prevent 404 errors when retrieving images.
| if (imgIdx !== -1) { | ||
| const text = rawResult.slice(0, imgIdx); | ||
| const sessionId = unstable_threadId || ""; | ||
| const sessionId = resolvedThreadId || ""; |
There was a problem hiding this comment.
To ensure that images generated during tool calls without a session ID can be correctly retrieved from the backend, the fallback sessionId should match the backend's _default directory name. Currently, it defaults to an empty string, which causes a 404. Per repository rules, when duplicating default values between frontend and backend, define them as named constants and add comments to cross-reference them to ensure they stay in sync.
const DEFAULT_SESSION_ID = "_default"; // Matches DEFAULT_SESSION_ID in studio/backend/app/constants.py
const sessionId = resolvedThreadId || DEFAULT_SESSION_ID;References
- When default values must be duplicated between frontend and backend, define them as named constants in both places and add comments to cross-reference them, rather than creating a shared API endpoint.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce0becc504
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Capture the thread ID once at the start so it stays stable even if | ||
| // the user switches chats while waiting for model load / auto-load. | ||
| const resolvedThreadId = | ||
| unstable_threadId ?? runtime.activeThreadId ?? undefined; |
There was a problem hiding this comment.
Gate active-thread fallback to single-chat requests
The new resolvedThreadId fallback can misroute compare runs into an unrelated single-chat sandbox. In compare mode, activeThreadId is not synchronized (ActiveThreadSync is disabled when pairId is set in runtime-provider.tsx), so it commonly retains the last single-chat thread ID. When a compare run starts before unstable_threadId is populated (the same missing-thread window this patch targets), this line falls back to that stale ID and then sends it as session_id / adapter lookup, which can mix tool state/files across conversations.
Useful? React with 👍 / 👎.
|
I revised the last commit I made (and reverted) and ran some testing against it. I'll reconcile it with your changes and rerun my testing, then push it. |
|
@danielhanchen Apologies, but I'm having trouble with us both working on the branch at the same time. Can you pause until I re-mark the PR Ready for Review? |
|
Pardon the churn. I'm pretty new at contributing. Your changes fix the issue with a simpler approach, and I see you are also working on sandbox hardening. I'll mark the PR Ready for Review and if I come across anything else I'll open a separate issue/PR. I appreciate the engagement! |
- Clear activeThreadId synchronously when switching to a new chat so the adapter never picks up the previous thread's ID during initialization - Disable ActiveThreadSync while a new-chat nonce is active to prevent it from re-populating a stale mainThreadId before initialize() resolves - Capture resolvedThreadId once at the start of run() and keep it stable through model load / auto-load waits - Add try/catch around the async initialize() call so failures are logged and do not leave the store in an inconsistent state
When resolvedThreadId is undefined (during new-chat initialization), the image sessionId now falls back to "_default" instead of "" to match the backend's _get_workdir fallback directory. This ensures tool-generated images can be retrieved even if the first tool call fires before thread initialization completes.
Prevents compare panes from inheriting a stale single-chat thread ID as a fallback for session_id routing. Without this, the first compare tool call could route to the previous single-chat's sandbox.
Both handleNewCompare and enterCompare transition into compare mode, but only handleNewCompare was clearing activeThreadId. This makes enterCompare consistent so compare panes never inherit a stale single-chat thread ID for session_id routing.
a7846cc to
badd9ef
Compare
restore context usage from saved view on exit - Clear activeThreadId in the training handoff compare path so it is consistent with handleNewCompare and enterCompare - In exitCompare, read the thread ID from viewBeforeCompare instead of the store, since activeThreadId may have been cleared on compare entry
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request improves session and thread ID management to ensure consistent sandbox routing and prevent state leakage between chat sessions. Key changes include defaulting backend workdirs to a _default subdirectory, capturing stable thread IDs at the start of inference in the frontend, and clearing active thread IDs when entering comparison modes or initializing new chats. One piece of feedback suggests refining the cleanup logic in the ThreadNewChatSwitch component's effect to more robustly handle potential race conditions during asynchronous initialization.
| void (async () => { | ||
| try { | ||
| aui.threads().switchToNewThread(); | ||
| const { remoteId } = await aui.threadListItem().initialize(); | ||
| if (!cancelled) { | ||
| useChatRuntimeStore.getState().setActiveThreadId(remoteId); | ||
| } | ||
| } catch (error) { | ||
| if (!cancelled) { | ||
| useChatRuntimeStore.getState().setActiveThreadId(null); | ||
| } | ||
| console.error("Failed to initialize new chat thread", error); | ||
| } | ||
| })(); |
There was a problem hiding this comment.
The async IIFE inside the useEffect hook is not properly awaited or handled in a way that ensures the cleanup logic (setting activeThreadId to null) is consistently applied if the component unmounts during the initialization process. While the cancelled flag is used, it is safer to ensure that the cleanup logic is explicitly handled in the return function of the effect to avoid potential race conditions or stale state.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request improves session and thread ID management across the frontend and backend to ensure consistent sandbox directory usage. Key changes include defaulting to a '_default' sandbox directory in the backend, stabilizing thread IDs in the chat adapter, and ensuring active thread IDs are cleared during compare mode or new thread initialization to prevent stale state. I have reviewed the suggested fix for the resolvedThreadId initialization in the chat adapter and agree that coercing falsy values to undefined is necessary to prevent inconsistent behavior between tool calls and image URL construction.
| const resolvedThreadId = | ||
| unstable_threadId ?? runtime.activeThreadId ?? undefined; |
There was a problem hiding this comment.
An empty string for threadId can cause inconsistent behavior. If resolvedThreadId is "":
session_idsent to the backend for tool calls will be"". The backend will then use an_invalidsandbox directory.sessionIdused for constructing image URLs will become"_default"because"" || "_default"evaluates to"_default".
This mismatch will lead to broken image URLs.
By coercing a falsy threadId (like an empty string) to undefined, we ensure that it's handled consistently as the default case (no session) throughout the adapter.
| const resolvedThreadId = | |
| unstable_threadId ?? runtime.activeThreadId < 992E span class="pl-c1">?? undefined; | |
| const resolvedThreadId = (unstable_threadId ?? runtime.activeThreadId) || undefined; |
An empty string threadId would cause a mismatch: the backend sanitizes it to _invalid while the frontend image URL falls back to _default. Using || instead of ?? ensures any falsy value (including "") is treated as the no-session case consistently.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request improves thread ID management and session routing consistency across the frontend and backend. Key changes include stabilizing the thread ID during model loading, ensuring compare panes do not inherit stale thread IDs, and updating the backend to use a '_default' directory for unassigned sessions. I have no feedback to provide on the current implementation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d6054a344c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Capture the thread ID once at the start so it stays stable even if | ||
| // the user switches chats while waiting for model load / auto-load. | ||
| const resolvedThreadId = | ||
| (unstable_threadId ?? runtime.activeThreadId) || undefined; |
There was a problem hiding this comment.
Restrict active-thread fallback for sidebar compare switches
resolvedThreadId now falls back to runtime.activeThreadId for all runs, but there is still a compare-entry path where that store value is stale: selecting an existing compare conversation from the sidebar goes through handleThreadSelect (in chat-page.tsx), which only calls setView(nextView) and does not clear activeThreadId. If a compare run starts before its pane has unstable_threadId (the startup window this patch is addressing), this fallback sends the previous single-chat ID as session_id, routing tool execution/files into the wrong sandbox. Fresh evidence: only handleNewCompare, enterCompare, and the training handoff path clear activeThreadId; sidebar selection does not.
Useful? React with 👍 / 👎.
…nslothai#4810) Fixes unslothai#4809 On a new Studio chat, the first tool call could start before the frontend initializes the thread ID. That meant the first request could go out without a session_id, so the backend started the tool in the shared sandbox root instead of the chat's session sandbox. Frontend: - Eagerly initialize the thread when switching to a new chat - Resolve the thread ID once at request time and keep it stable through async model-load waits - Disable ActiveThreadSync during new-chat initialization to prevent stale thread IDs from being written back - Add error handling for thread initialization failures - Clear activeThreadId on all compare-mode entry paths to prevent cross-session leakage - Fix exitCompare to restore context usage from the saved view - Coerce falsy thread IDs to undefined for consistent backend/frontend fallback behavior - Use _default as the image sessionId fallback to match the backend Backend: - Use ~/studio_sandbox/_default when a request arrives without a session_id
Fixes #4809
Summary
On a new Studio chat, the first tool call can start before the frontend initializes the thread ID.
That means the first request can go out without a
session_id, so the backend starts the tool in the shared sandbox root instead of the chat's session sandbox. After the thread finishes initializing, later tool calls use the correct session sandbox.Changes
Frontend
unstable_threadIdfirst andactiveThreadIdas a fallbackthreadKeyandsession_idBackend
~/studio_sandbox/_defaultwhen a request arrives without asession_idThe backend change is only a safety net to prevent access to all session sandboxes when
session_idis not provided.Test plan
touch tmp.txt~/studio_sandbox/<session_id>/tmp.txt~/studio_sandbox/tmp.txt