8000
Skip to content

refactor: document_indexing_sync_task split db session#32129

Merged
fatelei merged 2 commits intolanggenius:mainfrom
fatelei:issue-32127
Feb 9, 2026
Merged

refactor: document_indexing_sync_task split db session#32129
fatelei merged 2 commits intolanggenius:mainfrom
fatelei:issue-32127

Conversation

@fatelei
Copy link
Copy Markdown
Contributor
@fatelei fatelei commented Feb 9, 2026

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

fix #32127

Screenshots

Before After
... ...

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint and make type-check (backend) and cd web && npx lint-staged (frontend) to appease the lint gods

@fatelei fatelei requested a review from JohnJyong as a code owner February 9, 2026 07:16
Copilot AI review requested due to automatic review settings February 9, 2026 07:16
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @fatelei, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the document_indexing_sync_task to enhance its robustness and efficiency by introducing more granular control over database sessions. The changes aim to prevent long-running transactions, improve error handling for external dependencies like Notion credentials, and ensure that document updates and index cleaning operations are managed within their own distinct transactional contexts. This approach leads to better resource utilization and more reliable task execution.

Highlights

  • Granular Database Session Management: The document_indexing_sync_task now employs multiple, smaller database sessions, replacing a single long-running session. This refactoring improves transaction control and resource management by ensuring sessions are opened and closed for specific, atomic operations.
  • Explicit Dataset Validation: The task now includes an explicit database query to fetch and validate the existence of the Dataset object associated with the document, adding an early check for data integrity.
  • Improved Credential Error Handling: The logic for retrieving Notion credentials and handling cases where they are not found has been refactored. If credentials are missing, the document's error status is updated within a dedicated, separate database session, preventing the main transaction from being held open unnecessarily.
  • Refined Document Update and Index Cleaning Flow: When a Notion document is detected as updated, the process of marking it as 'parsing', deleting old segments, and cleaning the vector index now occurs within its own distinct database session. This ensures atomicity and better isolation for these critical update operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • api/tasks/document_indexing_sync_task.py
    • Initialized tenant_id to None at the start of the document_indexing_sync_task function.
    • Added a database query to fetch and validate the Dataset object using the provided dataset_id.
    • Moved the DatasourceProviderService credential retrieval and subsequent error handling for missing credentials outside the initial session_factory block.
    • Encapsulated the update of document.indexing_status and document.error for missing credentials within a new, dedicated database session.
    • Relocated the logic for deleting DocumentSegment records and cleaning the vector index (when a document is updated) outside the main session block.
    • Introduced a new database session specifically for updating the document's status to 'parsing' and deleting associated DocumentSegment records before the index cleaning process.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 9, 2026
Copilot AI review requested due to automatic review settings February 9, 2026 07:26
Copy link
Copy Markdown
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 9, 2026 07:58
Copy link
Copy Markdown
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings February 9, 2026 08:10
Copy link
Copy Markdown
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 9, 2026
Copilot AI review requested due to automatic review settings February 9, 2026 08:27
Copy link
Copy Markdown
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Feb 9, 2026
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 9, 2026
Copilot AI review requested due to automatic review settings February 9, 2026 08:41
Copy link
Copy Markdown
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 9, 2026
@fatelei fatelei merged commit d546210 into langgenius:main Feb 9, 2026
12 checks passed
fatelei added a commit that referenced this pull request Feb 9, 2026
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
laipz8200 pushed a commit that referenced this pull request Feb 16, 2026
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

document_indexing_sync_task split database session

4 participants

0