Skip to main content

Showing 1–19 of 19 results for author: Tensmeyer, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.25118  [pdf, ps, other

    cs.CV

    AnyDoc: Enhancing Document Generation via Large-Scale HTML/CSS Data Synthesis and Height-Aware Reinforcement Optimization

    Authors: Jiawei Lin, Wanrong Zhu, Vlad I Morariu, Christopher Tensmeyer

    Abstract: Document generation has gained growing attention in the field of AI-driven content creation. In this work, we push its boundaries by introducing AnyDoc, a framework capable of handling multiple generation tasks across a wide spectrum of document categories, all represented in a unified HTML/CSS format. To overcome the limited coverage and scale of existing human-crafted document datasets, AnyDoc f… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: CVPR 2026 Main Conference

  2. arXiv:2601.04589  [pdf, ps, other

    cs.CV

    MiLDEdit: Reasoning-Based Multi-Layer Design Document Editing

    Authors: Zihao Lin, Wanrong Zhu, Jiuxiang Gu, Jihyung Kil, Christopher Tensmeyer, Lin Zhang, Shilong Liu, Ruiyi Zhang, Lifu Huang, Vlad I. Morariu, Tong Sun

    Abstract: Real-world design documents (e.g., posters) are inherently multi-layered, combining decoration, text, and images. Editing them from natural-language instructions requires fine-grained, layer-aware reasoning to identify relevant layers and coordinate modifications. Prior work largely overlooks multi-layer design document editing, focusing instead on single-layer image editing or multi-layer generat… ▽ More

    Submitted 28 January, 2026; v1 submitted 7 January, 2026; originally announced January 2026.

  3. arXiv:2512.17151  [pdf, ps, other

    cs.CV

    Text-Conditioned Background Generation for Editable Multi-Layer Documents

    Authors: Taewon Kang, Joseph K J, Chris Tensmeyer, Jihyung Kil, Wanrong Zhu, Ming C. Lin, Vlad I. Morariu

    Abstract: We present a framework for document-centric background generation with multi-page editing and thematic continuity. To ensure text regions remain readable, we employ a \emph{latent masking} formulation that softly attenuates updates in the diffusion space, inspired by smooth barrier functions in physics and numerical optimization. In addition, we introduce \emph{Automated Readability Optimization (… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  4. arXiv:2410.15504  [pdf, other

    cs.HC

    FlexDoc: Flexible Document Adaptation through Optimizing both Content and Layout

    Authors: Yue Jiang, Christof Lutteroth, Rajiv Jain, Christopher Tensmeyer, Varun Manjunatha, Wolfgang Stuerzlinger, Vlad Morariu

    Abstract: Designing adaptive documents that are visually appealing across various devices and for diverse viewers is a challenging task. This is due to the wide variety of devices and different viewer requirements and preferences. Alterations to a document's content, style, or layout often necessitate numerous adjustments, potentially leading to a complete layout redesign. We introduce FlexDoc, a framework… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  5. arXiv:2306.06306  [pdf, other

    cs.CV cs.AI

    DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

    Authors: Fuxiao Liu, Hao Tan, Chris Tensmeyer

    Abstract: Vision-language pretraining models have achieved great success in supporting multimedia applications by understanding the alignments between images and text. While existing vision-language pretraining models primarily focus on understanding single image associated with a single piece of text, they often ignore the alignment at the intra-document level, consisting of multiple sentences with multipl… ▽ More

    Submitted 25 April, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted to ICPRAI 2024

  6. arXiv:2305.10434  [pdf, other

    cs.CL cs.AI cs.LG

    Learning the Visualness of Text Using Large Vision-Language Models

    Authors: Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova

    Abstract: Visual text evokes an image in a person's mind, while non-visual text fails to do so. A method to automatically detect visualness in text will enable text-to-image retrieval and generation models to augment text with relevant images. This is particularly challenging with long-form text as text-to-image generation and retrieval models are often triggered for text that is designed to be explicitly v… ▽ More

    Submitted 22 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023 (Main, long); 9 pages, 5 figures

  7. arXiv:2211.14958  [pdf, other

    cs.CV

    MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding

    Authors: Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu

    Abstract: Document images are a ubiquitous source of data where the text is organized in a complex hierarchical structure ranging from fine granularity (e.g., words), medium granularity (e.g., regions such as paragraphs or figures), to coarse granularity (e.g., the whole page). The spatial hierarchical relationships between content at different levels of granularity are crucial for document image understand… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  8. arXiv:2203.16618  [pdf, other

    cs.CV

    End-to-end Document Recognition and Understanding with Dessurt

    Authors: Brian Davis, Bryan Morse, Bryan Price, Chris Tensmeyer, Curtis Wigington, Vlad Morariu

    Abstract: We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to the document understanding, it do… ▽ More

    Submitted 15 June, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

  9. arXiv:2111.13792  [pdf, other

    cs.CV cs.LG

    LAFITE: Towards Language-Free Training for Text-to-Image Generation

    Authors: Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun

    Abstract: One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs. While image samples are often easily accessible, the associated text descriptions typically require careful human captioning, which is particularly time- and cost-consuming. In this paper, we propose the first work to train text-to-image generation models without… ▽ More

    Submitted 24 March, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: Accepted by CVPR 2022, https://github.com/drboog/Lafite

  10. arXiv:2104.08689  [pdf, other

    cs.CV

    RPCL: A Framework for Improving Cross-Domain Detection with Auxiliary Tasks

    Authors: Kai Li, Curtis Wigington, Chris Tensmeyer, Vlad I. Morariu, Handong Zhao, Varun Manjunatha, Nikolaos Barmpalios, Yun Fu

    Abstract: Cross-Domain Detection (XDD) aims to train an object detector using labeled image from a source domain but have good performance in the target domain with only unlabeled images. Existing approaches achieve this either by aligning the feature maps or the region proposals from the two domains, or by transferring the style of source images to that of target image. Contrasted with prior work, this pap… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 10 pages, 5 figures

  11. arXiv:2009.00678  [pdf, other

    cs.CV

    Text and Style Conditioned GAN for Generation of Offline Handwriting Lines

    Authors: Brian Davis, Chris Tensmeyer, Brian Price, Curtis Wigington, Bryan Morse, Rajiv Jain

    Abstract: This paper presents a GAN for generating images of handwritten lines conditioned on arbitrary text and latent style vectors. Unlike prior work, which produce stroke points or single-word images, this model generates entire lines of offline handwriting. The model produces variable-sized images by using style vectors to determine character widths. A generator network is trained with GAN and autoenco… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

    Comments: Includes Supplementary Material. Accepted at BMVC 2020. 32 pages, 30 figures

  12. arXiv:2004.12016  [pdf, other

    cs.HC

    Using Behavioral Interactions from a Mobile Device to Classify the Reader's Prior Familiarity and Goal Conditions

    Authors: Sungjin Nam, Zoya Bylinskii, Christopher Tensmeyer, Curtis Wigington, Rajiv Jain, Tong Sun

    Abstract: A student reads a textbook to learn a new topic; an attorney leafs through familiar legal documents. Each reader may have a different goal for, and prior knowledge of, their reading. A mobile context, which captures interaction behavior, can provide insights about these reading conditions. In this paper, we focus on understanding the different reading conditions of mobile readers, as such an under… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

  13. arXiv:2003.13197  [pdf, other

    cs.CV

    Cross-Domain Document Object Detection: Benchmark Suite and Method

    Authors: Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu

    Abstract: Decomposing images of document pages into high-level semantic regions (e.g., figures, tables, paragraphs), document object detection (DOD) is fundamental for downstream tasks like intelligent document editing and understanding. DOD remains a challenging problem as document objects vary significantly in layout, size, aspect ratio, texture, etc. An additional challenge arises in practice because lar… ▽ More

    Submitted 29 March, 2020; originally announced March 2020.

    Comments: To appear in CVPR 2020

  14. arXiv:1909.02576  [pdf, other

    cs.CV

    Deep Visual Template-Free Form Parsing

    Authors: Brian Davis, Bryan Morse, Scott Cohen, Brian Price, Chris Tensmeyer

    Abstract: Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts. This is even more challenging for historical forms due to noise and degradation. A crucial part of the extraction process is associating input text with pre-printed labels. We present a learned, template-free solution to detecting pre-printed text and input text/handwriting and pr… ▽ More

    Submitted 18 September, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: Accepted at ICDAR 2019. Updated results with average of repeated experiments

  15. arXiv:1808.01423  [pdf, other

    cs.CV

    Language Model Supervision for Handwriting Recognition Model Adaptation

    Authors: Chris Tensmeyer, Curtis Wigington, Brian Davis, Seth Stewart, Tony Martinez, William Barrett

    Abstract: Training state-of-the-art offline handwriting recognition (HWR) models requires large labeled datasets, but unfortunately such datasets are not available in all languages and domains due to the high cost of manual labeling.We address this problem by showing how high resource languages can be leveraged to help train models for low resource languages.We propose a transfer learning methodology where… ▽ More

    Submitted 4 August, 2018; originally announced August 2018.

  16. arXiv:1709.01618  [pdf, other

    cs.CV

    PageNet: Page Boundary Extraction in Historical Handwritten Documents

    Authors: Chris Tensmeyer, Brian Davis, Curtis Wigington, Iain Lee, Bill Barrett

    Abstract: When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and n… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

    Comments: HIP 2017 (in submission)

  17. arXiv:1708.03669  [pdf, other

    cs.CV

    Convolutional Neural Networks for Font Classification

    Authors: Chris Tensmeyer, Daniel Saunders, Tony Martinez

    Abstract: Classifying pages or text lines into font categories aids transcription because single font Optical Character Recognition (OCR) is generally more accurate than omni-font OCR. We present a simple framework based on Convolutional Neural Networks (CNNs), where a CNN is trained to classify small patches of text into predefined font classes. To classify page or line images, we average the CNN predictio… ▽ More

    Submitted 11 August, 2017; originally announced August 2017.

    Comments: ICDAR 2017

  18. arXiv:1708.03276  [pdf, other

    cs.CV

    Document Image Binarization with Fully Convolutional Neural Networks

    Authors: Chris Tensmeyer, Tony Martinez

    Abstract: Binarization of degraded historical manuscript images is an important pre-processing step for many document processing tasks. We formulate binarization as a pixel classification learning task and apply a novel Fully Convolutional Network (FCN) architecture that operates at multiple image scales, including full resolution. The FCN is trained to optimize a continuous version of the Pseudo F-measure… ▽ More

    Submitted 10 August, 2017; originally announced August 2017.

    Comments: ICDAR 2017 (oral)

  19. arXiv:1708.03273  [pdf, other

    cs.CV

    Analysis of Convolutional Neural Networks for Document Image Classification

    Authors: Chris Tensmeyer, Tony Martinez

    Abstract: Convolutional Neural Networks (CNNs) are state-of-the-art models for document image classification tasks. However, many of these approaches rely on parameters and architectures designed for classifying natural images, which differ from document images. We question whether this is appropriate and conduct a large empirical study to find what aspects of CNNs most affect performance on document images… ▽ More

    Submitted 10 August, 2017; originally announced August 2017.

    Comments: Accepted ICDAR 2017