Skip to main content

Showing 1–31 of 31 results for author: Arora, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.16881  [pdf, ps, other

    cs.RO cs.LG

    PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies

    Authors: Arhan Jain, Mingtong Zhang, Kanav Arora, William Chen, Marcel Torne, Muhammad Zubair Irshad, Sergey Zakharov, Yue Wang, Sergey Levine, Chelsea Finn, Wei-Chiu Ma, Dhruv Shah, Abhishek Gupta, Karl Pertsch

    Abstract: A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scene… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

    Comments: Website: https://polaris-evals.github.io/

  2. arXiv:2512.12870  [pdf, ps, other

    cs.LG cs.AI math.OC

    Optimal Labeler Assignment and Sampling for Active Learning in the Presence of Imperfect Labels

    Authors: Pouya Ahadi, Blair Winograd, Camille Zaug, Karunesh Arora, Lijun Wang, Kamran Paynabar

    Abstract: Active Learning (AL) has garnered significant interest across various application domains where labeling training data is costly. AL provides a framework that helps practitioners query informative samples for annotation by oracles (labelers). However, these labels often contain noise due to varying levels of labeler accuracy. Additionally, uncertain samples are more prone to receiving incorrect la… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

    Comments: 22 pages, 6 figures. Preprint under review

  3. arXiv:2511.18829  [pdf, ps, other

    cs.LG

    Towards Characterizing Knowledge Distillation of PPG Heart Rate Estimation Models

    Authors: Kanav Arora, Girish Narayanswamy, Shwetak Patel, Richard Li

    Abstract: Heart rate estimation from photoplethysmography (PPG) signals generated by wearable devices such as smartwatches and fitness trackers has significant implications for the health and well-being of individuals. Although prior work has demonstrated deep learning models with strong performance in the heart rate estimation task, in order to deploy these models on wearable devices, these models must als… ▽ More

    Submitted 8 December, 2025; v1 submitted 24 November, 2025; originally announced November 2025.

    Comments: 5 pages, 3 figures, 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Learning from Time Series for Health

  4. arXiv:2509.19941  [pdf, ps, other

    cs.CL cs.AI

    CorIL: Towards Enriching Indian Language to Indian Language Parallel Corpora and Machine Translation Systems

    Authors: Soham Bhattacharjee, Mukund K Roy, Yathish Poojary, Bhargav Dave, Mihir Raj, Vandan Mujadia, Baban Gain, Pruthwik Mishra, Arafat Ahsan, Parameswari Krishnamurthy, Ashwath Rao, Gurpreet Singh Josan, Preeti Dubey, Aadil Amin Kak, Anna Rao Kulkarni, Narendra VG, Sunita Arora, Rakesh Balbantray, Prasenjit Majumdar, Karunesh K Arora, Asif Ekbal, Dipti Mishra Sharma

    Abstract: India's linguistic landscape is one of the most diverse in the world, comprising over 120 major languages and approximately 1,600 additional languages, with 22 officially recognized as scheduled languages in the Indian Constitution. Despite recent progress in multilingual neural machine translation (NMT), high-quality parallel corpora for Indian languages remain scarce, especially across varied do… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

  5. arXiv:2509.12917  [pdf, ps, other

    cs.LG stat.ML

    Reversible Deep Equilibrium Models

    Authors: Sam McCallum, Kamran Arora, James Foster

    Abstract: Deep Equilibrium Models (DEQs) are an interesting class of implicit model where the model output is implicitly defined as the fixed point of a learned function. These models have been shown to outperform explicit (fixed-depth) models in large-scale tasks by trading many deep layers for a single layer that is iterated many times. However, gradient calculation through DEQs is approximate. This often… ▽ More

    Submitted 3 December, 2025; v1 submitted 16 September, 2025; originally announced September 2025.

  6. arXiv:2508.14151  [pdf, ps, other

    eess.IV cs.AI cs.CV

    A Systematic Study of Deep Learning Models and xAI Methods for Region-of-Interest Detection in MRI Scans

    Authors: Justin Yiu, Kushank Arora, Daniel Steinberg, Rohit Ghiya

    Abstract: Magnetic Resonance Imaging (MRI) is an essential diagnostic tool for assessing knee injuries. However, manual interpretation of MRI slices remains time-consuming and prone to inter-observer variability. This study presents a systematic evaluation of various deep learning architectures combined with explainable AI (xAI) techniques for automated region of interest (ROI) detection in knee MRI scans.… ▽ More

    Submitted 21 August, 2025; v1 submitted 19 August, 2025; originally announced August 2025.

  7. arXiv:2508.10925  [pdf, ps, other

    cs.CL cs.AI

    gpt-oss-120b & gpt-oss-20b Model Card

    Authors: OpenAI, :, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck, Che Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook , et al. (102 additional authors not shown)

    Abstract: We present gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models that push the frontier of accuracy and inference cost. The models use an efficient mixture-of-expert transformer architecture and are trained using large-scale distillation and reinforcement learning. We optimize the models to have strong agentic capabilities (deep research browsing, python tool use, and support for develope… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

  8. arXiv:2507.16947  [pdf, ps, other

    cs.CL

    AI-based Clinical Decision Support for Primary Care: A Real-World Study

    Authors: Robert Korom, Sarah Kiptinness, Najib Adan, Kassim Said, Catherine Ithuli, Oliver Rotich, Boniface Kimani, Irene King'ori, Stellah Kamau, Elizabeth Atemba, Muna Aden, Preston Bowman, Michael Sharman, Rebecca Soskin Hicks, Rebecca Distler, Johannes Heidecke, Rahul K. Arora, Karan Singhal

    Abstract: We evaluate the impact of large language model-based clinical decision support in live care. In partnership with Penda Health, a network of primary care clinics in Nairobi, Kenya, we studied AI Consult, a tool that serves as a safety net for clinicians by identifying potential documentation and clinical decision-making errors. AI Consult integrates into clinician workflows, activating only when ne… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: Blog: https://openai.com/index/ai-clinical-copilot-penda-health/

  9. arXiv:2507.05331  [pdf, ps, other

    cs.RO

    A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

    Authors: TRI LBM Team, Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching-Hsin Fang, Kunimatsu Hashimoto, Muhammad Zubair Irshad, Masha Itkina, Naveen Kuppuswamy, Kuan-Hui Lee, Katherine Liu, Dale McConachie, Ian McMahon, Haruki Nishimura, Calder Phillips-Grafflin, Charles Richter, Paarth Shah, Krishnan Srinivasan, Blake Wulfe, Chen Xu, Mengchao Zhang, Alex Alspach , et al. (57 additional authors not shown)

    Abstract: Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnere… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  10. arXiv:2506.18123  [pdf, ps, other

    cs.RO cs.LG

    RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies

    Authors: Pranav Atreya, Karl Pertsch, Tony Lee, Moo Jin Kim, Arhan Jain, Artur Kuramshin, Clemens Eppner, Cyrus Neary, Edward Hu, Fabio Ramos, Jonathan Tremblay, Kanav Arora, Kirsty Ellis, Luca Macesanu, Marcel Torne Villasevil, Matthew Leonard, Meedeum Cho, Ozgur Aslan, Shivin Dass, Jie Wang, William Reger, Xingfang Yuan, Xuning Yang, Abhishek Gupta, Dinesh Jayaraman , et al. (7 additional authors not shown)

    Abstract: Comprehensive, unbiased, and comparable evaluation of modern generalist policies is uniquely challenging: existing approaches for robot benchmarking typically rely on heavy standardization, either by specifying fixed evaluation tasks and environments, or by hosting centralized ''robot challenges'', and do not readily scale to evaluating generalist policies across a broad range of tasks and environ… ▽ More

    Submitted 29 November, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: Website: https://robo-arena.github.io/

  11. arXiv:2506.04178  [pdf, ps, other

    cs.LG

    OpenThoughts: Data Recipes for Reasoning Models

    Authors: Etash Guha, Ryan Marten, Sedrick Keh, Negin Raoof, Georgios Smyrnis, Hritik Bansal, Marianna Nezhurina, Jean Mercat, Trung Vu, Zayne Sprague, Ashima Suvarna, Benjamin Feuer, Liangyu Chen, Zaid Khan, Eric Frankel, Sachin Grover, Caroline Choi, Niklas Muennighoff, Shiye Su, Wanjia Zhao, John Yang, Shreyas Pimpalgaonkar, Kartik Sharma, Charlie Cheng-Jie Ji, Yichuan Deng , et al. (25 additional authors not shown)

    Abstract: Reasoning models have made rapid progress on many benchmarks involving math, code, and science. Yet, there are still many open questions about the best training recipes for reasoning since state-of-the-art models often rely on proprietary datasets with little to no public information available. To address this, the goal of the OpenThoughts project is to create open-source datasets for training rea… ▽ More

    Submitted 4 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: https://www.openthoughts.ai/blog/ot3. arXiv admin note: text overlap with arXiv:2505.23754 by other authors

  12. arXiv:2505.23197  [pdf, ps, other

    cs.RO cs.AI

    Unified Path Planner with Adaptive Safety and Optimality

    Authors: Jatin Kumar Arora, Soutrik Bandyopadhyay, Shubhendu Bhasin

    Abstract: Path planning for autonomous robots presents a fundamental trade-off between optimality and safety. While conventional algorithms typically prioritize one of these objectives, we introduce the Unified Path Planner (UPP), a unified framework that simultaneously addresses both. UPP is a graph-search-based algorithm that employs a modified heuristic function incorporating a dynamic safety cost, enabl… ▽ More

    Submitted 29 August, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: 6 pages,4 figures

  13. arXiv:2505.08775  [pdf, ps, other

    cs.CL

    HealthBench: Evaluating Large Language Models Towards Improved Human Health

    Authors: Rahul K. Arora, Jason Wei, Rebecca Soskin Hicks, Preston Bowman, Joaquin Quiñonero-Candela, Foivos Tsimpourlas, Michael Sharman, Meghan Shah, Andrea Vallone, Alex Beutel, Johannes Heidecke, Karan Singhal

    Abstract: We present HealthBench, an open-source benchmark measuring the performance and safety of large language models in healthcare. HealthBench consists of 5,000 multi-turn conversations between a model and an individual user or healthcare professional. Responses are evaluated using conversation-specific rubrics created by 262 physicians. Unlike previous multiple-choice or short-answer benchmarks, Healt… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Blog: https://openai.com/index/healthbench/ Code: https://github.com/openai/simple-evals

  14. arXiv:2503.07603  [pdf, other

    cs.CV

    Should VLMs be Pre-trained with Image Data?

    Authors: Sedrick Keh, Jean Mercat, Samir Yitzhak Gadre, Kushal Arora, Igor Vasiljevic, Benjamin Burchfiel, Shuran Song, Russ Tedrake, Thomas Kollar, Ludwig Schmidt, Achal Dave

    Abstract: Pre-trained LLMs that are further trained with image data perform well on vision-language tasks. While adding images during a second training phase effectively unlocks this capability, it is unclear how much of a gain or loss this two-step pipeline gives over VLMs which integrate images earlier into the training process. To investigate this, we train models spanning various datasets, scales, image… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: ICLR 2025

  15. arXiv:2501.08341  [pdf, ps, other

    cond-mat.dis-nn cond-mat.stat-mech cs.LG physics.comp-ph

    Dissecting a Small Artificial Neural Network

    Authors: Xiguang Yang, Krish Arora, Michael Bachmann

    Abstract: We investigate the loss landscape and backpropagation dynamics of convergence for the simplest possible artificial neural network representing the logical exclusive-OR (XOR) gate. Cross-sections of the loss landscape in the nine-dimensional parameter space are found to exhibit distinct features, which help understand why backpropagation efficiently achieves convergence toward zero loss, whereas va… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: 12 pages, 8 figures, and 2 tables

    Journal ref: J. Phys. A: Math. Theor. 58 025001(1-18) (2025)

  16. arXiv:2406.11794  [pdf, other

    cs.LG cs.CL

    DataComp-LM: In search of the next generation of training sets for language models

    Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

    Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More

    Submitted 21 April, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.datacomp.ai/dclm/

  17. arXiv:2405.06640  [pdf, other

    cs.CL

    Linearizing Large Language Models

    Authors: Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar

    Abstract: Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost. However, their original formulation suffers from poor scaling and underperforms compute-matched transformers. Recent linear models such as RWKV and Mamba have attempted to address these shortcomings by pr… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  18. arXiv:2405.04829  [pdf, other

    cs.CL

    Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

    Authors: Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy

    Abstract: Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges an… ▽ More

    Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: 8 pages, accepted in NAACL-SRW, 2024

  19. arXiv:2404.07225  [pdf

    q-fin.ST cs.AI cs.LG

    Unveiling the Impact of Macroeconomic Policies: A Double Machine Learning Approach to Analyzing Interest Rate Effects on Financial Markets

    Authors: Anoop Kumar, Suresh Dodda, Navin Kamuni, Rajeev Kumar Arora

    Abstract: This study examines the effects of macroeconomic policies on financial markets using a novel approach that combines Machine Learning (ML) techniques and causal inference. It focuses on the effect of interest rate changes made by the US Federal Reserve System (FRS) on the returns of fixed income and equity funds between January 1986 and December 2021. The analysis makes a distinction between active… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  20. arXiv:2402.12366  [pdf, other

    cs.LG cs.AI cs.CL

    A Critical Evaluation of AI Feedback for Aligning Large Language Models

    Authors: Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar

    Abstract: Reinforcement learning with AI feedback (RLAIF) is a popular paradigm for improving the instruction-following abilities of powerful pre-trained language models. RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher model and then further fine-tunes the model with reinforcement learning (RL), using feedback from a critic model. While recent popular open-source models… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  21. arXiv:2302.06784  [pdf, other

    cs.CL

    The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation

    Authors: Kushal Arora, Timothy J. O'Donnell, Doina Precup, Jason Weston, Jackie C. K. Cheung

    Abstract: State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story generation, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and n… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  22. arXiv:2302.06568  [pdf, other

    cs.CV cs.AI

    Comp2Comp: Open-Source Body Composition Assessment on Computed Tomography

    Authors: Louis Blankemeier, Arjun Desai, Juan Manuel Zambrano Chaves, Andrew Wentland, Sally Yao, Eduardo Reis, Malte Jensen, Bhanushree Bahl, Khushboo Arora, Bhavik N. Patel, Leon Lenchik, Marc Willis, Robert D. Boutin, Akshay S. Chaudhari

    Abstract: Computed tomography (CT) is routinely used in clinical practice to evaluate a wide variety of medical conditions. While CT scans provide diagnoses, they also offer the ability to extract quantitative body composition metrics to analyze tissue volume and quality. Extracting quantitative body composition measures manually from CT scans is a cumbersome and time-consuming task. Proprietary software ha… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  23. arXiv:2301.10165  [pdf, other

    cs.CL cs.AI

    Lexi: Self-Supervised Learning of the UI Language

    Authors: Pratyay Banerjee, Shweti Mahajan, Kushal Arora, Chitta Baral, Oriana Riva

    Abstract: Humans can learn to operate the user interface (UI) of an application by reading an instruction manual or how-to guide. Along with text, these resources include visual content such as UI screenshots and images of application icons referenced in the text. We explore how to leverage this data to learn generic visio-linguistic representations of UI screens and their components. These representations… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: EMNLP (Findings) 2022

  24. arXiv:2208.03270  [pdf, other

    cs.CL cs.AI

    Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

    Authors: Jing Xu, Megan Ung, Mojtaba Komeili, Kushal Arora, Y-Lan Boureau, Jason Weston

    Abstract: Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We… ▽ More

    Submitted 16 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  25. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  26. arXiv:2206.07694  [pdf, other

    cs.CL

    DIRECTOR: Generator-Classifiers For Supervised Language Modeling

    Authors: Kushal Arora, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. The standard language modeling setup fails to address these issues. In this paper, we introduce a new architecture, {\sc Director}, that consists of a unified generator-classifier with both a language modeling and a classification head for each output… ▽ More

    Submitted 25 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

  27. arXiv:2204.01171  [pdf, other

    cs.CL cs.AI cs.LG

    Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation

    Authors: Kushal Arora, Layla El Asri, Hareesh Bahuleyan, Jackie Chi Kit Cheung

    Abstract: Current language generation models suffer from issues such as repetition, incoherence, and hallucinations. An often-repeated hypothesis is that this brittleness of generation models is caused by the training and the generation procedure mismatch, also referred to as exposure bias. In this paper, we verify this hypothesis by analyzing exposure bias from an imitation learning perspective. We show th… ▽ More

    Submitted 9 January, 2023; v1 submitted 3 April, 2022; originally announced April 2022.

    Comments: Accepted in Findings of ACL 2022. v2: Equation 7 updated, typo fixes

  28. arXiv:2105.03826  [pdf

    cs.CV

    A Hybrid Model for Combining Neural Image Caption and k-Nearest Neighbor Approach for Image Captioning

    Authors: Kartik Arora, Ajul Raj, Arun Goel, Seba Susan

    Abstract: A hybrid model is proposed that integrates two popular image captioning methods to generate a text-based summary describing the contents of the image. The two image captioning models are the Neural Image Caption (NIC) and the k-nearest neighbor approach. These are trained individually on the training set. We extract a set of five features, from the validation set, for evaluating the results of the… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

    Comments: Included in Proceedings of 3rd ICSCSP 2020

  29. arXiv:1701.08329  [pdf

    cs.CY

    An Exploratory Study on the Implementation and Adoption of ERP Solutions for Businesses

    Authors: Emre Erturk, Jitesh Kumar Arora

    Abstract: Enterprise Resource Planning (ERP) systems have been covered in both mainstream Information Technology (IT) periodicals, and in academic literature, as a result of extensive adoption by organisations in the last two decades. Some of the past studies have reported operational efficiency and other gains, while other studies have pointed out the challenges. ERP systems continue to evolve, moving into… ▽ More

    Submitted 28 January, 2017; originally announced January 2017.

  30. arXiv:1604.00100  [pdf, other

    cs.CL

    A Compositional Approach to Language Modeling

    Authors: Kushal Arora, Anand Rangarajan

    Abstract: Traditional language models treat language as a finite state automaton on a probability space over words. This is a very strong assumption when modeling something inherently complex such as language. In this paper, we challenge this by showing how the linear chain assumption inherent in previous work can be translated into a sequential composition tree. We then propose a new model that marginalize… ▽ More

    Submitted 31 March, 2016; originally announced April 2016.

    Comments: submitted to ACL 2016

  31. arXiv:1601.00248  [pdf, other

    cs.CL

    Contrastive Entropy: A new evaluation metric for unnormalized language models

    Authors: Kushal Arora, Anand Rangarajan

    Abstract: Perplexity (per word) is the most widely used metric for evaluating language models. Despite this, there has been no dearth of criticism for this metric. Most of these criticisms center around lack of correlation with extrinsic metrics like word error rate (WER), dependence upon shared vocabulary for model comparison and unsuitability for unnormalized language model evaluation. In this paper, we a… ▽ More

    Submitted 31 March, 2016; v1 submitted 3 January, 2016; originally announced January 2016.

    Comments: submitted to INTERSPEECH 2016