Skip to main content

Showing 1–13 of 13 results for author: Doshi, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2601.20999  [pdf, ps, other

    cs.CR cs.HC

    What Are Brands Telling You About Smishing? A Cross-Industry Evaluation of Customer Guidance

    Authors: Dev Vikesh Doshi, Mehjabeen Tasnim, Fernando Landeros, Chinthagumpala Muni Venkatesh, Daniel Timko, Muhammad Lutfor Rahman

    Abstract: Phishing attacks through text, also known as smishing, are a prevalent type of social engineering tactic in which attackers impersonate brands to deceive victims into providing personal information and/or money. While smishing awareness and cyber education are a key method by which organizations communicate this awareness, the guidance itself varies widely. In this paper, we investigate the state… ▽ More

    Submitted 28 January, 2026; originally announced January 2026.

  2. arXiv:2510.11255  [pdf, ps, other

    cs.GT

    Sequential Solution Concepts in Cooperative Games with Generalized Characteristic Functions

    Authors: Ashwin Goyal, Drashthi Doshi, Swaprava Nath

    Abstract: Motivated by the fact that the worth of a coalition may depend on the order in which agents arrive, Nowak and Radzik (1994) (NR) introduced cooperative games with generalized characteristic functions. We study such temporal cooperative games (TCGs), where the worth function v is defined on sequences of agents π rather than sets S. This order sensitivity necessitates a re-examination of axioms for… ▽ More

    Submitted 15 March, 2026; v1 submitted 13 October, 2025; originally announced October 2025.

    Comments: 22 pages, under review

  3. arXiv:2507.13575  [pdf, ps, other

    cs.LG cs.AI

    Apple Intelligence Foundation Language Models: Tech Report 2025

    Authors: Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang, Xiyou Zhou, Jun Qin, Dian Ang Yap, Narendran Raghavan, Xuankai Chang, Margit Bowler, Eray Yildiz, John Peebles, Hannah Gillis Coleman, Matteo Ronchi, Peter Gray, Keen You, Anthony Spalvieri-Kruse, Ruoming Pang, Reed Li, Yuli Yang, Emad Soroush, Zhiyun Lu, Crystal Xiao, Rong Situ, Jordan Huffaker, David Griffiths , et al. (373 additional authors not shown)

    Abstract: We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transform… ▽ More

    Submitted 27 August, 2025; v1 submitted 17 July, 2025; originally announced July 2025.

  4. arXiv:2505.12010  [pdf, ps, other

    cs.GT cs.LG cs.MA

    Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners

    Authors: Drashthi Doshi, Aditya Vema Reddy Kesari, Avishek Ghosh, Swaprava Nath, Suhas S Kowshik

    Abstract: Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and cont… ▽ More

    Submitted 15 March, 2026; v1 submitted 17 May, 2025; originally announced May 2025.

    Comments: 27 pages, under review

  5. arXiv:2502.10390  [pdf, ps, other

    cs.LG cond-mat.dis-nn cs.CR stat.ML

    (How) Can Transformers Predict Pseudo-Random Numbers?

    Authors: Tao Tao, Darshil Doshi, Dayal Singh Kalra, Tianyu He, Maissam Barkeshli

    Abstract: Transformers excel at discovering patterns in sequential data, yet their fundamental limitations and learning mechanisms remain crucial topics of investigation. In this paper, we study the ability of Transformers to learn pseudo-random number sequences from linear congruential generators (LCGs), defined by the recurrence relation $x_{t+1} = a x_t + c \;\mathrm{mod}\; m$. We find that with sufficie… ▽ More

    Submitted 8 July, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: ICML 2025 (camera-ready version). 10+17 pages, 13+23 figures

    Journal ref: TIn Forty-second International Conference on Machine Learning (2025)

  6. arXiv:2406.03495  [pdf, other

    cs.LG cond-mat.dis-nn hep-th math.NT stat.ML

    Grokking Modular Polynomials

    Authors: Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov

    Abstract: Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical solution for the weights of Multi-layer Perceptron (MLP) networks that generalize on the modular addition task is known in the literature. In this work, we (i) extend… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 7+4 pages, 3 figures, 2 tables

  7. arXiv:2406.02550  [pdf, other

    cs.LG cond-mat.dis-nn hep-th stat.ML

    Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

    Authors: Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

    Abstract: Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions… ▽ More

    Submitted 4 November, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Camera-ready version, NeurIPS 2024 (Oral)

  8. arXiv:2404.09886  [pdf, other

    cs.LG cs.CV

    ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

    Authors: Divyang Doshi, Jung-Eun Kim

    Abstract: In this research, we propose an innovative method to boost Knowledge Distillation efficiency without the need for resource-heavy teacher models. Knowledge Distillation trains a smaller ``student'' model with guidance from a larger ``teacher'' model, which is computationally costly. However, the main benefit comes from the soft labels provided by the teacher, helping the student grasp nuanced class… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  9. arXiv:2310.13061  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

    Authors: Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov

    Abstract: Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large. In general, it is very difficult to know if the network has memorized a particular set of examples or understood the underlying rule (or both). Motivated by this challenge, we study an interpretable model where generalizing representations are understood analytically, an… ▽ More

    Submitted 4 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 9+20 pages, 7+25 figures, 2 tables

  10. Parking Spot Classification based on surround view camera system

    Authors: Andy Xiao, Deep Doshi, Lihao Wang, Harsha Gorantla, Thomas Heitzmann, Peter Groth

    Abstract: Surround-view fisheye cameras are commonly used for near-field sensing in automated driving scenarios, including urban driving and auto valet parking. Four fisheye cameras, one on each side, are sufficient to cover 360° around the vehicle capturing the entire near-field region. Based on surround view cameras, there has been much research on parking slot detection with main focus on the occupancy s… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: SPIE Optical Engineering + Applications, 2023, San Diego, California, United States. Proc. SPIE 12675, Applications of Machine Learning 2023

  11. arXiv:2206.13568  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    AutoInit: Automatic Initialization via Jacobian Tuning

    Authors: Tianyu He, Darshil Doshi, Andrey Gromov

    Abstract: Good initialization is essential for training Deep Neural Networks (DNNs). Oftentimes such initialization is found through a trial and error approach, which has to be applied anew every time an architecture is substantially modified, or inherited from smaller size networks leading to sub-optimal initialization. In this work we introduce a new and cheap algorithm, that allows one to find a good ini… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 22 pages, 5 figures

  12. arXiv:2111.12143  [pdf, other

    cs.LG cond-mat.dis-nn hep-th stat.ML

    Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

    Authors: Darshil Doshi, Tianyu He, Andrey Gromov

    Abstract: Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity, the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows one to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rat… ▽ More

    Submitted 5 October, 2023; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: Accepted (spotlight) at NeurIPS2023. Additional ResNet results. 42 pages, 12 figures

  13. arXiv:2007.10571  [pdf, other

    cs.DC cs.PF

    AI Tax: The Hidden Cost of AI Data Center Applications

    Authors: Daniel Richins, Dharmisha Doshi, Matthew Blackmore, Aswathy Thulaseedharan Nair, Neha Pathapati, Ankit Patel, Brainard Daguman, Daniel Dobrijalowski, Ramesh Illikkal, Kevin Long, David Zimmerman, Vijay Janapa Reddi

    Abstract: Artificial intelligence and machine learning are experiencing widespread adoption in industry and academia. This has been driven by rapid advances in the applications and accuracy of AI through increasingly complex algorithms and models; this, in turn, has spurred research into specialized hardware AI accelerators. Given the rapid pace of advances, it is easy to forget that they are often develope… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: 32 pages. 16 figures. Submitted to ACM "Transactions on Computer Systems."

    ACM Class: I.2; C.4