Skip to main content

Showing 1–12 of 12 results for author: Elmagarmid, A

.
  1. arXiv:2603.16397  [pdf, ps, other

    cs.CL cs.AI

    Fanar 2.0: Arabic Generative AI Stack

    Authors: FANAR TEAM, Ummar Abbas, Mohammad Shahmeer Ahmad, Minhaj Ahmad, Abdulaziz Al-Homaid, Anas Al-Nuaimi, Enes Altinisik, Ehsaneddin Asgari, Sanjay Chawla, Shammur Chowdhury, Fahim Dalvi, Kareem Darwish, Nadir Durrani, Mohamed Elfeky, Ahmed Elmagarmid, Mohamed Eltabakh, Asim Ersoy, Masoomali Fatehkia, Mohammed Qusay Hashim, Majd Hawasly, Mohamed Hefeeda, Mus'ab Husaini, Keivin Isufaj, Soon-Gyo Jung, Houssam Lachemat , et al. (12 additional authors not shown)

    Abstract: We present Fanar 2.0, the second generation of Qatar's Arabic-centric Generative AI platform. Sovereignty is a first-class design principle: every component, from data pipelines to deployment infrastructure, was designed and operated entirely at QCRI, Hamad Bin Khalifa University. Fanar 2.0 is a story of resource-constrained excellence: the effort ran on 256 NVIDIA H100 GPUs, with Arabic having on… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

  2. arXiv:2504.20047  [pdf, ps, other

    cs.IR cs.AI cs.DB

    HCT-QA: A Benchmark for Question Answering on Human-Centric Tables

    Authors: Mohammad S. Ahmad, Zan A. Naeem, Michaƫl Aupetit, Ahmed Elmagarmid, Mohamed Eltabakh, Xiaosong Ma, Mourad Ouzzani, Chaoyi Ruan, Hani Al-Sayeh

    Abstract: Tabular data embedded in PDF files, web pages, and other types of documents is prevalent in various domains. These tables, which we call human-centric tables (HCTs for short), are dense in information but often exhibit complex structural and semantic layouts. To query these HCTs, some existing solutions focus on transforming them into relational formats. However, they fail to handle the diverse an… ▽ More

    Submitted 5 March, 2026; v1 submitted 9 March, 2025; originally announced April 2025.

  3. arXiv:2501.13944  [pdf, other

    cs.CL cs.AI

    Fanar: An Arabic-Centric Multimodal Generative AI Platform

    Authors: Fanar Team, Ummar Abbas, Mohammad Shahmeer Ahmad, Firoj Alam, Enes Altinisik, Ehsannedin Asgari, Yazan Boshmaf, Sabri Boughorbel, Sanjay Chawla, Shammur Chowdhury, Fahim Dalvi, Kareem Darwish, Nadir Durrani, Mohamed Elfeky, Ahmed Elmagarmid, Mohamed Eltabakh, Masoomali Fatehkia, Anastasios Fragkopoulos, Maram Hasanain, Majd Hawasly, Mus'ab Husaini, Soon-Gyo Jung, Ji Kim Lucas, Walid Magdy, Safa Messaoud , et al. (17 additional authors not shown)

    Abstract: We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that supports language, speech and image generation tasks. At the heart of Fanar are Fanar Star and Fanar Prime, two highly capable Arabic Large Language Models (LLMs) that are best in the class on well established benchmarks for similar sized models. Fanar Star is a 7B (billion) parameter model that was trained from… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

    ACM Class: I.2.0; D.2.0

  4. arXiv:2306.00932  [pdf

    cs.AI cs.DB

    Cross Modal Data Discovery over Structured and Unstructured Data Lakes

    Authors: Mohamed Y. Eltabakh, Mayuresh Kunjir, Ahmed Elmagarmid, Mohammad Shahmeer Ahmad

    Abstract: Organizations are collecting increasingly large amounts of data for data driven decision making. These data are often dumped into a centralized repository, e.g., a data lake, consisting of thousands of structured and unstructured datasets. Perversely, such mixture of datasets makes the problem of discovering elements (e.g., tables or documents) that are relevant to a user's query or an analytical… ▽ More

    Submitted 16 July, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Report number: 17

  5. arXiv:1712.09437  [pdf, other

    cs.DB

    Pattern-Driven Data Cleaning

    Authors: El Kindi Rezig, Mourad Ouzzani, Walid G. Aref, Ahmed K. Elmagarmid, Ahmed R. Mahmood

    Abstract: Data is inherently dirty and there has been a sustained effort to come up with different approaches to clean it. A large class of data repair algorithms rely on data-quality rules and integrity constraints to detect and repair the data. A well-studied class of integrity constraints is Functional Dependencies (FDs, for short) that specify dependencies among attributes in a relation. In this paper,… ▽ More

    Submitted 26 December, 2017; originally announced December 2017.

  6. arXiv:1712.08971  [pdf, other

    cs.DB

    Human-Centric Data Cleaning [Vision]

    Authors: El Kindi Rezig, Mourad Ouzzani, Ahmed K. Elmagarmid, Walid G. Aref

    Abstract: Data Cleaning refers to the process of detecting and fixing errors in the data. Human involvement is instrumental at several stages of this process, e.g., to identify and repair errors, to validate computed repairs, etc. There is currently a plethora of data cleaning algorithms addressing a wide range of data errors (e.g., detecting duplicates, violations of integrity constraints, missing values,… ▽ More

    Submitted 30 December, 2017; v1 submitted 24 December, 2017; originally announced December 2017.

  7. arXiv:1709.10436  [pdf, other

    cs.DB

    Unsupervised String Transformation Learning for Entity Consolidation

    Authors: Dong Deng, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Guoliang Li, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang

    Abstract: Data integration has been a long-standing challenge in data management with many applications. A key step in data integration is entity consolidation. It takes a collection of clusters of duplicate records as input and produces a single "golden record" for each cluster, which contains the canonical value for each attribute. Truth discovery and data fusion methods, as well as Master Data Management… ▽ More

    Submitted 30 July, 2018; v1 submitted 29 September, 2017; originally announced September 2017.

  8. arXiv:1610.00192  [pdf, ps, other

    cs.IR cs.LG

    A large scale study of SVM based methods for abstract screening in systematic reviews

    Authors: Tanay Kumar Saha, Mourad Ouzzani, Hossam M. Hammady, Ahmed K. Elmagarmid, Wajdi Dhifli, Mohammad Al Hasan

    Abstract: A major task in systematic reviews is abstract screening, i.e., excluding, often hundreds or thousand of, irrelevant citations returned from a database search based on titles and abstracts. Thus, a systematic review platform that can automate the abstract screening process is of huge importance. Several methods have been proposed for this task. However, it is very hard to clearly understand the ap… ▽ More

    Submitted 15 January, 2018; v1 submitted 1 October, 2016; originally announced October 2016.

  9. Impact of Physical Activity on Sleep:A Deep Learning Based Exploration

    Authors: Aarti Sathyanarayana, Shafiq Joty, Luis Fernandez-Luque, Ferda Ofli, Jaideep Srivastava, Ahmed Elmagarmid, Shahrad Taheri, Teresa Arora

    Abstract: The importance of sleep is paramount for maintaining physical, emotional and mental wellbeing. Though the relationship between sleep and physical activity is known to be important, it is not yet fully understood. The explosion in popularity of actigraphy and wearable devices, provides a unique opportunity to understand this relationship. Leveraging this information source requires new tools to be… ▽ More

    Submitted 24 July, 2016; originally announced July 2016.

    Journal ref: JMIR Mhealth Uhealth 2016;4(4):e125

  10. Robust Automated Human Activity Recognition and its Application to Sleep Research

    Authors: Aarti Sathyanarayana, Ferda Ofli, Luis Fernandes-Luque, Jaideep Srivastava, Ahmed Elmagarmid, Teresa Arora, Shahrad Taheri

    Abstract: Human Activity Recognition (HAR) is a powerful tool for understanding human behaviour. Applying HAR to wearable sensors can provide new insights by enriching the feature set in health studies, and enhance the personalisation and effectiveness of health, wellness, and fitness applications. Wearable devices provide an unobtrusive platform for user monitoring, and due to their increasing market penet… ▽ More

    Submitted 19 July, 2016; v1 submitted 17 July, 2016; originally announced July 2016.

  11. arXiv:1508.00703  [pdf, other

    cs.DB cs.LG

    Parameter Database : Data-centric Synchronization for Scalable Machine Learning

    Authors: Naman Goel, Divyakant Agrawal, Sanjay Chawla, Ahmed Elmagarmid

    Abstract: We propose a new data-centric synchronization framework for carrying out of machine learning (ML) tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk synchronization parallel (BSP) paradigm that has previously been used for distributed machine learning. Data-centric synchronization complements function-centric s… ▽ More

    Submitted 4 August, 2015; originally announced August 2015.

    Report number: QCRI-TR-2015-003

  12. arXiv:1103.3103  [pdf

    cs.DB

    Guided Data Repair

    Authors: Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani, Ihab F. Ilyas

    Abstract: In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates di… ▽ More

    Submitted 16 March, 2011; originally announced March 2011.

    Comments: VLDB2011

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 5, pp. 279-289 (2011)