Skip to main content

Showing 1–50 of 73 results for author: Erdem, A

.
  1. arXiv:2604.15210  [pdf, ps, other

    cs.AI cs.CL

    Learning to Think Like a Cartoon Captionist: Incongruity-Resolution Supervision for Multimodal Humor Understanding

    Authors: Hatice Merve Vural, Doga Kukul, Ege Erdem Ozlu, Demir Ekin Arikan, Bob Mankoff, Erkut Erdem, Aykut Erdem

    Abstract: Humor is one of the few cognitive tasks where getting the reasoning right matters as much as getting the answer right. While recent work evaluates humor understanding on benchmarks such as the New Yorker Cartoon Caption Contest (NYCC), it largely treats it as black-box prediction, overlooking the structured reasoning processes underlying humor comprehension. We introduce IRS (Incongruity-Resolutio… ▽ More

    Submitted 16 April, 2026; originally announced April 2026.

  2. arXiv:2602.21854  [pdf, ps, other

    cs.CL

    FewMMBench: A Benchmark for Multimodal Few-Shot Learning

    Authors: Mustafa Dogan, Ilker Kesen, Iacer Calixto, Aykut Erdem, Erkut Erdem

    Abstract: As multimodal large language models (MLLMs) advance in handling interleaved image-text data, assessing their few-shot learning capabilities remains an open challenge. In this paper, we introduce FewMMBench, a comprehensive benchmark designed to evaluate MLLMs under few-shot conditions, with a focus on In-Context Learning (ICL) and Chain-of-Thought (CoT) prompting. Covering a diverse suite of multi… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

    Comments: Preprint. 49 pages, 38 Figures, 5 Tables

  3. arXiv:2512.03619  [pdf, ps, other

    cs.CV

    LAMP: Language-Assisted Motion Planning for Controllable Video Generation

    Authors: Muhammed Burak Kizil, Enes Sanli, Niloy J. Mitra, Erkut Erdem, Aykut Erdem, Duygu Ceylan

    Abstract: Video generation has achieved remarkable progress in visual fidelity and controllability, enabling conditioning on text, layout, or motion. Among these, motion control - specifying object dynamics and camera trajectories - is essential for composing complex, cinematic scenes, yet existing interfaces remain limited. We introduce LAMP that leverages large language models (LLMs) as motion planners to… ▽ More

    Submitted 29 March, 2026; v1 submitted 3 December, 2025; originally announced December 2025.

    Comments: CVPR 2026. Project Page: https://cyberiada.github.io/LAMP/

  4. arXiv:2508.20221  [pdf, ps, other

    cs.CV

    Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360-Degree Videos

    Authors: Mert Cokelek, Halit Ozsoy, Nevrez Imamoglu, Cagri Ozcinar, Inci Ayhan, Erkut Erdem, Aykut Erdem

    Abstract: Omnidirectional videos (ODVs) are redefining viewer experiences in virtual reality (VR) by offering an unprecedented full field-of-view (FOV). This study extends the domain of saliency prediction to 360-degree environments, addressing the complexities of spherical distortion and the integration of spatial audio. Contextually, ODVs have transformed user experience by adding a spatial audio dimensio… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

    Comments: Accepted for publication in IEEE Transaction on Pattern Analysis and Machine Intelligence (IEEE TPAMI)

  5. arXiv:2508.16431  [pdf, ps, other

    cs.CL cs.AI

    Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish

    Authors: Yakup Abrek Er, Ilker Kesen, Gözde Gül Şahin, Aykut Erdem

    Abstract: We introduce Cetvel, a comprehensive benchmark designed to evaluate large language models (LLMs) in Turkish. Existing Turkish benchmarks often lack either task diversity or culturally relevant content, or both. Cetvel addresses these gaps by combining a broad range of both discriminative and generative tasks ensuring content that reflects the linguistic and cultural richness of Turkish language. C… ▽ More

    Submitted 22 August, 2025; originally announced August 2025.

    Comments: 31 pages, 2 figures, 10 tables

    ACM Class: I.2.7

  6. Absolute Parameters of Young Stars: NO Puppis

    Authors: Ahmet Erdem, Volkan Bakış, John Southworth, Michael D. Rhodes, Filiz Kahraman Aliçavuş, Edwin Budding, Mark Blackford, Timothy Banks, Murray Alexander

    Abstract: The southern early-type, young, eccentric-orbit eclipsing binary NO Puppis forms the A component of the multiple star Gaia DR3 552\-8147999779517568. The B component is an astrometric binary now at a separation of about 8.1 arcsec. There may be other fainter stars in this interesting but complex stellar system. We have combined several lines of evidence, including TESS data from 4 sectors, new gro… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

    Comments: 22 pages, 21 figures, 13 tables, accepted for publication in PASA

    Journal ref: Publ. Astron. Soc. Aust. 42 (2025) e120

  7. arXiv:2507.15824  [pdf, ps, other

    cs.CV

    Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models

    Authors: Enes Sanli, Baris Sarper Tezcan, Aykut Erdem, Erkut Erdem

    Abstract: Recent progress in text-to-video (T2V) generation has enabled the synthesis of visually compelling and temporally coherent videos from natural language. However, these models often fall short in basic physical commonsense, producing outputs that violate intuitive expectations around causality, object behavior, and tool use. Addressing this gap, we present PhysVidBench, a benchmark designed to eval… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  8. arXiv:2506.21681  [pdf, ps, other

    cs.CV cs.LG

    TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation

    Authors: Hakan Çapuk, Andrew Bond, Muhammed Burak Kızıl, Emir Göçen, Erkut Erdem, Aykut Erdem

    Abstract: Recent advances in image generation have led to remarkable improvements in synthesizing perspective images. However, these models still struggle with panoramic image generation due to unique challenges, including varying levels of geometric distortion and the requirement for seamless loop-consistency. To address these issues while leveraging the strengths of the existing models, we introduce TanDi… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  9. arXiv:2506.15339  [pdf, ps, other

    cs.CL

    DeVisE: Behavioral Testing of Medical Large Language Models

    Authors: Camila Zurdo Tagliabue, Heloisa Oss Boll, Aykut Erdem, Erkut Erdem, Iacer Calixto

    Abstract: Large language models (LLMs) are increasingly applied in clinical decision support, yet current evaluations rarely reveal whether their outputs reflect genuine medical reasoning or superficial correlations. We introduce DeVisE (Demographics and Vital signs Evaluation), a behavioral testing framework that probes fine-grained clinical understanding through controlled counterfactuals. Using intensive… ▽ More

    Submitted 26 February, 2026; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: Camera-ready version published at Findings of the EACL 2026

  10. arXiv:2501.10144  [pdf, other

    cs.CV

    A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features

    Authors: Enes Karanfil, Nevrez Imamoglu, Erkut Erdem, Aykut Erdem

    Abstract: Scene understanding in remote sensing often faces challenges in generating accurate representations for complex environments such as various land use areas or coastal regions, which may also include snow, clouds, or haze. To address this, we present a vision-language framework named Spectral LLaVA, which integrates multispectral data with vision-language alignment techniques to enhance scene repre… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  11. arXiv:2501.04782  [pdf, other

    cs.CV

    GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting

    Authors: Andrew Bond, Jui-Hsien Wang, Long Mai, Erkut Erdem, Aykut Erdem

    Abstract: Efficient neural representations for dynamic video scenes are critical for applications ranging from video compression to interactive simulations. Yet, existing methods often face challenges related to high memory usage, lengthy training times, and temporal consistency. To address these issues, we introduce a novel neural video representation that combines 3D Gaussian splatting with continuous cam… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: 10 pages, 10 figures

  12. arXiv:2411.12832  [pdf, other

    cs.CV

    HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and Manipulation

    Authors: Abdul Basit Anees, Ahmet Canberk Baykal, Muhammed Burak Kizil, Duygu Ceylan, Erkut Erdem, Aykut Erdem

    Abstract: Generative Adversarial Networks (GANs), particularly StyleGAN and its variants, have demonstrated remarkable capabilities in generating highly realistic images. Despite their success, adapting these models to diverse tasks such as domain adaptation, reference-guided synthesis, and text-guided manipulation with limited training data remains challenging. Towards this end, in this study, we present a… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: Accepted for publication in SIGGRAPH Asia 2024. Project Website: https://cyberiada.github.io/HyperGAN-CLIP/

  13. arXiv:2410.19164  [pdf, other

    cs.CV

    HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision

    Authors: Burak Ercan, Onur Eker, Aykut Erdem, Erkut Erdem

    Abstract: Low-light environments pose significant challenges for image enhancement methods. To address these challenges, in this work, we introduce the HUE dataset, a comprehensive collection of high-resolution event and frame sequences captured in diverse and challenging low-light conditions. Our dataset includes 106 sequences, encompassing indoor, cityscape, twilight, night, driving, and controlled scenar… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: 18 pages, 4 figures. Has been accepted for publication at the European Conference on Computer Vision Workshops (ECCVW), Milano, 2024. The project page can be found at https://ercanburak.github.io/HUE.html

  14. arXiv:2409.17303  [pdf, other

    astro-ph.SR

    Comparative study of the W UMa type binaries S Ant and Epsilon CrA

    Authors: Volkan Bakis, Edwin Budding, Ahmet Erdem, Tom Love, Mark G. Blackford, Wu Zihao, Adam Tang, Michael D. Rhodes, Timothy S. Banks

    Abstract: Contact binaries challenge contemporary stellar astrophysics with respect to their incidence, structure and evolution. We explore these issues through a detailed study of two bright examples: S Ant and Eps CrA, that permit high-resolution spectroscopy at a relatively good S/N ratio. The availability of high-quality photometry, including data from the TESS satellite as well as Gaia parallaxes, allo… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: This paper has been accepted for publication in PASA. 28 pages, 13 figures, 16 tables

  15. Modelling of eclipsing binary systems with pulsating components and tertiary companions: BF Vel and RR Lep

    Authors: Alexios Liakos, David J. W. Moriarty, Ahmet Erdem, Julian F. West, Phil Evans

    Abstract: This paper presents a comprehensive analysis of RR Lep and BF Vel, two short-period semi-detached oscillating Algols (oEA stars), which are shown to be triple systems. Spectral types of their primaries were determined and radial velocities calculated from spectra observed with the Australian National University's 2.3 m telescope and Wide Field Spectrograph. Spectra of the Na I D doublet confirmed… ▽ More

    Submitted 16 September, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

    Comments: 22 pages, 21 figures, 8 tables, 3 appendices, Accepted for publication in A&A

    Journal ref: A&A 691, A260 (2024)

  16. arXiv:2408.04658  [pdf, other

    cs.CL cs.AI cs.LG

    Winning Amazon KDD Cup'24

    Authors: Chris Deotte, Ivan Sorokin, Ahmet Erdem, Benedikt Schifferer, Gilberto Titericz Jr, Simon Jegou

    Abstract: This paper describes the winning solution of all 5 tasks for the Amazon KDD Cup 2024 Multi Task Online Shopping Challenge for LLMs. The challenge was to build a useful assistant, answering questions in the domain of online shopping. The competition contained 57 diverse tasks, covering 5 different task types (e.g. multiple choice) and across 4 different tracks (e.g. multi-lingual). Our solution is… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  17. arXiv:2407.12498  [pdf, other

    cs.CL cs.CV

    Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning

    Authors: Mustafa Dogan, Ilker Kesen, Iacer Calixto, Aykut Erdem, Erkut Erdem

    Abstract: The linguistic capabilities of Multimodal Large Language Models (MLLMs) are critical for their effective application across diverse tasks. This study aims to evaluate the performance of MLLMs on the VALSE benchmark, focusing on the efficacy of few-shot In-Context Learning (ICL), and Chain-of-Thought (CoT) prompting. We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in mode… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Preprint. 33 pages, 17 Figures, 3 Tables

  18. arXiv:2406.09368  [pdf, other

    cs.CV

    CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models

    Authors: Yigit Ekin, Ahmet Burak Yildirim, Erdem Eren Caglar, Aykut Erdem, Erkut Erdem, Aysegul Dundar

    Abstract: Advanced image editing techniques, particularly inpainting, are essential for seamlessly removing unwanted elements while preserving visual integrity. Traditional GAN-based methods have achieved notable success, but recent advancements in diffusion models have produced superior results due to their training on large-scale datasets, enabling the generation of remarkably realistic inpainted images.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://yigitekin.github.io/CLIPAway/

  19. arXiv:2405.00878  [pdf, other

    cs.CV

    SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

    Authors: Burak Can Biner, Farrin Marouf Sofian, Umur Berkay Karakaş, Duygu Ceylan, Erkut Erdem, Aykut Erdem

    Abstract: We are witnessing a revolution in conditional image synthesis with the recent success of large scale text-to-image generation methods. This success also opens up new opportunities in controlling the generation and editing process using multi-modal input. While spatial control using cues such as depth, sketch, and other images has attracted a lot of research, we argue that another equally effective… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  20. arXiv:2404.16621  [pdf, other

    cs.LG cs.AI cs.CL

    Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

    Authors: Emre Can Acikgoz, Osman Batur İnce, Rayene Bench, Arda Anıl Boz, İlker Kesen, Aykut Erdem, Erkut Erdem

    Abstract: The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  21. arXiv:2404.12013  [pdf, other

    cs.CL

    Sequential Compositional Generalization in Multimodal Models

    Authors: Semih Yagcioglu, Osman Batur İnce, Aykut Erdem, Erkut Erdem, Desmond Elliott, Deniz Yuret

    Abstract: The rise of large-scale multimodal models has paved the pathway for groundbreaking advances in generative modeling and reasoning, unlocking transformative applications in a variety of complex tasks. However, a pressing question that remains is their genuine capability for stronger forms of generalization, which has been largely underexplored in the multimodal setting. Our study aims to address thi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to the main conference of NAACL (2024) as a long paper

  22. arXiv:2311.07022  [pdf, other

    cs.CL cs.AI cs.CV

    ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

    Authors: Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, Erkut Erdem

    Abstract: With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm foo… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Preprint. 48 pages, 22 figures, 10 tables

  23. arXiv:2310.12118  [pdf, other

    cs.CL

    Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers

    Authors: Osman Batur İnce, Tanin Zeraati, Semih Yagcioglu, Yadollah Yaghoobzadeh, Erkut Erdem, Aykut Erdem

    Abstract: Neural networks have revolutionized language modeling and excelled in various downstream tasks. However, the extent to which these models achieve compositional generalization comparable to human cognitive abilities remains a topic of debate. While existing approaches in the field have mainly focused on novel architectures and alternative learning paradigms, we introduce a pioneering method harness… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  24. Hyperspectral Image Denoising via Self-Modulating Convolutional Neural Networks

    Authors: Orhan Torun, Seniha Esen Yuksel, Erkut Erdem, Nevrez Imamoglu, Aykut Erdem

    Abstract: Compared to natural images, hyperspectral images (HSIs) consist of a large number of bands, with each band capturing different spectral information from a certain wavelength, even some beyond the visible spectrum. These characteristics of HSIs make them highly effective for remote sensing applications. That said, the existing hyperspectral imaging devices introduce severe degradation in HSIs. Henc… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Journal ref: Signal Processing, Volume 214, January 2024, 109248

  25. arXiv:2308.13004  [pdf, other

    cs.CV cs.AI cs.MM

    Spherical Vision Transformer for 360-degree Video Saliency Prediction

    Authors: Mert Cokelek, Nevrez Imamoglu, Cagri Ozcinar, Erkut Erdem, Aykut Erdem

    Abstract: The growing interest in omnidirectional videos (ODVs) that capture the full field-of-view (FOV) has gained 360-degree saliency prediction importance in computer vision. However, predicting where humans look in 360-degree scenes presents unique challenges, including spherical distortion, high resolution, and limited labelled data. We propose a novel vision-transformer-based model for omnidirectiona… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: 12 pages, 4 figures, accepted to BMVC 2023

  26. arXiv:2307.08397  [pdf, other

    cs.CV

    CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

    Authors: Ahmet Canberk Baykal, Abdul Basit Anees, Duygu Ceylan, Erkut Erdem, Aykut Erdem, Deniz Yuret

    Abstract: Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. H… ▽ More

    Submitted 18 July, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in ACM Transactions on Graphics

  27. HyperE2VID: Improving Event-Based Video Reconstruction via Hypernetworks

    Authors: Burak Ercan, Onur Eker, Canberk Saglam, Aykut Erdem, Erkut Erdem

    Abstract: Event-based cameras are becoming increasingly popular for their ability to capture high-speed motion with low latency and high dynamic range. However, generating videos from events remains challenging due to the highly sparse and varying nature of event data. To address this, in this study, we propose HyperE2VID, a dynamic neural network architecture for event-based video reconstruction. Our appro… ▽ More

    Submitted 20 February, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 20 pages, 11 figures. Accepted by IEEE Transactions on Image Processing. The project page can be found at https://ercanburak.github.io/HyperE2VID.html

    Journal ref: IEEE Trans. Image Process., 33 (2024), 1826-1837

  28. EVREAL: Towards a Comprehensive Benchmark and Analysis Suite for Event-based Video Reconstruction

    Authors: Burak Ercan, Onur Eker, Aykut Erdem, Erkut Erdem

    Abstract: Event cameras are a new type of vision sensor that incorporates asynchronous and independent pixels, offering advantages over traditional frame-based cameras such as high dynamic range and minimal motion blur. However, their output is not easily understandable by humans, making the reconstruction of intensity images from event streams a fundamental task in event-based vision. While recent deep lea… ▽ More

    Submitted 5 April, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: 19 pages, 9 figures. Has been accepted for publication at the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, 2023. The project page can be found at https://ercanburak.github.io/evreal.html

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 3942-3951. 2023

  29. VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs

    Authors: Moayed Haji Ali, Andrew Bond, Tolga Birdal, Duygu Ceylan, Levent Karacan, Erkut Erdem, Aykut Erdem

    Abstract: We propose $\textbf{VidStyleODE}$, a spatiotemporally continuous disentangled $\textbf{Vid}$eo representation based upon $\textbf{Style}$GAN and Neural-$\textbf{ODE}$s. Effective traversal of the latent space learned by Generative Adversarial Networks (GANs) has been the basis for recent breakthroughs in image editing. However, the applicability of such advancements to the video domain has been hi… ▽ More

    Submitted 10 March, 2025; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Project website: https://cyberiada.github.io/VidStyleODE

    Journal ref: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

  30. arXiv:2304.03246  [pdf, other

    cs.CV

    Inst-Inpaint: Instructing to Remove Objects with Diffusion Models

    Authors: Ahmet Burak Yildirim, Vedat Baday, Erkut Erdem, Aykut Erdem, Aysegul Dundar

    Abstract: Image inpainting task refers to erasing unwanted pixels from images and filling them in a semantically consistent and realistic way. Traditionally, the pixels that are wished to be erased are defined with binary masks. From the application point of view, a user needs to generate the masks for the objects they would like to remove which can be time-consuming and prone to errors. In this work, we ar… ▽ More

    Submitted 9 August, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

  31. arXiv:2303.06907  [pdf, other

    cs.CV eess.IV

    ST360IQ: No-Reference Omnidirectional Image Quality Assessment with Spherical Vision Transformers

    Authors: Nafiseh Jabbari Tofighi, Mohamed Hedi Elfkir, Nevrez Imamoglu, Cagri Ozcinar, Erkut Erdem, Aykut Erdem

    Abstract: Omnidirectional images, aka 360 images, can deliver immersive and interactive visual experiences. As their popularity has increased dramatically in recent years, evaluating the quality of 360 images has become a problem of interest since it provides insights for capturing, transmitting, and consuming this new media. However, directly adapting quality assessment methods proposed for standard natura… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  32. First detailed study of two eccentric eclipsing binaries: TYC 5378-1590-1 and TYC 8378-252-1

    Authors: P. Zasche, D. Sürgit, A. Erdem, C. A. Engelbrecht, F. Marang

    Abstract: Aims: The analysis of combined photometry and spectroscopy of eccentric eclipsing binary systems facilitates the derivation of precise values for the parameters of the component stars and their orbits, thereby providing stringent tests of theories of stellar structure and evolution. In this paper two eccentric eclipsing binary systems, TYC 5378-1590-1 and TYC 8378-252-1, are studied in detail for… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: 11 pages, 11 figures, 4 tables + 2 appendix tables, published in: 2023A&A...670A..39Z

    Journal ref: 2023A&A...670A..39Z

  33. General Terms of All Almost Balancing Numbers of First and Second Type

    Authors: Ahmet Tekcan, Alper Erdem

    Abstract: In this work, we determined the general terms of all almost balancing numbers of first and second type in terms of balancing numbers and conversely we determined the general terms of all balancing numbers in terms of all almost balancing numbers of first and second type. We also set a correspondence between all almost balancing numbers of first and second type and Pell numbers.

    Submitted 19 November, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: 12 pages

    Report number: 11 MSC Class: 11B37; 11B39; 11D25

    Journal ref: Communications in Mathematics, Volume 31 (2023), Issue 1 (November 22, 2022) cm:10318

  34. arXiv:2211.04576  [pdf, other

    cs.CL cs.AI

    Detecting Euphemisms with Literal Descriptions and Visual Imagery

    Authors: İlker Kesen, Aykut Erdem, Erkut Erdem, Iacer Calixto

    Abstract: This paper describes our two-stage system for the Euphemism Detection shared task hosted by the 3rd Workshop on Figurative Language Processing in conjunction with EMNLP 2022. Euphemisms tone down expressions about sensitive or unpleasant issues like addiction and death. The ambiguous nature of euphemistic words or expressions makes it challenging to detect their actual meaning within a context. In… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 7 pages, 1 table, 1 figure. Accepted to the 3rd Workshop on Figurative Language Processing at EMNLP 2022. https://github.com/ilkerkesen/euphemism

  35. arXiv:2211.02980  [pdf, other

    cs.CV

    Disentangling Content and Motion for Text-Based Neural Video Manipulation

    Authors: Levent Karacan, Tolga Kerimoğlu, İsmail İnan, Tolga Birdal, Erkut Erdem, Aykut Erdem

    Abstract: Giving machines the ability to imagine possible new objects or scenes from linguistic descriptions and produce their realistic renderings is arguably one of the most challenging problems in computer vision. Recent advances in deep generative models have led to new approaches that give promising results towards this goal. In this paper, we introduce a new method called DiCoMoGAN for manipulating vi… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

  36. arXiv:2209.08564  [pdf, other

    cs.CV cs.LG eess.IV eess.SP

    Perception-Distortion Trade-off in the SR Space Spanned by Flow Models

    Authors: Cansu Korkmaz, A. Murat Tekalp, Zafer Dogan, Erkut Erdem, Aykut Erdem

    Abstract: Flow-based generative super-resolution (SR) models learn to produce a diverse set of feasible SR solutions, called the SR space. Diversity of SR solutions increases with the temperature ($τ$) of latent variables, which introduces random variations of texture among sample solutions, resulting in visual artifacts and low fidelity. In this paper, we present a simple but effective image ensembling/fus… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: 5 pages, 4 figures, accepted for publication in IEEE ICIP 2022 Conference

  37. V410 Puppis: A useful laboratory for early stellar evolution

    Authors: Ahmet Erdem, Derya Surgit, Burcu Ozkardes, Petr Hadrava, Micheal D. Rhodes, Tom Love, Mark G. Blackford, Timothy S. Banks, Edwin Budding

    Abstract: New spectrometric (HERCULES) and ground-based multi-colour photometric data on the multiple star V410 Puppis are combined with satellite photometry (HIPPARCOS and TESS), as well as historic astrometric observations. Absolute parameters for V410 Pup Aab are derived: $M_{Aa}$ = $3.15 \pm 0.10$, $M_{Ab}$ = $1.83 \pm 0.08$ (M$_{\odot}$); $R_{Aa}$ = $2.12 \pm 0.10$, $R_{Ab}$ = $1.52 \pm 0.08$ (R… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: 14 pages, 15 figures, 12 tables. MNRAS (accepted)

  38. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  39. arXiv:2108.02760  [pdf, other

    cs.CV

    SLAMP: Stochastic Latent Appearance and Motion Prediction

    Authors: Adil Kaan Akan, Erkut Erdem, Aykut Erdem, Fatma Güney

    Abstract: Motion is an important cue for video prediction and often utilized by separating video content into static and dynamic components. Most of the previous work utilizing motion is deterministic but there are stochastic methods that can model the inherent uncertainty of the future. Existing stochastic models either do not reason about motion explicitly or make limiting assumptions about the static par… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  40. Absolute Parameters of Young Stars: PU Pup

    Authors: A. Erdem, D. Surgit, T. S. Banks, B. Ozkardes, E. Budding

    Abstract: We present combined photometric and spectroscopic analyses of the southern binary star PU Pup. High-resolution spectra of this system were taken at the University of Canterbury Mt. John Observatory in the years 2008 and again in 2014-15. We find the light contribution of the secondary component to be only $\sim$2\% of the total light of the system in optical wavelengths, resulting in a single-line… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted by Research in Astronomy and Astrophysics

  41. The ultra-hot-Jupiter KELT-16 b: Dynamical Evolution and Atmospheric Properties

    Authors: L. Mancini, J. Southworth, L. Naponiello, O. Basturk, D. Barbato, F. Biagiotti, I. Bruni, L. Cabona, G. D'Ago, M. Damasso, A. Erdem, D. Evans, Th. Henning, O. Ozturk, D. Ricci, A. Sozzetti, J. Tregloan-Reed, S. Yalcinkayaz

    Abstract: We present broad-band photometry of 30 planetary transits of the ultra-hot Jupiter KELT-16b, using five medium-class telescopes. The transits were monitored through standard B, V, R, I filters and four were simultaneously observed from different places, for a total of 36 new light curves. We used these new photometric data and those from the TESS space telescope to review the main physical propert… ▽ More

    Submitted 5 November, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: 17 pages, 16 figures, Accepted for publication in Monthly Notices of the Royal Astronomical Society

    Journal ref: MNRAS 509, 1447-1464 (2022)

  42. arXiv:2102.07682  [pdf, other

    cs.CV

    A Gated Fusion Network for Dynamic Saliency Prediction

    Authors: Aysun Kocak, Erkut Erdem, Aykut Erdem

    Abstract: Predicting saliency in videos is a challenging problem due to complex modeling of interactions between spatial and temporal information, especially when ever-changing, dynamic nature of videos is considered. Recently, researchers have proposed large-scale datasets and models that take advantage of deep learning as a way to understand what's important for video saliency. These approaches, however,… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: Project page: https://hucvl.github.io/GFSalNet/

  43. Object and Relation Centric Representations for Push Effect Prediction

    Authors: Ahmet E. Tekden, Aykut Erdem, Erkut Erdem, Tamim Asfour, Emre Ugur

    Abstract: Pushing is an essential non-prehensile manipulation skill used for tasks ranging from pre-grasp manipulation to scene rearrangement, reasoning about object relations in the scene, and thus pushing actions have been widely studied in robotics. The effective use of pushing actions often requires an understanding of the dynamics of the manipulated objects and adaptation to the discrepancies between p… ▽ More

    Submitted 22 February, 2023; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: Project Page: https://fzaero.github.io/push_learning/

  44. Physical parameters of close binary systems: VIII

    Authors: K. Gazeas, S. Zola, A. Liakos, B. Zakrzewski, S. M. Rucinski, J. M. Kreiner, W. Ogloza, M. Drozdz, D. Koziel-Wierzbowska, G. Stachowski, M. Siwak, A. Baran, D. Kjurkchieva, D. Marchev, A. Erdem, S. Szalankiewicz

    Abstract: This paper presents the results of a combined spectroscopic and photometric study of 20 contact binary systems: HV Aqr, OO Aql, FI Boo, TX Cnc, OT Cnc, EE Cet, RWCom, KR Com, V401 Cyg, V345 Gem, AK Her, V502 Oph, V566 Oph, V2612 Oph, V1363 Ori, V351 Peg, V357 Peg, Y Sex, V1123 Tau and W UMa, which was conducted in the frame of the W UMa Project. Together with 51 already covered by the project and… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: 25 pages, 6 figures, 7 tables

    Journal ref: Monthly Notices of the Royal Astronomical Society, 2021, Volume 501, Issue 2, pp.2897-2919

  45. arXiv:2101.10044  [pdf, other

    cs.CL cs.CV

    Cross-lingual Visual Pre-training for Multimodal Machine Translation

    Authors: Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

    Abstract: Pre-trained language models have been shown to improve performance in many natural language tasks substantially. Although the early focus of such models was single language pre-training, recent advances have resulted in cross-lingual and visual pre-training methods. In this paper, we combine these two approaches to learn visually-grounded cross-lingual representations. Specifically, we extend the… ▽ More

    Submitted 20 April, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

    Comments: Accepted to EACL 2021 (Camera-ready version)

  46. arXiv:2012.07098  [pdf, other

    cs.CV

    MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish

    Authors: Begum Citamak, Ozan Caglayan, Menekse Kuyu, Erkut Erdem, Aykut Erdem, Pranava Madhyastha, Lucia Specia

    Abstract: Automatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other… ▽ More

    Submitted 13 December, 2020; originally announced December 2020.

  47. arXiv:2012.04293  [pdf, other

    cs.AI cs.CL cs.CV

    CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions

    Authors: Tayfun Ates, M. Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, Deniz Yuret

    Abstract: Humans are able to perceive, understand and reason about causal events. Developing models with similar physical and causal understanding capabilities is a long-standing goal of artificial intelligence. As a step towards this direction, we introduce CRAFT, a new video question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 58K video and q… ▽ More

    Submitted 1 March, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Accepted to Findings of ACL 2022

  48. Burst Photography for Learning to Enhance Extremely Dark Images

    Authors: Ahmet Serdar Karadeniz, Erkut Erdem, Aykut Erdem

    Abstract: Capturing images under extremely low-light conditions poses significant challenges for the standard camera pipeline. Images become too dark and too noisy, which makes traditional enhancement techniques almost impossible to apply. Recently, learning-based approaches have shown very promising results for this task since they have substantially more expressive capabilities to allow for improved quali… ▽ More

    Submitted 19 November, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Published in IEEE Transactions on Image Processing

  49. arXiv:2003.12739  [pdf, other

    cs.CV cs.CL cs.LG

    Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters

    Authors: İlker Kesen, Ozan Arkan Can, Erkut Erdem, Aykut Erdem, Deniz Yuret

    Abstract: How to best integrate linguistic and perceptual processing in multi-modal tasks that involve language and vision is an important open problem. In this work, we argue that the common practice of using language in a top-down manner, to direct visual attention over high-level visual features, may not be optimal. We hypothesize that the use of language to also condition the bottom-up processing from p… ▽ More

    Submitted 23 June, 2022; v1 submitted 28 March, 2020; originally announced March 2020.

    Comments: 13 pages, 6 figures, 6 tables. Appeared in MULA Workshop at CVPR 2022

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4610-4620

  50. arXiv:2003.07823   

    cs.CV

    Burst Denoising of Dark Images

    Authors: Ahmet Serdar Karadeniz, Erkut Erdem, Aykut Erdem

    Abstract: Capturing images under extremely low-light conditions poses significant challenges for the standard camera pipeline. Images become too dark and too noisy, which makes traditional image enhancement techniques almost impossible to apply. Very recently, researchers have shown promising results using learning based approaches. Motivated by these ideas, in this paper, we propose a deep learning framewo… ▽ More

    Submitted 18 June, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: This paper has been withdrawn by the authors to be replaced by a new version available at arXiv:2006.09845