Skip to main content

Showing 1–8 of 8 results for author: Heiter, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.11089  [pdf, other

    cs.LG cs.CV

    InfoClus: Informative Clustering of High-dimensional Data Embeddings

    Authors: Fuyin Lai, Edith Heiter, Guillaume Bied, Jefrey Lijffijt

    Abstract: Developing an understanding of high-dimensional data can be facilitated by visualizing that data using dimensionality reduction. However, the low-dimensional embeddings are often difficult to interpret. To facilitate the exploration and interpretation of low-dimensional embeddings, we introduce a new concept named partitioning with explanations. The idea is to partition the data shown through the… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 17 pages, 9 figures

  2. arXiv:2410.18417  [pdf, other

    cs.CL cs.LG

    Large Language Models Reflect the Ideology of their Creators

    Authors: Maarten Buyl, Alexander Rogiers, Sander Noels, Guillaume Bied, Iris Dominguez-Catena, Edith Heiter, Iman Johary, Alexandru-Cristian Mara, Raphaƫl Romero, Jefrey Lijffijt, Tijl De Bie

    Abstract: Large language models (LLMs) are trained on vast amounts of data to generate natural language, enabling them to perform tasks like text summarization and question answering. These models have become popular in artificial intelligence (AI) assistants like ChatGPT and already play an influential role in how humans access information. However, the behavior of LLMs varies depending on their design, tr… ▽ More

    Submitted 30 January, 2025; v1 submitted 24 October, 2024; originally announced October 2024.

  3. arXiv:2406.12953  [pdf, other

    cs.GR cs.HC cs.LG

    Pattern or Artifact? Interactively Exploring Embedding Quality with TRACE

    Authors: Edith Heiter, Liesbet Martens, Ruth Seurinck, Martin Guilliams, Tijl De Bie, Yvan Saeys, Jefrey Lijffijt

    Abstract: This paper presents TRACE, a tool to analyze the quality of 2D embeddings generated through dimensionality reduction techniques. Dimensionality reduction methods often prioritize preserving either local neighborhoods or global distances, but insights from visual structures can be misleading if the objective has not been achieved uniformly. TRACE addresses this challenge by providing a scalable and… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 4 pages, 3 figures, Accepted at ECML-PKDD 2024. For a demo video, see https://youtu.be/mtyFzXt51Jw. Code is available at https://github.com/aida-ugent/TRACE

  4. Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors

    Authors: Edith Heiter, Bo Kang, Ruth Seurinck, Jefrey Lijffijt

    Abstract: Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal of known cluster information from the embedding, to obtain a visualization revealing structure beyond label information. This is useful, for example, when one wants to factor out unwanted differences between a set of classes. We show that ct-SNE fails in many realistic settings, namely if the data is well clustered over… ▽ More

    Submitted 11 April, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: 20 pages including supplement

    Journal ref: Advances in Intelligent Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, vol 13876. Springer, Cham

  5. arXiv:2301.03338  [pdf, other

    cs.LG

    Topologically Regularized Data Embeddings

    Authors: Edith Heiter, Robin Vandaele, Tijl De Bie, Yvan Saeys, Jefrey Lijffijt

    Abstract: Unsupervised representation learning methods are widely used for gaining insight into high-dimensional, unstructured, or structured data. In some cases, users may have prior topological knowledge about the data, such as a known cluster structure or the fact that the data is known to lie along a tree- or graph-structured topology. However, generic methods to ensure such structure is salient in the… ▽ More

    Submitted 7 November, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: 52 pages, preprint, under review

  6. arXiv:2111.03168  [pdf, other

    cs.LG

    ExClus: Explainable Clustering on Low-dimensional Data Representations

    Authors: Xander Vankwikelberge, Bo Kang, Edith Heiter, Jefrey Lijffijt

    Abstract: Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret. We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable, such as when the data is projected onto a two-dimensional space using a dimensionality-reduction method. Spe… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 15 pages, 7 figures

  7. arXiv:2103.01828  [pdf, other

    cs.LG stat.ML

    Factoring out prior knowledge from low-dimensional embeddings

    Authors: Edith Heiter, Jonas Fischer, Jilles Vreeken

    Abstract: Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing high-dimensional data and therewith facilitate the discovery of interesting structure. Although they are widely used, they visualize data as is, rather than in light of the background knowledge we have about the data. What we already know, however, strongly determines what is novel and hence interesting. In this paper we… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: 27 pages, 17 figures

  8. Verification of PCP-Related Computational Reductions in Coq

    Authors: Yannick Forster, Edith Heiter, Gert Smolka

    Abstract: We formally verify several computational reductions concerning the Post correspondence problem (PCP) using the proof assistant Coq. Our verifications include a reduction of a string rewriting problem generalising the halting problem for Turing machines to PCP, and reductions of PCP to the intersection problem and the palindrome problem for context-free grammars. Interestingly, rigorous correctness… ▽ More

    Submitted 18 July, 2018; v1 submitted 19 November, 2017; originally announced November 2017.