doi 10.1038/s41538-026-00764-0

Mapping Regional Disparities in Discounted Grocery Products

Authors: Antonio Desiderio, Alessia Galdeman, Franziska Bauerlein, Sune Lehmann

Abstract: Food waste represents a major challenge to global climate resilience, accounting for almost 10% of annual greenhouse gas emissions. The retail sector is a critical player, mediating product flows between producers and consumers, where supply chain inefficiencies can shape which items are put on sale. Yet how these dynamics vary across geographic contexts remains largely unexplored. Here, we analyz… ▽ More Food waste represents a major challenge to global climate resilience, accounting for almost 10% of annual greenhouse gas emissions. The retail sector is a critical player, mediating product flows between producers and consumers, where supply chain inefficiencies can shape which items are put on sale. Yet how these dynamics vary across geographic contexts remains largely unexplored. Here, we analyze data from Denmark's largest retail group on near-expiry products put on sale. We uncover the geospatial variations using a dual-clustering approach. We characterize multi-scale spatial relationships in retail organization by correlating store clustering -- measured using shortest-path distances along the street network -- with product clustering based on promotion co-occurrence patterns. Using a bipartite network approach, we identify three regional store clusters, and use percolation thresholds to corroborate the scale of their spatial separation. We find that stores in rural communities put meat and dairy products on sale up to 2.2 times more frequently than metropolitan areas. In contrast, metropolitan and capital regions lean toward convenience products, which have more balanced nutritional profiles but less favorable environmental impacts. By linking geographic context to retail inventory, we provide evidence that reducing food waste requires interventions tailored to local retail dynamics, highlighting the importance of region-specific sustainability strategies. △ Less

Submitted 7 January, 2026; v1 submitted 31 October, 2025; originally announced October 2025.

Comments: 21 pages, 3 figures

arXiv:2510.23639 [pdf, ps, other]

Integrating Genomics into Multimodal EHR Foundation Models

Authors: Jonathan Amar, Edward Liu, Alessandra Breschi, Liangliang Zhang, Pouya Kheradpour, Sylvia Li, Lisa Soleymani Lehmann, Alessandro Giulianelli, Matt Edwards, Yugang Jia, David Nola, Raghav Mani, Pankaj Vats, Jesse Tetreault, T. J. Chen, Cory Y. McLean

Abstract: This paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR-only approaches to build more holistic health profiles. Leveraging the extensive and diverse data from the All of Us (AoU) Research Program, this multimodal framework aims to learn complex relationships betwee… ▽ More This paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR-only approaches to build more holistic health profiles. Leveraging the extensive and diverse data from the All of Us (AoU) Research Program, this multimodal framework aims to learn complex relationships between clinical data and genetic predispositions. The methodology extends advancements in generative AI to the EHR foundation model space, enhancing predictive capabilities and interpretability. Evaluation on AoU data demonstrates the model's predictive value for the onset of various conditions, particularly Type 2 Diabetes (T2D), and illustrates the interplay between PRS and EHR data. The work also explores transfer learning for custom classification tasks, showcasing the architecture's versatility and efficiency. This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies, laying the groundwork for more personalized, equitable, and actionable real-world evidence generation in healthcare. △ Less

Submitted 14 November, 2025; v1 submitted 24 October, 2025; originally announced October 2025.

arXiv:2508.16519 [pdf]

The Community Index: A More Comprehensive Approach to Assessing Scholarly Impact

Authors: Arav Kumar, Cameron Sabet, Alessandro Hammond, Amelia Fiske, Bhav Jain, Deirdre Goode, Dharaa Suresha, Leo Anthony Celi, Lisa Soleymani Lehmann, Ned Mccague, Rawan Abulibdeh, Sameer Pradhan

Abstract: The h index is a widely recognized metric for assessing the research impact of scholars, defined as the maximum value h such that the scholar has published h papers each cited at least h times. While it has proven useful measuring individual scholarly productivity and citation impact, the h index has limitations, such as an inability to account for interdisciplinary collaboration or demographic di… ▽ More The h index is a widely recognized metric for assessing the research impact of scholars, defined as the maximum value h such that the scholar has published h papers each cited at least h times. While it has proven useful measuring individual scholarly productivity and citation impact, the h index has limitations, such as an inability to account for interdisciplinary collaboration or demographic differences in citation patterns. Moreover, it is sometimes mistakenly treated as a measure of research quality, even though it only reflects how often work has been cited. While metric based evaluations of research have grown in importance in some areas of academia, such as medicine, these evaluations fail to consider other important aspects of intellectual work, such as representational and epistemic diversity in research. In this article, we propose a new metric called the c index, or the community index, which combines multiple dimensions of scholarly impact. This is important because a plurality of perspectives and lived experiences within author teams can promote epistemological reflection and humility as part of the creation and validation of scientific knowledge. The c index is a means of accounting for the often global, and increasingly interdisciplinary nature of contemporary research, in particular, the data that is collected, curated and analyzed in the process of scientific inquiry. While the c index provides a means of quantifying diversity within research teams, diversity is integral to the advancement of scientific excellence and should be actively fostered through formal recognition and valuation. We herein describe the mathematical foundation of the c index and demonstrate its potential to provide a more comprehensive representation and more multidimensional assessment of scientific contributions of research impact as compared to the h index. △ Less

Submitted 22 August, 2025; originally announced August 2025.

Comments: 22 pages 49 references

arXiv:2508.05488 [pdf, ps, other]

Modeling roles and trade-offs in multiplex networks

Authors: Nikolaos Nakis, Sune Lehmann, Nicholas A. Christakis, Morten Mørup

Abstract: A multiplex social network captures multiple types of social relations among the same set of people, with each layer representing a distinct type of relationship. Understanding the structure of such systems allows us to identify how social exchanges may be driven by a person's own attributes and actions (independence), the status or resources of others (dependence), and mutual influence between en… ▽ More A multiplex social network captures multiple types of social relations among the same set of people, with each layer representing a distinct type of relationship. Understanding the structure of such systems allows us to identify how social exchanges may be driven by a person's own attributes and actions (independence), the status or resources of others (dependence), and mutual influence between entities (interdependence). Characterizing structure in multiplex networks is challenging, as the distinct layers can reflect different yet complementary roles, with interdependence emerging across multiple scales. Here, we introduce the Multiplex Latent Trade-off Model (MLT), a framework for extracting roles in multiplex social networks that accounts for independence, dependence, and interdependence. MLT defines roles as trade-offs, requiring each node to distribute its source and target roles across layers while simultaneously distributing community memberships within hierarchical, multi-scale structures. Applying the MLT approach to 176 real-world multiplex networks, composed of social, health, and economic layers, from villages in western Honduras, we see core social exchange principles emerging, while also revealing local, layer-specific, and multi-scale communities. Link prediction analyses reveal that modeling interdependence yields the greatest performance gains in the social layer, with subtler effects in health and economic layers. This suggests that social ties are structurally embedded, whereas health and economic ties are primarily shaped by individual status and behavioral engagement. Our findings offer new insights into the structure of human social systems. △ Less

Submitted 7 August, 2025; originally announced August 2025.

Comments: Preprint

arXiv:2505.08902 [pdf]

Performance Gains of LLMs With Humans in a World of LLMs Versus Humans

Authors: Lucas McCullum, Pelagie Ami Agassi, Leo Anthony Celi, Daniel K. Ebner, Chrystinne Oliveira Fernandes, Rachel S. Hicklen, Mkliwa Koumbia, Lisa Soleymani Lehmann, David Restrepo

Abstract: Currently, a considerable research effort is devoted to comparing LLMs to a group of human experts, where the term "expert" is often ill-defined or variable, at best, in a state of constantly updating LLM releases. Without proper safeguards in place, LLMs will threaten to cause harm to the established structure of safe delivery of patient care which has been carefully developed throughout history… ▽ More Currently, a considerable research effort is devoted to comparing LLMs to a group of human experts, where the term "expert" is often ill-defined or variable, at best, in a state of constantly updating LLM releases. Without proper safeguards in place, LLMs will threaten to cause harm to the established structure of safe delivery of patient care which has been carefully developed throughout history to keep the safety of the patient at the forefront. A key driver of LLM innovation is founded on community research efforts which, if continuing to operate under "humans versus LLMs" principles, will expedite this trend. Therefore, research efforts moving forward must focus on effectively characterizing the safe use of LLMs in clinical settings that persist across the rapid development of novel LLM models. In this communication, we demonstrate that rather than comparing LLMs to humans, there is a need to develop strategies enabling efficient work of humans with LLMs in an almost symbiotic manner. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2412.01955 [pdf]

The use of large language models to enhance cancer clinical trial educational materials

Authors: Mingye Gao, Aman Varshney, Shan Chen, Vikram Goddla, Jack Gallifant, Patrick Doyle, Claire Novack, Maeve Dillon-Martin, Teresia Perkins, Xinrong Correia, Erik Duhaime, Howard Isenstein, Elad Sharon, Lisa Soleymani Lehmann, David Kozono, Brian Anthony, Dmitriy Dligach, Danielle S. Bitterman

Abstract: Cancer clinical trials often face challenges in recruitment and engagement due to a lack of participant-facing informational and educational resources. This study investigated the potential of Large Language Models (LLMs), specifically GPT4, in generating patient-friendly educational content from clinical trial informed consent forms. Using data from ClinicalTrials.gov, we employed zero-shot learn… ▽ More Cancer clinical trials often face challenges in recruitment and engagement due to a lack of participant-facing informational and educational resources. This study investigated the potential of Large Language Models (LLMs), specifically GPT4, in generating patient-friendly educational content from clinical trial informed consent forms. Using data from ClinicalTrials.gov, we employed zero-shot learning for creating trial summaries and one-shot learning for developing multiple-choice questions, evaluating their effectiveness through patient surveys and crowdsourced annotation. Results showed that GPT4-generated summaries were both readable and comprehensive, and may improve patients' understanding and interest in clinical trials. The multiple-choice questions demonstrated high accuracy and agreement with crowdsourced annotators. For both resource types, hallucinations were identified that require ongoing human oversight. The findings demonstrate the potential of LLMs "out-of-the-box" to support the generation of clinical trial education materials with minimal trial-specific engineering, but implementation with a human-in-the-loop is still needed to avoid misinformation risks. △ Less

Submitted 3 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

arXiv:2409.11099 [pdf, other]

Unveiling the Social Fabric: A Temporal, Nation-Scale Social Network and its Characteristics

Authors: Jolien Cremers, Benjamin Kohler, Benjamin Frank Maier, Stine Nymann Eriksen, Johanna Einsiedler, Frederik Kølby Christensen, Sune Lehmann, David Dreyer Lassen, Laust Hvas Mortensen, Andreas Bjerre-Nielsen

Abstract: Social networks shape individuals' lives, influencing everything from career paths to health. This paper presents a registry-based, multi-layer and temporal network of the entire Danish population in the years 2008-2021 (roughly 7.2 mill. individuals). Our network maps the relationships formed through family, households, neighborhoods, colleagues and classmates. We outline key properties of this m… ▽ More Social networks shape individuals' lives, influencing everything from career paths to health. This paper presents a registry-based, multi-layer and temporal network of the entire Danish population in the years 2008-2021 (roughly 7.2 mill. individuals). Our network maps the relationships formed through family, households, neighborhoods, colleagues and classmates. We outline key properties of this multiplex network, introducing both an individual-focused perspective as well as a bipartite representation. We show how to aggregate and combine the layers, and how to efficiently compute network measures such as shortest paths in large administrative networks. Our analysis reveals how past connections reappear later in other layers, that the number of relationships aggregated over time reflects the position in the income distribution, and that we can recover canonical shortest path length distributions when appropriately weighting connections. Along with the network data, we release a Python package that uses the bipartite network representation for efficient analysis. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2405.07574 [pdf, other]

Is it getting harder to make a hit? Evidence from 65 years of US music chart history

Authors: Marta Ewa Lech, Sune Lehmann, Jonas L. Juul

Abstract: Since the creation of the Billboard Hot 100 music chart in 1958, the chart has been a window into the music consumption of Americans. Which songs succeed on the chart is decided by consumption volumes, which can be affected by consumer music taste, and other factors such as advertisement budgets, airplay time, the specifics of ranking algorithms, and more. Since its introduction, the chart has doc… ▽ More Since the creation of the Billboard Hot 100 music chart in 1958, the chart has been a window into the music consumption of Americans. Which songs succeed on the chart is decided by consumption volumes, which can be affected by consumer music taste, and other factors such as advertisement budgets, airplay time, the specifics of ranking algorithms, and more. Since its introduction, the chart has documented music consumerism through eras of globalization, economic growth, and the emergence of new technologies for music listening. In recent years, musicians and other hitmakers have voiced their worry that the music world is changing: Many claim that it is getting harder to make a hit but until now, the claims have not been backed using chart data. Here we show that the dynamics of the Billboard Hot 100 chart have changed significantly since the chart's founding in 1958, and in particular in the past 15 years. Whereas most songs spend less time on the chart now than songs did in the past, we show that top-1 songs have tripled their chart lifetime since the 1960s, the highest-ranked songs maintain their positions for far longer than previously, and the lowest-ranked songs are replaced more frequently than ever. At the same time, who occupies the chart has also changed over the years: In recent years, fewer new artists make it into the chart and more positions are occupied by established hit makers. Finally, investigating how song chart trajectories have changed over time, we show that historical song trajectories cluster into clear trajectory archetypes characteristic of the time period they were part of. The results are interesting in the context of collective attention: Whereas recent studies have documented that other cultural products such as books, news, and movies fade in popularity quicker in recent years, music hits seem to last longer now than in the past. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 17 pages, 4 figures

arXiv:2403.03143 [pdf, other]

Using Smartphones to Study Vaccination Decisions in the Wild

Authors: Nicolò Alessandro Girardini, Arkadiusz Stopczynski, Olga Baranov, Cornelia Betsch, Dirk Brockmann, Sune Lehmann, Robert Böhm

Abstract: One of the most important tools available to limit the spread and impact of infectious diseases is vaccination. It is therefore important to understand what factors determine people's vaccination decisions. To this end, previous behavioural research made use of, (i) controlled but often abstract or hypothetical studies (e.g., vignettes) or, (ii) realistic but typically less flexible studies that m… ▽ More One of the most important tools available to limit the spread and impact of infectious diseases is vaccination. It is therefore important to understand what factors determine people's vaccination decisions. To this end, previous behavioural research made use of, (i) controlled but often abstract or hypothetical studies (e.g., vignettes) or, (ii) realistic but typically less flexible studies that make it difficult to understand individual decision processes (e.g., clinical trials). Combining the best of these approaches, we propose integrating real-world Bluetooth contacts via smartphones in several rounds of a game scenario, as a novel methodology to study vaccination decisions and disease spread. In our 12-week proof-of-concept study conducted with $N$ = 494 students, we found that participants strongly responded to some of the information provided to them during or after each decision round, particularly those related to their individual health outcomes. In contrast, information related to others' decisions and outcomes (e.g., the number of vaccinated or infected individuals) appeared to be less important. We discuss the potential of this novel method and point to fruitful areas for future research. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00032 [pdf, other]

Time to Cite: Modeling Citation Networks using the Dynamic Impact Single-Event Embedding Model

Authors: Nikolaos Nakis, Abdulkadir Celikkanat, Louis Boucherie, Sune Lehmann, Morten Mørup

Abstract: Understanding the structure and dynamics of scientific research, i.e., the science of science (SciSci), has become an important area of research in order to address imminent questions including how scholars interact to advance science, how disciplines are related and evolve, and how research impact can be quantified and predicted. Central to the study of SciSci has been the analysis of citation ne… ▽ More Understanding the structure and dynamics of scientific research, i.e., the science of science (SciSci), has become an important area of research in order to address imminent questions including how scholars interact to advance science, how disciplines are related and evolve, and how research impact can be quantified and predicted. Central to the study of SciSci has been the analysis of citation networks. Here, two prominent modeling methodologies have been employed: one is to assess the citation impact dynamics of papers using parametric distributions, and the other is to embed the citation networks in a latent space optimal for characterizing the static relations between papers in terms of their citations. Interestingly, citation networks are a prominent example of single-event dynamic networks, i.e., networks for which each dyad only has a single event (i.e., the point in time of citation). We presently propose a novel likelihood function for the characterization of such single-event networks. Using this likelihood, we propose the Dynamic Impact Single-Event Embedding model (DISEE). The \textsc{\modelabbrev} model characterizes the scientific interactions in terms of a latent distance model in which random effects account for citation heterogeneity while the time-varying impact is characterized using existing parametric representations for assessment of dynamic impact. We highlight the proposed approach on several real citation networks finding that the DISEE well reconciles static latent distance network embedding approaches with classical dynamic impact assessments. △ Less

Submitted 28 February, 2024; originally announced March 2024.

Comments: Accepted for AISTATS 2024

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2306.03009 [pdf, other]

doi 10.1038/s43588-023-00573-5

Using Sequences of Life-events to Predict Human Lives

Authors: Germans Savcisens, Tina Eliassi-Rad, Lars Kai Hansen, Laust Mortensen, Lau Lilleholt, Anna Rogers, Ingo Zettler, Sune Lehmann

Abstract: Over the past decade, machine learning has revolutionized computers' ability to analyze text through flexible computational models. Due to their structural similarity to written language, transformer-based architectures have also shown promise as tools to make sense of a range of multi-variate sequences from protein-structures, music, electronic health records to weather-forecasts. We can also rep… ▽ More Over the past decade, machine learning has revolutionized computers' ability to analyze text through flexible computational models. Due to their structural similarity to written language, transformer-based architectures have also shown promise as tools to make sense of a range of multi-variate sequences from protein-structures, music, electronic health records to weather-forecasts. We can also represent human lives in a way that shares this structural similarity to language. From one perspective, lives are simply sequences of events: People are born, visit the pediatrician, start school, move to a new location, get married, and so on. Here, we exploit this similarity to adapt innovations from natural language processing to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on arguably the most comprehensive registry data in existence, available for an entire nation of more than six million individuals across decades. Our data include information about life-events related to health, education, occupation, income, address, and working hours, recorded with day-to-day resolution. We create embeddings of life-events in a single vector space showing that this embedding space is robust and highly structured. Our models allow us to predict diverse outcomes ranging from early mortality to personality nuances, outperforming state-of-the-art models by a wide margin. Using methods for interpreting deep learning models, we probe the algorithm to understand the factors that enable our predictions. Our framework allows researchers to identify new potential mechanisms that impact life outcomes and associated possibilities for personalized interventions. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Journal ref: Nature Computational Science 4 (2024) 43-56

arXiv:2306.01930 [pdf, other]

Structural Similarities Between Language Models and Neural Response Measurements

Authors: Jiaang Li, Antonia Karamolegkou, Yova Kementchedjhieva, Mostafa Abdou, Sune Lehmann, Anders Søgaard

Abstract: Large language models (LLMs) have complicated internal dynamics, but induce representations of words and phrases whose geometry we can study. Human language processing is also opaque, but neural response measurements can provide (noisy) recordings of activation during listening or reading, from which we can extract similar representations of words and phrases. Here we study the extent to which the… ▽ More Large language models (LLMs) have complicated internal dynamics, but induce representations of words and phrases whose geometry we can study. Human language processing is also opaque, but neural response measurements can provide (noisy) recordings of activation during listening or reading, from which we can extract similar representations of words and phrases. Here we study the extent to which the geometries induced by these representations, share similarities in the context of brain decoding. We find that the larger neural language models get, the more their representations are structurally similar to neural response measurements from brain imaging. Code is available at \url{https://github.com/coastalcph/brainlm}. △ Less

Submitted 31 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: NeurReps@NeurIPS 2023

arXiv:2303.08107 [pdf]

doi 10.1093/beheco/araf132

Far-reaching consequences of trait preferences for animal social network structure and function

Authors: Josefine Bohr Brask, Andreas Koher, Darren P. Croft, Sune Lehmann

Abstract: Social network structures play an important role in the lives of animals by affecting individual fitness and the spread of disease and information. Nevertheless, we still lack a good understanding of how these structures emerge from the behavior of individuals. Generative network models provide a powerful approach that can help close this gap. Empirical research has shown that trait-based social p… ▽ More Social network structures play an important role in the lives of animals by affecting individual fitness and the spread of disease and information. Nevertheless, we still lack a good understanding of how these structures emerge from the behavior of individuals. Generative network models provide a powerful approach that can help close this gap. Empirical research has shown that trait-based social preferences (preferences for social partners with certain trait values, such as sex, body size, relatedness etc.) play a key role in the formation of social networks across species. Currently, however, we lack a good understanding of how such preferences affect network properties. In this study: 1) we develop a general and flexible generative network model that can create artificial (simulated) networks where social connection is affected by trait-based social preferences; 2) we use this model to investigate how different trait-based social preferences affect social network structure and function. We find that the preferences can affect the efficiency of the networks in terms of transmitting disease and information, and their robustness against fragmentation when individuals disappear, with the effects often - but not always - going in the direction of slower transmission and lower robustness. Furthermore, the extent and form of the effects depend on both the type of preference and the type of trait it is used with. The findings lead to new insights about the potential mechanisms driving the structural diversity of animal social networks, the importance of trait value distributions for social structure, the degree distributions of social networks, and the detectability of trait effects from network data. Overall, the study shows that trait-based social preferences can have far-reaching consequences for populations. △ Less

Submitted 13 February, 2026; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: 27 pages, 5 figures

Journal ref: Behavioral Ecology, 37(1), araf132 (2026)

arXiv:2302.05657 [pdf]

Dialectograms: Machine Learning Differences between Discursive Communities

Authors: Thyge Enggaard, August Lohse, Morten Axel Pedersen, Sune Lehmann

Abstract: Word embeddings provide an unsupervised way to understand differences in word usage between discursive communities. A number of recent papers have focused on identifying words that are used differently by two or more communities. But word embeddings are complex, high-dimensional spaces and a focus on identifying differences only captures a fraction of their richness. Here, we take a step towards l… ▽ More Word embeddings provide an unsupervised way to understand differences in word usage between discursive communities. A number of recent papers have focused on identifying words that are used differently by two or more communities. But word embeddings are complex, high-dimensional spaces and a focus on identifying differences only captures a fraction of their richness. Here, we take a step towards leveraging the richness of the full embedding space, by using word embeddings to map out how words are used differently. Specifically, we describe the construction of dialectograms, an unsupervised way to visually explore the characteristic ways in which each community use a focal word. Based on these dialectograms, we provide a new measure of the degree to which words are used differently that overcomes the tendency for existing measures to pick out low frequent or polysemous words. We apply our methods to explore the discourses of two US political subreddits and show how our methods identify stark affective polarisation of politicians and political entities, differences in the assessment of proper political action as well as disagreement about whether certain issues require political intervention at all. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2205.08820 [pdf, other]

Generating fine-grained surrogate temporal networks

Authors: Antonio Longa, Giulia Cencetti, Sune Lehmann, Andrea Passerini, Bruno Lepri

Abstract: Temporal networks are essential for modeling and understanding systems whose behavior varies in time, from social interactions to biological systems. Often, however, real-world data are prohibitively expensive to collect in a large scale or unshareable due to privacy concerns. A promising way to bypass the problem consists in generating arbitrarily large and anonymized synthetic graphs with the pr… ▽ More Temporal networks are essential for modeling and understanding systems whose behavior varies in time, from social interactions to biological systems. Often, however, real-world data are prohibitively expensive to collect in a large scale or unshareable due to privacy concerns. A promising way to bypass the problem consists in generating arbitrarily large and anonymized synthetic graphs with the properties of real-world networks, namely `surrogate networks'. Until now, the generation of realistic surrogate temporal networks has remained an open problem, due to the difficulty of capturing both the temporal and topological properties of the input network, as well as their correlations, in a scalable model. Here, we propose a novel and simple method for generating surrogate temporal networks. Our method decomposes the input network into star-like structures evolving in time. Then those structures are used as building blocks to generate a surrogate temporal network. Our model vastly outperforms current methods across multiple examples of temporal networks in terms of both topological and dynamical similarity. We further show that beyond generating realistic interaction patterns, our method is able to capture intrinsic temporal periodicity of temporal networks, all with an execution time lower than competing methods by multiple orders of magnitude. The simplicity of our algorithm makes it easily interpretable, extendable and algorithmically scalable. △ Less

Submitted 22 August, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

arXiv:2203.09884 [pdf, other]

doi 10.4204/EPTCS.355.4

Modeling R$^3$ Needle Steering in Uppaal

Authors: Sascha Lehmann, Antje Rogalla, Maximilian Neidhardt, Anton Reinecke, Alexander Schlaefer, Sibylle Schupp

Abstract: Medical cyber-physical systems are safety-critical, and as such, require ongoing verification of their correct behavior, as system failure during run time may cause severe (or even fatal) personal damage. However, creating a verifiable model often conflicts with other application requirements, most notably regarding data precision and model accuracy, as efficient model checking promotes discrete d… ▽ More Medical cyber-physical systems are safety-critical, and as such, require ongoing verification of their correct behavior, as system failure during run time may cause severe (or even fatal) personal damage. However, creating a verifiable model often conflicts with other application requirements, most notably regarding data precision and model accuracy, as efficient model checking promotes discrete data (over continuous) and abstract models to reduce the state space. In this paper, we approach the task of medical needle steering in soft tissue around potential obstacles. We design a verifiable model of needle motion (implemented in Uppaal Stratego) and a framework embedding the model for online needle steering. We mitigate the conflict by imposing boundedness on both the data types, reducing from R^3 to Z^3 when needed, and the motion and environment models, reducing the set of allowed local actions and global paths. In experiments, we successfully apply the static model alone, as well as the dynamic framework in scenarios with varying environment complexity and both a virtual and real needle setting, where up to 100% of targets were reached depending on the scenario and needle. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: In Proceedings MARS 2022, arXiv:2203.09299

Journal ref: EPTCS 355, 2022, pp. 40-59

arXiv:2110.12590 [pdf, other]

doi 10.4204/EPTCS.348.9

Online Strategy Synthesis for Safe and Optimized Control of Steerable Needles

Authors: Sascha Lehmann, Antje Rogalla, Maximilian Neidhardt, Alexander Schlaefer, Sibylle Schupp

Abstract: Autonomous systems are often applied in uncertain environments, which require prospective action planning and retrospective data evaluation for future planning to ensure safe operation. Formal approaches may support these systems with safety guarantees, but are usually expensive and do not scale well with growing system complexity. In this paper, we introduce online strategy synthesis based on cla… ▽ More Autonomous systems are often applied in uncertain environments, which require prospective action planning and retrospective data evaluation for future planning to ensure safe operation. Formal approaches may support these systems with safety guarantees, but are usually expensive and do not scale well with growing system complexity. In this paper, we introduce online strategy synthesis based on classical strategy synthesis to derive formal safety guarantees while reacting and adapting to environment changes. To guarantee safety online, we split the environment into region types which determine the acceptance of action plans and trigger local correcting actions. Using model checking on a frequently updated model, we can then derive locally safe action plans (prospectively), and match the current model against new observations via reachability checks (retrospectively). As use case, we successfully apply online strategy synthesis to medical needle steering, i.e., navigating a (flexible and beveled) needle through tissue towards a target without damaging its surroundings. △ Less

Submitted 24 October, 2021; originally announced October 2021.

Comments: In Proceedings FMAS 2021, arXiv:2110.11527

Journal ref: EPTCS 348, 2021, pp. 128-135

arXiv:2108.08641 [pdf, other]

Successive cohorts of Twitter users show increasing activity and shrinking content horizons

Authors: Frederik Wolf, Philipp Lorenz-Spreen, Sune Lehmann

Abstract: The global public sphere has changed dramatically over the past decades: a significant part of public discourse now takes place on algorithmically driven platforms owned by a handful of private companies. Despite its growing importance, there is scant large-scale academic research on the long-term evolution of user behaviour on these platforms, because the data are often proprietary to the platfor… ▽ More The global public sphere has changed dramatically over the past decades: a significant part of public discourse now takes place on algorithmically driven platforms owned by a handful of private companies. Despite its growing importance, there is scant large-scale academic research on the long-term evolution of user behaviour on these platforms, because the data are often proprietary to the platforms. Here, we evaluate the individual behaviour of 600,000 Twitter users between 2012 and 2019 and find empirical evidence for an acceleration of the way Twitter is used on an individual level. This manifests itself in the fact that cohorts of Twitter users behave differently depending on when they joined the platform. Behaviour within a cohort is relatively consistent over time and characterised by strong internal interactions, but over time behaviour from cohort to cohort shifts towards increased activity. Specifically, we measure this in terms of more tweets per user over time, denser interactions with others via retweets, and shorter content horizons, expressed as an individual's decaying autocorrelation of topics over time. Our observations are explained by a growing proportion of active users who not only tweet more actively but also elicit more retweets. These behaviours suggest a collective contribution to an increased flow of information through each cohort's news feed -- an increase that potentially depletes available collective attention over time. Our findings complement recent, empirical work on social acceleration, which has been largely agnostic about individual user activity. △ Less

Submitted 19 August, 2021; originally announced August 2021.

arXiv:2107.10835 [pdf, other]

Recovering lost and absent information in temporal networks

Authors: James P. Bagrow, Sune Lehmann

Abstract: The full range of activity in a temporal network is captured in its edge activity data -- time series encoding the tie strengths or on-off dynamics of each edge in the network. However, in many practical applications, edge-level data are unavailable, and the network analyses must rely instead on node activity data which aggregates the edge-activity data and thus is less informative. This raises th… ▽ More The full range of activity in a temporal network is captured in its edge activity data -- time series encoding the tie strengths or on-off dynamics of each edge in the network. However, in many practical applications, edge-level data are unavailable, and the network analyses must rely instead on node activity data which aggregates the edge-activity data and thus is less informative. This raises the question: Is it possible to use the static network to recover the richer edge activities from the node activities? Here we show that recovery is possible, often with a surprising degree of accuracy given how much information is lost, and that the recovered data are useful for subsequent network analysis tasks. Recovery is more difficult when network density increases, either topologically or dynamically, but exploiting dynamical and topological sparsity enables effective solutions to the recovery problem. We formally characterize the difficulty of the recovery problem both theoretically and empirically, proving the conditions under which recovery errors can be bounded and showing that, even when these conditions are not met, good quality solutions can still be derived. Effective recovery carries both promise and peril, as it enables deeper scientific study of complex systems but in the context of social systems also raises privacy concerns when social information can be aggregated across multiple data sources. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 19 pages, 5 figures, 1 table, plus supporting information

arXiv:2011.07161 [pdf]

Ambient heat and human sleep

Authors: Kelton Minor, Andreas Bjerre-Nielsen, Sigga Svala Jonasdottir, Sune Lehmann, Nick Obradovich

Abstract: Ambient temperatures are rising globally, with the greatest increases recorded at night. Concurrently, the prevalence of insufficient sleep is increasing in many populations, with substantial costs to human health and well-being. Even though nearly a third of the human lifespan is spent asleep, it remains unknown whether temperature and weather impact objective measures of sleep in real-world sett… ▽ More Ambient temperatures are rising globally, with the greatest increases recorded at night. Concurrently, the prevalence of insufficient sleep is increasing in many populations, with substantial costs to human health and well-being. Even though nearly a third of the human lifespan is spent asleep, it remains unknown whether temperature and weather impact objective measures of sleep in real-world settings, globally. Here we link billions of sleep measurements from wearable devices comprising over 7 million nighttime sleep records across 68 countries to local daily meteorological data from 2015 to 2017. Rising nighttime temperatures shorten within-person sleep duration primarily through delayed onset, increasing the probability of insufficient sleep. The effect of temperature on sleep loss is substantially larger for residents from lower income countries and older adults, and females are affected more than are males. Nighttime temperature increases inflict the greatest sleep loss during summer and fall months, and we do not find evidence of short-term acclimatization. Coupling historical behavioral measurements with output from climate models, we project that climate change will further erode human sleep, producing substantial geographic inequalities. Our findings have significant implications for adaptation planning and illuminate a pathway through which rising temperatures may globally impact public health. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Comments: 29 pages, 10 figures

ACM Class: J.3; J.4

arXiv:2009.09973 [pdf, other]

Privacy and Uniqueness of Neighborhoods in Social Networks

Authors: Daniele Romanini, Sune Lehmann, Mikko Kivelä

Abstract: The ability to share social network data at the level of individual connections is beneficial to science: not only for reproducing results, but also for researchers who may wish to use it for purposes not foreseen by the data releaser. Sharing such data, however, can lead to serious privacy issues, because individuals could be re-identified, not only based on possible nodes' attributes, but also f… ▽ More The ability to share social network data at the level of individual connections is beneficial to science: not only for reproducing results, but also for researchers who may wish to use it for purposes not foreseen by the data releaser. Sharing such data, however, can lead to serious privacy issues, because individuals could be re-identified, not only based on possible nodes' attributes, but also from the structure of the network around them. The risk associated with re-identification can be measured and it is more serious in some networks than in others. Various optimization algorithms have been proposed to anonymize the network while keeping the number of changes minimal. However, existing algorithms do not provide guarantees on where the changes will be made, making it difficult to quantify their effect on various measures. Using network models and real data, we show that the average degree of networks is a crucial parameter for the severity of re-identification risk from nodes' neighborhoods. Dense networks are more at risk, and, apart from a small band of average degree values, either almost all nodes are re-identifiable or they are all safe. Our results allow researchers to assess the privacy risk based on a small number of network statistics which are available even before the data is collected. As a rule-of-thumb, the privacy risks are high if the average degree is above 10. Guided by these results we propose a simple method based on edge sampling to mitigate the re-identification risk of nodes. Our method can be implemented already at the data collection phase. Its effect on various network measures can be estimated and corrected using sampling theory. These properties are in contrast with previous methods arbitrarily biasing the data. In this sense, our work could help in sharing network data in a statistically tractable way. △ Less

Submitted 21 September, 2020; originally announced September 2020.

Comments: 19 pages, 6 figures, 1 table

arXiv:2009.09914 [pdf, other]

A Non-negative Matrix Factorization Based Method for Quantifying Rhythms of Activity and Sleep and Chronotypes Using Mobile Phone Data

Authors: Talayeh Aledavood, Ilkka Kivimäki, Sune Lehmann, Jari Saramäki

Abstract: Human activities follow daily, weekly, and seasonal rhythms. The emergence of these rhythms is related to physiology and natural cycles as well as social constructs. The human body and biological functions undergo near 24-hour rhythms (circadian rhythms). The frequency of these rhythms is more or less similar across people, but its phase is different. In the chronobiology literature, based on the… ▽ More Human activities follow daily, weekly, and seasonal rhythms. The emergence of these rhythms is related to physiology and natural cycles as well as social constructs. The human body and biological functions undergo near 24-hour rhythms (circadian rhythms). The frequency of these rhythms is more or less similar across people, but its phase is different. In the chronobiology literature, based on the propensity to sleep at different hours of the day, people are categorized into morning-type, evening-type, and intermediate-type groups called \textit{chronotypes}. This typology is typically based on carefully designed questionnaires or manually crafted features drawing on data on timings of people's activity. Here we develop a fully data-driven (unsupervised) method to decompose individual temporal activity patterns into components. This has the advantage of not including any predetermined assumptions about sleep and activity hours, but the results are fully context-dependent and determined by the most prominent features of the activity data. Using a year-long dataset from mobile phone screen usage logs of 400 people, we find four emergent temporal components: morning activity, night activity, evening activity and activity at noon. Individual behavior can be reduced to weights on these four components. We do not observe any clear emergent categories of people based on the weights, but individuals are rather placed on a continuous spectrum according to the timings of their activities. High loads on morning and night components highly correlate with going to bed and waking up times. Our work points towards a data-driven way of categorizing people based on their full daily and weekly rhythms of activity and behavior, rather than focusing mainly on the timing of their sleeping periods. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2008.01884 [pdf, other]

Self-interested behaviour as a social norm

Authors: Kamilla Haworth Buchter, Bjarke Mønsted, Sune Lehmann

Abstract: Language can exert a strong influence on human behaviour. In experimental studies, it is for example well-known that the framing of an experiment or priming at the beginning of an experiment can alter participants' behaviour. However, few studies have been conducted to determine why framing or priming specific words can alter people's behaviour. Here, we show that the behaviour of participants in… ▽ More Language can exert a strong influence on human behaviour. In experimental studies, it is for example well-known that the framing of an experiment or priming at the beginning of an experiment can alter participants' behaviour. However, few studies have been conducted to determine why framing or priming specific words can alter people's behaviour. Here, we show that the behaviour of participants in a game-theoretical experiment is driven mainly by social norms, and that participants' adherence to different social norms is influenced by the exposure to economic terminology. To explore how these terminology-driven changes impact behavior at the system level, we use established frameworks for modeling collective cooperative behaviour. We find that economic terminology induces a behavioural difference which is larger than that caused by financial incentives in the magnitude usually employed in experiments and simulation. These findings place an increased responsibility on scientists and science communicators, as scientific terminology is increasingly communicated to the general population. △ Less

Submitted 4 August, 2020; originally announced August 2020.

Comments: 19 pages + 17 pages SI. 15 figures total

arXiv:2004.13292 [pdf, other]

doi 10.4204/EPTCS.316.10

Synthesizing Strategies for Needle Steering in Gelatin Phantoms

Authors: Antje Rogalla, Sascha Lehmann, Maximilian Neidhardt, Johanna Sprenger, Marcel Bengs, Alexander Schlaefer, Sibylle Schupp

Abstract: In medicine, needles are frequently used to deliver treatments to subsurface targets or to take tissue samples from the inside of an organ. Current clinical practice is to insert needles under image guidance or haptic feedback, although that may involve reinsertions and adjustments since the needle and its interaction with the tissue during insertion cannot be completely controlled. (Automated) ne… ▽ More In medicine, needles are frequently used to deliver treatments to subsurface targets or to take tissue samples from the inside of an organ. Current clinical practice is to insert needles under image guidance or haptic feedback, although that may involve reinsertions and adjustments since the needle and its interaction with the tissue during insertion cannot be completely controlled. (Automated) needle steering could in theory improve the accuracy with which a target is reached and thus reduce surgical traumata especially for minimally invasive procedures, e.g., brachytherapy or biopsy. Yet, flexible needles and needle-tissue interaction are both complex and expensive to model and can often be computed approximatively only. In this paper we propose to employ timed games to navigate flexible needles with a bevel tip to reach a fixed target in tissue. We use a simple non-holonomic model of needle-tissue interaction, which abstracts in particular from the various physical forces involved and appears to be simplistic compared to related models from medical robotics. Based on the model, we synthesize strategies from which we can derive sufficiently precise motion plans to steer the needle in soft tissue. However, applying those strategies in practice, one is faced with the problem of an unpredictable behavior of the needle at the initial insertion point. Our proposal is to implement a preprocessing step to initialize the model based on data from the real system, once the needle is inserted. Taking into account the actual needle tip angle and position, we generate strategies to reach the desired target. We have implemented the model in Uppaal Stratego and evaluated it on steering a flexible needle in gelatin phantoms; gelatin phantoms are commonly used in medical technology to simulate the behavior of soft tissue. The experiments show that strategies can be synthesized for both generated and measured needle motions with a maximum deviation of 1.84mm. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: In Proceedings MARS 2020, arXiv:2004.12403

Journal ref: EPTCS 316, 2020, pp. 261-274

arXiv:2004.05222 [pdf]

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Authors: Mirco Nanni, Gennady Andrienko, Albert-László Barabási, Chiara Boldrini, Francesco Bonchi, Ciro Cattuto, Francesca Chiaromonte, Giovanni Comandé, Marco Conti, Mark Coté, Frank Dignum, Virginia Dignum, Josep Domingo-Ferrer, Paolo Ferragina, Fosca Giannotti, Riccardo Guidotti, Dirk Helbing, Kimmo Kaski, Janos Kertesz, Sune Lehmann, Bruno Lepri, Paul Lukowicz, Stan Matwin, David Megías Jiménez, Anna Monreale , et al. (14 additional authors not shown)

Abstract: The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countri… ▽ More The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society. △ Less

Submitted 16 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Revised text. Additional authors

Journal ref: Transactions on Data Privacy 13(1): 61-66 (2020), http://www.tdp.cat/issues16/abs.a389a20.php

arXiv:2004.02957 [pdf, other]

Gender-specific behavior change following terror attacks

Authors: Jonas S. Juul, Laura Alessandretti, Jesper Dammeyer, Ingo Zettler, Sune Lehmann, Joachim Mathiesen

Abstract: Terrorists use violence in pursuit of political goals. While terror often has severe consequences for victims, it remains an open question how terror attacks affect the general population. We study the behavioral response of citizens of cities affected by $7$ different terror attacks. We compare real-time mobile communication patterns in the first $24$ hours following a terror attack to the corres… ▽ More Terrorists use violence in pursuit of political goals. While terror often has severe consequences for victims, it remains an open question how terror attacks affect the general population. We study the behavioral response of citizens of cities affected by $7$ different terror attacks. We compare real-time mobile communication patterns in the first $24$ hours following a terror attack to the corresponding patterns on days with no terror attack. On ordinary days, the group of female and male participants have different activity patterns. Following a terror attack, however, we observe a significant increase of the gender differences. Knowledge about citizens' behavior response patterns following terror attacks may have important implications for the public response during and after an attack. △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:2003.12347 [pdf]

Mobile phone data and COVID-19: Missing an opportunity?

Authors: Nuria Oliver, Emmanuel Letouzé, Harald Sterly, Sébastien Delataille, Marco De Nadai, Bruno Lepri, Renaud Lambiotte, Richard Benjamins, Ciro Cattuto, Vittoria Colizza, Nicolas de Cordes, Samuel P. Fraiberger, Till Koebe, Sune Lehmann, Juan Murillo, Alex Pentland, Phuong N Pham, Frédéric Pivetta, Albert Ali Salah, Jari Saramäki, Samuel V. Scarpino, Michele Tizzoni, Stefaan Verhulst, Patrick Vinck

Abstract: This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of… ▽ More This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic. △ Less

Submitted 27 March, 2020; originally announced March 2020.

arXiv:1907.09966 [pdf, other]

Fundamental Structures in Dynamic Communication Networks

Authors: Sune Lehmann

Abstract: In this paper I introduce a framework for modeling temporal communication networks and dynamical processes unfolding on such networks. The framework originates from the realization that there is a meaningful division of temporal communication networks into six dynamic classes, where the class of a network is determined by its generating process. In particular, each class is characterized by a fund… ▽ More In this paper I introduce a framework for modeling temporal communication networks and dynamical processes unfolding on such networks. The framework originates from the realization that there is a meaningful division of temporal communication networks into six dynamic classes, where the class of a network is determined by its generating process. In particular, each class is characterized by a fundamental structure: a temporal-topological network motif, which corresponds to the network representation of communication events in that class of network. These fundamental structures constrain network configurations: only certain configurations are possible within a dynamic class. In this way the framework presented here highlights strong constraints on network structures, which simplify analyses and shape network flows. Therefore the fundamental structures hold the potential to impact how we model temporal networks overall. I argue below that networks within the same class can be meaningfully compared, and modeled using similar techniques, but that integrating statistics across networks belonging to separate classes is not meaningful in general. This paper presents a framework for how to analyze networks in general, rather than a particular result of analyzing a particular dataset. I hope, however, that readers interested in modeling temporal networks will find the ideas and discussion useful in spite of the paper's more conceptual nature. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: To appear in Holme and Saramaki (Editors). "Temporal Network Theory". Springer- Nature, New York. 2019

arXiv:1905.12908 [pdf, other]

Algorithmic Detection and Analysis of Vaccine-Denialist Sentiment Clusters in Social Networks

Authors: Bjarke Mønsted, Sune Lehmann

Abstract: Vaccination rates are decreasing in many areas of the world, and outbreaks of preventable diseases tend to follow in areas with particular low rates. Much research has been devoted to improving our understanding of the motivations behind vaccination decisions and the effects of various types of information offered to skeptics, no large-scale study of the structure of online vaccination discourse h… ▽ More Vaccination rates are decreasing in many areas of the world, and outbreaks of preventable diseases tend to follow in areas with particular low rates. Much research has been devoted to improving our understanding of the motivations behind vaccination decisions and the effects of various types of information offered to skeptics, no large-scale study of the structure of online vaccination discourse have been conducted. Here, we offer an approach to quantitatively study the vaccine discourse in an online system, exemplified by Twitter. We use train a deep neural network to predict tweet vaccine sentiments, surpassing state-of-the-art performance, attaining two-class accuracy of $90.4\%$, and a three-class F1 of $0.762$. We identify profiles which consistently produce strongly anti- and pro-vaccine content. We find that strongly anti-vaccine profiles primarily post links to Youtube, and commercial sites that make money on selling alternative health products, representing a conflict of interest. We also visualize the network of repeated mutual interactions of actors in the vaccine discourse and find that it is highly stratified, with an assortativity coefficient of $r = .813$. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: 14 pages, 7 figures

arXiv:1812.10181 [pdf, other]

doi 10.1073/pnas.1800471115

The Chaperone Effect in Scientific Publishing

Authors: Vedran Sekara, Pierre Deville, Sebastian Ahnert, Albert-László Barabási, Roberta Sinatra, Sune Lehmann

Abstract: Experience plays a critical role in crafting high impact scientific work. This is particularly evident in top multidisciplinary journals, where a scientist is unlikely to appear as senior author if they have not previously published within the same journal. Here, we develop a quantitative understanding of author order by quantifying this 'Chaperone Effect', capturing how scientists transition into… ▽ More Experience plays a critical role in crafting high impact scientific work. This is particularly evident in top multidisciplinary journals, where a scientist is unlikely to appear as senior author if they have not previously published within the same journal. Here, we develop a quantitative understanding of author order by quantifying this 'Chaperone Effect', capturing how scientists transition into senior status within a particular publication venue. We illustrate that the chaperone effect has different magnitude for journals in different branches of science, being more pronounced in medical and biological sciences and weaker in natural sciences. Finally, we show that in the case of high-impact venues, the chaperone effect has significant implications, specifically resulting in a higher average impact relative to papers authored by new PIs. Our findings shed light on the role played by experience in publishing within specific scientific journals, on the paths towards acquiring the necessary experience and expertise, and on the skills required to publish in prestigious venues. △ Less

Submitted 25 December, 2018; originally announced December 2018.

Comments: 5 pages, 3 figures

Journal ref: PNAS December 11, 2018 115 (50) 12603-12607

arXiv:1811.03153 [pdf, other]

Offline Behaviors of Online Friends

Authors: Piotr Sapiezynski, Arkadiusz Stopczynski, David Kofoed Wind, Jure Leskovec, Sune Lehmann

Abstract: In this work we analyze traces of mobility and co-location among a group of nearly 1000 closely interacting individuals. We attempt to reconstruct the Facebook friendship graph, Facebook interaction network, as well as call and SMS networks from longitudinal records of person-to-person offline proximity. We find subtle, yet observable behavioral differences between pairs of people who communicate… ▽ More In this work we analyze traces of mobility and co-location among a group of nearly 1000 closely interacting individuals. We attempt to reconstruct the Facebook friendship graph, Facebook interaction network, as well as call and SMS networks from longitudinal records of person-to-person offline proximity. We find subtle, yet observable behavioral differences between pairs of people who communicate using each of the different channels and we show that the signal of friendship is strong enough to stand out from the noise of random and schedule-driven offline interactions between familiar strangers. Our study also provides an overview of methods for link inference based on offline behavior and proposes new features to improve the performance of the prediction task. △ Less

Submitted 8 November, 2018; v1 submitted 7 November, 2018; originally announced November 2018.

arXiv:1801.03962 [pdf, other]

doi 10.1140/epjds/s13688-018-0164-6

Understanding the interplay between social and spatial behaviour

Authors: Laura Alessandretti, Sune Lehmann, Andrea Baronchelli

Abstract: According to personality psychology, personality traits determine many aspects of human behaviour. However, validating this insight in large groups has been challenging so far, due to the scarcity of multi-channel data. Here, we focus on the relationship between mobility and social behaviour by analysing trajectories and mobile phone interactions of $\sim 1,000$ individuals from two high-resolutio… ▽ More According to personality psychology, personality traits determine many aspects of human behaviour. However, validating this insight in large groups has been challenging so far, due to the scarcity of multi-channel data. Here, we focus on the relationship between mobility and social behaviour by analysing trajectories and mobile phone interactions of $\sim 1,000$ individuals from two high-resolution longitudinal datasets. We identify a connection between the way in which individuals explore new resources and exploit known assets in the social and spatial spheres. We show that different individuals balance the exploration-exploitation trade-off in different ways and we explain part of the variability in the data by the big five personality traits. We point out that, in both realms, extraversion correlates with the attitude towards exploration and routine diversity, while neuroticism and openness account for the tendency to evolve routine over long time-scales. We find no evidence for the existence of classes of individuals across the spatio-social domains. Our results bridge the fields of human geography, sociology and personality psychology and can help improve current models of mobility and tie formation. △ Less

Submitted 9 November, 2018; v1 submitted 11 January, 2018; originally announced January 2018.

Journal ref: Alessandretti, L., Lehmann, S. & Baronchelli, A. EPJ Data Sci. (2018) 7: 36

arXiv:1801.02236 [pdf, ps, other]

Spreading in Social Systems: Reflections

Authors: Sune Lehmann, Yong-Yeol Ahn

Abstract: In this final chapter, we consider the state-of-the-art for spreading in social systems and discuss the future of the field. As part of this reflection, we identify a set of key challenges ahead. The challenges include the following questions: how can we improve the quality, quantity, extent, and accessibility of datasets? How can we extract more information from limited datasets? How can we take… ▽ More In this final chapter, we consider the state-of-the-art for spreading in social systems and discuss the future of the field. As part of this reflection, we identify a set of key challenges ahead. The challenges include the following questions: how can we improve the quality, quantity, extent, and accessibility of datasets? How can we extract more information from limited datasets? How can we take individual cognition and decision making processes into account? How can we incorporate other complexity of the real contagion processes? Finally, how can we translate research into positive real-world impact? In the following, we provide more context for each of these open questions. △ Less

Submitted 7 January, 2018; originally announced January 2018.

Comments: 7 pages, chapter to appear in "Spreading Dynamics in Social Systems"; Eds. Sune Lehmann and Yong-Yeol Ahn, Springer Nature

arXiv:1711.07649 [pdf, other]

doi 10.1103/PhysRevE.97.062312

Constrained information flows in temporal networks reveal intermittent communities

Authors: Ulf Aslak, Martin Rosvall, Sune Lehmann

Abstract: Many real-world networks represent dynamic systems with interactions that change over time, often in uncoordinated ways and at irregular intervals. For example, university students connect in intermittent groups that repeatedly form and dissolve based on multiple factors, including their lectures, interests, and friends. Such dynamic systems can be represented as multilayer networks where each lay… ▽ More Many real-world networks represent dynamic systems with interactions that change over time, often in uncoordinated ways and at irregular intervals. For example, university students connect in intermittent groups that repeatedly form and dissolve based on multiple factors, including their lectures, interests, and friends. Such dynamic systems can be represented as multilayer networks where each layer represents a snapshot of the temporal network. In this representation, it is crucial that the links between layers accurately capture real dependencies between those layers. Often, however, these dependencies are unknown. Therefore, current methods connect layers based on simplistic assumptions that do not capture node-level layer dependencies. For example, connecting every node to itself in other layers with the same weight can wipe out dependencies between intermittent groups, making it difficult or even impossible to identify them. In this paper, we present a principled approach to estimating node-level layer dependencies based on the network structure within each layer. We implement our node-level coupling method in the community detection framework Infomap and demonstrate its performance compared to current methods on synthetic and real temporal networks. We show that our approach more effectively constrains information inside multilayer communities so that Infomap can better recover planted groups in multilayer benchmark networks that represent multiple modes with different groups and better identify intermittent communities in real temporal contact networks. These results suggest that node-level layer coupling can improve the modeling of information spreading in temporal networks and better capture intermittent community structure. △ Less

Submitted 24 June, 2018; v1 submitted 21 November, 2017; originally announced November 2017.

Comments: 10 pages, 10 figures, published in PRE

Journal ref: Phys. Rev. E 97, 062312 (2018)

arXiv:1709.06690 [pdf, other]

Social Network Differences of Chronotypes Identified from Mobile Phone Data

Authors: Talayeh Aledavood, Sune Lehmann, Jari Saramäki

Abstract: Human activity follows an approximately 24-hour day-night cycle, but there is significant individual variation in awake and sleep times. Individuals with circadian rhythms at the extremes can be categorized into two chronotypes: "larks", those who wake up and go to sleep early, and "owls", those who stay up and wake up late. It is well established that a person's chronotype can affect their activi… ▽ More Human activity follows an approximately 24-hour day-night cycle, but there is significant individual variation in awake and sleep times. Individuals with circadian rhythms at the extremes can be categorized into two chronotypes: "larks", those who wake up and go to sleep early, and "owls", those who stay up and wake up late. It is well established that a person's chronotype can affect their activities and health. However, less is known on the effects of chronotypes on the social behavior, even though it is evident that social interactions require coordinated timings. To study how chronotypes relate to social behavior, we use data collected using a smartphone app on a population of more than seven hundred volunteer students to simultaneously determine their chronotypes and social network structure. We find that owls maintain larger personal networks, albeit with less time spent per contact. On average, owls are more central in the social network of students than larks, frequently occupying the dense core of the network. Owls also display strong homophily, as seen in an unexpectedly large number of social ties connecting owls to owls. △ Less

Submitted 19 September, 2017; originally announced September 2017.

arXiv:1708.00524 [pdf, other]

doi 10.18653/v1/D17-1169

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Authors: Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, Sune Lehmann

Abstract: NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a… ▽ More NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches. △ Less

Submitted 7 October, 2017; v1 submitted 1 August, 2017; originally announced August 2017.

Comments: Accepted at EMNLP 2017. Please include EMNLP in any citations. Minor changes from the EMNLP camera-ready version. 9 pages + references and supplementary material

Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

arXiv:1706.09245 [pdf, other]

Academic Performance and Behavioral Patterns

Authors: Valentin Kassarnig, Enys Mones, Andreas Bjerre-Nielsen, Piotr Sapiezynski, David Dreyer Lassen, Sune Lehmann

Abstract: Identifying the factors that influence academic performance is an essential part of educational research. Previous studies have documented the importance of personality traits, class attendance, and social network structure. Because most of these analyses were based on a single behavioral aspect and/or small sample sizes, there is currently no quantification of the interplay of these factors. Here… ▽ More Identifying the factors that influence academic performance is an essential part of educational research. Previous studies have documented the importance of personality traits, class attendance, and social network structure. Because most of these analyses were based on a single behavioral aspect and/or small sample sizes, there is currently no quantification of the interplay of these factors. Here, we study the academic performance among a cohort of 538 undergraduate students forming a single, densely connected social network. Our work is based on data collected using smartphones, which the students used as their primary phones for two years. The availability of multi-channel data from a single population allows us to directly compare the explanatory power of individual and social characteristics. We find that the most informative indicators of performance are based on social ties and that network indicators result in better model performance than individual characteristics (including both personality and class attendance). We confirm earlier findings that class attendance is the most important predictor among individual characteristics. Finally, our results suggest the presence of strong homophily and/or peer effects among university students. △ Less

Submitted 9 April, 2018; v1 submitted 21 June, 2017; originally announced June 2017.

arXiv:1706.05100 [pdf, other]

doi 10.1371/journal.pone.0189873

The Role of Gender in Social Network Organization

Authors: Ioanna Psylla, Piotr Sapiezynski, Enys Mones, Sune Lehmann

Abstract: The digital traces we leave behind when engaging with the modern world offer an interesting lens through which we study behavioral patterns as expression of gender. Although gender differentiation has been observed in a number of settings, the majority of studies focus on a single data stream in isolation. Here we use a dataset of high resolution data collected using mobile phones, as well as deta… ▽ More The digital traces we leave behind when engaging with the modern world offer an interesting lens through which we study behavioral patterns as expression of gender. Although gender differentiation has been observed in a number of settings, the majority of studies focus on a single data stream in isolation. Here we use a dataset of high resolution data collected using mobile phones, as well as detailed questionnaires, to study gender differences in a large cohort. We consider mobility behavior and individual personality traits among a group of more than $800$ university students. We also investigate interactions among them expressed via person-to-person contacts, interactions on online social networks, and telecommunication. Thus, we are able to study the differences between male and female behavior captured through a multitude of channels for a single cohort. We find that while the two genders are similar in a number of aspects, there are robust deviations that include multiple facets of social interactions, suggesting the existence of inherent behavioral differences. Finally, we quantify how aspects of an individual's characteristics and social behavior reveals their gender by posing it as a classification problem. We ask: How well can we distinguish between male and female study participants based on behavior alone? Which behavioral features are most predictive? △ Less

Submitted 15 June, 2017; originally announced June 2017.

arXiv:1705.01723 [pdf, other]

Exact VC-dimension for $L_1$-visibility of points in simple polygons

Authors: Elmar Langetepe, Simone Lehmann

Abstract: The VC-dimension plays an important role for the algorithmic problem of guarding art galleries efficiently. We prove that inside a simple polygon at most $5$ points can be shattered by $L_1$-visibility polygons and give an example where 5 points are shattered. The VC-dimension is exactly $5$. The proof idea for the upper bound is different from previous approaches. Keywords: Art gallery, VC-dime… ▽ More The VC-dimension plays an important role for the algorithmic problem of guarding art galleries efficiently. We prove that inside a simple polygon at most $5$ points can be shattered by $L_1$-visibility polygons and give an example where 5 points are shattered. The VC-dimension is exactly $5$. The proof idea for the upper bound is different from previous approaches. Keywords: Art gallery, VC-dimension, $L_1$-visibility, polygons △ Less

Submitted 4 May, 2017; originally announced May 2017.

arXiv:1703.06027 [pdf, other]

doi 10.1371/journal.pone.0184148

Evidence of Complex Contagion of Information in Social Media: An Experiment Using Twitter Bots

Authors: Bjarke Mønsted, Piotr Sapieżyński, Emilio Ferrara, Sune Lehmann

Abstract: It has recently become possible to study the dynamics of information diffusion in techno-social systems at scale, due to the emergence of online platforms, such as Twitter, with millions of users. One question that systematically recurs is whether information spreads according to simple or complex dynamics: does each exposure to a piece of information have an independent probability of a user adop… ▽ More It has recently become possible to study the dynamics of information diffusion in techno-social systems at scale, due to the emergence of online platforms, such as Twitter, with millions of users. One question that systematically recurs is whether information spreads according to simple or complex dynamics: does each exposure to a piece of information have an independent probability of a user adopting it (simple contagion), or does this probability depend instead on the number of sources of exposure, increasing above some threshold (complex contagion)? Most studies to date are observational and, therefore, unable to disentangle the effects of confounding factors such as social reinforcement, homophily, limited attention, or network community structure. Here we describe a novel controlled experiment that we performed on Twitter using `social bots' deployed to carry out coordinated attempts at spreading information. We propose two Bayesian statistical models describing simple and complex contagion dynamics, and test the competing hypotheses. We provide experimental evidence that the complex contagion model describes the observed information diffusion behavior more accurately than simple contagion. Future applications of our results include more effective defenses against malicious propaganda campaigns on social media, improved marketing and advertisement strategies, and design of effective network intervention techniques. △ Less

Submitted 17 March, 2017; originally announced March 2017.

Comments: 10 pages + 4 pages of supplementary information. 4+1 figures

arXiv:1702.01262 [pdf, other]

doi 10.1371/journal.pone.0187078

Class attendance, peer similarity, and academic performance in a large field study

Authors: Valentin Kassarnig, Andreas Bjerre-Nielsen, Enys Mones, Sune Lehmann, David Dreyer Lassen

Abstract: Identifying the factors that determine academic performance is an essential part of educational research. Existing research indicates that class attendance is a useful predictor of subsequent course achievements. The majority of the literature is, however, based on surveys and self-reports, methods which have well-known systematic biases that lead to limitations on conclusions and generalizability… ▽ More Identifying the factors that determine academic performance is an essential part of educational research. Existing research indicates that class attendance is a useful predictor of subsequent course achievements. The majority of the literature is, however, based on surveys and self-reports, methods which have well-known systematic biases that lead to limitations on conclusions and generalizability as well as being costly to implement. Here we propose a novel method for measuring class attendance that overcomes these limitations by using location and bluetooth data collected from smartphone sensors. Based on measured attendance data of nearly 1,000 undergraduate students, we demonstrate that early and consistent class attendance strongly correlates with academic performance. In addition, our novel dataset allows us to determine that attendance among social peers was substantially correlated ($>$0.5), suggesting either an important peer effect or homophily with respect to attendance. △ Less

Submitted 9 April, 2018; v1 submitted 4 February, 2017; originally announced February 2017.

arXiv:1611.08262 [pdf, other]

doi 10.1371/journal.pone.0188973

Correlations Between Human Mobility and Social Interaction Reveal General Activity Patterns

Authors: Anders Mollgaard, Sune Lehmann, Joachim Mathiesen

Abstract: A day in the life of a person involves a broad range of activities which are common across many people. Going beyond diurnal cycles, a central question is: to what extent do individuals act according to patterns shared across an entire population? Here we investigate the interplay between different activity types, namely communication, motion, and physical proximity by analyzing data collected fro… ▽ More A day in the life of a person involves a broad range of activities which are common across many people. Going beyond diurnal cycles, a central question is: to what extent do individuals act according to patterns shared across an entire population? Here we investigate the interplay between different activity types, namely communication, motion, and physical proximity by analyzing data collected from smartphones distributed among 638 individuals. We explore two central questions: Which underlying principles govern the formation of the activity patterns? Are the patterns specific to each individual or shared across the entire population? We find that statistics of the entire population allows us to successfully predict 71\% of the activity and 85\% of the inactivity involved in communication, mobility, and physical proximity. Surprisingly, individual level statistics only result in marginally better predictions, indicating that a majority of activity patterns are shared across {our sample population}. Finally, we predict short-term activity patterns using a generalized linear model, which suggests that a simple linear description might be sufficient to explain a wide range of actions, whether they be of social or of physical character. △ Less

Submitted 29 November, 2017; v1 submitted 24 November, 2016; originally announced November 2016.

Journal ref: PLoS ONE 12(12): e0188973 (2017)

arXiv:1611.04061 [pdf, other]

Contact activity and dynamics of the online elite

Authors: Enys Mones, Arkadiusz Stopczynski, Sune Lehmann

Abstract: Humans interact through numerous channels to build and maintain social connections: they meet face-to-face, initiate phone calls or send text messages, and interact via social media. Although it is known that the network of physical contacts, for example, is distinct from the network arising from communication events via phone calls and instant messages, the extent to which these networks differ i… ▽ More Humans interact through numerous channels to build and maintain social connections: they meet face-to-face, initiate phone calls or send text messages, and interact via social media. Although it is known that the network of physical contacts, for example, is distinct from the network arising from communication events via phone calls and instant messages, the extent to which these networks differ is not clear. In fact, the network structure of these channels shows large structural variations. Each network of interactions, however, contains both central and peripheral individuals: central members are characterized by higher connectivity and can reach a high fraction of the network within a low number of connections, contrary to the nodes on the periphery. Here we show that the various channels account for diverse relationships between pairs of individuals and the corresponding interaction patterns across channels differ to an extent that hinders the simple reduction of social ties to a single layer. Furthemore, the origin and purpose of each network also determine the role of their respective central members: highly connected individuals in the person-to-person networks interact with their environment in a regular manner, while members central in the social communication networks display irregular behavior with respect to their physical contacts and are more active through rare, social events. These results suggest that due to the inherently different functions of communication channels, each one favors different social behaviors and different strategies for interacting with the environment. Our findings can facilitate the understanding of the varying roles and impact individuals have on the population, which can further shed light on the prediction and prevention of epidemic outbreaks, or information propagation. △ Less

Submitted 12 November, 2016; originally announced November 2016.

arXiv:1610.04730 [pdf, other]

Inferring Person-to-person Proximity Using WiFi Signals

Authors: Piotr Sapiezynski, Arkadiusz Stopczynski, David Kofoed Wind, Jure Leskovec, Sune Lehmann

Abstract: Today's societies are enveloped in an ever-growing telecommunication infrastructure. This infrastructure offers important opportunities for sensing and recording a multitude of human behaviors. Human mobility patterns are a prominent example of such a behavior which has been studied based on cell phone towers, Bluetooth beacons, and WiFi networks as proxies for location. However, while mobility is… ▽ More Today's societies are enveloped in an ever-growing telecommunication infrastructure. This infrastructure offers important opportunities for sensing and recording a multitude of human behaviors. Human mobility patterns are a prominent example of such a behavior which has been studied based on cell phone towers, Bluetooth beacons, and WiFi networks as proxies for location. However, while mobility is an important aspect of human behavior, understanding complex social systems requires studying not only the movement of individuals, but also their interactions. Sensing social interactions on a large scale is a technical challenge and many commonly used approaches---including RFID badges or Bluetooth scanning---offer only limited scalability. Here we show that it is possible, in a scalable and robust way, to accurately infer person-to-person physical proximity from the lists of WiFi access points measured by smartphones carried by the two individuals. Based on a longitudinal dataset of approximately 800 participants with ground-truth interactions collected over a year, we show that our model performs better than the current state-of-the-art. Our results demonstrate the value of WiFi signals in social sensing as well as potential threats to privacy that they imply. △ Less

Submitted 15 October, 2016; originally announced October 2016.

arXiv:1609.03526 [pdf, other]

Evidence for a Conserved Quantity in Human Mobility

Authors: Laura Alessandretti, Piotr Sapiezynski, Vedran Sekara, Sune Lehmann, Andrea Baronchelli

Abstract: Recent seminal works on human mobility have shown that individuals constantly exploit a small set of repeatedly visited locations. A concurrent literature has emphasized the explorative nature of human behavior, showing that the number of visited places grows steadily over time. How to reconcile these seemingly contradicting facts remains an open question. Here, we analyze high-resolution multi-ye… ▽ More Recent seminal works on human mobility have shown that individuals constantly exploit a small set of repeatedly visited locations. A concurrent literature has emphasized the explorative nature of human behavior, showing that the number of visited places grows steadily over time. How to reconcile these seemingly contradicting facts remains an open question. Here, we analyze high-resolution multi-year traces of $\sim$40,000 individuals from 4 datasets and show that this tension vanishes when the long-term evolution of mobility patterns is considered. We reveal that mobility patterns evolve significantly yet smoothly, and that the number of familiar locations an individual visits at any point is a conserved quantity with a typical size of $\sim$25 locations. We use this finding to improve state-of-the-art modeling of human mobility. Furthermore, shifting the attention from aggregated quantities to individual behavior, we show that the size of an individual's set of preferred locations correlates with the number of her social interactions. This result suggests a connection between the conserved quantity we identify, which as we show can not be understood purely on the basis of time constraints, and the `Dunbar number' describing a cognitive upper limit to an individual's number of social relations. We anticipate that our work will spark further research linking the study of Human Mobility and the Cognitive and Behavioral Sciences. △ Less

Submitted 19 June, 2018; v1 submitted 12 September, 2016; originally announced September 2016.

arXiv:1608.06108 [pdf, ps, other]

doi 10.1371/journal.pone.0169901

SensibleSleep: A Bayesian Model for Learning Sleep Patterns from Smartphone Events

Authors: Andrea Cuttone, Per Bækgaard, Vedran Sekara, Håkan Jonsson, Jakob Eg Larsen, Sune Lehmann

Abstract: We propose a Bayesian model for extracting sleep patterns from smartphone events. Our method is able to identify individuals' daily sleep periods and their evolution over time, and provides an estimation of the probability of sleep and wake transitions. The model is fitted to more than 400 participants from two different datasets, and we verify the results against ground truth from dedicated armba… ▽ More We propose a Bayesian model for extracting sleep patterns from smartphone events. Our method is able to identify individuals' daily sleep periods and their evolution over time, and provides an estimation of the probability of sleep and wake transitions. The model is fitted to more than 400 participants from two different datasets, and we verify the results against ground truth from dedicated armband sleep trackers. We show that the model is able to produce reliable sleep estimates with an accuracy of 0.89, both at the individual and at the collective level. Moreover the Bayesian model is able to quantify uncertainty and encode prior knowledge about sleep patterns. Compared with existing smartphone-based systems, our method requires only screen on/off events, and is therefore much less intrusive in terms of privacy and more battery-efficient. △ Less

Submitted 22 August, 2016; originally announced August 2016.

arXiv:1608.01939 [pdf, other]

Understanding Predictability and Exploration in Human Mobility

Authors: Andrea Cuttone, Sune Lehmann, Marta C. González

Abstract: Predictive models for human mobility have important applications in many fields such as traffic control, ubiquitous computing and contextual advertisement. The predictive performance of models in literature varies quite broadly, from as high as 93% to as low as under 40%. In this work we investigate which factors influence the accuracy of next-place prediction, using a high-precision location data… ▽ More Predictive models for human mobility have important applications in many fields such as traffic control, ubiquitous computing and contextual advertisement. The predictive performance of models in literature varies quite broadly, from as high as 93% to as low as under 40%. In this work we investigate which factors influence the accuracy of next-place prediction, using a high-precision location dataset of more than 400 users for periods between 3 months and one year. We show that it is easier to achieve high accuracy when predicting the time-bin location than when predicting the next place. Moreover we demonstrate how the temporal and spatial resolution of the data can have strong influence on the accuracy of prediction. Finally we uncover that the exploration of new locations is an important factor in human mobility, and we measure that on average 20-25% of transitions are to new places, and approx. 70% of locations are visited only once. We discuss how these mechanisms are important factors limiting our ability to predict human mobility. △ Less

Submitted 5 August, 2016; originally announced August 2016.

arXiv:1608.01933 [pdf, other]

Geoplotlib: a Python Toolbox for Visualizing Geographical Data

Authors: Andrea Cuttone, Sune Lehmann, Jakob Eg Larsen

Abstract: We introduce geoplotlib, an open-source python toolbox for visualizing geographical data. geoplotlib supports the development of hardware-accelerated interactive visualizations in pure python, and provides implementations of dot maps, kernel density estimation, spatial graphs, Voronoi tesselation, shapefiles and many more common spatial visualizations. We describe geoplotlib design, functionalitie… ▽ More We introduce geoplotlib, an open-source python toolbox for visualizing geographical data. geoplotlib supports the development of hardware-accelerated interactive visualizations in pure python, and provides implementations of dot maps, kernel density estimation, spatial graphs, Voronoi tesselation, shapefiles and many more common spatial visualizations. We describe geoplotlib design, functionalities and use cases. △ Less

Submitted 5 August, 2016; originally announced August 2016.

arXiv:1608.01870 [pdf, ps, other]

Who Wants to Self-Track Anyway? Measuring the Relation between Self-Tracking Behavior and Personality Traits

Authors: Georgios Chatzigeorgakidis, Andrea Cuttone, Sune Lehmann, Jakob Eg Larsen

Abstract: We describe an empirical study of the usage of a mobility self-tracking app, SensibleJournal 2014, which provides personal mobility information to N=796 participants as part of a large mobile sensing study. Specifically, we report on the app design, as well as deployment, uptake and usage of the app. The latter analysis is based on logging of user interactions as well as answers gathered from a qu… ▽ More We describe an empirical study of the usage of a mobility self-tracking app, SensibleJournal 2014, which provides personal mobility information to N=796 participants as part of a large mobile sensing study. Specifically, we report on the app design, as well as deployment, uptake and usage of the app. The latter analysis is based on logging of user interactions as well as answers gathered from a questionnaire provided to the participants. During the study enrollment process, participants were asked to fill out a questionnaire including a Big Five inventory and Narcissism NAR-Q personality tests. A comparison of personality traits was conducted to understand potential differences among the users and non-users of the app. We found a relation between self-tracking and conscientiousness, but contrary to the view in popular media, we found no relation between self-tracking behavior and narcissism. △ Less

Submitted 5 August, 2016; originally announced August 2016.

Comments: 14 pages, 5 figures, submitted to PLoS ONE

Showing 1–50 of 65 results for author: Lehmann, S