-
A New Data Integration Framework for Covid-19 Social Media Information
Authors:
Lauren Ansell,
Luciana Dalla Valle
Abstract:
The Covid-19 pandemic presents a serious threat to people health, resulting in over 250 million confirmed cases and over 5 million deaths globally. To reduce the burden on national health care systems and to mitigate the effects of the outbreak, accurate modelling and forecasting methods for short- and long-term health demand are needed to inform government interventions aiming at curbing the pand…
▽ More
The Covid-19 pandemic presents a serious threat to people health, resulting in over 250 million confirmed cases and over 5 million deaths globally. To reduce the burden on national health care systems and to mitigate the effects of the outbreak, accurate modelling and forecasting methods for short- and long-term health demand are needed to inform government interventions aiming at curbing the pandemic. Current research on Covid-19 is typically based on a single source of information, specifically on structured historical pandemic data. Other studies are exclusively focused on unstructured online retrieved insights, such as data available from social media. However, the combined use of structured and unstructured information is still uncharted. This paper aims at filling this gap, by leveraging historical and social media information with a novel data integration methodology. The proposed approach is based on vine copulas, which allow us to exploit the dependencies between different sources of information. We apply the methodology to combine structured datasets retrieved from official sources and a big unstructured dataset of information collected from social media. The results show that the combined use of official and online generated information contributes to yield a more accurate assessment of the evolution of the Covid-19 pandemic, compared to the sole use of official data.
△ Less
Submitted 12 April, 2023; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Bayesian Nonparametric Modelling of Conditional Multidimensional Dependence Structures
Authors:
Rosario Barone,
Luciana Dalla Valle
Abstract:
In recent years, conditional copulas, that allow dependence between variables to vary according to the values of one or more covariates, have attracted increasing attention. In high dimension, vine copulas offer greater flexibility compared to multivariate copulas, since they are constructed using bivariate copulas as building blocks. In this paper we present a novel inferential approach for multi…
▽ More
In recent years, conditional copulas, that allow dependence between variables to vary according to the values of one or more covariates, have attracted increasing attention. In high dimension, vine copulas offer greater flexibility compared to multivariate copulas, since they are constructed using bivariate copulas as building blocks. In this paper we present a novel inferential approach for multivariate distributions, which combines the flexibility of vine constructions with the advantages of Bayesian nonparametrics, not requiring the specification of parametric families for each pair copula. Expressing multivariate copulas using vines allows us to easily account for covariate specifications driving the dependence between response variables. More precisely, we specify the vine copula density as an infinite mixture of Gaussian copulas, defining a Dirichlet process (DP) prior on the mixing measure, and we perform posterior inference via Markov chain Monte Carlo (MCMC) sampling. Our approach is successful as for clustering as well as for density estimation. We carry out intensive simulation studies and apply the proposed approach to investigate the impact of natural disasters on financial development. Our results show that the methodology is able to capture the heterogeneity in the dataset and to reveal different behaviours of different country clusters in relation to natural disasters.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Social Media Integration of Flood Data: A Vine Copula-Based Approach
Authors:
Lauren Ansell,
Luciana Dalla Valle
Abstract:
Floods are the most common and among the most severe natural disasters in many countries around the world. As global warming continues to exacerbate sea level rise and extreme weather, governmental authorities and environmental agencies are facing the pressing need of timely and accurate evaluations and predictions of flood risks. Current flood forecasts are generally based on historical measureme…
▽ More
Floods are the most common and among the most severe natural disasters in many countries around the world. As global warming continues to exacerbate sea level rise and extreme weather, governmental authorities and environmental agencies are facing the pressing need of timely and accurate evaluations and predictions of flood risks. Current flood forecasts are generally based on historical measurements of environmental variables at monitoring stations. In recent years, in addition to traditional data sources, large amounts of information related to floods have been made available via social media. Members of the public are constantly and promptly posting information and updates on local environmental phenomena on social media platforms. Despite the growing interest of scholars towards the usage of online data during natural disasters, the majority of studies focus exclusively on social media as a stand-alone data source, while its joint use with other type of information is still unexplored. In this paper we propose to fill this gap by integrating traditional historical information on floods with data extracted by Twitter and Google Trends. Our methodology is based on vine copulas, that allow us to capture the dependence structure among the marginals, which are modelled via appropriate time series methods, in a very flexible way. We apply our methodology to data related to three different coastal locations on the South coast of the United Kingdom (UK). The results show that our approach, based on the integration of social media data, outperforms traditional methods in terms of evaluation and prediction of flood events.
△ Less
Submitted 5 October, 2021; v1 submitted 5 April, 2021;
originally announced April 2021.
-
Approximate Bayesian Conditional Copulas
Authors:
Clara Grazian,
Luciana Dalla Valle,
Brunero Liseo
Abstract:
Copula models are flexible tools to represent complex structures of dependence for multivariate random variables. According to Sklar's theorem (Sklar, 1959), any d-dimensional absolutely continuous density can be uniquely represented as the product of the marginal distributions and a copula function which captures the dependence structure among the vector components. In real data applications, the…
▽ More
Copula models are flexible tools to represent complex structures of dependence for multivariate random variables. According to Sklar's theorem (Sklar, 1959), any d-dimensional absolutely continuous density can be uniquely represented as the product of the marginal distributions and a copula function which captures the dependence structure among the vector components. In real data applications, the interest of the analyses often lies on specific functionals of the dependence, which quantify aspects of it in a few numerical values. A broad literature exists on such functionals, however extensions to include covariates are still limited. This is mainly due to the lack of unbiased estimators of the copula function, especially when one does not have enough information to select the copula model. Recent advances in computational methodologies and algorithms have allowed inference in the presence of complicated likelihood functions, especially in the Bayesian approach, whose methods, despite being computationally intensive, allow us to better evaluate the uncertainty of the estimates. In this work, we present several Bayesian methods to approximate the posterior distribution of functionals of the dependence, using nonparametric models which avoid the selection of the copula function. These methods are compared in simulation studies and in two realistic applications, from civil engineering and astrophysics.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Bayesian Multivariate Nonlinear State Space Copula Models
Authors:
Alexander Kreuzer,
Luciana Dalla Valle,
Claudia Czado
Abstract:
In this paper we propose a flexible class of multivariate nonlinear non-Gaussian state space models, based on copulas. More precisely, we assume that the observation equation and the state equation are defined by copula families that are not necessarily equal. For each time point, the resulting model can be described by a C-vine copula truncated after the first tree, where the root node is represe…
▽ More
In this paper we propose a flexible class of multivariate nonlinear non-Gaussian state space models, based on copulas. More precisely, we assume that the observation equation and the state equation are defined by copula families that are not necessarily equal. For each time point, the resulting model can be described by a C-vine copula truncated after the first tree, where the root node is represented by the latent state. Inference is performed within the Bayesian framework, using the Hamiltonian Monte Carlo method, where a further D-vine truncated after the first tree is used as prior distribution to capture the temporal dependence in the latent states. Simulation studies show that the proposed copula-based approach is extremely flexible, since it is able to describe a wide range of dependence structures and, at the same time, allows us to deal with missing data. The application to atmospheric pollutant measurement data shows that our approach is suitable for accurate modeling and prediction of data dynamics in the presence of missing values. Comparison to a Gaussian linear state space model and to Bayesian additive regression trees shows the superior performance of the proposed model with respect to predictive accuracy.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
A Pólya-Gamma Sampler for a Generalized Logistic Regression
Authors:
Luciana Dalla Valle,
Fabrizio Leisen,
Luca Rossini,
Weixuan Zhu
Abstract:
In this paper we introduce a novel Bayesian data augmentation approach for estimating the parameters of the generalised logistic regression model. We propose a Pólya-Gamma sampler algorithm that allows us to sample from the exact posterior distribution, rather than relying on approximations. A simulation study illustrates the flexibility and accuracy of the proposed approach to capture heavy and l…
▽ More
In this paper we introduce a novel Bayesian data augmentation approach for estimating the parameters of the generalised logistic regression model. We propose a Pólya-Gamma sampler algorithm that allows us to sample from the exact posterior distribution, rather than relying on approximations. A simulation study illustrates the flexibility and accuracy of the proposed approach to capture heavy and light tails in binary response data of different dimensions. The methodology is applied to two different real datasets, where we demonstrate that the Pólya-Gamma sampler provides more precise estimates than the empirical likelihood method, outperforming approximate approaches.
△ Less
Submitted 21 December, 2020; v1 submitted 6 September, 2019;
originally announced September 2019.
-
A Bayesian Non-linear State Space Copula Model to Predict Air Pollution in Beijing
Authors:
Alexander Kreuzer,
Luciana Dalla Valle,
Claudia Czado
Abstract:
Air pollution is a serious issue that currently affects many industrial cities in the world and can cause severe illness to the population. In particular, it has been proven that extreme high levels of airborne contaminants have dangerous short-term effects on human health, in terms of increased hospital admissions for cardiovascular and respiratory diseases and increased mortality risk. For these…
▽ More
Air pollution is a serious issue that currently affects many industrial cities in the world and can cause severe illness to the population. In particular, it has been proven that extreme high levels of airborne contaminants have dangerous short-term effects on human health, in terms of increased hospital admissions for cardiovascular and respiratory diseases and increased mortality risk. For these reasons, accurate estimation and prediction of airborne pollutant concentration is crucial. In this paper, we propose a flexible novel approach to model hourly measurements of fine particulate matter and meteorological data collected in Beijing in 2014. We show that the standard state space model, based on Gaussian assumptions, does not correctly capture the time dynamics of the observations. Therefore, we propose a non-linear non-Gaussian state space model where both the observation and the state equations are defined by copula specifications, and we perform Bayesian inference using the Hamiltonian Monte Carlo method. The proposed copula state space approach is very flexible, since it allows us to separately model the marginals and to accommodate a wide variety of dependence structures in the data dynamics. We show that the proposed approach allows us not only to predict particulate matter measurements, but also to investigate the effects of user specified climate scenarios.
△ Less
Submitted 11 November, 2019; v1 submitted 20 March, 2019;
originally announced March 2019.
-
Bayesian Nonparametric Conditional Copula Estimation of Twin Data
Authors:
Luciana Dalla Valle,
Fabrizio Leisen,
Luca Rossini
Abstract:
Several studies on heritability in twins aim at understanding the different contribution of environmental and genetic factors to specific traits. Considering the National Merit Twin Study, our purpose is to correctly analyse the influence of the socioeconomic status on the relationship between twins' cognitive abilities. Our methodology is based on conditional copulas, which allow us to model the…
▽ More
Several studies on heritability in twins aim at understanding the different contribution of environmental and genetic factors to specific traits. Considering the National Merit Twin Study, our purpose is to correctly analyse the influence of the socioeconomic status on the relationship between twins' cognitive abilities. Our methodology is based on conditional copulas, which allow us to model the effect of a covariate driving the strength of dependence between the main variables. We propose a flexible Bayesian nonparametric approach for the estimation of conditional copulas, which can model any conditional copula density. Our methodology extends the work of Wu et al (2015) by introducing dependence from a covariate in an infinite mixture model. Our results suggest that environmental factors are more influential in families with lower socio-economic position.
△ Less
Submitted 3 July, 2017; v1 submitted 10 March, 2016;
originally announced March 2016.
-
Bayesian Model Selection for Beta Autoregressive Processes
Authors:
R. Casarin,
L. Dalla Valle,
F. Leisen
Abstract:
We deal with Bayesian inference for Beta autoregressive processes. We restrict our attention to the class of conditionally linear processes. These processes are particularly suitable for forecasting purposes, but are difficult to estimate due to the constraints on the parameter space. We provide a full Bayesian approach to the estimation and include the parameter restrictions in the inference prob…
▽ More
We deal with Bayesian inference for Beta autoregressive processes. We restrict our attention to the class of conditionally linear processes. These processes are particularly suitable for forecasting purposes, but are difficult to estimate due to the constraints on the parameter space. We provide a full Bayesian approach to the estimation and include the parameter restrictions in the inference problem by a suitable specification of the prior distributions. Moreover in a Bayesian framework parameter estimation and model choice can be solved simultaneously. In particular we suggest a Markov-Chain Monte Carlo (MCMC) procedure based on a Metropolis-Hastings within Gibbs algorithm and solve the model selection problem following a reversible jump MCMC approach.
△ Less
Submitted 31 July, 2010;
originally announced August 2010.