Canonical Correlation Analysis - An Application to Weather Conditions Data

Gyasi-Agyei AK, Mintah AE, Abraham AY and Mustapha AM

Published on: 2025-05-01

Abstract

This paper aims to evaluate the degree of correlation between Ghana's heating and cooling factors, spanning the period from January 2000 to December 2024. By identifying the linear combinations of the two sets of data with the highest correlation, the study attempts to determine the correlations that currently exist between the two sets of variables. It uses a type of multivariate analysis called canonical correlation analysis, a widely used covariance analysis method that can be effectively partitioned into two subsets of response and predictor variables. The first canonical variate had canonical correlation coefficient of 0.658 which is statistically significant with F (9, 716) = 25.246 at alpha level of 0.05, followed by the second canonical correlation coefficient of 0.277 with F (4, 590) = 7.926, and the third canonical correlation coefficient of 0.157 with F (1, 296) = 7.473. The results rejected the null hypothesis and showed that the three canonical correlation coefficients are not equal to zero. The first canonical root is used to show that there is highly positive correlation between the heating variables and the cooling variables of weather conditions. Policy recommendations include adopting climate-resilient agriculture, expanding renewable energy, integrating green infrastructure in urban planning, enhancing meteorological systems, and promoting public awareness to support climate adaptation and sustainable development in Ghana.

Keywords

Canonical correlation coefficient; Cooling variables; Eigenvalues; Heating variables; Statistical significant

Introduction

Canonical correlation analysis examines the relationship between several response variables and several predictor variables. It is a technique used in a variety of domains to investigate the relationship between a set of predictor variables, X, and a collection of response variables, Y, in a dataset. Canonical correlation analysis has been applied in numerous applications across a range of industries [1,2]. The development of a multivariate method with the specific aim of identifying the types of correlations that exist within as well as among two groups of variables has been spearheaded by [3] since its introduction by Hotelling. The goal of canonical correlation analysis (CCA) is to ascertain whether the criterion set of variables has an impact on the predictor sets of variables when it is known, based on some theory [3]. [4] proposed the following four relevant situations for application of CCA. (a) Insurance providers are looking to see if there is a correlation between the types of insurance policies purchased and individual characteristics. (b) The availability of running water, heating and cooling conditions, kitchen and restroom facilities, and type of housing as quality-related factors that a health department is examining to determine if there is a correlation between these and incidences of minor and serious illness, as well as the number of disability days. (c) When it comes to several health-related issues including weight, stress levels, hypertension, and anxiety, a medical researcher is interested in whether people's dietary and lifestyle choices have an impact on their health. (d) A consumer goods company is curious to know if there is a correlation between the types of products purchased and consumer personalities and lifestyles. Each of the aforementioned scenarios aims to establish whether there is a connection between two specific groups of variables. The best method for finding connections between two sets of data is the CCA. Meteorologists use CCA to investigate the association between response variables, such as solar radiation, maximum and lowest temperatures, and a number of other weather-related factors, such as wind, precipitation, and relative humidity. The intricacy of the data makes it necessary to model their interaction in order to predict weather conditions. Consequently, a natural framework for this kind of study is provided by CCA. CCA explores basis vectors for two sets of variables by mutually optimizing the correlations between the projections onto these basis vectors [5]. This paper is an account of some aspects of CCA to multivariate multiple weather conditions data structure, which is an almost totally unexplored area of Statistics. Since CCA is part of mainstream multivariate statistics, relatively little or no work has been done on canonical correlation for unusual kinds of data, such as the types of weather conditions. In as much as many academicians have conducted lot of researches to solve man’s need in an analytic and scientific way, CCA with the relationship between heating variables and cooling variables of weather conditions remain a virgin field of promising research interests. Also, the overriding problem of rising of types of weather conditions in Ghana is of great concern. From statistical evidence, as cited by great academicians in the literature [6-9], there have been an increase in weather conditions in the cities of the country. The objective of revealing some insight between temperature variables and types of weather variables by using CCA is very great concern. This paper's primary goal is to present an overview of basic methods for CCA that use weather data. Thus, focusing on Ghana's weather data is essential due to the country's vulnerability to climate change, its agriculture-dependent economy, and the challenges posed by rapid urbanization, including increased energy demand and land-use changes. A focus on weather dynamics in this regard can mitigate climate risks, enhance food security, and support sustainable urban planning [5]. For the objective of this work in mind, this paper is organized to answer the following three main research questions under CCA: (i) Is there any meaningful correlation between heating variables and cooling variables of weather conditions in Ghana? (ii) Which heating variable has the least or largest impact on construction a meaningful correlation between response variables and predictor variables of weather conditions in a given group of response variables? (iii) Which cooling variable has the largest or least impact on construction a meaningful correlation between predictor variables and response variables of weather conditions in a given group of predictor variables? Specifically, the results of CCA on raw data, mean-corrected data, and standardized data are not clearly distinguished. Since the approach has been primarily theoretical, the execution of methods for data analysis has not been explicitly stated. Although numerous models of CCA have been suggested by various researchers, some conclusions are only stated in the text due to the mathematical complexity of the concept and the literature on the subject lacks a systematic presentation [8]. This paper will provide a method for assessing the effectiveness of CCA of weather conditions. It will contribute to the research information on canonical correlation so that it can help in further work. It will also attempt to present application of CCA at a level accessible to a wide variety of students and researchers. Measuring the degree of the relationship between two sets of data is the goal of CCA, like that of regression analysis. It is comparable to factor analysis for constructing variable composites. It is also similar to discriminant analysis in that it can generate independent dimensions with the aim of obtaining the highest correlation between the dimensions for each set of variables. In order to maximize the relationship between response and predictor variable sets [10], CCA determines the best structure or dimensionality for each variable set.

For Full-length manuscript, please go throught this pdf: