Systematic Statistical Analysis to Ascertain the Missing Data Patterns in Energy Consumption Data of Smart Homes

K. Purna Prakash, Y. V. Pavan Kumar

Abstract


The evolution of smart homes is very rapid and the benefits, comfort, as well as flexibility in controlling energy consumption, attract the development smart home culture across the globe. The energy consumption data collected from these smart homes play a major role in energy pricing, understanding consumers’ behavior, demand-side management, etc., functionalities. But, sometimes, this collected data may suffer from the anomalies such as missing data, redundancy, outliers, etc., which affect the energy data analytics. Among these anomalies, the missing data is one of the anomalies to be concentrated more as it makes data incomplete and significantly hinders the further analysis of the data. This missing of data may take place in three different patterns viz. missing completely at random, missing at random, and missing not at random. Therefore, capturing the pattern of the missing data is highly preferred to better handle them. Although there are a few works on the missing data, they are focused only on the occurrence, behavior, impacts, recovery, and imputation of the missing data rather than identifying the pattern of missing data. Hence to address this problem, this paper proposes a statistical approach to ascertain the pattern of missing data in the energy consumption data of smart homes. The proposed statistical approach revealed that the data are missing at random in the energy consumption data. An energy consumption database named ‘Tracebase’ is used for implementing the proposed approach.

Keywords


Energy consumption data; MAR; MCAR; Missing data pattern; MNAR; Smart home data; Statistical analysis

Full Text:

PDF

References


K. P. Prakash and Y. V. P. Kumar, “A systematic approach for exploration, behavior analysis, and visualization of redundant data anomalies in smart home energy consumption dataset,” International Journal of Renewable Energy Research (IJRER), vol. 12, no. 1, pp. 109-123, March 2022.

A. Zielonka, W. Marcin, S. Garg, G. Kaddoum, Md. P. Jalil and M. Ghulam, “Smart homes: How much will they support us? A research on recent trends and advances,” IEEE Access, vol. 9, pp. 26388-26419, January 2021.

J. F. DeFranco and K. Mohamad, “Smart home research themes: An analysis and taxonomy,” Procedia Computer Science, vol. 185, pp. 91-100, 2021.

A. Sial, S. Amarjeet and A. Mahanti, “Detecting anomalous energy consumption using contextual analysis of smart meter data,” Wireless Networks, vol. 27, pp. 4275-4292, 2021.

L. G. Fahad and F. S. Tahir, “Activity recognition and anomaly detection in smart homes,” Neurocomputing, vol. 423, pp. 362–372, 2021.

L. Feng, X. Shu, L. Zhang, J. Wu, Z. Jidong, C. Chu, W. Zhenyu and S. Haoyang, “Anomaly detection for electricity consumption in cloud computing: framework, methods, applications, and challenges,” EURASIP Journal on Wireless Communications and Networking, vol. 194, 2020.

Hela Sfar, B. Amel and B. Raddaoui, “Early anomaly detection in smart home: A causal association rule-based approach,” Artificial Intelligence in Medicine, vol. 91, pp. 57-71, September 2018.

R. Moghaddass and W. Jianhui, “A hierarchical framework for smart grid anomaly detection using large-scale smart meter data,” IEEE Transactions on Smart Grid, vol. 9. no. 6, pp. 5820-5830, November 2018.

W. Chen, Z. Kaile, S. Yang and C. Wu, “Data quality of electricity consumption data in a smart grid environment,” Renewable and Sustainable Energy Reviews, vol. 75, pp. 98–105, 2017.

D. Firmani, M. Massimo, M. Scannapieco and B. Carlo, “On the meaningfulness of big data quality,” Data Science and Engineering, vol. 1, no. 1, pp. 6–20, 2016.

Wei Biao Wu, Maggie X. Cheng and Bei Gou, “A hypothesis testing approach for topology error detection in power grids,” IEEE Internet of Things Journal, vol. 3, no. 6, December 2016.

Beth-Anne Schuelke-Leech, B. Barry, M. Matteo and B. J. Yurkovich, “Big data issues and opportunities for electric utilities,” Renewable and Sustainable Energy Reviews,” vol. 52, pp. 937–947, 2015.

K. Purna Prakash and Y. V. Pavan Kumar, "Analytical approach to exploring the missing data behavior in smart home energy consumption dataset," Journal of Renewable Energy and Environment (JREE), vol. 9, no. 2, pp. 1-12, Spring 2022.

K. P. Prakash and Y. V. P. Kumar, “Simple and effective descriptive analysis of missing data anomalies in smart home energy consumption readings,” Journal of Energy Systems, vol. 5, no. 3, pp. 199-220, 2021.

I. Romero-Fiances, L. Andreas, M. Theristis, M. George, J. S. Stein, N. Gustavo, J. de la Casa and G. E. Georghiou, “Impact of duration and missing data on the long-term photovoltaic degradation rate estimation,” Renewable Energy, vol. 181, pp. 738-748, 2022.

T. Su, S. Ying, J. Yu, Y. Changxi and Z. Feng, “Nonlinear compensation algorithm for multidimensional temporal data: A missing value imputation for the power grid applications,” Knowledge-Based Systems, vol. 215, pp. 106743, 2021.

Y. Zhou, S. Lijun, X. Hu and L. Ma, “Clustering and statistical analyses of electricity consumption for university dormitories: A case study from China,” Energy & Buildings, vol. 245, pp. 110862, 2021.

C. Wang, C. Yu, S. Zhang and L. Tong, “A reconstruction method for missing data in power system measurement based on LSGAN,” Frontiers in Energy Research, vol. 9, pp. 1-13, March 2021.

D. Jeong, P. Chiwoo and M. K. Young, “Missing data imputation using mixture factor analysis for building electric load data,” Applied Energy, vol. 304, pp. 117655, 2021.

S. Ryu, K. Minsoo and H. Kim, “Denoising autoencoder-based missing value imputation for smart meters,” IEEE Access, vol. 8, pp. 40656-40666, February 2020.

J. Ma, J. C. P. Cheng, J. Feifeng, W. Chen, M. Wang and Z. Chong, “A bi-directional missing data imputation scheme based on LSTM and transfer learning for building energy data,” Energy & Buildings, vol. 216, pp. 109941, 2020.

B. Tan, Y. Jun, T. Zhou, X. Zhan, Y. Liu, S. Jiang and L. Chao, “Spatial-temporal adaptive transient stability assessment for power system under missing data,” Electrical Power and Energy Systems, vol. 123, pp. 106237, 2020.

R. Tawn, J. Browell and D. Iain, “Missing data in wind farm time series: Properties and effect on forecasts,” Electric Power Systems Research, vol. 189, pp. 106640, 2020.

R. Razavi-Far, M. Farajzadeh-Zanjani, M. Saif and C. Shiladitya, “Correlation clustering imputation for diagnosing attacks and faults with missing power grid data,” IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1453-1464, March 2020.

I. Izonin, K. Natalia, T. Roman and Z. Khrystyna, “An approach towards missing data recovery within IoT smart system,” Procedia Computer Science, vol. 155, pp. 11–18, 2019.

C. Genes, I. Esnaola, M. P. Samir, L. F. Ochoa and C. Daniel, “Robust recovery of missing data in electricity distribution systems,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 4057-4067, July 2019.

S. Jurado, N. Àngela, F. Mugica and M. Mihail, “Fuzzy inductive reasoning forecasting strategies able to cope with missing data: A smart grid application,” Applied Soft Computing, vol. 51, pp. 225-238, February 2017.

P. R. Harvey, B. Stephen and G. Stuart, “Classification of AMI residential load profiles in the presence of missing data,” IEEE Transactions on Smart Grid, vol. 7, no. 4, pp. 1944-1945, July 2016.

The Tracebase appliance-level power consumption data set, (http://www.tracebase.org/)

C. Li, “Little’s test of missing completely at random,” The Stata Journal, vol. 13, no. 4, pp. 795-809, 2013.

Little's missing completely at random (MCAR) test (https://search.r-project.org/CRAN/refmans/naniar/html/mcar_test.html)

A. A. T. Fernandes, D. B. F. Filho, E. Carvalho da Rocha and W. da Silva Nascimento, “Read this paper if you want to learn logistic regression,” Revista de Sociologia e Política, vol. 28, no. 74, 2020.

E. Y. Boateng and D. A. Abaye, “A review of the logistic regression model with emphasis on medical research,” Journal of Data Analysis and Information Processing, vol. 7, pp. 190-07, 2019.

S. Sperandei, “Understanding logistic regression analysis,” Biochemia Medica, vol. 24, no. 1, pp. 12-18, 2014.

A. P. King and R. J. Eckersley, Statistics for Biomedical Engineers and Scientists, Academic Press, 2019, pp. 147-171.

R. C. Aster, B. Borchers and C. H. Thurber, Parameter Estimation and Inverse Problems, 3rd ed. Elsevier, 2019, pp. 341-362.

A. Ghasemi and S. Zahediasl, “Normality tests for statistical analysis: a guide for non-statisticians,” Int J Endocrinol Metab., vol. 10, no. 2, pp. 486-489, 2012.

T. K. Kim, “T test as a parametric statistic,” Korean Journal of Anesthesiology, vol. 68, no. 6, pp. 540-546, December 2015.

Y. Xia, Progress in Molecular Biology and Translational Science, Academic Press, vol. 171, pp. 309-491, 2020.

G. D. Garson, Missing values analysis & data imputation, 2015 Edition, Statistical Publishing Associates, pp. 1-26, 2015.




DOI (PDF): https://doi.org/10.20508/ijrer.v12i3.13029.g8543

Refbacks

  • There are currently no refbacks.


Online ISSN: 1309-0127

Publisher: Gazi University

IJRER is cited in SCOPUS, EBSCO, WEB of SCIENCE (Clarivate Analytics);

IJRER has been cited in Emerging Sources Citation Index from 2016 in web of science.

WEB of SCIENCE between 2020-2022; 

h=30,

Average citation per item=5.73

Impact Factor=(1638+1731+1808)/(189+170+221)=9.24

Category Quartile:Q4