Treating class imbalance in non-technical loss detection : an exploratory analysis of a real dataset
- Ghori, Khawaja, Awais, Muhammad, Khattak, Akmal, Imran, Muhammad, Amin, Fazal, Szathmary, Laszlo
- Authors: Ghori, Khawaja , Awais, Muhammad , Khattak, Akmal , Imran, Muhammad , Amin, Fazal , Szathmary, Laszlo
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 98928-98938
- Full Text:
- Reviewed:
- Description: Non-Technical Loss (NTL) is a significant concern for many electric supply companies due to the financial impact caused as a result of suspect consumption activities. A range of machine learning classifiers have been tested across multiple synthesized and real datasets to combat NTL. An important characteristic that exists in these datasets is the imbalance distribution of the classes. When the focus is on predicting the minority class of suspect activities, the classifiers' sensitivity to the class imbalance becomes more important. In this paper, we evaluate the performance of a range of classifiers with under-sampling and over-sampling techniques. The results are compared with the untreated imbalanced dataset. In addition, we compare the performance of the classifiers using penalized classification model. Lastly, the paper presents an exploratory analysis of using different sampling techniques on NTL detection in a real dataset and identify the best performing classifiers. We conclude that logistic regression is the most sensitive to the sampling techniques as the change of its recall is measured around 50% for all sampling techniques. While the random forest is the least sensitive to the sampling technique, the difference in its precision is observed between 1% - 6% for all sampling techniques. © 2013 IEEE.
Treating class imbalance in non-technical loss detection : an exploratory analysis of a real dataset
- Authors: Ghori, Khawaja , Awais, Muhammad , Khattak, Akmal , Imran, Muhammad , Amin, Fazal , Szathmary, Laszlo
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 98928-98938
- Full Text:
- Reviewed:
- Description: Non-Technical Loss (NTL) is a significant concern for many electric supply companies due to the financial impact caused as a result of suspect consumption activities. A range of machine learning classifiers have been tested across multiple synthesized and real datasets to combat NTL. An important characteristic that exists in these datasets is the imbalance distribution of the classes. When the focus is on predicting the minority class of suspect activities, the classifiers' sensitivity to the class imbalance becomes more important. In this paper, we evaluate the performance of a range of classifiers with under-sampling and over-sampling techniques. The results are compared with the untreated imbalanced dataset. In addition, we compare the performance of the classifiers using penalized classification model. Lastly, the paper presents an exploratory analysis of using different sampling techniques on NTL detection in a real dataset and identify the best performing classifiers. We conclude that logistic regression is the most sensitive to the sampling techniques as the change of its recall is measured around 50% for all sampling techniques. While the random forest is the least sensitive to the sampling technique, the difference in its precision is observed between 1% - 6% for all sampling techniques. © 2013 IEEE.
Performance analysis of different types of machine learning classifiers for non-technical loss detection
- Ghori, Khawaja, Abbasi, Rabeeh, Awais, Muhammad, Imran, Muhammad, Ullah, Ata, Szathmary, Laszlo
- Authors: Ghori, Khawaja , Abbasi, Rabeeh , Awais, Muhammad , Imran, Muhammad , Ullah, Ata , Szathmary, Laszlo
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 16033-16048
- Full Text:
- Reviewed:
- Description: With the ever-growing demand of electric power, it is quite challenging to detect and prevent Non-Technical Loss (NTL) in power industries. NTL is committed by meter bypassing, hooking from the main lines, reversing and tampering the meters. Manual on-site checking and reporting of NTL remains an unattractive strategy due to the required manpower and associated cost. The use of machine learning classifiers has been an attractive option for NTL detection. It enhances data-oriented analysis and high hit ratio along with less cost and manpower requirements. However, there is still a need to explore the results across multiple types of classifiers on a real-world dataset. This paper considers a real dataset from a power supply company in Pakistan to identify NTL. We have evaluated 15 existing machine learning classifiers across 9 types which also include the recently developed CatBoost, LGBoost and XGBoost classifiers. Our work is validated using extensive simulations. Results elucidate that ensemble methods and Artificial Neural Network (ANN) outperform the other types of classifiers for NTL detection in our real dataset. Moreover, we have also derived a procedure to identify the top-14 features out of a total of 71 features, which are contributing 77% in predicting NTL. We conclude that including more features beyond this threshold does not improve performance and thus limiting to the selected feature set reduces the computation time required by the classifiers. Last but not least, the paper also analyzes the results of the classifiers with respect to their types, which has opened a new area of research in NTL detection. © 2013 IEEE.
- Authors: Ghori, Khawaja , Abbasi, Rabeeh , Awais, Muhammad , Imran, Muhammad , Ullah, Ata , Szathmary, Laszlo
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 16033-16048
- Full Text:
- Reviewed:
- Description: With the ever-growing demand of electric power, it is quite challenging to detect and prevent Non-Technical Loss (NTL) in power industries. NTL is committed by meter bypassing, hooking from the main lines, reversing and tampering the meters. Manual on-site checking and reporting of NTL remains an unattractive strategy due to the required manpower and associated cost. The use of machine learning classifiers has been an attractive option for NTL detection. It enhances data-oriented analysis and high hit ratio along with less cost and manpower requirements. However, there is still a need to explore the results across multiple types of classifiers on a real-world dataset. This paper considers a real dataset from a power supply company in Pakistan to identify NTL. We have evaluated 15 existing machine learning classifiers across 9 types which also include the recently developed CatBoost, LGBoost and XGBoost classifiers. Our work is validated using extensive simulations. Results elucidate that ensemble methods and Artificial Neural Network (ANN) outperform the other types of classifiers for NTL detection in our real dataset. Moreover, we have also derived a procedure to identify the top-14 features out of a total of 71 features, which are contributing 77% in predicting NTL. We conclude that including more features beyond this threshold does not improve performance and thus limiting to the selected feature set reduces the computation time required by the classifiers. Last but not least, the paper also analyzes the results of the classifiers with respect to their types, which has opened a new area of research in NTL detection. © 2013 IEEE.
- «
- ‹
- 1
- ›
- »