Improving Transfer Learning for Software Cross-Project Defect Prediction

aut.relation.endpage24
aut.relation.journalApplied Intelligence
aut.relation.startpage1
dc.contributor.authorOmondiagbe, OP
dc.contributor.authorLicorish, SA
dc.contributor.authorMacDonell, SG
dc.date.accessioned2024-05-02T01:44:54Z
dc.date.available2024-05-02T01:44:54Z
dc.date.issued2024-04-24
dc.description.abstractSoftware cross-project defect prediction (CPDP) makes use of cross-project (CP) data to overcome the lack of data necessary to train well-performing software defect prediction (SDP) classifiers in the early stage of new software projects. Since the CP data (known as the source) may be different from the new project’s data (known as the target), this makes it difficult for CPDP classifiers to perform well. In particular, it is a mismatch of data distributions between source and target that creates this difficulty. Transfer learning-based CPDP classifiers are designed to minimize these distribution differences. The first Transfer learning-based CPDP classifiers treated these differences equally, thereby degrading prediction performance. To this end, recent research has the Weighted Balanced Distribution Adaptation (W-BDA) method to leverage the importance of both distribution differences to improve classification performance. Although W-BDA has been shown to improve model performance in CPDP and tackle the class imbalance by balancing the class proportion of each domain, research to date has failed to consider model performance in light of increasing target data. We provide the first investigation studying the effects of increasing the target data when leveraging the importance of both distribution differences. We extend the initial W-BDA method and call this extension the W-BDA+ method. To evaluate the effectiveness of W-BDA+ for improving CPDP performance, we conduct eight experiments on 18 projects from four datasets, where data sampling was performed with different sampling methods. Data sampling was only performed on the baseline methods and not on our proposed W-BDA+ and the original W-BDA because data sampling issues do not exist for these two methods. We evaluate our method using four complementary indicators (i.e., Balanced Accuracy, AUC, F-measure and G-Measure). Our findings reveal an average improvement of 6%, 7.5%, 10% and 12% for these four indicators when W-BDA+ is compared to the original W-BDA and five other baseline methods (for all four of the sampling methods used). Also, as the target to source ratio is increased with different sampling methods, we observe a decrease in performance for the original W-BDA, with our W-BDA+ approach outperforming the original W-BDA in most cases. Our results highlight the importance of having an awareness of the effect of the increasing availability of target data in CPDP scenarios when using a method that can handle the class imbalance problem.
dc.identifier.citationApplied Intelligence, ISSN: 0924-669X (Print); 1573-7497 (Online), Springer Science and Business Media LLC, 1-24. doi: 10.1007/s10489-024-05459-1
dc.identifier.doi10.1007/s10489-024-05459-1
dc.identifier.issn0924-669X
dc.identifier.issn1573-7497
dc.identifier.urihttp://hdl.handle.net/10292/17504
dc.languageen
dc.publisherSpringer Science and Business Media LLC
dc.relation.urihttps://link.springer.com/article/10.1007/s10489-024-05459-1
dc.rightsOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
dc.rights.accessrightsOpenAccess
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject46 Information and Computing Sciences
dc.subject4612 Software Engineering
dc.subject0801 Artificial Intelligence and Image Processing
dc.subjectArtificial Intelligence & Image Processing
dc.subject46 Information and computing sciences
dc.titleImproving Transfer Learning for Software Cross-Project Defect Prediction
dc.typeJournal Article
pubs.elements-id546570
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Improving transfer learning for software.pdf
Size:
938.11 KB
Format:
Adobe Portable Document Format
Description:
Journal article
Loading...
Thumbnail Image
Name:
Omondiagbe et al._2024_Improving transfer learning for software cross-project defect prediction.pdf
Size:
927.46 KB
Format:
Adobe Portable Document Format
Description:
Evidence for verification