Log-related Coding Patterns to Conduct Postmortems of Attacks in Supervised Learning-based Projects

Farzana Ahamed Bhuiyan and Akond Rahman in Journal of ACM Transactions on Privacy and Security (TOPS), 2022 Pre-print

Adversarial attacks against supervised learning algorithms, such as deep neural networks and decision trees can be consequential, which necessitates the application of logging while using supervised learning algorithms in software projects. Logging enables practitioners to conduct postmortem analysis, which can be helpful to diagnose any conducted attacks. We conduct an empirical study to identify and characterize log-related coding patterns, i.e., recurring coding patterns that can be leveraged to conduct adversarial attacks and needs to be logged. A list of log-related coding patterns can guide practitioners on what to log while using supervised learning algorithms in software projects. We apply qualitative analysis on 3,004 Python files used to implement 103 supervised learning-based software projects. We identify a list of 54 log-related coding patterns that map to 6 attacks related to supervised learning algorithms. Using LOPSUL, we quantify the frequency of the identified log-related coding patterns with 278 open source software projects that use supervised learning. We observe log-related coding patterns to appear for 22\% of the analyzed files, where training data forensics is the most frequently occurring category.