We present a fast machine-learning approach to static code analysis and fingerprinting for weaknesses related to security, software engineering, and others using the open-source MARF framework and its MARFCAT application. We used the NIST’s SATE IV static analysis tool exposition workshop’s data sets that included popular open-source projects and large synthetic sets as test cases.
To aid detection of weak or vulnerable code, including source or binary on different platforms the machine learning approach proved to be fast and accurate to for such tasks where other tools are either much slower or have much smaller recall of known vulnerabilities. We use signal processing techniques in our approach to accomplish the classification tasks. MARFCAT’s design is independent of the language being analyzed, source code, bytecode, or binary.