Source Code Static Analysis

Static code analysis strives to analyse programs without executing them at all (i.e., purely on the syntactic level) or at least without executing them under the original semantics, meaning that some abstract semantics is used. There exist many different forms of static analysis based, e.g., on syntactic error patterns, data-flow analysis, extended type and effect analysis, abstract interpretation, or symbolic execution.
Deriving various runtime properties and finding various kinds of errors in programs without executing them at all or at least not under their original semantics.

The domain of static analysis is rather broad and even the understanding of the term “static analysis” varies in different sources. For the purpose of this document, we will understand it as an approach to analysing properties of code (primarily source code, but one can use static analysis over executable binary code too) without executing it at all or at least without executing it under its original semantics. Note that static analysis may produce results applicable for verification but also many other purposes – e.g., optimisation, generation of target code, automated parallelization, code understanding, worst-case execution time analysis, performance (or, more generally, resource consumption) analysis, etc. Given the broad meaning of static analysis, it covers approaches that range from purely syntactic ones (e.g., even grep can be used as a static analyser in an extreme case) to much more semantic approaches that execute the given system under some alternative (typically more abstract) semantics. The approaches of model checking and deductive verification can be viewed as forms of static analysis too though they are often considered standalone approaches (which is adopted in this document too).

If we do not count possible extreme interpretations of static analysis, the probably most common approaches to static analysis include data-flow analysis, abstract interpretation, symbolic execution, type and effect analysis, constraint-based static analysis, error-pattern-driven analysis, as well as various pointer analyses. Describing all these approaches in detail is of course beyond the scope of this document, and so we provide their very basic characterization only (inspired by the overview paper [SAN1]).

Given a pre-defined set of properties of program states, which are of interest for some particular reason and which can be denoted as the so-called data-flow facts, a data-flow analysis tracks how the data-flow facts propagate through a program, usually encoded as a control-flow graph (CFG). If the CFG of a program is not explicit in the program code, it may be derived by some preceding static control-flow analysis. The propagation of data-flow facts is tracked in a way consistent with all feasible paths through the CFG, but without directly executing the program. The most common approach to data-flow analysis is the lattice-theoretic iterative data-flow analysis, which essentially solves a system of data-flow equations derived from the CFG of the program being analysed [SAN2, SAN3, SAN4, SAN5, SAN6]. These equations describe how data-flow facts propagate forward or backward through program statements and how they meet at program junctions.

Abstract interpretation [SAN7, SAN8, SAN9] is a theory of sound approximation of the semantics of computer programs that consists in giving a class of programs a concrete and abstract semantics defined on suitable concrete and abstract lattice-based domains. These domains are usually linked by a pair of monotone functions – the so-called abstraction and concretisation that form a Galois connection (though this requirement can be lifted, and the analysis defined in terms of one of these functions only). Program statements are modelled as monotone functions, often called as concrete and abstract transformers, on the concrete and abstract domains, respectively.

Symbolic execution [SAN10, SAN11, SAN12, SAN13, SAN14, SAN15] encodes runs through a program over systematically enumerated program paths as formulae over a suitable logic theory. When doing so, program inputs are represented symbolically as variables whose value is gradually restricted according to the conditions that the chosen program path contains. Automatic decision procedures, implemented, e.g., in SMT solvers, are used to check satisfiability of the obtained formulae.

Type and effect systems [SAN16, SAN17] extend the basic type systems of programming languages to take into account various semantic effects of the supported programming constructions. One can, for instance, track how the memory is accessed (reading, writing, allocation, de-allocation), whether and how synchronization is used (locking and unlocking of mutexes), whether and how various other resources are used (such as reading and writing of files), whether some exceptions can be generated, etc.

In constraint-based static analysis, a set of constraints is derived from the analysed program such that when the constraints are solved, the solution provides the needed information about the program [SAN18, SAN19]. Various kinds of constraints can be used such as conditional set-constraints, linear arithmetic constraints, polynomial arithmetic constraints, etc.

Static analyses based on error patterns use some syntactic characterisation of various classes of errors. First tools based on this approach appeared already in the late 70’s in the Lint tool. Modern versions of “linters” are still in use and development for numerous languages. However, in more advanced tools, usage of error patterns is often combined with various other static analyses: An error pattern provides some basic characterisation of an undesirable control flow in a program, but then results of, e.g., data-flow analysis, abstract interpretation, various pointer analyses, and/or symbolic execution are used to rule out infeasible control-flow paths in order to reduce the number of false alarms [SAN20, SAN21]. Moreover, all these techniques can also be complemented by using further techniques, such as code slicing to remove parts of code irrelevant when verifying a certain property of the code [SAN22].

Pointer analyses may have various forms such as alias analysis or shape analysis. These analyses are often used as auxiliary analyses for other kinds of analyses. They may be implemented as specific forms of data-flow analysis or abstract interpretation, but specialised algorithms for them have been proposed too [SAN23, SAN24, SAN25].

As one can see, the field of static analysis is indeed broad. Moreover, as already indicated above, it is often the case that several approaches to static analysis are combined together. Since static analyses often over-approximate the behaviour of programs (though they can also under-approximate it or combine over- and under-approximation), they can produce a number of false alarms. To keep this number low, the described techniques or their combinations are often combined with further heuristics (such as statistical reasoning about which warnings are likely to be real errors, baselining – i.e., ignoring results of the first analysis run and reporting warnings that appear in later analysis runs only, or combinations of static and dynamic approaches).

  • Lightweight (more syntactic) static analysis is highly scalable. Heavier-weight (more semantic) static analysis may scale less but often still significantly better than other approaches.
  • Some forms of static analysis are applicable even on code fragments without any need for the fragment to even compile, not speaking about a need to run it.
  • Some forms of static analysis are sound (i.e., provide formal guarantees of the correctness of the provided answers) though soundness is often sacrificed to scalability and reducing the number of false alarms.
  • Highly scalable static analysis approaches often come with a number of false alarms (or the number of such alarms is artificially reduced by accepting unsoundness of the approach, i.e., allowing the approach to miss real errors).
  • Some heavier-weight approaches may require the system under analysis to compile and even to come with a test harness.
  • Efficient and precise static analysers are often fine-tuned for a specific class of programs and properties. Their event slight change may require the analysis to be reworked.
  • Some forms of static analysis are not applicable for some classes of errors or they are applicable in a limited way only – e.g., it is difficult to cover errors such as data races in a satisfactorily exhaustive way by purely syntactic error patterns.

[SAN1] Krena, B., Vojnar, T.: Automated Formal Analysis and Verification: An Overview. International Journal of General Systems, 42(4), 2013.

[SAN2] Kildall, G.A.: A Unified Approach to Global Program Optimization. ACM Press, 1973.

[SAN3] Kam, J.B., Ullman, J.D.: Global Data Flow Analysis and Iterative Algorithms. Journal of the ACM, 23, 1976.

[SAN4] Kam, J.B., Ullman, J.D.: Monotone Data Flow Analysis Frameworks. Acta Informatica, 7, 1977.

[SAN5] Khedker, U.P., Dhamdhere, D.M.: A Generalized Theory of Bit Vector Data Flow Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS), 16(5), 1994.

[SAN6] Khedker, U., Sanyal, A., Sathe, B.: Data Flow Analysis: Theory and Practice, CRC Press, 2009.

[SAN7] Cousot, P., Cousot, R.: Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. Proc. of POPL'77, ACM, 1977.

[SAN8] Cousot, P., Cousot, R.: Abstract Interpretation Frameworks. Journal of Logic and Computation, 2(4), 1992.

[SAN9] Rival, X., Yi, K.: Introduction to Static Analysis. An Abstract Interpretation Perspective. MIT Press, 2020.

[SAN10] Boyer, R.S., Elspas, B., Levitt, K.N.: SELECT – A Formal System for Testing and Debugging Programs by Symbolic Execution. Proc. of Int. Conf. on Reliable Software, ACM, 1975.

[SAN11] Howden, W.E.: Symbolic Testing and the DISSECT Symbolic Evaluation System. IEEE Trans. on Software Engineering, 3(4), 1977.

[SAN12] King, J.C.: A New Approach to Program Testing. Proc. Int. Conf. on Reliable Software, ACM, 1975.

[SAN13] King, J.C.: Symbolic Execution and Program Testing. CACM, 19(7), ACM, 1976.

[SAN14] Cadar, C., Dunbar, D., Engler, D.R.: KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. Proc. of OSDI’08, USENIX, 2008.

[SAN15] Baldoni, R., Coppa, E., Cono D’elia, D., Demetrescu, C., Finocchi, I.: A Survey of Symbolic Execution Techniques. ACM Computing Surveys, 50, ACM, 2018.

[SAN16] Nielson, F., Nielson, H.R., 1999. Type and Effect Systems. Correct System Design, Recent Insight and Advances, LNCS 1710, Springer, 1999.

[SAN17] Palsberg, J., Millstein, T.D.: Type Systems: Advances and Applications. The Compiler Design Handbook, 2007.

[SAN18] Aiken, A.: Introduction to Set Constraint-based Program Analysis. Science of Computer Programming, 35(2-3), 1999.

[SAN19] Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis, Springer-Verlag, 2005.

[SAN20] Hallem, S., Chelf, B., Xie, Y., Engler, D.R.: A System and Language for Building System-Specific, Static Analyses. Proc. of PLDI’02, ACM, 2002.

[SAN21] Bessey, A., Block, K., Chelf, B., Chou, A., Fulton, B., Hallem, S., Gros, C.-H., Kamsky, A., McPeak, S., Engler, D.R.: A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. CACM 53(2), ACM, 2010.

[SAN22] Slaby, J., Strejcek, J., Trtik, M.: Symbiotic: Synergy of Instrumentation, Slicing, and Symbolic Execution. Proc. of TACAS’13, LNCS 7795, Springer, 2013.

[SAN23] Andersen, L.O.: Program Analysis and Specialization for the C Programming Language. PhD thesis, University of Copenhage, 1994.

[SAN24] Steensgaard, B.: Points-to Analysis in Almost Linear Time. Proc. of POPL'96, ACM, 1996.

[SAN25] Smaragdakis, Y., Balatsouras, G.: Pointer Analysis. Foundations and Trends in Programming Languages, 2(1), 2015.

Method Dimensions
In-the-lab environment
Analytical - Formal, Analytical - Semi-Formal
Software
Implementation
Thinking, Acting
Non-Functional - Safety, Functional, Non-Functional - Privacy, Non-Functional - Other, Non-Functional - Security
V&V process criteria, SCP criteria
Relations
Contents

There are currently no items in this folder.