Msr19 Assignment2

Technology Usage Mining

ANTLR ( is one of the popular technologies for realizing a parser of a custom language. Even complex technologies such as XteXt depend on it. Many open source projects maintained on GitHub use it. Many technical details are explained in the documentation that is part of the official GitHub repository.

Lists of Repositories (Pre-Mature)

In the following, two lists of repositories are provided that contain links to repositories with traces of ANTLR usage.

  1. Pre-Final Repository list with metadata (CSV)
  2. Final Repository list filtered based on Star Gazers and grammar file count. (CSV)

The two lists still need to be merged and filtered. We will discuss this as a showcase for Pandas-based data analysis.


In this mining task, we are interested in getting first insights into what components of ANTLR are frequently used in what way. Hence, the MSR participants collaboratively collect different kinds of statistics on ANTLR usage given the provided set of repositories.

Exemplary tasks are:

  • Grammar file focused analysis:
      • How often do people actually copy grammars written by Terrence Parr (e.g., here)?
      • What is the typical complexity of grammars (.g4 files)?
          • Lines of Code
          • Recursion
          • Various Complexity Metrics
      • How many repositories make use of semantic actions? (embedded Java code)
      • Advanced: Can we confirm the existence of grammar smells in ANTLR grammars? (
  • Java focused
      • Do projects actually use the listener pattern for a parsing task?
          • How often do people implement an enterX Method vs. exitX Method, where the placeholder X is replaced by a rule name.
      • Do projects actually use the visitor pattern for a parsing task?
  • Build focused
      • Do projects make use of Maven/Gradle plugins for starting the code generation?