Automatic Layout and Topic Annotation for Code Maps using A Taxonomy of Computing Concepts


Automatic Layout and Topic Annotation for Code Maps using A Taxonomy of Computing Concepts

Principal Investigator

Kenny Zhu

Shanghai Jiao Tong University


Code Map is a recent research effort of Oracle Labs Australia that aims at providing a scalable, spatial visualization of large code bases using a world map metaphor. Code entities at different levels of abstraction such as packages, classes and methods are mapped to their corresponding entities in a world map such as continents, countries, states and cities. Users navigate the map using a pan and zoom interaction model where more detail is shown for increasing levels of zoom. This multi-layer map is automatically constructed from a dependency graph of the code (e.g. call graph) and an abstraction hierarchy (e.g. directory structure). The goal of this project is to derive a standard taxonomy of computing and programming topics from various sources such as Wikipedia, text corpuses and code repositories. This taxonomy is useful in two ways: 1) It acts as a knowledge base which specifies is-a and part-of relations among large number of computing concepts. With this knowledge base, we can infer topics about code segments by extracting concepts from program comments, variable/function names, check-in messages, as well as bug reports. These topics can then be used to annotate the semantics of different regions in the code map. 2) It can serve as a standard abstraction hierarchy for code maps. This hierarchy provides the standard vocabulary for labels of code entities or groups at different level of abstraction. Such a hierarchy would be consistent across different code bases (unlike the directory structure). Code maps derived from this hierarchy may be laid out with a consistent orientation (e.g. security code to the north, storage code to the east, etc.) so that different code bases can be viewed from a consistent perspective and can be compared more effectively.