The basic problem the Routine Analyzer is trying to solve is how to assimilate a large software product that may consist of dozens, even hundreds of source files, and hundreds of thousands of lines of source code. Where to begin? Where to go? How to identify the relationship between routines, such that an engineer can enhance or correct the product without introducing new problems?
The Routine Analyzer is just one of the tools an engineer can use in working on a large software project. Several other tools provide similar capabilities, such as Source Code Analyzer under VMS, and cscope under Unix. Tools such as cross-reference utilities and compiler and linker listings provide additional information. Each tool has its own benefits and limitations. The Routine Analyzer is intended to supplement other tools, not replace them.
The Routine Analyzer runs under VMS, Unix, and MS-DOS.
Reports
The Routine Analyzer produces the following reports from a software product's
source code:
How It Works
The analysis process is very analogous to a software build, in which a build
utility identifies the source file to compile, compiles them, and links them to
form the product image. The analyzer reads a list of source file locations from
a product definition file, analyzes them, and produces the reports. It
works in two phases, an analysis phase, and a report generation phase.
During the analysis phase, the analyzer identifies the source language of a file based on the file type or extension, matching the type to known file types or to special types listed in a language definition file. It then selects a parser for that language and parses the contents of the source file, similar to the way a compiler parses a source file (note, however, that the parsers used here are much simpler than true compiler front ends, and ignore many details that are important to a compiler). The parser notes each routine definition (i.e. the actual routine contents) and each call to a routine that it encounters in the source code, building cross-reference information and maintaining various counters.
When it has parsed the contents of all the source files, the analyzer enters the report generation phase. First, it formats the lists of files and routines it has built. Then it trims duplicate references and identifies recursion, and formats the cross references and call trees. Finally, it formats the annotated source listings.
The analyzer supports mixed-language programming since it selects the appropriate parser on a file-by-file basis (though it does not recognized mixed languages within a single file). It permits multiple definitions of the same routine, for example to provide alternate implementations for different operating systems. It can also be used on partial code sets, where a compile or link might fail due to missing code.
Supported Languages
Currently, the analyzer supports three source languages: C, BLISS, and text. C
and BLISS are the primary high-level languages used in current and existing
Digital products. The text language is actually just a dummy parser that accepts
all input without looking for routines, and can be used to capture files
containing text or unsupported source languages in the analysis reports.
Future language support under consideration is for C++, PERL, and several assembly languages. Assembly languages present a special challenge in that the bounds of a routine definition are not as clear as in high-level languages.
Report Formats
Currently, the analyzer produces reports in two formats: SDML, and HTML. SDML
(Standard Digital Markup Language) is the input format for VAX Document, which
can produce ASCII text, PostScript, or Bookreader output. This format is
especially suited to hardcopy documentation when producing PostScript output.
HTML (Hyper Text Markup Language) is the input format for World-Wide Web
browsers such as Mosaic. This format is especially suited to online browsing,
since it includes hypertext links between related items in the reports for quick
navigation.
SDML reports are created in a small set of files that can be used standalone or included in other documents, either together or individually. HTML reports are created in a somewhat larger set of files that are optimized for quick browsing, and are intended to be used as an integrated set.
Future report formats under consideration are for ASCII text, Windows Rich Text Format (RTF), Windows Help, and VMS Help.
The Product Definition File
The Product Definition File is a required text listing the source files to be
analyzed. The first line of the file is the product identification line. It can
contain any text, which will be used to identify the product in the reports.
Each remaining line contains one source file pathname. The pathnames may be
complete, absolute pathnames, or they may be relative pathnames. Relative
pathnames are relative to the current working directory when the analyzer is
run. Wildcards are not permitted, and Unix shell abbreviations, such as "~" for
the user's home directory, are not recognized. Remember that Unix filenames are
case-sensitive, while VMS and MS-DOS filenames are not.
For example, the following are product definition files for Unix and VMS. They are for a hypothetical mail utility that uses an X Windows interface. In both cases, the source files are in several subdirectories under a master source directory, from which the analyzer will be run:
Unix: VMS: % cat myxmail.prd $ type myxmail.prd X Windows Mail System X Windows Mail System send/finduser.c [.send]finduser.c send/sendmail.c [.send]sendmail.c rcv/getmail.c [.rcv]getmail.c rcv/showmail.c [.rcv]showmail.c file/savemail.c [.file]savemail.c file/importmail.c [.file]importmail.c file/movemail.c [.file]movemail.c gui/mainwindow.c [.gui]mainwindow.c gui/sendwindow.c [.gui]sendwindow.c gui/rcvwindow.c [.gui]rcvwindow.c
For example, the following is a language definition file valid for both Unix and VMS. It shows the default file types for the supported languages, plus one definition for an unsupported assembly language. However, note that it is not necessary to include the default language definitions, and if all files listed in the product definition file are types known to the analyzer, a language definition file is not necessary.
% cat myxmail.lng ! The following are the default file type/language definitions. ! c=c ! .c file type is C code h=c ! .h file type is C code bli=bliss ! .bli file type is BLISS code req=bliss ! .req file type is BLISS code r32=bliss ! .r32 file type is BLISS code dat=text ! .dat file type is plain text txt=text ! .txt file type is plain text ! ! The following is a custom C language file type. ! myc=c ! .myc file type is C code ! ! The following tell the analyzer to treat assembly language and ! options files as plain text so that they will be included in ! the annotated source output, even though the analyzer is unable ! to parse them. ! asm=text ! Treat assembly language as plain text opt=text ! Treat options file as plain text
For example, the following are options files for Unix and VMS. Note that the option switch character for Unix and MS-DOS is "-", and for VMS is "/".
Unix: VMS: % cat myxmail.opt $ type myxmail.opt -format=html /format=html -outprefix=/devdocs/xmail_ /outprefix=dev$docs:xmail_ -lang=myxmail.lng /lang=myxmail.lng
ranalyzer product_definition_file [options]where product_definition_file is the product definition file, and [options] are any desired command options (see Options Reference for descriptions of the command line options).
All relative file pathnames in the product definition file are assumed to be relative to the current directory from which the analyzer is run. All reports are created in the current directory, unless the "outprefix" option is used with a pathname. The reports use fixed filenames, as shown in the following table; the "outprefix" option may be used to specify an additional prefix to these names:
Report File name Continuation files ---------------------- -------------- ------------------ Source file list: files Routines by file list: byfile byfxxxxx Defined routines: defined External routines: undefind Cross references: xref xrfxxxxx Annotated source code: srcxxxxxwhere xxxxx is a decimal sequence number. The file type is ".sdml" or ".html" for Unix and VMS, and ".sdm" or ".htm" for MS-DOS. Annotated source code is reported one report file per source file. Continuation files are used only for HTML output. These are used to split large reports into smaller, more manageable HTML files to ensure responsiveness while browsing (see the "htmlbyfile" and "htmlxref" options for changing the size of continuation files).
Progress messages are written to standard output. The "brief" or "silent" options may be used to reduce the volume of messages. The "log" option may be used to redirect messages. For example, the following commands run the analyzer under Unix and VMS. they will create HTML format reports, in the directory /devdoc (Unix) or dev$doc: (VMS); the additional prefix "myxmail_" will be prepended to the fixed file names. The analyzer program file is assumed to be /tools/bin/ranalyzer (Unix) or sys$tools:ranalyzer.exe (VMS).
Unix: % setenv PATH ${PATH}:/tools/bin % ranalyzer myxmail.prd -format=html -outprefix=/devdoc/myxmail_ . . . % ls /devdoc/myxmail_* /devdoc/myxmail_byfile.html /devdoc/myxmail_src00006.html /devdoc/myxmail_defined.html /devdoc/myxmail_src00007.html /devdoc/myxmail_files.html /devdoc/myxmail_src00008.html /devdoc/myxmail_src00001.html /devdoc/myxmail_src00009.html /devdoc/myxmail_src00002.html /devdoc/myxmail_src00010.html /devdoc/myxmail_src00003.html /devdoc/myxmail_undefind.html /devdoc/myxmail_src00004.html /devdoc/myxmail_xref.html /devdoc/myxmail_src00005.html VMS: $ ranalyzer :== $ sys$tools:ranalyzer.exe $ ranalyzer myxmail.prd /format=html /outprefix=dev$doc:myxmail_ . . . $ dir dev$doc:myxmail_* Directory DEV$DSK:[DOC] MYXMAIL_BYFILE.HTML;1 MYXMAIL_DEFINED.HTML;1 MYXMAIL_FILES.HTML;1 MYXMAIL_SRC00001.HTML;1 MYXMAIL_SRC00002.HTML;1 MYXMAIL_SRC00003.HTML;1 MYXMAIL_SRC00004.HTML;1 MYXMAIL_SRC00005.HTML;1 MYXMAIL_SRC00006.HTML;1 MYXMAIL_SRC00007.HTML;1 MYXMAIL_SRC00008.HTML;1 MYXMAIL_SRC00009.HTML;1 MYXMAIL_SRC00010.HTML;1 MYXMAIL_UNDEFIND.HTML;1 MYXMAIL_XREF.HTML;1 Total of 15 files.
The best way to use the reports is to start with either the alphabetical list of source files or the alphabetical list of defined routines. In HTML format, the list of source files acts as the home page for the report set. From these you can go to the source code or the cross reference information, rapidly traversing the code paths of interest.
The reports have the same general appearance regardless of actual output format,
using the formatting directives available. For instance, SDML has formatting
directives that directly support multi-column tables. HTML does not; instead,
tables are formed from pre-formatted text. All the samples below are taken from
HTML-formatted reports, based on analysis of the analyzer's own source code;
hypertext links are not shown, but are described in the text.
Source Files Alphabetical
The alphabetical list of source files report shows high level statistics about
the source code. The following is a sample source file list:
------------------------------------------------------------------------------- +- Link to annotated source code for the file. / / +- Link to routines by Source Files / / file for the file. v v =============================================================================== Com- State- Rou- Avg # File Lines mented ment tines Length Len Calls ------------------------------------------------------------------------------- BLIPARSE.C 739 154 577 5 520 104 102 CMDOPT.C 525 290 222 9 473 53 45 CMDOPT.H 78 47 33 9 9 1 0 CPARSER.C 703 161 526 7 568 81 101 GLOBDB.H 259 96 179 99 99 1 18 LIST.C 316 168 140 5 281 56 45 LIST.H 108 57 39 26 26 1 15 LISTFILE.C 330 114 175 11 286 26 125 OBJALLOC.C 232 102 98 6 195 33 43 OBJALLOC.H 37 26 10 0 0 0 0 OBJECTS.C 767 323 372 21 697 33 156 OBJECTS.H 283 143 138 63 63 1 26 PARSER.H 44 21 20 13 13 1 19 RANALYZER.C 1199 389 720 23 1124 49 237 RANALYZER.H 139 62 69 1 1 1 3 REPORTS.C 1741 563 1175 112 1417 13 391 REPORTS.H 75 33 45 19 19 1 8 RPTHTML.C 1747 539 1024 45 1600 36 507 RPTSDML.C 593 204 306 21 528 25 158 RPTTEXT.C 404 152 190 15 352 23 82 ------------------------------------------------------------------------------- TOTAL: 20 files 10319 3644 6058 510 8271 16 2081 =============================================================================== -------------------------------------------------------------------------------The columns in this table are:
Defined Routines By File
The list of defined routines by files report shows, for each source file,
statistics for the routines in that file. The routines are listed in the order
in which they occur in the file. The following is a sample for the file
containing the analyzer's C language parser:
---------------------------------------------------------------------- +- Link to annotated source for the / routine. / +- Link to cross CPARSER.C Routines / / reference for v v the routine. ======================================================= # Times Routine Line Length Calls Called ------------------------------------------------------- trace_parser 124 17 4 3 trace_parser_int 143 22 4 2 trace_parser_state 167 22 4 1 new_source_line 191 37 5 1 iskeyword 230 24 1 1 get_token 256 260 10 1 c_parser 518 186 13 0 ------------------------------------------------------- TOTAL: 7 ROUTINES 81 AVG ======================================================= ----------------------------------------------------------------------The columns in this table are:
------------------------------------------------------------------------ Link to cross reference -+ Defined Routines Alphabetical for the routine \ v ======================================================================== # Times Routine Line Length Calls Called ------------------------------------------------------------------------ add_caller - OBJECTS.H 231 1 1 2 add_def - GLOBDB.H 244 1 1 1 add_file - GLOBDB.H 243 1 1 1 add_lang - OBJECTS.H 99 1 2 6 add_ref - OBJECTS.H 228 1 1 2 add_srcref - OBJECTS.H 163 1 1 1 analyze_file - RANALYZER.C 101 144 41 1 analyze_product - RANALYZER.C 247 114 57 1 append_list_entry - LIST.H 53 1 2 4 assign_byfilefiles - RPTHTML.C 97 25 6 1 assign_xreffiles - RPTHTML.C 70 25 7 1 bliss_parser - BLIPARSE.C 561 178 38 0 block_level_dec - PARSER.H 41 1 1 4 block_level_inc - PARSER.H 40 1 1 2 block_level_zero - PARSER.H 39 1 1 3 byfile_link_prefix - RPTHTML.C 413 29 3 1 byfile_link_suffix - RPTHTML.C 28 1 1 1 c_parser - CPARSER.C 518 186 43 0 . . . tree_link - RPTHTML.C 152 1 1 0 tree_size - REPORTS.C 364 39 11 0 url_prefix - GLOBDB.H 140 1 0 2 ustrcpy - BLIPARSE.C 282 25 1 5 ustrncmp - CMDOPT.C 36 36 4 8 xref_link - RPTHTML.C 382 29 4 4 xref_link_prefix - RPTHTML.C 340 40 7 8 xref_link_suffix - RPTHTML.C 27 1 1 8 ------------------------------------------------------------------------- TOTAL: 510 routines ========================================================================= -------------------------------------------------------------------------
-------------------------------------------------------------- +- Link to cross Undefined Routines Alphabetical / reference for v the routine. ================================================ Times Routine Called ------------------------------------------------ atoi 3 calloc 1 def_treefile 4 fclose 22 fgetc 9 fgets 9 fopen 17 fprintf 161 fputc 3 fputs 147 free 1 freopen 1 fseek 1 isalnum 6 isalpha 3 isdigit 4 isspace 6 malloc 2 printf 129 puts 16 sprintf 20 strcat 2 strcmp 9 strcpy 20 strlen 29 strncpy 1 toupper 5 ungetc 15 ------------------------------------------------ TOTAL: 28 routines ================================================ --------------------------------------------------------------The columns in this table are:
The following are example cross reference entries for three routines illustrating three different situations. Routine product_name is a simple routine (actually a macro) that does not call any others. Routine puts is a standard C library routine. Routine remove_lang is a routine with a call tree three routines deep.
----------------------------------------------------------------- +- Link to annotated source for the routine. / v product_name - GLOBDB.H Callers Link to cross reference -+ for the routine \ 3 callers v + list_product_begin - LISTFILE.C Goto + rpt_html_section_hdr - RPTHTML.C Goto + rpt_html_section_title - RPTHTML.C Goto Call Tree: No calls ----------------------------------------------------------------- puts - External Callers 7 callers + bliss_parser - BLIPARSE.C Goto + cmdopt_fmt_kwhandler - RANALYZER.C Goto + main - RANALYZER.C Goto + new_def - OBJECTS.C Goto + report_source - REPORTS.C Goto + report_tree - REPORTS.C Goto + show_help - RANALYZER.C Goto ----------------------------------------------------------------- remove_lang - OBJECTS.H Callers 1 caller + get_parser - RANALYZER.C Goto Call Tree remove_lang | dequeue_entry Goto | | remove_list_entry Goto | | | isfirst_entry Goto | | | + entry_blink Goto | | | set_list_first Goto | | | entry_flink Goto | | | set_entry_flink Goto | | | entry_blink (Duplicate) Goto | | | islast_entry Goto | | | + entry_flink (Duplicate) Goto | | | set_list_last Goto | | | set_entry_blink Goto | | + dec_list_entries Goto | + list_first Goto + global_langlist Goto END OF TREE -----------------------------------------------------------------The items in these figures are:
-------------------------------------------------------------------------------- RANALYZER.C Source Code Routines In This File (Alphabetical) Line Name ----- ---- 101 analyze_file 247 analyze_product 1037 cmdopt_author . . . 31 get_parser 1121 main 1052 show_help BEGINNING OF FILE 1: /****************************************************************************/ 2: /* */ 3: /* FACILITY: Routine Analyzer */ 4: /* */ 5: /* MODULE: Main Module */ 6: /* */ 7: /* AUTHOR: Steve Branam, Network Product Support Group, Digital */ 8: /* Equipment Corporation, Littleton, MA, USA. */ 9: /* */ 10: /* DESCRIPTION: This is the main module for Routine Analyzer. It contains */ 11: /* the main routine, command option handlers, and the main application */ 12: /* routines for processing product and source files. */ 13: /* */ 14: /* REVISION HISTORY: */ 15: /* */ 16: /* V0.1-00 24-AUG-1994 Steve Branam */ 17: /* */ 18: /* Original version. */ 19: /* */ 20: /****************************************************************************/ 21: 22: #define MAIN_MODULE /* This is the main module. */ 23: #includeThe columns in this table are:24: #include "ranalyzer.h" 25: 26: 27: extern language_element c_parser(); 28: extern language_element bliss_parser(); 29: 30: /*************************************************************************++*/ ROUTINE get_parser. 31: PARSER get_parser( 32: /* Returns the parser function appropriate for the source language, based */ 33: /* on the file name extension. */ 34: 35: char *aSourceName, 36: /* (READ, BY ADDR): */ 37: /* Source file name string. */ 38: 39: char **aParserName 40: /* (WRITE, BY ADDR): */ 41: /* Parser name string ptr, set to parser name string. */ 42: 43: ) /* Returns ptr to parser function. */ 44: /*****************************************************************--*/ 45: 46: { 47: KEYWORD_DEFINITION /* Current keyword definition. */ 48: *curkwdef; 49: LANGUAGE_TRANSLATION /* Current language trans. */ 50: *curtrans; 51: char *extstr; /* File extension ptr. */ 52: 53: if (global_langtable() == NULL) { 54: set_lang_table(list_entries(global_langlist())); 55: for (curkwdef = global_langtable(); 56: (curtrans = remove_lang()) != NULL; 57: curkwdef++) { 58: set_kwdef_keyword(curkwdef, lang_fext(curtrans)); 59: set_kwdef_minlen(curkwdef, strlen(lang_fext(curtrans))); 60: set_kwdef_code(curkwdef, lang_code(curtrans)); 61: free_lang(curtrans); 62: } 63: } 64: 65: 66: /*+ */ 67: /* Scan back from end of file name string for file extension. If not */ 68: /* found, can't identify parser. Otherwise, locate end of extension */ 69: /* string and compare it to known file extensions. */ 70: /*- */ 71: 72: for (extstr = &aSourceName[strlen(aSourceName)]; 73: extstr >= aSourceName && *extstr != FILE_EXT_SEPARATOR; 74: extstr--); 75: if (extstr < aSourceName) { 76: printf("ERROR: No file extension specified for file %s\n", aSourceName); 77: return NULL; 78: } 79: else { 80: extstr++; 81: switch (translate_keyword(extstr, global_langtable())) { 82: case LANGUAGE_UNKNOWN: /* No matches on file type. */ 83: printf( 84: "ERROR: Unable to identify source language for file %s\n", 85: aSourceName); 86: return NULL; 87: break; 88: case LANGUAGE_C: 89: *aParserName = "C"; 90: return c_parser; 91: break; 92: case LANGUAGE_BLISS: 93: *aParserName = "BLISS"; 94: return bliss_parser; 95: break; 96: } 97: } 98: } END get_parser. . . . ROUTINE main. 1121: main( 1122: /* Program main routine. */ 1123: 1124: int vArgc, 1125: /* (READ, BY VAL): */ 1126: /* Number of program argument strings in aArgv. */ 1127: 1128: char *aArgv[] 1129: /* (READ, BY ADDR): */ 1130: /* List of program argument strings. */ 1131: 1132: ) /* Returns system success code. */ 1133: /*****************************************************************--*/ 1134: 1135: { 1136: /* Main program command line */ 1137: /* argument options dispatch */ 1138: /* table. */ 1139: static KEYWORD_DEFINITION options[] = { 1140: {"options", 3, process_options_file}, 1141: {"trace", 3, cmdopt_trace}, 1142: {"log", 3, cmdopt_log}, 1143: {"list", 3, cmdopt_list}, 1144: {"silent", 3, cmdopt_set, LOG_SILENT_ENABLE}, 1145: {"brief", 3, cmdopt_set, LOG_BRIEF_ENABLE}, 1146: {"outprefix", 3, cmdopt_outprefix}, 1147: {"format", 3, cmdopt_format}, 1148: {"description", 3, cmdopt_description}, 1149: {"definition", 3, cmdopt_set, LOG_DEF_ENABLE}, 1150: {"reference", 3, cmdopt_set, LOG_REF_ENABLE}, 1151: {"separate", 3, cmdopt_separate}, 1152: {"language", 3, cmdopt_language}, 1153: {"noinline", 3, cmdopt_set, TREE_INLINE_DISABLE}, 1154: {"urlprefix", 3, cmdopt_urlprefix}, 1155: {"callers", 3, cmdopt_callers}, 1156: {"report", 3, cmdopt_report}, 1157: {"noreport", 3, cmdopt_noreport}, 1158: {"htmlbyfile", 5, cmdopt_htmlbyfile}, 1159: {"htmlxref", 5, cmdopt_htmlxref}, 1160: {"author", 3, cmdopt_author}, 1161: {NULL, 0, NULL} /* End of table. */ 1162: }; 1163: 1164: /*+ */ 1165: /* Make sure enough reqired arguments were specified, then process the */ 1166: /* optional arguments and analyze the product files. */ 1167: /*- */ 1168: 1169: if (vArgc < 3) { 1170: if (vArgc > 1 && *aArgv[1] == CMDLINE_HELP_SWITCH) { 1171: show_help(); 1172: } 1173: else { 1174: puts(PROGRAM_PARAMS); 1175: puts(PROGRAM_HELP); 1176: } 1177: } 1178: else { 1179: /* Disable these reports by */ 1180: /* default. */ 1181: set_option(RPT_CALLS_DISABLE | RPT_TREES_DISABLE); 1182: 1183: set_max_callers(DEF_MAX_CALLERS); 1184: set_max_html_byfile(DEF_MAX_HTML_BYFILE); 1185: set_max_html_xref(DEF_MAX_HTML_XREF); 1186: if (process_options(vArgc, aArgv, 2, options)) { 1187: add_lang(new_lang("C", LANGUAGE_C)); 1188: add_lang(new_lang("H", LANGUAGE_C)); 1189: add_lang(new_lang("BLI", LANGUAGE_BLISS)); 1190: add_lang(new_lang("REQ", LANGUAGE_BLISS)); 1191: add_lang(new_lang("R32", LANGUAGE_BLISS)); 1192: analyze_product(aArgv[1]); 1193: if (list_enabled()) { 1194: fclose(list_file()); 1195: } 1196: } 1197: } 1198: } END main. 1199: END OF FILE TOTAL: 23 routines, 49 Avg Length --------------------------------------------------------------------------------