SynDiff - Generalized Syntactic Differ Tool

Wouldn't it be nice to have a generalized syntactic differ?

Definition

Well, let's go in reverse order: differ, syntactic, generalized:

  • 'diff' is what you expect it to be. It is a tool that can highlight differences between at least two documents.
  • 'syntactic' means that the input is parseable according to an LL(1) grammar.
  • 'generalized' means that for any language which is defined by an LL(1) grammar we can diff two sentences that are in the language defined by the grammar.

Back Story

The idea for this tool arose from a problem I encountered during an internship in the summer of 2007. I was working on checked-out code that looked like the following:

... various code ...
public funcOne(arguments) {
  ... lots of complicated code ...
}
public funcTwo(arguments) {
  ... lots of complicated code ...
}
public funcThree(arguments) {
  ... lots of complicated code ...
}
... various code ...

I was working hard to understand the code and so made the following trivial change:

... various code ...
public funcTwo(arguments) {
  ... lots of complicated code ...
}
public funcThree(arguments) {
  ... lots of complicated code ...
}
public funcOne(arguments) {
  ... lots of complicated code ...
}
... various code ...

Later, I made three simple changes and tried to check-in my code. The check-in failed! Why? Because I had made too many changes according to the company's diff-ing tool. I objected because I knew I hadn't changed anything substantial and determined the problem to be the rearranging of functions! I decided that a diff-ing tool which was aware of the syntax and knew which elements did not depend on order would be a better tool.

Simple Example

Consider the following grammars in EBNF form:

Grammar 1:
S -> Line S | Line
Line -> [a string of zero or more characters followed by a newline character]
Grammar 2:
S -> ( Line )*
Line -> [a string of zero or more characters followed by a newline character]

Now consider the following fictional text files:

file1.txt:
one
two
three
four
file2.txt:
three
two
one
four

Then the SynDiff tool would report the following differences in the following two files:

According to grammar 1:
Difference in token 'Line' at line 1, column 1:
file1.txt: one
file2.txt: three
Difference in token 'Line' at line 3, column 1:
file1.txt: three
file2.txt: one
According to grammar 2:
No differences to report.

Notice that when the Kleene star operator is used, ordering is assumed to be unimportant. Since ordering of the lines is unimportant in grammar 2, there are no differences between the files according to SynDiff.

Uses

  • Version Systems - A version system may be be able to keep a better record of meaningful changes.
  • Regression Tests - A syntactic differ may yield a better understanding of what has actually changed.

User Interface

Command line.

Usage: syndiff -[options] file1 file2

Options:

  • g - guess the syntax of the files based on their extensions
  • j - use the java syntax parser
  • t - use the text syntax parser
  • p - use the python syntax parser

Drawbacks

  • More memory.
  • More processing power.

Why do the drawbacks not matter? Because we are constantly building faster and faster computers with more and more memory. Check out Moore's Law for more information.

Feasibility Research

Is it possible? YES! For the following reasons:

Tools Used

For development:

For code creation:

For code versioning:

projects/syndiff/details.txt · Last modified: 2008/03/31 15:43 by grant