A Couple of Meta-interpreters in Prolog
    
    
    
      Motivation
    
    Informally, an interpreter is a program that evaluates programs.
    Interpretation is pervasive in computer science both from a
    theoretical and practical perspective, and many programs are
    interpreters for domain-specific languages.  For example, a
    program reading settings from a configuration file and adjusting
    itself accordingly is interpreting this "configuration language".
    
    An interpreter for a language similar or identical to its own
    implementation language is
    called meta-interpreter (MI). An interpreter that can
    interpret itself is called meta-circular. Prolog is
    exceptionally well-suited for writing MIs: First and most
    importantly, Prolog programs can be naturally represented as
    Prolog terms
    and are easily inspected and manipulated
    using built-in
    mechanisms. Second, Prolog's implicit computation strategy and
    all-solutions predicates can be used in interpreters, allowing for
    concise specifications. Third, variables from the object-level
    (the program to be interpreted) can be treated as variables on the
    meta-level (the interpreter); therefore, an interpreter can
    delegate handling of the interpreted program's binding environment
    to the underlying Prolog engine.
    
    
      
        | Video: |   | 
    
    
    The following simple Prolog program will serve as a running
    example throughout this text. I state it here using default Prolog
    syntax, and we will subsequently consider various different
    representations for it:
    
natnum(0).
natnum(s(X)) :-
        natnum(X).
    
    Prolog can evaluate Prolog code dynamically:
    
?- Goal = natnum(X), Goal.
   Goal = natnum(0), X = 0
;  Goal = natnum(s(0)), X = s(0)
;  ... .
    
    You can make the call explicit using the built-in call/1
    predicate, yielding the equivalent query "?- Goal = natnum(X),
    call(Goal).". The call/n (n > 1) family of
    predicates lets you append additional arguments to Goal
    before it is called.
    
    This mechanism ("meta-call") is used in predicates
    like maplist/3, include/3, if/3 etc., and can
    be seen as interpretation in its coarsest sense. Implicitly using
    features of the underlying engine is called absorption.
    Making features explicit is called reification. By
    using call/1, we absorb backtracking, unification,
    handling of conjunctions, the call stack etc. We can make all
    these features explicit and subsequently adjust and extend them at
    will.
    
    Vanilla MIs
 
    Accessing Prolog code
    The definition of natnum/1 consists of two clauses, the
    first of which degenerated into a fact that is implicitly
    treated as if it were stated as
    
natnum(0) :- true.
    
    The Prolog built-in clause/2 allows inspection of the
    predicate's definition:
    
?- clause(natnum(Z), Body).
   Z = 0, Body = true
;  Z = s(_A), Body = natnum(_A).
    
    Note: Clause inspection requires the predicate to
    be public. A portable way to declare predicates public is
    to use the dynamic/1 directive, because dynamic
    predicates are automatically public.
    
    The two solutions found on backtracking correspond to the two
    clauses of the predicate's definition.
    
    Another example:
    
complicated_clause(A) :-
        goal1(A),
        goal2(A),
        goal3(A).
?- clause(complicated_clause(Z), Body).
Body = (goal1(Z), goal2(Z), goal3(Z)).
    
    The body of complicated_clause/1 is represented by a
    compound term with functor ',' and arity 2, whose
    arguments are either goals or compound terms of the same
    structure. Here is a Prolog "type" definition of a clause body in
    the default representation:
    
body(true).
body((A,B)) :-
        body(A),
        body(B).
body(G) :-          % ambiguous, also matches "true" and "(_,_)"
        goal(G).
goal(_ = _).
goal(call(_)).
% ... etc.
    
    
    We can thus define an interpreter for pure Prolog programs:
    
mi1(true).
mi1((A,B)) :-
        mi1(A),
        mi1(B).
mi1(Goal) :-
        Goal \= true,
        Goal \= (_,_),
        clause(Goal, Body),
        mi1(Body).
    
    This is often called vanilla MI because there's nothing
    special to it. All MIs that add no ability to pure Prolog are
    sometimes called vanilla MIs. They serve as skeletons for
    extensions.
    
    We can use this MI to run our example program:
    
?- mi1(natnum(X)).
   X = 0
;  X = s(0)
;  X = s(s(0))
;  ... .
    
    Using a clean representation
    In the preceding MI, we had to guard the third clause from matching
    true/0 or conjunctions to prevent run-time errors resulting
    from calling clause/2 with these arguments. This is ugly,
    and we can make it uglier still by using cuts (!/0) in
    the first two clauses. That would at least remove unnecessary
    choice-points, though typically not prevent their creation.
    Instead of recognising goals by indication of what they are, we
    had to introduce a "default" case and specify what they aren't.
    Such representations are
    called "defaulty". To
    fix this, we equip goals with the (arbitrary)
    functor g:
    
    
body(true).
body((A,B)) :-
        body(A),
        body(B).
body(g(G)) :-
        goal(G).
    
    Such a representation is called clean.
    
    We rewrite our example according to this specification:
    
natnum_clean(0).
natnum_clean(s(X)) :-
        g(natnum_clean(X)).
    
    While a query like "?- natnum_clean(X)." would yield a runtime
    error (there is no predicate g/1), we can use an MI to
    interpret the program:
    
mi2(true).
mi2((A,B)) :-
        mi2(A),
        mi2(B).
mi2(g(G)) :-
        clause(G, Body),
        mi2(Body).
    
    In addition to being shorter and cleaner, this version is
    typically more efficient (no choice-points due to first-argument
    indexing, and fewer inferences due to fewer tests):
    
?- integer_natnum(10^6, T), time(mi1(natnum(T))).
   % CPU time: 1.950s, 16_000_015 inferences
   T = s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(...)))))))))))))))))))))
;  ... .
?- integer_natnum(10^6, T), time(mi2(g(natnum_clean(T)))).
   % CPU time: 1.442s, 11_000_010 inferences
   T = s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(...)))))))))))))))))))))
;  ... .
    
    
    In the following, mi_clause/2 denotes a predicate that is
    similar to clause/2, except that it supplies us with an
    "appropriate" (MI-dependent) representation of clause bodies.  For
    the MI at hand:
    
mi_clause(G, Body) :-
        clause(G, B),
        defaulty_better(B, Body).
defaulty_better(true, true).
defaulty_better((A,B), (BA,BB)) :-
        defaulty_better(A, BA),
        defaulty_better(B, BB).
defaulty_better(G, g(G)) :-
        G \= true,
        G \= (_,_).
    
    This predicate lets us interpret a subset of normal Prolog code
    with the second MI, and it can also be used for static
    conversion. Alternatively, mi_clause/2 can be defined by
    facts. For this MI:
    
mi_clause(natnum(0), true).
mi_clause(natnum(s(X)), g(natnum(X)).
    
    Variants and modifications
    The 2 MIs presented so far reify
    conjunction. Through clause/2, they absorb
    unification and backtracking.  Having made handling of
    conjunctions explicit, we can reverse the default execution order
    of goals:
    
mi3(true).
mi3((A,B)) :-
        mi3(B),        % first B, then A
        mi3(A).
mi3(g(G)) :-
        mi_clause(G, Body),
        mi3(Body).
    
    From a logical point of view, this changes nothing (conjunction is
    commutative). Procedurally, there's a difference: With minor
    adjustments to mi_clause/2 (adding the fact
    "defaulty_better(false, false)." and guarding G against the
    atom false/0), this MI can be used to prove that the query
    "?- declarative_false."  cannot succeed given the predicate's
    definition:
    
declarative_false :-
        declarative_false,
        false.
    
    This can't be derived with the standard execution order.
    
    We can ask the user before we execute goals (tracing),
    print out the execution history or restrict access to safe goals
    by adjusting the third clause:
    
mi2_safe(g(G)) :-
        (   safe_goal(G) ->
            mi_clause(G, Body),
            mi2_safe(Body)
        ;   throw(cannot_execute(G))
        ).
    
    
    We can obtain a simpler MI by using lists of goals to represent
    conjunctions. In this representation, true/0 becomes the
    empty list:
    
natnum_list(0, []).
natnum_list(s(X), [natnum_list(X)]).
    
    Again, the conversion between runnable Prolog code and this
    representation can be performed automatically:
    
mi_clause(G, Ls) :-
        clause(G, Body),
        phrase(body_list(Body), Ls).
body_list(true)  --> [].
body_list((A,B)) -->
        body_list(A),
        body_list(B).
body_list(G) -->
        { G \= true },
        { G \= (_,_) },
        [G].
    
    An obvious MI for this representation consists of only 2 clauses:
    
mi_list1([]).
mi_list1([G|Gs]) :-
        mi_clause(G, Body),
        mi_list1(Body),
        mi_list1(Gs).
    
    We can make it tail-recursive:
    
mi_list2([]).
mi_list2([G0|Gs0]) :-
        mi_clause(G0, Body),
        append(Body, Gs0, Gs),
        mi_list2(Gs).
    
    The difference:
    
always_infinite :-
        always_infinite.
?- mi_list1([always_infinite]).
ERROR: Out of local stack
?- mi_list2([always_infinite]).  % loops, constant stack space
    
    Using list differences in our clause representation, appending the
    yet-to-be-proved goals can be fused with expanding a goal:
    
mi_ldclause(natnum(0), Rest, Rest).
mi_ldclause(natnum(s(X)), [natnum(X)|Rest], Rest).
mi_list3([]).
mi_list3([G0|Gs0]) :-
        mi_ldclause(G0, Remaining, Gs0),
        mi_list3(Remaining).
    
    Example query:
    
?- mi_list3([natnum(X)]).
   X = 0
;  X = s(0)
;  X = s(s(0))
;  ... .
    
    A meta-circular MI
    Here is an MI that can handle the built-in
    predicates clause/2 and (\=)/2:
    
mi_circ(true).
mi_circ((A,B)) :-
        mi_circ(A),
        mi_circ(B).
mi_circ(clause(A,B)) :-
        clause(A,B).
mi_circ(A \= B) :-
        A \= B.
mi_circ(G) :-
        G \= true,
        G \= (_,_),
        G \= (_\=_),
        G \= clause(_,_),
        clause(G, Body),
        mi_circ(Body).
    
    It can interpret its own source code and is thus
    a meta-circular interpreter:
    
?- mi_circ(mi_circ(natnum(X))).
   X = 0
;  X = s(0)
;  X = s(s(0))
;  ... .
    
    Reasoning about programs
    Let us define a predicate provable/2 as
    follows: provable(Goal, Defs) is true if Goal is
    provable with respect to the definitions Defs, which
    is a list of clauses represented as terms of the
    form Head-Body. In comparison to the other MIs shown
    so far, the main difference is thus that the clauses are made
    available explicitly as an argument of the predicate.
    
    In addition, we generalize the handling of built-in predicates by
    using predicate_property/2 to identify them as such for
    calling them directly:
    
provable(true, _).
provable((A,B), Defs) :-
        provable(A, Defs),
        provable(B, Defs).
provable(g(Goal), Defs) :-
        (   predicate_property(Goal, built_in) ->
            call(Goal)
        ;   member(Def, Defs),
            copy_term(Def, Goal-Body),
            provable(Body, Defs)
        ).
    
    Note that we need to use copy_term/2 to replace variables
    by fresh copies. Also, how the built-in
    predicate !/0 is interpreted by this MI does not match
    its intended meaning, and building an MI that handles cuts
    correctly requires more work.
    
    With the following additional definitions, we can use this MI to
    identify redundant facts in some predicate
    definitions:
    
redundant(Functor/Arity, Reds) :-
        functor(Term, Functor, Arity),
        findall(Term-Body, mi_clause(Term, Body), Defs),
        setof(Red, Defs^redundant_(Defs, Red), Reds).
redundant_(Defs, Fact) :-
        select(Fact-true, Defs, Rest),
        once(provable(g(Fact), Rest)).
    
    Given the definitions
    
as([]).
as([a]).   % redundant
as([a,a]). % redundant
as([A|As]) :-
        A = a,           % test built-in (=)/2
        true,            % test built-in true/0
        as(As).
    
    we can ask for facts which are deducible from all (respective)
    remaining clauses and hence redundant:
    
?- redundant(as/1, Rs).
Rs = [as([a]), as([a, a])].
    
    Thus, MIs allow us to determine non-trivial properties of programs
    in some cases.
    
    Importantly, MIs give us the freedom to interpret each language
    construct differently than regular Prolog would do it.
    For example, using abstract interpretation, we can derive type
    and mode information of predicates.
    
    An example taken from Codish and
    Søndergaard 2002, Meta-Circular Abstract
    Interpretation in Prolog is in the source
    file. By fixpoint computation, we derive non-trivial facts
    about the Ackermann function over an abstract parity domain:
    
?- ack_fixpoint(As).
As = [ack(odd, odd, odd), ack(odd, even, odd), ack(odd, one, odd), ack(even, odd, odd),
      ack(odd, zero, odd), ack(even, even, odd), ack(even, one, odd), ack(one, odd, odd),
      ack(even, zero, odd), ack(one, even, even), ack(one, one, odd), ack(one, zero, even),
      ack(zero, even, odd), ack(zero, odd, even), ack(zero, zero, one), ack(zero, one, even)].
    
    Now consider the following query:
    
?- dif(X, one), dif(X, zero), dif(Z, odd), ack_fixpoint(As), member(ack(X,Y,Z), As).
false.
    
    This shows that Ackermann(i, j) is odd and greater
    than 1 for all i > 1.
    
    Reifying backtracking
    Let us reify backtracking now. We need to make explicit all
    alternative clause bodies that are normally found on backtracking,
    collecting them deterministically using findall/3. A single
    alternative is represented as a list of goals, and the branches of
    computation that have yet to be explored as a list of
    alternatives. The interface predicate, mi_backtrack/1,
    takes a goal as its argument and creates the representation of the
    initial state of computation: A list, consisting of a single
    alternative, [G0]. Actually, the representation is
    [G0]-G0, and G0 is also passed as the second
    argument to the worker predicate for reasons that will become
    clear shortly.
    
mi_backtrack(G0) :- mi_backtrack_([[G0]-G0], G0).
    
    To perform a single step of the computation, we first collect all
    clause bodies whose heads unify with the first goal of the first
    alternative. To all found clause bodies, the remaining goals of
    that first alternative are appended, thus obtaining new
    alternatives that we prepend to the other alternatives to give the
    new state of computation. Using the clause representation that
    makes use of list differences, and findall/4 to append
    existing alternatives, this becomes:
    
resstep_([A|As0], As) :-
        findall(Gs-G, (A = [G0|Rest]-G,mi_ldclause(G0,Gs,Rest)), As, As0).
    
    
    
    Leaves of the computation, i.e., alternatives that we are done
    with, automatically vanish from the list of alternatives as there
    is no goal to be proved for them any more. The unification A =
    [G0|Rest]-G thus fails and only the other alternatives
    remain.
    
    The worker predicate:
    
mi_backtrack_([[]-G|_], G).
mi_backtrack_(Alts0, G) :-
        resstep_(Alts0, Alts1),
        mi_backtrack_(Alts1, G).
    
    If no goals remain to be proved for an alternative, a solution for
    the initial query is found and we report the bindings to the user.
    This is why we needed to pass around the user's query. The second
    clause describes how the computation is carried on: The list of
    alternatives is transformed as described above, and the process
    continues.
    
    Representing all alternatives explicitly allows us to guide the
    inference process by reordering alternatives, implement fair
    execution strategies (by appending new alternatives) and so
    on. Also, the MI shows what is needed to implement Prolog
    in languages that lack built-in backtracking.
    
    
      Extending Prolog
    
    Showing proofs
    If you want a feature that plain Prolog does not provide, you can
    add it to a vanilla MI. Here is an MI that behaves like standard
    pure Prolog and builds a proof-tree in parallel that makes
    explicit the inferences that lead to the proof:
    
:- op(750, xfy, =>).
mi_tree(true, true).
mi_tree((A,B), (TA,TB)) :-
        mi_tree(A, TA),
        mi_tree(B, TB).
mi_tree(g(G), TBody => G) :-
        mi_clause(G, Body),
        mi_tree(Body, TBody).
    
    Example query:
    
?- mi_tree(g(natnum(X)), T).
   T = true=>natnum(0), X = 0
;  T = (true=>natnum(0))=>natnum(s(0)), X = s(0)
;  T = ((true=>natnum(0))=>natnum(s(0)))=>natnum(s(s(0))), X = s(s(0))
;  ... .
    
    
    Changing the search strategy
    Another group of extensions aims to improve the incomplete default
    computation strategy. We start with an MI that limits the depth of
    the search tree,
    using integer arithmetic:
    
mi_limit(Goal, Max) :-
        mi_limit(Goal, Max, _).
mi_limit(true, N, N).
mi_limit((A,B), N0, N) :-
        mi_limit(A, N0, N1),
        mi_limit(B, N1, N).
mi_limit(g(G), N0, N) :-
        N0 #> 0,
        N1 #= N0 - 1,
        mi_clause(G, Body),
        mi_limit(Body, N1, N).
    
    Example query:
    
?- mi_limit(g(natnum(X)), 3).
   X = 0
;  X = s(0)
;  X = s(s(0))
;  false.
    
    As expected, the number of solutions coincides with the maximal
    depth of the search tree in the case of natnum/1.  Based
    on this MI, we can implement a complete search
    strategy, iterative deepening:
    
mi_id(Goal) :-
        length(_, N),
        mi_limit(Goal, N).
    
    Consider the program:
    
edge(a, b).
edge(b, a).
edge(b, c).
path(A, A, []).
path(A, C, [e(A,B)|Es]) :-
        edge(A, B),
        path(B, C, Es).
    
    And the queries:
    
?- path(a, c, Es).
ERROR: Out of local stack
?- mi_id(g(path(a, c, Es))).
   Es = [e(a,b),e(b,c)]
;  ... .
    
    In contrast to depth-first search, iterative deepening finds a
    solution. At first glance, iterative deepening may seem a wasteful
    search technique because many nodes of the search tree are
    explored repeatedly. However, one can show that iterative
    deepening is in fact an optimal search strategy under very
    general assumptions. In a general search tree, the number of nodes
    at each level grows exponentially with the depth of the tree, and
    the time it takes to explore all nodes at the final depth is
    enough to cover the previously expended effort with
    a constant factor.
    
    Sound unification
    Omission of the occurs check in the default unification
    algorithms of common Prolog implementations can lead to unsound
    inference:
    
occ(X, f(X)).
    
    Without occurs check, the query
    
?- occ(A, A).
    
    succeeds. We can use occurs check for unification of clause heads
    in an MI:
    
mi_occ(true).
mi_occ((A,B)) :-
        mi_occ(A),
        mi_occ(B).
mi_occ(g(G)) :-
        functor(G, F, Arity),
        functor(H, F, Arity),
        mi_clause(H, Body),
        unify_with_occurs_check(G, H),
        mi_occ(Body).
    
    We get:
    
?- mi_occ(g(occ(A,A))).
   false.
    
    You can use an MI similar to this one to reify the binding
    environment of variables: Thread the bindings through and add
    a term of the form unify(G,H) to the set of bindings in the
    third clause. Use numbervars/3 to get rid of variables if
    you want to reify unification.
    
    Parsing with left-recursive grammars
    As a consequence of Prolog's computation strategy, parsing with
    left-recursive grammars is problematic. Let us now define an MI
    that interprets Definite Clause
    Grammars in such a way that they can handle
    left-recursion. Consider a simple grammar:
    
dcgnumber(0).
dcgnumber(1).
expr(N)   --> [N], { dcgnumber(N) }.
expr(A+B) --> expr(A), [(+)], expr(B).
    
    This grammar can be used to (unfairly) enumerate an arbitrary
    number of strings it describes:
    
?- phrase(expr(E), Ss).
   E = 0, Ss = [0]
;  E = 1, Ss = [1]
;  E = 0+0, Ss = [0,+,0]
;  ... .
    
    However, parsing ground strings leads to problems:
    
?- phrase(expr(E), [1,+,1]).
   E = 1+1
;  ERROR: Out of local stack
    
    We first convert the grammar into a more suitable representation:
    
dcg_clause(expr(N),   [t(N),{dcgnumber(N)}]).
dcg_clause(expr(A+B), [l,nt(expr(A)),t(+),nt(expr(B))]).
    
    The atom l in the body of the second clause is used to
    capture that for this clause to apply, there must be at least one
    (therefore, one l) token left in the string
    to be parsed. This is used in the MI to prune the search tree if
    we run out of tokens. The other terms are: t/1 for
    terminals, nt/1 for non-terminals and {}/1 for
    goals.
    
    The interface to the MI:
    
mi_dcg(NT, String) :-
        length(String, L),
        length(Rest0, L),
        mi_dcg_([nt(NT)], Rest0, _, String, []).
    
    The worker predicates:
    
mi_dcg(t(T), Rest, Rest, [T|Ts], Ts).
mi_dcg({Goal}, Rest, Rest, Ts, Ts) :-
        call(Goal).
mi_dcg(nt(NT), Rest0, Rest, Ts0, Ts) :-
        dcg_clause(NT, Body),
        mi_dcg_(Body, Rest0, Rest, Ts0, Ts).
mi_dcg(l, [_|Rest], Rest, Ts, Ts).
mi_dcg_([], Rest, Rest, Ts, Ts).
mi_dcg_([G|Gs], Rest0, Rest, Ts0, Ts) :-
        mi_dcg(G, Rest0, Rest1, Ts0, Ts1),
        mi_dcg_(Gs, Rest1, Rest, Ts1, Ts).
    
    We can now use the left-recursive grammar also for parsing:
    
?- mi_dcg(expr(E), [1,+,1,+,1]).
   E = 1+(1+1)
;  E = 1+1+1
;  false.
    
    
    Further extensions
    Other possible extensions are module systems, delayed goals,
    checking for various kinds of infinite loops, profiling,
    debugging, type systems, constraint solving etc. The overhead
    incurred by implementing these things using MIs can be compiled
    away using partial evaluation techniques. For instance, we can
    (mechanically) derive the following partially evaluated version of
    the DCG example:
    
pe_expr(Expr, String) :-
        length(String, L),
        length(Rest0, L),
        pe_expr(Expr, Rest0, _, String, []).
pe_expr(N, Rest, Rest, Ts0, Ts) :-
        Ts0 = [N|Ts],
        dcgnumber(N).
pe_expr(A+B, [_|Rest0], Rest, Ts0, Ts) :-
        pe_expr(A, Rest0, Rest1, Ts0, Ts1),
        Ts1 = [+|Ts2],
        pe_expr(B, Rest1, Rest, Ts2, Ts).
    
    Speed comparison:
    
?- sum_of_ones(10^4, Ss), time(mi_dcg(expr(Sum), Ss)).
   % CPU time: 7.010s, 50_245_019 inferences
   ... .
?- sum_of_ones(10^4, Ss), time(pe_expr(Sum, Ss)).
   % CPU time: 0.016s, 70_012 inferences
   ... .
    
    
    Source file containing all examples: acomip.pl
    
    Written Sept. 14th 2005
    
    More about Prolog
    
    Main page