Prolog Data Structures
In Prolog, all data is represented by Prolog terms.
Terms are defined inductively:
All data can be represented in this way. Terms come into existence
by simply writing them down.
- an atom is a term. Examples of atoms
are: x, test and 'quotes and space'.
- a variable is a term. Variables start with an
uppercase letter or underscore _. A single
underscore denotes an anonymous variables and can
be read as "any term".
- integers and floating point numbers are terms.
Examples: 42 and 42.42.
- a compound term is a term, defined inductively as
follows: If T1, T2,
..., Tn are
terms, then F(T1, T2, ..., Tn)
is also a term, where F is called the functor of the
compound term, and n is called the arity. Examples: f(a), g(f(X)) and +(a, f(X)).
Prolog is dynamically typed and allows us great freedom for
representing data. For example, we could represent natural
numbers as follows:
In this representation, the term s(s(s(zero))) represents
the number 3. Such a representation is of course highly
impractical due to its enormous size for large numbers. For
this reason, Prolog provides integer
arithmetic with a much more efficient number representation.
- we could use the atom zero to represent 0
- we could use the compound term s(X) to
represent the successor of X.
Prolog terms naturally correspond
to trees. There is a
standard order on terms.
Prolog lists are a special case of terms.
Lists are defined inductively:
There is a special syntax for denoting lists conveniently in Prolog:
- the atom  is a list, denoting the empty list
- if Ls is a list, then the term '.'(L,
Ls) is also a list.
These notations can be combined in any way. For example, the term
[a,b|Ls] is a list iff Ls is a list.
- The list '.'(a, '.'(b, '.'(c, ))) can also be
written as [a,b,c].
- The term '.'(L, Ls) can also be written as [L|Ls].
Lists naturally represent collections of elements and arise
in almost all Prolog programs. When choosing between lists
and any other way to represent collections of elements, ask
yourself the following questions:
If your answer is "yes" to most of these questions, lists are
often a good fit.
- Can there be arbitrarily many elements?
- Can there be zero elements?
- Is the order of elements significant?
- Are the elements of the same kind?
In contrast, consider for example the case where each collection
comprises exactly three elements. You can use lists of the
form [A,B,C] to represent such collections, and thus
benefit from meta-predicates that
reason over lists. On the other hand, using compound terms of
the form f(A,B,C) is more memory efficient: This is
because the list [A,B,C] is the compound
term '.'(A,'.'(B,'.'(C,))) and thus takes
roughly twice the space of the term f(A,B,C)
DCG notation is frequently used for
By convention, the functor (-)/2 is often used to denote
pairs of elements in Prolog. For example, the term -(A,
B) denotes the pair of elements A and B. In
Prolog, (-)/2 is defined as an infix operator.
Therefore, the term can be written equivalently
Many commonly available predicates also use this syntax to denote
pairs. Examples of this
In many Prolog systems, association lists are available to
allow faster than linear access to a collection of elements. These
association lists are typically based on balanced trees
like AVL trees.
There is a public domain library called library(assoc)
that ships with many Prolog systems and provides operations for
inserting, fetching and changing elements of a collection.
Here is one way to represent an AVL tree in Prolog:
Every operation on AVL trees, such as adding or changing an
association, can be described as a relation between
different trees, one tree before the operation and
one after it. See Thinking
in States for more information.
- the empty tree is represented by the
- an inner node is represented as
a compound term of the form t(Key, Value,
Balance, Left, Right) where:
- Key and Value represent
the association of Key
- Balance is an atom that denotes a
balance criterion, such as a relation between the number of
children of the left and right subtrees. For example, we can
use the atoms <, = and > to
denote different states between the subtrees.
- Left and Right are AVL trees.
AVL trees let us perform many important operations
in O(log(N)) time, where N is the number
of associations. This is acceptably efficient in many cases.
Almost everything that needs to be said about strings has
been brilliantly said by Richard O'Keefe in
Library Proposal. In particular, strings
are wrong, and:
For almost any use case that involves some kind of processing,
the only sensible thing to do with string data is to turn it
into some kind of tree early. Prolog is brilliant with
In Prolog, a convenient representation of strings is available as
lists of characters, which are atoms. This
representation is available if you add the following directive to
:- set_prolog_flag(double_quotes, chars).
With this setting, we obtain for example:
?- Cs = "test".
Cs = [t, e, s, t].
Thus, working with strings is reduced to working
with lists, which can be easily handled
What about arrays? Is there any data structure in Prolog to
represent a collection of terms, allowing
for O(1) access to individual elements?
First of all, the concept of destructive modifications is
alien to logic programming. In Prolog, we
describe relations between entities, not destructive
effects. To express a change in a Prolog data structure, we define
a predicate that relates the state of the structure before
the change to a different structure after the change. For
this reason, pure modifications
often entail some copying of data and typically lead to at
least logarithmic overhead (for example, to copy a subtree of a
That being said, we can access the arguments of a term
in O(1) with the built-in predicate arg/3. There
even is an impure predicate called setarg/3 which allows
destructive modifications to a term. Use it at your own peril: If
you use setarg/3, you can no longer reason about your
code in the way you expect from pure relations. If you really need
efficient destructive modifications of terms, it is better to
use attributed variables.
Clean vs. defaulty representations
When representing data with Prolog terms, ask yourself the following question:
Can I distinguish the kind of each component from
its outermost functor and arity?
If this holds, your representation is called clean. If
you cannot distinguish the elements by their outermost
functor and arity, your representation is
called defaulty, a wordplay combining "default"
and "faulty". This is because reasoning about your
data will need a "default case", which is applied if
everything else fails. In addition, such a representation prevents
argument indexing, and is considered
faulty due to this shortcoming. Always aim to avoid
defaulty representations! Aim for cleaner representations instead.
For example, suppose you represent a full binary tree in
Prolog. There are two kinds of elements in a full binary
We can represent such trees with Prolog terms as follows:
- leaves, which are concrete elements of the tree
- nodes, which have two children that are again full
The above is a clean representation: It lets us distinguish
the kinds of elements by their outermost functor. We can reason
about such trees in a way that keeps the code very general and
usable in all directions. Such a representation is also amenable
to argument indexing.
- leaf(L) represents the leaf L
- node(Left, Right) represents a node and its
You can also recognize a clean representation if it lets you
describe the general structure of your data while retaining
enough flexibility to keep the concrete elements unspecified. For
example, in the case of full binary trees, we can represent the
general outline of all trees with exactly two leaves
as node(leaf(_),leaf(_)). If you had chosen
to omit the leaf/1 wrapper for representing
leaves, this would become node(_,_): a defaulty
representation that no longer represents precisely such
Sometimes, it will appear to you that there is no clean way out.
For instance, you may be faced with a representation
that requires you to distinguish different cases in an
impure and non-monotonic way, such as by testing
the instantiation of certain arguments
via var/1. In such cases, it is good practice to
restrict the impure parts of your program to small fragments, and
to convert any defaulty representation to a clean one
by introducing suitable wrappers that let you distinguish the
cases by pattern matching. This helps to ensure that you can
use your core predicates in multiple directions.
In Prolog, the most natural way to reason about terms is to
rely on unification. In addition, there are several
predicates that let you decompose and analyze terms. The most
important of these term inspection predicates
are functor/3, arg/3 and (=..)/2.
Here are examples that illustrate their usage:
?- functor(f(a,g(X)), Functor, Arity).
Functor = f,
Arity = 2.
?- functor(Term, f, 2).
Term = f(_25002, _25004).
?- arg(2, f(a,g(X)), Arg).
Arg = g(X).
?- f(a,g(X)) =.. [Functor|Args].
Functor = f,
Args = [a, g(X)].
?- Term =.. [f,a,g(X)].
Term = f(a, g(X)).
These predicates cannot be defined by a finite set of clauses, and
can therefore be considered higher-order
predicates. Do not get carried away with these predicates!
Everything that can be expressed by
pattern matching should be expressed by
pattern matching. For example, instead of functor(Term,
f, 2), you can simply write Term = f(_,_).
Every Prolog term has a canonical representation. If you
are ever unsure about the structure of a term,
use write_canonical/1 to obtain the canonical
representation. For example:
The canonical representation shows the structure of the term
in such a way that it can also be easily parsed by
More about Prolog