Big reorganization of the whole manual to give it a simpler structure.
This commit is contained in:
parent
b1cb24ff0a
commit
3229f68b5a
48 changed files with 2837 additions and 13116 deletions
379
doc/ref/libguile-concepts.texi
Normal file
379
doc/ref/libguile-concepts.texi
Normal file
|
|
@ -0,0 +1,379 @@
|
|||
@c -*-texinfo-*-
|
||||
@c This is part of the GNU Guile Reference Manual.
|
||||
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004
|
||||
@c Free Software Foundation, Inc.
|
||||
@c See the file guile.texi for copying conditions.
|
||||
|
||||
@page
|
||||
@node General Libguile Concepts
|
||||
@section General concepts for using libguile
|
||||
|
||||
When you want to embed the Guile Scheme interpreter into your program,
|
||||
you need to link it against the @file{libguile} library (@pxref{Linking
|
||||
Programs With Guile}). Once you have done this, your C code has access
|
||||
to a number of data types and functions that can be used to invoke the
|
||||
interpreter, or make new functions that you have written in C available
|
||||
to be called from Scheme code, among other things.
|
||||
|
||||
Scheme is different from C in a number of significant ways, and Guile
|
||||
tries to make the advantages of Scheme available to C as well. Thus, in
|
||||
addition to a Scheme interpreter, libguile also offers dynamic types,
|
||||
garbage collection, continuations, arithmetic on arbitrary sized
|
||||
numbers, and other things.
|
||||
|
||||
The two fundamental concepts are dynamic types and garbage collection.
|
||||
You need to understand how libguile offers them to C programs in order
|
||||
to use the rest of libguile. Also, the more general control flow of
|
||||
Scheme caused by continuations needs to be dealt with.
|
||||
|
||||
@menu
|
||||
* Dynamic Types:: Dynamic Types.
|
||||
* Garbage Collection:: Garbage Collection.
|
||||
* Control Flow:: Control Flow.
|
||||
@end menu
|
||||
|
||||
@node Dynamic Types
|
||||
@subsection Dynamic Types
|
||||
|
||||
Scheme is a dynamically-typed language; this means that the system
|
||||
cannot, in general, determine the type of a given expression at compile
|
||||
time. Types only become apparent at run time. Variables do not have
|
||||
fixed types; a variable may hold a pair at one point, an integer at the
|
||||
next, and a thousand-element vector later. Instead, values, not
|
||||
variables, have fixed types.
|
||||
|
||||
In order to implement standard Scheme functions like @code{pair?} and
|
||||
@code{string?} and provide garbage collection, the representation of
|
||||
every value must contain enough information to accurately determine its
|
||||
type at run time. Often, Scheme systems also use this information to
|
||||
determine whether a program has attempted to apply an operation to an
|
||||
inappropriately typed value (such as taking the @code{car} of a string).
|
||||
|
||||
Because variables, pairs, and vectors may hold values of any type,
|
||||
Scheme implementations use a uniform representation for values --- a
|
||||
single type large enough to hold either a complete value or a pointer
|
||||
to a complete value, along with the necessary typing information.
|
||||
|
||||
In Guile, this uniform representation of all Scheme values is the C type
|
||||
@code{SCM}. This is an opaque type and its size is typically equivalent
|
||||
to that of a pointer to @code{void}. Thus, @code{SCM} values can be
|
||||
passed around efficiently and they take up reasonably little storage on
|
||||
their own.
|
||||
|
||||
The most important rule is: You never access a @code{SCM} value
|
||||
directly; you only pass it to functions or macros defined in libguile.
|
||||
|
||||
As an obvious example, although a @code{SCM} variable can contain
|
||||
integers, you can of course not compute the sum of two @code{SCM} values
|
||||
by adding them with the C @code{+} operator. You must use the libguile
|
||||
function @code{scm_sum}.
|
||||
|
||||
Less obvious and therefore more important to keep in mind is that you
|
||||
also cannot directly test @code{SCM} values for trueness. In Scheme,
|
||||
the value @code{#f} is considered false and of course a @code{SCM}
|
||||
variable can represent that value. But there is no guarantee that the
|
||||
@code{SCM} representation of @code{#f} looks false to C code as well.
|
||||
You need to use @code{scm_is_true} or @code{scm_is_false} to test a
|
||||
@code{SCM} value for trueness or falseness, respectively.
|
||||
|
||||
You also can not directly compare two @code{SCM} values to find out
|
||||
whether they are identical (that is, whether they are @code{eq?} in
|
||||
Scheme terms). You need to use @code{scm_is_eq} for this.
|
||||
|
||||
The one exception is that you can directly assign a @code{SCM} value to
|
||||
a @code{SCM} variable by using the C @code{=} operator.
|
||||
|
||||
The following (contrieved) example shows how to do it right. It
|
||||
implements a function of two arguments (@var{a} and @var{flag}) that
|
||||
returns @var{a}+1 if @var{flag} is true, else it returns @var{a}
|
||||
unchanged.
|
||||
|
||||
@example
|
||||
SCM
|
||||
my_incrementing_function (SCM a, SCM flag)
|
||||
@{
|
||||
SCM result;
|
||||
|
||||
if (scm_is_true (flag))
|
||||
result = scm_sum (a, scm_from_int (1));
|
||||
else
|
||||
result = a;
|
||||
|
||||
return result;
|
||||
@}
|
||||
@end example
|
||||
|
||||
Often, you need to convert between @code{SCM} values and approriate C
|
||||
values. For example, we needed to convert the integer @code{1} to its
|
||||
@code{SCM} representation in order to add it to @var{a}. Libguile
|
||||
provides many function to do these conversions, both from C to
|
||||
@code{SCM} and from @code{SCM} to C.
|
||||
|
||||
The conversion functions follow a common naming pattern: those that make
|
||||
a @code{SCM} value from a C value have names of the form
|
||||
@code{scm_from_@var{type} (@dots{})} and those that convert a @code{SCM}
|
||||
value to a C value use the form @code{scm_to_@var{type} (@dots{})}.
|
||||
|
||||
However, it is best to avoid converting values when you can. When you
|
||||
must combine C values and @code{SCM} values in a computation, it is
|
||||
often better to convert the C values to @code{SCM} values and do the
|
||||
computation by using libguile functions than to the other way around
|
||||
(converting @code{SCM} to C and doing the computation some other way).
|
||||
|
||||
As a simple example, consider this version of
|
||||
@code{my_incrementing_function} from above:
|
||||
|
||||
@example
|
||||
SCM
|
||||
my_other_incrementing_function (SCM a, SCM flag)
|
||||
@{
|
||||
int result;
|
||||
|
||||
if (scm_is_true (flag))
|
||||
result = scm_to_int (a) + 1;
|
||||
else
|
||||
result = scm_to_int (a);
|
||||
|
||||
return scm_from_int (result);
|
||||
@}
|
||||
@end example
|
||||
|
||||
This version is much less general than the original one: it will only
|
||||
work for values @var{A} that can fit into a @code{int}. The original
|
||||
function will work for all values that Guile can represent and that
|
||||
@code{scm_sum} can understand, including integers bigger than @code{long
|
||||
long}, floating point numbers, complex numbers, and new numerical types
|
||||
that have been added to Guile by third-party libraries.
|
||||
|
||||
Also, computing with @code{SCM} is not necessarily inefficient. Small
|
||||
integers will be encoded directly in the @code{SCM} value, for example,
|
||||
and do not need any additional memory on the heap. See @ref{Data
|
||||
Representation} to find out the details.
|
||||
|
||||
Some special @code{SCM} values are available to C code without needing
|
||||
to convert them from C values:
|
||||
|
||||
@multitable {Scheme value} {C representation}
|
||||
@item Scheme value @tab C representation
|
||||
@item @nicode{#f} @tab @nicode{SCM_BOOL_F}
|
||||
@item @nicode{#t} @tab @nicode{SCM_BOOL_T}
|
||||
@item @nicode{()} @tab @nicode{SCM_EOL}
|
||||
@end multitable
|
||||
|
||||
In addition to @code{SCM}, Guile also defines the related type
|
||||
@code{scm_t_bits}. This is an unsigned integral type of sufficient
|
||||
size to hold all information that is directly contained in a
|
||||
@code{SCM} value. The @code{scm_t_bits} type is used internally by
|
||||
Guile to do all the bit twiddling explained in @ref{Data
|
||||
Representation}, but you will encounter it occasionally in low-level
|
||||
user code as well.
|
||||
|
||||
|
||||
@node Garbage Collection
|
||||
@subsection Garbage Collection
|
||||
|
||||
As explained above, the @code{SCM} type can represent all Scheme values.
|
||||
Some values fit entirely into a @code{SCM} value (such as small
|
||||
integers), but other values require additional storage in the heap (such
|
||||
as strings and vectors). This additional storage is managed
|
||||
automatically by Guile. You don't need to explicitely deallocate it
|
||||
when a @code{SCM} value is no longer used.
|
||||
|
||||
Two things must be guaranteed so that Guile is able to manage the
|
||||
storage automatically: it must know about all blocks of memory that have
|
||||
ever been allocated for Scheme values, and it must know about all Scheme
|
||||
values that are still being used. Given this knowledge, Guile can
|
||||
periodically free all blocks that have been allocated but are not used
|
||||
by any active Scheme values. This activity is called @dfn{garbage
|
||||
collection}.
|
||||
|
||||
It is easy for Guile to remember all blocks of memory that is has
|
||||
allocated for use by Scheme values, but you need to help it with finding
|
||||
all Scheme values that are in use by C code.
|
||||
|
||||
You do this when writing a SMOB mark function, for example
|
||||
(@pxref{Garbage Collecting Smobs}). By calling this function, the
|
||||
garbage collector learns about all references that your SMOB has to
|
||||
other @code{SCM} values.
|
||||
|
||||
Other references to @code{SCM} objects, such as global variables of type
|
||||
@code{SCM} or other random data structures in the heap that contain
|
||||
fields of type @code{SCM}, can be made visible to the garbage collector
|
||||
by calling the functions @code{scm_gc_protect} or
|
||||
@code{scm_permanent_object}. You normally use these funtions for long
|
||||
lived objects such as a hash table that is stored in a global variable.
|
||||
For temporary references in local variables or function arguments, using
|
||||
these functions would be too expensive.
|
||||
|
||||
These references are handled differently: Local variables (and function
|
||||
arguments) of type @code{SCM} are automatically visible to the garbage
|
||||
collector. This works because the collector scans the stack for
|
||||
potential references to @code{SCM} objects and considers all referenced
|
||||
objects to be alive. The scanning considers each and every word of the
|
||||
stack, regardless of what it is actually used for, and then decides
|
||||
whether it could possible be a reference to a @code{SCM} object. Thus,
|
||||
the scanning is guaranteed to find all actual references, but it might
|
||||
also find words that only accidentally look like references. These
|
||||
`false positives' might keep @code{SCM} objects alive that would
|
||||
otherwise be considered dead. While this might waste memory, keeping an
|
||||
object around longer than it strictly needs to is harmless. This is why
|
||||
this technique is called ``conservative garbage collection''. In
|
||||
practice, the wasted memory seems to be no problem.
|
||||
|
||||
The stack of every thread is scanned in this way and the registers of
|
||||
the CPU and all other memory locations where local variables or function
|
||||
parameters might show up are included in this scan as well.
|
||||
|
||||
The consequence of the conservative scanning is that you can just
|
||||
declare local variables and function parameters of type @code{SCM} and
|
||||
be sure that the garbage collector will not free the corresponding
|
||||
objects.
|
||||
|
||||
However, a local variable or function parameter is only protected as
|
||||
long as it is really on the stack (or in some register). As an
|
||||
optimization, the C compiler might reuse its location for some other
|
||||
value and the @code{SCM} object would no longer be protected. Normally,
|
||||
this leads to exactly the right behabvior: the compiler will only
|
||||
overwrite a reference when it is no longer needed and thus the object
|
||||
becomes unprotected precisely when the reference disappears, just as
|
||||
wanted.
|
||||
|
||||
There are situations, however, where a @code{SCM} object needs to be
|
||||
around longer than its reference from a local variable or function
|
||||
parameter. This happens, for example, when you retrieve the array of
|
||||
characters from a Scheme string and work on that array directly. The
|
||||
reference to the @code{SCM} string object might be dead after the
|
||||
character array has been retrieved, but the array itself is still in use
|
||||
and thus the string object must be protected. The compiler does not
|
||||
know about this connection and might overwrite the @code{SCM} reference
|
||||
too early.
|
||||
|
||||
To get around this problem, you can use @code{scm_remember_upto_here_1}
|
||||
and its cousins. It will keep the compiler from overwriting the
|
||||
reference. For a typical example of its use, see @ref{Remembering
|
||||
During Operations}.
|
||||
|
||||
@node Control Flow
|
||||
@subsection Control Flow
|
||||
|
||||
Scheme has a more general view of program flow than C, both locally and
|
||||
non-locally.
|
||||
|
||||
Controlling the local flow of control involves things like gotos, loops,
|
||||
calling functions and returning from them. Non-local control flow
|
||||
refers to situations where the program jumps across one or more levels
|
||||
of function activations without using the normal call or return
|
||||
operations.
|
||||
|
||||
The primitive means of C for local control flow is the @code{goto}
|
||||
statement, together with @code{if}. Loops done with @code{for},
|
||||
@code{while} or @code{do} could in principle be rewritten with just
|
||||
@code{goto} and @code{if}. In Scheme, the primitive means for local
|
||||
control flow is the @emph{function call} (together with @code{if}).
|
||||
Thus, the repetition of some computation in a loop is ultimately
|
||||
implemented by a function that calls itself, that is, by recursion.
|
||||
|
||||
This approach is theoretically very powerful since it is easier to
|
||||
reason formally about recursion than about gotos. In C, using
|
||||
recursion exclusively would not be practical, tho, since it would eat
|
||||
up the stack very quickly. In Scheme, however, it is practical:
|
||||
function calls that appear in a @dfn{tail position} do not use any
|
||||
additional stack space.
|
||||
|
||||
A function call is in a tail position when it is the last thing the
|
||||
calling function does. The value returned by the called function is
|
||||
immediately returned from the calling function. In the following
|
||||
example, the call to @code{bar-1} is in a tail position, while the
|
||||
call to @code{bar-2} is not. (The call to @code{1-} in @code{foo-2}
|
||||
is in a tail position, tho.)
|
||||
|
||||
@lisp
|
||||
(define (foo-1 x)
|
||||
(bar-1 (1- x)))
|
||||
|
||||
(define (foo-2 x)
|
||||
(1- (bar-2 x)))
|
||||
@end lisp
|
||||
|
||||
Thus, when you take care to recurse only in tail positions, the
|
||||
recursion will only use constant stack space and will be as good as a
|
||||
loop constructed from gotos.
|
||||
|
||||
Scheme offers a few syntactic abstractions (@code{do} and @dfn{named}
|
||||
@code{let}) that make writing loops slightly easier.
|
||||
|
||||
But only Scheme functions can call other functions in a tail position:
|
||||
C functions can not. This matters when you have, say, two functions
|
||||
that call each other recursively to form a common loop. The following
|
||||
(unrealistic) example shows how one might go about determing whether a
|
||||
non-negative integer @var{n} is even or odd.
|
||||
|
||||
@lisp
|
||||
(define (my-even? n)
|
||||
(cond ((zero? n) #t)
|
||||
(else (my-odd? (1- n)))))
|
||||
|
||||
(define (my-odd? n)
|
||||
(cond ((zero? n) #f)
|
||||
(else (my-even? (1- n)))))
|
||||
@end lisp
|
||||
|
||||
Because the calls to @code{my-even?} and @code{my-odd?} are in tail
|
||||
positions, these two procedures can be applied to arbitrary large
|
||||
integers without overflowing the stack. (They will still take a lot
|
||||
of time, of course.)
|
||||
|
||||
However, when one or both of the two procedures would be rewritten in
|
||||
C, it could no longer call its companion in a tail position (since C
|
||||
does not have this concept). You might need to take this
|
||||
consideration into account when deciding which parts of your program
|
||||
to write in Scheme and which in C.
|
||||
|
||||
In addition to calling functions and returning from them, a Scheme
|
||||
program can also exit non-locally from a function so that the control
|
||||
flow returns directly to an outer level. This means that some functions
|
||||
might not return at all.
|
||||
|
||||
Even more, it is not only possible to jump to some outer level of
|
||||
control, a Scheme program can also jump back into the middle of a
|
||||
function that has already exited. This might cause some functions to
|
||||
return more than once.
|
||||
|
||||
In general, these non-local jumps are done by invoking
|
||||
@dfn{continuations} that have previously been captured using
|
||||
@code{call-with-current-continuation}. Guile also offers a slightly
|
||||
restricted set of functions, @code{catch} and @code{throw}, that can
|
||||
only be used for non-local exits. This restriction makes them more
|
||||
efficient. Error reporting (with the function @code{error}) is
|
||||
implemented by invoking @code{throw}, for example. The functions
|
||||
@code{catch} and @code{throw} belong to the topic of @dfn{exceptions}.
|
||||
|
||||
Since Scheme functions can call C functions and vice versa, C code can
|
||||
experience the more general control flow of Scheme as well. It is
|
||||
possible that a C function will not return at all, or will return more
|
||||
than once. While C does offer @code{setjmp} and @code{longjmp} for
|
||||
non-local exits, it is still an unusual thing for C code. In
|
||||
contrast, non-local exits are very common in Scheme, mostly to report
|
||||
errors.
|
||||
|
||||
You need to be prepared for the non-local jumps in the control flow
|
||||
whenever you use a function from @code{libguile}: it is best to assume
|
||||
that any @code{libguile} function might signal an error or run a pending
|
||||
signal handler (which in turn can do arbitrary things).
|
||||
|
||||
It is often necessary to take cleanup actions when the control leaves a
|
||||
function non-locally. Also, when the control returns non-locally, some
|
||||
setup actions might be called for. For example, the Scheme function
|
||||
@code{with-output-to-port} needs to modify the global state so that
|
||||
@code{current-output-port} returns the port passed to
|
||||
@code{with-output-to-port}. The global output port needs to be reset to
|
||||
its previous value when @code{with-output-to-port} returns normally or
|
||||
when it is exited non-locally. Likewise, the port needs to be set again
|
||||
when control enters non-locally.
|
||||
|
||||
Scheme code can use the @code{dynamic-wind} function to arrange for the
|
||||
setting and resetting of the global state. C code could use the
|
||||
corresponding @code{scm_internal_dynamic_wind} function, but it might
|
||||
prefer to use the @dfn{frames} concept that is more natural for C code,
|
||||
(@pxref{Frames}).
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue