* libguile/strports.c (scm_mkstrport): Use UTF-8; ignore
%default-port-encoding. Rename 'str_len' and 'c_pos' to
'num_bytes' and 'c_byte_pos'. Interpret 'pos' argument
as a character index instead of a byte index.
* module/ice-9/boot-9.scm (%cond-expand-features): Add srfi-6 to the
list of core features.
* module/srfi/srfi-6.scm (open-input-string, open-output-string): Simply
re-export these, since the core versions are now compliant.
* doc/ref/api-io.texi (String Ports): Remove text that describes
non-compliant behavior of string ports with regard to encoding.
* doc/ref/srfi-modules.texi (SRFI-0): Add srfi-6 to the list of
core features.
(SRFI-6): Remove text that mentions non-compliant behavior of
core string ports.
* module/ice-9/format.scm (format):
* module/ice-9/pretty-print.scm (truncated-print):
* module/rnrs/io/ports.scm (open-string-input-port,
open-string-output-port):
* test-suite/test-suite/lib.scm (format-test-name):
* test-suite/tests/chars.test ("combining accent is pretty-printed",
"combining X is pretty-printed"):
* test-suite/tests/ecmascript.test (eread, eread/1):
* test-suite/tests/rdelim.test:
* test-suite/tests/reader.test (read-string):
* test-suite/tests/regexp.test:
* test-suite/tests/srfi-105.test (read-string): Don't set
%default-port-encoding before creating string ports.
* benchmark-suite/benchmarks/ports.bm (%latin1-port): Use
'set-port-encoding!' to set the string port encoding.
(%utf8/ascii-port, %utf8/wide-port, "rdelim"): Don't set
%default-port-encoding before creating string ports.
* test-suite/tests/r6rs-ports.test ("lookahead-u8 non-ASCII"): Don't set
%default-port-encoding before creating string ports.
("put-bytevector with UTF-16 string port", "put-bytevector with
wrong-encoding string port"): Use 'set-port-encoding!' to set the
string port encoding.
* test-suite/tests/print.test (tprint): Use 'set-port-encoding!' to set
the string port encoding.
("truncated-print"): Use 'pass-if-equal'.
* test-suite/tests/ports.test ("encoding failure leads to exception",
"%default-port-encoding is honored", "peek-char [latin-1]", "peek-char
[utf-8]", "peek-char [utf-16]"): Remove tests.
("%default-port-encoding is ignored", "peek-char"): Add tests.
("suitable encoding [latin-1]", "suitable encoding [latin-3]",
"wrong encoding, error", "wrong encoding, substitute",
"wrong encoding, escape"): Use 'set-port-encoding!' to set the
string port encoding.
("%default-port-encoding, wrong encoding"): Rewrite to use
a file port instead of a string port.
* libguile/r6rs-ports.c (scm_get_bytevector_some): Rewrite to
efficiently take the contents of the read/putback buffers. In the
docstring, clarify that it might not return all available bytes.
* doc/ref/api-io.texi (R6RS Binary Input): Clarify that
'get-bytevector-some' might not return all available bytes.
* test-suite/tests/r6rs-ports.test ("get-bytevector-some [only-some]"):
Remove bogus test, which requires more than the R6RS requires.
* module/rnrs/io/ports.scm (binary-port?): All ports are binary _and_
textual. Bytevectors and strings may be written to or read from
either.
(port-transcoder): All textual ports (all ports) have transcoders of
some sort.
* test-suite/tests/r6rs-ports.test ("8.2.6 Input and output ports"):
Remove test that binary ports don't have transcoders, because binary
ports are also textual.
* module/rnrs/io/port.scm (r6rs-open): New internal helper procedure for
opening files.
(open-file-input-port, open-file-output-port): Make use of
`r6rs-open'.
(open-file-input/output-port): Implement in terms of `r6rs-open',
add to exported identifiers list.
* module/rnrs.scm (open-file-input/output-port): Add to exported
identifiers.
* test-suite/tests/r6rs-ports.test (test-input-file-opener): New
procedure, collects several tests for opening file input ports.
("7.2.7 Input Ports"): Use `test-input-file-opener' for checking
`open-file-input-port'.
(test-output-file-opener): New procedure, collects several tests for
opening file output ports.
("8.2.10 Output ports"): Use `test-output-file-opener' for checking
`open-file-output-port'.
("8.2.13 Input/output ports"): New test prefix, making use of both
`test-input-file-opener' and `test-output-file-opener' to check
`open-file-input/output-port'.
This is a followup to 9f6e3f5a99 ("Have
string ports honor `%default-port-conversion-strategy'.").
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Set
%DEFAULT-PORT-CONVERSION-STRATEGY to 'error. Return #f when no
exception is raised.
("8.2.6 Input and output ports")["transcoded-port [error handling
mode = raise]"]: Return #f when no exception is raised.
* module/rnrs/io/ports.scm (open-file-input-port)
(open-file-output-port): Ensure the resulting ports are binary when no
transcoder is specified.
* test-suite/tests/r6rs-ports.test: Remove superfluous global change of
`%default-port-encoding' and accompanying comment.
("7.2.7 Input Ports"): Add test ensuring `open-file-input-port' opens
a binary port when no transcoder is specified.
("8.2.10 Output ports"): Strengthen existing `open-file-output-port'
binary-ness test by setting `%default-port-encoding' to "UTF-8".
* test-suite/tests/r6rs-ports.test
(call-with-bytevector-output-port/transcoded): New helper procedure.
("8.2.6 Input and output ports"): Use that helper procedure.
(encoding-error-predicate): New helper procedure.
("8.2.12 Textual Output"): Add tests for `put-char' and `put-string'
exception behavior on encoding errors.
* module/rnrs/io/ports.scm (open-file-output-port): Handle `no-fail'
file option.
(with-i/o-filename-conditions): Use `with-throw-handler' instead of `catch'.
(with-i/o-port-error,
with-textual-output-conditions. with-textual-input-conditions): New
exception-conversion helpers.
(put-char, put-datum, put-string, display): Use
`with-textual-output-conditions' instead of `with-i/o-encoding-error'
to get proper conditions in case of write errors.
(get-char, get-datum, get-line, get-string-all, lookahead-char):
Likewise, for the input case.
* test-suite/tests/r6rs-ports.test (pass-if-condition, test-file,
make-failing-port): New helpers.
("8.2.10 Output ports"): Add some tests for `open-file-output-port'.
("8.2.9 Textual Input"): Add tests read error conditions.
("8.2.12 Textual Output"): Add tests for write error conditions.
("8.3 Simple I/O"): Add tests for conditions, `call-with-input-file'
and `call-with-output-file'.
* libguile/r6rs-ports.c (bip_seek): Fix to allow seeking to end of port.
* test-suite/tests/r6rs-ports.test ("bytevector input port can seek to
very end"): Add tests.
* module/rnrs/io/ports.scm (transcoder-eol-style)
(transcoder-error-handling-mode): Export these.
(textual-port?): Implement this procedure and export it.
* module/rnrs.scm: Export these here as well.
* module/rnrs/io/ports.scm (port-transcoder): Implement this procedure.
(binary-port?): Treat only ports without an encoding as binary ports,
add docstring.
(standard-input-port, standard-output-port, standard-error-port):
Ensure these are created without an encoding.
(eol-style): Add `none' as enumeration member.
(native-eol-style): Switch to `none' from `lf'.
* test-suite/tests/r6rs-ports.test (7.2.7 Input ports)
(8.2.10 Output ports): Test binary-ness of `standard-input-port',
`standard-output-port' and `standard-error-port'.
(8.2.6 Input and output ports): Add test for `port-transcoder'.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
* libguile/r6rs-ports.c (scm_get_string_n_x): Implement `get-string-n!'
in C for efficiency.
* libguile/r6rs-ports.h: Add prototype for this function.
* module/ice-9/binary-ports.scm: Export `get-string-n!'.
* module/rnrs/io/ports.scm (get-string-n): Implement based on
`get-string-n!'.
Export both `get-string-n!' and `get-string-n'.
* module/rnrs.scm: Also export these.
* test-suite/tests/r6rs-ports.test (8.2.9 Textual input): Add a few
tests for `get-string-n' and `get-string-n!'.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
* module/rnrs/io/ports.scm (&i/o-encoding): New error condition type.
(with-i/o-encoding-error): New macro.
(put-char, put-datum, put-string): Use it.
* test-suite/tests/r6rs-ports.test ("8.2.6 Input and output
ports")["transcoded-port, output [error handling mode = raise]"]: New
test.
* module/rnrs/io/ports.scm (&i/o-decoding): New error condition type.
(with-i/o-decoding-error): New macro.
(get-char, get-datum, get-line, get-string-all, lookahead-char): Use
it.
* test-suite/tests/r6rs-ports.test ("8.2.6 Input and output
ports")["transcoded-port [error handling mode = raise]"]: Use `guard'
and `i/o-decoding-error?'.
* libguile/ports.c (scm_read_char): Mention `decoding-error' in the
docstring.
(get_codepoint): Change to return an error code; add `codepoint'
output parameter. Don't raise an error from here.
(scm_getc): Raise an error with `scm_decoding_error' if
`get_codepoint' returns an error.
(scm_peek_char): Likewise. Update docstring.
* libguile/strings.c (scm_decoding_error_key): New variable.
(scm_decoding_error): New function.
(scm_from_stringn): Use `scm_decoding_error' instead of
`scm_encoding_error'.
* libguile/strings.h (scm_decoding_error): New declaration.
* test-suite/tests/ports.test ("string ports")["read-char, wrong
encoding, error"]: Change to expect `decoding-error'. Make sure PORT
points past the error.
["read-char, wrong encoding, escape"]: Likewise.
["peek-char, wrong encoding, error"]: New test.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Change to
expect `decoding-error'.
("8.2.6 Input and output ports")["transcoded-port [error handling
mode = raise]"]: Likewise.
* test-suite/tests/rdelim.test ("read-line")["decoding error", "decoding
error, substitute"]: New tests.
* doc/ref/api-io.texi (Reading): Update documentation of `read-char' and
`peek-char'.
(Line/Delimited): Update documentation of `read-line'.
* libguile/r6rs-ports.c (make_bip, make_cbip, make_bop, make_cbop): Lock
the port table.
* libguile/r6rs-ports.c (make_bop): Let the returned extraction
procedure refer to the port's buffer instead of the port itself. This
fixes a segfault if the port is closed before the extraction procedure
is called.
(bop_proc_apply): Adapt accordingly.
* test-suite/tests/r6rs-ports.test (8.2.10 Output ports): Add testcase
for extraction after close.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
Reported by Göran Weinholt <goran@weinholt.se>.
* libguile/r6rs-ports.c (scm_lookahead_u8): Return an unsigned byte.
* test-suite/tests/r6rs-ports.test ("7.2.8 Binary Input")["lookahead-u8:
result is unsigned"]: New test.
* libguile/strings.c (scm_encoding_error): Change arguments to convey
more information. Raise the error with `scm_throw ()', passing all
the information to the handler.
(scm_from_stringn, scm_to_stringn): Update accordingly.
* test-suite/tests/ports.test ("string ports")["wrong encoding"]: Check
the arguments passed to the `throw' handler.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Likewise.
* libguile/strports.c (scm_i_mkstrport): Remove.
(scm_mkstrport): Don't change the port's encoding to UTF-8; convert
STR to the default port encoding.
(scm_strport_to_string): Fix documentation & indentation.
* libguile/strports.h (scm_i_mkstrport): Remove.
* test-suite/lib.scm (exception:encoding-error): New variable.
(format-test-name): Set `%default-port-encoding' to "UTF-8".
* test-suite/tests/ports.test ("string ports")["%default-port-encoding
is honored", "suitable encoding [latin-1]", "suitable encoding
[latin-3]", "wrong encoding"]: New tests.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with UTF-16 string port", "put-bytevector
with wrong-encoding string port"]: New tests.
* test-suite/tests/reader.test (read-string): Set
`%default-port-encoding' to `#f'.
("reading")["unprintable symbol"]: Use a string that doesn't contain
zeros.
* doc/ref/api-io.texi (String Ports): Document encoding issues with
`call-with-output-string' and `with-output-to-string'.
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar