diff --git a/data/tutorials/examples/amodule.ml b/data/tutorials/examples/athens.ml similarity index 100% rename from data/tutorials/examples/amodule.ml rename to data/tutorials/examples/athens.ml diff --git a/data/tutorials/examples/bmodule.ml b/data/tutorials/examples/berlin.ml similarity index 100% rename from data/tutorials/examples/bmodule.ml rename to data/tutorials/examples/berlin.ml diff --git a/data/tutorials/examples/amodule2.ml b/data/tutorials/examples/cairo.ml similarity index 100% rename from data/tutorials/examples/amodule2.ml rename to data/tutorials/examples/cairo.ml diff --git a/data/tutorials/examples/bmodule2.ml b/data/tutorials/examples/delhi.ml similarity index 100% rename from data/tutorials/examples/bmodule2.ml rename to data/tutorials/examples/delhi.ml diff --git a/data/tutorials/language/1ms_00_modules.md b/data/tutorials/language/1ms_00_modules.md index 4273ab1f8a..4eaa124346 100644 --- a/data/tutorials/language/1ms_00_modules.md +++ b/data/tutorials/language/1ms_00_modules.md @@ -1,544 +1,440 @@ --- -id : modules +id: modules title: Modules short_title: Modules description: > - Learn about OCaml modules and how they can be used to cleanly separate distinct parts of your program + Modules are collections of definitions. This is the basic means to organise OCaml software. category: "Module System" --- +## Introduction + +In this tutorial, we look at how to use and define modules. + +Modules are collections of definitions grouped together. This is the basic means to organise OCaml software. Separate concerns can and should be isolated into separate modules. + +**Prerequisites**: [Values and Functions](/docs/values-and-functions) and [Basic Data Types and Pattern Matching](/docs/basic-data-types) + ## Basic Usage ### File-Based Modules In OCaml, every piece of code is wrapped into a module. Optionally, a module -itself can be a submodule of another module, pretty much like directories in a -file system - but we don't do this very often. - -When you write a program, let's say using two files `amodule.ml` and -`bmodule.ml`, each of these files automatically defines a module named -`Amodule` and a module named `Bmodule` that provide whatever you put into the -files. +itself can be a [submodule](#submodules) of another module, pretty much like +directories in a file system. -Here is the code that we have in our file `amodule.ml`: +Here is a program using two files: `athens.ml` and `berlin.ml`. Each file +defines a module named `Athens` and `Berlin`, respectively. - +Here is the file `athens.ml`: + ```ocaml -let hello () = print_endline "Hello" +let hello () = print_endline "Hello from Athens" ``` -And here is what we have in `bmodule.ml`: - - +Here is the file `berlin.ml`: + ```ocaml -let () = Amodule.hello () +let () = Athens.hello () ``` -### Automatised Compilation - -In order to compile them using the [Dune](https://dune.build/) build system, -which is now the standard on OCaml, at least two configuration files are -required: - -* The `dune-project` file, which contains project-wide configuration data. - Here's a very minimal one: - ``` - (lang dune 3.4) +To compile them using [Dune](https://dune.build/), at least two +configuration files are required: +* The `dune-project` file contains project-wide configuration. + ```lisp + (lang dune 3.7) ``` -* The `dune` file, which contains actual build directives. A project may have several - of them, depending on the organisation of the sources. This is sufficient for - our example: - ``` - (executable (name bmodule)) +* The `dune` file contains actual build directives. A project may have several + `dune` files, one per folder containing things to build. This single line is + sufficient in this example: + ```lisp + (executable (name berlin)) ``` -Here is how to create the configuration files, build the source, and run the -executable. +After you create those files, build and run them: ```bash -$ echo "(lang dune 3.4)" > dune-project -$ echo "(executable (name bmodule))" > dune $ opam exec -- dune build -$ opam exec -- dune exec ./bmodule.exe -Hello -``` -Actually, `dune build` is optional. Simply running `dune exec` would have -triggered the compilation. Note that in the `dune exec` command the argument -`./bmodule.exe` is not a file path. This command means “execute the content of -the file `./bmodule.ml`.” However, the actual executable file is stored and -named differently. - -In a real-world project, it is preferable to start by creating the `dune` -configuration files and directory structure using the `dune init project` -command. - -### Manual Compilation - -Alternatively, it is possible, but not recommended, to compile the files by -directly calling the compiler, either by using a single command: - - -```sh -$ ocamlopt -o hello amodule.ml bmodule.ml +$ opam exec -- dune exec ./berlin.exe +Hello from Athens ``` -Or, as a build system does, one by one: - - -```sh -$ ocamlopt -c amodule.ml -$ ocamlopt -c bmodule.ml -$ ocamlopt -o hello amodule.cmx bmodule.cmx -``` - -In both cases, a standalone executable is created - -```sh -$ ./hello -Hello -``` +Actually, `opam exec -- dune build` is optional. Running `opam exec -- dune exec ./berlin.exe` would have +triggered the compilation. Note that in the `opam exec -- dune exec` command, the parameter +`./berlin.exe` is not a file path. This command means “execute the content of +the file `./berlin.ml`.” However, the executable file is stored and named +differently. -Note: It's necessary to place the source files in the correct order. The dependencies must come before -the dependent. In the first example above, putting `bmodule.ml` before `amodule.ml` -will result in an `Unbound module` error. +In a project, it is preferable to create the `dune` configuration files and +directory structure using the `dune init project` command. Refer to the Dune +documentation for more on this matter. ### Naming and Scoping -Now we have an executable that prints `Hello`. As you can see, if you want to -access anything from a given module, use the name of the module (always -starting with a capital letter) followed by a dot and the thing that you want to use. -It may be a value, a type constructor, or anything else that a given module can -provide. - -Libraries, starting with the standard library, provide collections of modules. -for example, `List.iter` designates the `iter` function from the `List` module. - -If you are using a given module heavily, you may want to make its contents -directly accessible. For this, we use the `open` directive. In our example, -`bmodule.ml` could have been written: +In `berlin.ml`, we used `Athens.hello` to refer to `hello` from `athens.ml`. +Generally, to access something from a module, use the module's name (which +always starts with a capital letter: `Athens`) followed by a dot and the +thing you want to use (`hello`). It may be a value, a type constructor, or +anything the module provides. +If you are using a module heavily, you might want to `open` it. This brings the +module's definitions into scope. In our example, `berlin.ml` could have been +written: ```ocaml -open Amodule +open Athens let () = hello () ``` -Using `open` or not is a matter of personal taste. Some modules provide names -that are used in many other modules. This is the case of the `List` module for -instance. Usually, we don't do `open List`. Other modules like `Printf` provide -names that normally aren't subject to conflicts, such as `printf`. In order to -avoid writing `Printf.printf` all over the place, it often makes sense to place -one `open Printf` at the beginning of the file: - +Using `open` is optional. Usually, we don't open a module like `List` because it +provides names other modules also provide, such as `Array` or `Option`. Modules +like `Printf` provide names that aren't subject to conflicts, such as `printf`. +Placing `open Printf` at the top of a file avoids writing `Printf.printf` repeatedly. ```ocaml open Printf let data = ["a"; "beautiful"; "day"] let () = List.iter (printf "%s\n") data ``` -There are also local `open`s: + The standard library is a module called `Stdlib`. It contains + [submodules](#submodules) `List`, `Option`, `Either`, and more. By default, the + OCaml compiler opens the standard library, as if you had written `open Stdlib` + at the top of every file. Refer to Dune documentation if you need to opt-out. + +You can open a module inside a definition, using the `let open ... in` construct: +```ocaml +# let list_sum_sq m = + let open List in + init m Fun.id |> map (fun i -> i * i) |> fold_left ( + ) 0;; +val list_sum_sq : int -> int = +``` +The module access notation can be applied to an entire expression: ```ocaml -# let map_3d_matrix f m = - let open Array in - map (map (map f)) m;; -val map_3d_matrix : - ('a -> 'b) -> 'a array array array -> 'b array array array = -# let map_3d_matrix' f = - Array.(map (map (map f)));; -val map_3d_matrix' : - ('a -> 'b) -> 'a array array array -> 'b array array array = +# let array_sum_sq m = + Array.(init m Fun.id |> map (fun i -> i * i) |> fold_left ( + ) 0);; +val array_sum_sq : int -> int = ``` -## Interfaces and Signatures +## Interfaces and Implementations -A module can provide a certain number of things (functions, types, submodules, -etc.) to the rest of the program that is using it. If nothing special is done, -everything that's defined in a module will be accessible from the outside. That's -often fine in small personal programs, but there are many situations where it -is better that a module only provides what it is meant to provide, not any of -the auxiliary functions and types that are used internally. +By default, anything defined in a module is accessible from other modules. +Values, functions, types, or submodules, everything is public. This can be +restricted to avoid exposing definitions that are not relevant from the outside. -For this, we have to define a module interface, which will act as a mask over -the module's implementation. Just like a module derives from an `.ml` file, the -corresponding module interface or signature derives from an `.mli` file. It -contains a list of values with their type. Let's rewrite our `amodule.ml` file -to something called `amodule2.ml`: +For this, we must distinguish: +- The definitions inside a module (the module implementation) +- The public declarations of a module (the module interface) - +An `.ml` file contains a module implementation; an `.mli` file contains a module +interface. By default, when no corresponding `.mli` file is provided, an +implementation has a default interface where everything is public. + +Copy the `athens.ml` file into `cairo.ml` and change its contents: + ```ocaml -let message = "Hello 2" +let message = "Hello from Cairo" let hello () = print_endline message ``` -As it is, `Amodule2` has the following interface: - +As it is, `Cairo` has the following interface: ```ocaml val message : string val hello : unit -> unit ``` -Let's assume that accessing the `message` value directly is none of the other -modules' business; we want it to be a private definition. We can hide it by -defining a restricted interface. This is our `amodule2.mli` file: +Explicitly defining a module interface allows restricting the default one. It +acts as a mask over the module's implementation. The `cairo.ml` file defines +`Cairo`'s implementation. Adding a `cairo.mli` file defines `Cairo`'s interface. +Filenames without extensions must be the same. - +To turn `message` into a private definition, don't list it in the `cairo.mli` file: + ```ocaml val hello : unit -> unit -(** Displays a greeting message. *) +(** [hello ()] displays a greeting message. *) ``` -(note the double asterisk at the beginning of the comment. It is a good habit -to document `.mli` files using the format supported by -[ocamldoc](/releases/4.14/htmlman/ocamldoc.html)) +**Note**: The double asterisk at the beginning indicates a +comment meant for API documentation tools, such as +[`odoc`](https://github.com/ocaml/odoc). It is a good habit to document `.mli` +files using the format supported by this tool. -The corresponding module `Bmodule2` is defined in file `bmodule2.ml`: +The file `delhi.ml` defines the program calling `Cairo`: - + ```ocaml -let () = Amodule2.hello () +let () = Cairo.hello () ``` -The .`mli` files must be compiled before the matching `.ml` files. This is done -automatically by Dune. We update the `dune` file to allow the compilation -of this example aside of the previous one. - - -```bash -$ echo "(executables (names bmodule bmodule2))" > dune -$ opam exec -- dune build -$ opam exec -- dune exec ./bmodule.exe -Hello -$ opam exec -- dune exec ./bmodule2.exe -Hello 2 +Update the `dune` file to allow this example's compilation aside from the +previous one. +```lisp +(executables (names berlin delhi)) ``` -Here is how the same result can be achieved by calling the compiler manually. -Notice the `.mli` file is compiled using bytecode compiler `ocamlc`, while -`.ml` files are compiled to native code using `ocamlopt`: +Compile and execute both programs: +```shell +$ opam exec -- dune exec ./berlin.exe +Hello from Athens - -```sh -$ ocamlc -c amodule2.mli -$ ocamlopt -c amodule2.ml -$ ocamlopt -c bmodule2.ml -$ ocamlopt -o hello2 amodule2.cmx bmodule2.cmx -$ ./hello -Hello -$ ./hello2 -Hello 2 +$ opam exec -- dune exec ./delhi.exe +Hello from Cairo +``` + +You can check that `Cairo.message` is not public by attempting to compile a `delhi.ml` file containing: +```ocaml +let () = print_endline Cairo.message ``` -## Abstract Types +This triggers a compilation error. -What about type definitions? We saw that values such as functions can be -exported by placing their name and their type in an `.mli` file, e.g., +## Abstract and Read-Only Types + +Function and value definitions are either public or private. That also applies +to type definitions, but there are two more cases. + +Create files named `exeter.mli` and `exeter.ml` with the following contents: + +**Interface: `exeter.mli`** - ```ocaml -val hello : unit -> unit + +type aleph = Ada | Alan | Alonzo + +type gimel +val gimel_of_bool : bool -> gimel +val gimel_flip : gimel -> gimel +val gimel_to_string : gimel -> string + +type dalet = private Dennis of int | Donald of string | Dorothy +val dalet_of : (int, string) Either.t option -> dalet ``` -But modules often define new types. Let's define a simple record type that -would represent a date: +**Implementation: `exeter.ml`** ```ocaml -type date = {day : int; month : int; year : int} +type aleph = Ada | Alan | Alonzo + +type bet = bool + +type gimel = Christos | Christine +let gimel_of_bool b = if (b : bet) then Christos else Christine +let gimel_flip = function Christos -> Christine | Christine -> Christos +let gimel_to_string x = "Christ" ^ match x with Christos -> "os" | _ -> "ine" + +type dalet = Dennis of int | Donald of string | Dorothy +let dalet_of = function + | None -> Dorothy + | Some (Either.Left x) -> Dennis x + | Some (Either.Right x) -> Donald x ``` -There are four options when it comes to writing the `.mli` file: +Update file `dune`: +```lisp +(executables (names berlin delhi) (modules berlin delhi)) +(library (name exeter) (modules exeter) (modes byte)) +``` -1. The type is completely omitted from the signature. -2. The type definition is copy-pasted into the signature. -3. The type is made abstract: only its name is given. -4. The record fields are made read-only: `type date = private { ... }` +Run the `opam exec -- dune utop` command. This triggers `Exeter`'s compilation, launches `utop`, and loads `Exeter`. +```ocaml +# open Exeter;; -Case 3 would look like this: +# #show aleph;; +type aleph = Ada | Alan | Alonzo +``` +Type `aleph` is public. Values can be created or accessed. ```ocaml -type date +# #show bet;; +Unknown element. ``` -Now, users of the module can manipulate objects of type `date`, but they can't -access the record fields directly. They must use the functions that the module -provides. Let's assume the module provides three functions: one for creating a -date, one for computing the difference between two dates, and one that returns -the date in years: +Type `bet` is private. It is not available outside of the implementation where it is defined, here `Exeter`. +```ocaml +# #show gimel;; +type gimel - +# Christos;; +Error: Unbound constructor Christos + +# #show_val gimel_of_bool;; +val gimel_of_bool : bool -> gimel + +# true |> gimel_of_bool |> gimel_to_string;; +- : string = "Christos" + +# true |> gimel_of_bool |> gimel_flip |> gimel_to_string;; +- : string = "Christine" +``` + +Type `gimel` is _abstract_. Values can be created or manipulated, but only as function results or arguments. Just the provided functions `gimel_of_bool`, `gimel_flip`, and `gimel_to_string` or polymorphic functions can receive or return `gimel` values. ```ocaml -type date +# #show dalet;; +type dalet = private Dennis of int | Donald of string | Dorothy -val create : ?days:int -> ?months:int -> ?years:int -> unit -> date +# Donald 42;; +Error: Cannot create values of the private type Exeter.dalet -val sub : date -> date -> date +# dalet_of (Some (Either.Left 10));; +- : dalet = Dennis 10 -val years : date -> float +# let dalet_to_string = function + | Dorothy -> "Dorothy" + | Dennis _ -> "Dennis" + | Donald _ -> "Donald";; +val dalet_to_string : dalet -> string = ``` -The point is that only `create` and `sub` can be used to create `date` records. -Therefore, it is not possible for the user to create ill-formed -records. Actually, our implementation uses a record, but we could change it and -be sure that it will not break any code that relies on this module! This makes -a lot of sense in a library since subsequent versions of the same library can -continue to expose the same interface while internally changing the -implementation, including data structures. +The type `dalet` is _read-only_. Pattern matching is possible, but values can only be constructed by the provided functions, here `dalet_of`. + +Abstract and read-only types can be either variants, as shown in this section, records, or aliases. It is possible to access a read-only record field's value, but creating such a record requires using a provided function. ## Submodules ### Submodule Implementation -We saw that one `example.ml` file results automatically in one module -implementation named `Example`. Its module signature is automatically derived -and is the broadest possible, or can be restricted by writing an `example.mli` -file. - -That said, a given module can also be defined explicitly from within a file. -That makes it a submodule of the current module. Let's consider this -`example.ml` file: +A module can be defined inside another module. That makes it a _submodule_. +Let's consider the files `florence.ml` and `glasgow.ml` +**`florence.ml`** ```ocaml module Hello = struct - let message = "Hello" - let hello () = print_endline message + let message = "Hello from Florence" + let print () = print_endline message end -let goodbye () = print_endline "Goodbye" - -let hello_goodbye () = - Hello.hello (); - goodbye () +let print_goodbye () = print_endline "Goodbye" ``` -From another file, it is clear that we now have two levels of modules. We can -write: - - +**`glasgow.ml`** ```ocaml let () = - Example.Hello.hello (); - Example.goodbye () + Florence.Hello.print (); + Florence.print_goodbye () ``` -### Submodule Interface +Definitions from a submodule are accessed by chaining module names, here +`Florence.Hello.print`. -We can also restrict the interface of a given submodule. It is called a module -type. Let's do it in our `example.ml` file: +### Submodule With Signatures +To define a submodule's interface, we can provide a _module signature_. This +is done in this second version of the `florence.ml` file: ```ocaml module Hello : sig - val hello : unit -> unit -end -= -struct + val print : unit -> unit +end = struct let message = "Hello" - let hello () = print_endline message + let print () = print_endline message end -(* At this point, Hello.message is not accessible anymore. *) - -let goodbye () = print_endline "Goodbye" - -let hello_goodbye () = - Hello.hello (); - goodbye () +let print_goodbye () = print_endline "Goodbye" ``` -The definition of the `Hello` module above is the equivalent of a -`hello.mli`/`hello.ml` pair of files. Writing all of that in one block of code -is not elegant so, in general, we prefer to define the module signature -separately: +The first version made `Florence.Hello.message` public. In this version it can't be accessed from `glasgow.ml`. - +### Module Signatures are Types + +The role played by module signatures to implementations is akin to the role played by types to values. Here is a third possible way to write file `florence.ml`: ```ocaml -module type Hello_type = sig - val hello : unit -> unit +module type HelloType = sig + val hello : unit -> unit end -module Hello : Hello_type = struct - ... +module Hello : HelloType = struct + let message = "Hello" + let print () = print_endline message end + +let print_goodbye () = print_endline "Goodbye" ``` -`Hello_type` is a named module type and can be reused to define other module -interfaces. +First, we define a `module type` called `HelloType`, which defines the same module interface as before. Instead of providing the signature when defining the `Hello` module, we use the `HelloType` module type. -## Practical Manipulation of Modules +This allows writing interfaces shared by several modules. An implementation satisfies any module type listing some of its contents. This implies a module may have several types and that there is a subtyping relationship between module types. -### Displaying the Interface of a Module +## Module Manipulation -You can use the OCaml toplevel to visualise the contents of an existing -module, such as `List`: +### Displaying a Module's Interface + +You can use the OCaml toplevel to see the contents of an existing +module, such as `Unit`: ```ocaml -# #show List;; -module List : +# #show Unit;; +module Unit : sig - type 'a t = 'a list = [] | (::) of 'a * 'a list - val length : 'a t -> int - val compare_lengths : 'a t -> 'b t -> int - val compare_length_with : 'a t -> int -> int - val cons : 'a -> 'a t -> 'a t - val hd : 'a t -> 'a - val tl : 'a t -> 'a t - val nth : 'a t -> int -> 'a - val nth_opt : 'a t -> int -> 'a option - val rev : 'a t -> 'a t - val init : int -> (int -> 'a) -> 'a t - val append : 'a t -> 'a t -> 'a t - val rev_append : 'a t -> 'a t -> 'a t - val concat : 'a t t -> 'a t - val flatten : 'a t t -> 'a t - val equal : ('a -> 'a -> bool) -> 'a t -> 'a t -> bool - val compare : ('a -> 'a -> int) -> 'a t -> 'a t -> int - val iter : ('a -> unit) -> 'a t -> unit - val iteri : (int -> 'a -> unit) -> 'a t -> unit - val map : ('a -> 'b) -> 'a t -> 'b t - val mapi : (int -> 'a -> 'b) -> 'a t -> 'b t - val rev_map : ('a -> 'b) -> 'a t -> 'b t - val filter_map : ('a -> 'b option) -> 'a t -> 'b t - val concat_map : ('a -> 'b t) -> 'a t -> 'b t - val fold_left_map : ('a -> 'b -> 'a * 'c) -> 'a -> 'b t -> 'a * 'c t - val fold_left : ('a -> 'b -> 'a) -> 'a -> 'b t -> 'a - val fold_right : ('a -> 'b -> 'b) -> 'a t -> 'b -> 'b - val iter2 : ('a -> 'b -> unit) -> 'a t -> 'b t -> unit - val map2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t - val rev_map2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t - val fold_left2 : ('a -> 'b -> 'c -> 'a) -> 'a -> 'b t -> 'c t -> 'a - val fold_right2 : ('a -> 'b -> 'c -> 'c) -> 'a t -> 'b t -> 'c -> 'c - val for_all : ('a -> bool) -> 'a t -> bool - val exists : ('a -> bool) -> 'a t -> bool - val for_all2 : ('a -> 'b -> bool) -> 'a t -> 'b t -> bool - val exists2 : ('a -> 'b -> bool) -> 'a t -> 'b t -> bool - val mem : 'a -> 'a t -> bool - val memq : 'a -> 'a t -> bool - val find : ('a -> bool) -> 'a t -> 'a - val find_opt : ('a -> bool) -> 'a t -> 'a option - val find_map : ('a -> 'b option) -> 'a t -> 'b option - val filter : ('a -> bool) -> 'a t -> 'a t - val find_all : ('a -> bool) -> 'a t -> 'a t - val filteri : (int -> 'a -> bool) -> 'a t -> 'a t - val partition : ('a -> bool) -> 'a t -> 'a t * 'a t - val partition_map : ('a -> ('b, 'c) Either.t) -> 'a t -> 'b t * 'c t - val assoc : 'a -> ('a * 'b) t -> 'b - val assoc_opt : 'a -> ('a * 'b) t -> 'b option - val assq : 'a -> ('a * 'b) t -> 'b - val assq_opt : 'a -> ('a * 'b) t -> 'b option - val mem_assoc : 'a -> ('a * 'b) t -> bool - val mem_assq : 'a -> ('a * 'b) t -> bool - val remove_assoc : 'a -> ('a * 'b) t -> ('a * 'b) t - val remove_assq : 'a -> ('a * 'b) t -> ('a * 'b) t - val split : ('a * 'b) t -> 'a t * 'b t - val combine : 'a t -> 'b t -> ('a * 'b) t - val sort : ('a -> 'a -> int) -> 'a t -> 'a t - val stable_sort : ('a -> 'a -> int) -> 'a t -> 'a t - val fast_sort : ('a -> 'a -> int) -> 'a t -> 'a t - val sort_uniq : ('a -> 'a -> int) -> 'a t -> 'a t - val merge : ('a -> 'a -> int) -> 'a t -> 'a t -> 'a t - val to_seq : 'a t -> 'a Seq.t - val of_seq : 'a Seq.t -> 'a t + type t = unit = () + val equal : t -> t -> bool + val compare : t -> t -> int + val to_string : t -> string end ``` -There is online documentation for each library. +The OCaml compiler tool chain can be used to dump an `.ml` file's default interface. +```shell +$ ocamlc -c -i cairo.ml +val message : string +val hello : unit -> unit +``` ### Module Inclusion -Let's say we feel that a function is missing from the standard `List` module, -but we really want it as if it were part of it. In an `extensions.ml` file, we +Let's say we feel that a function is missing from the `List` module, +but we really want it as if it were part of it. In an `extlib.ml` file, we can achieve this effect by using the `include` directive: ```ocaml -# module List = struct - include List - let rec optmap f = function - | [] -> [] - | hd :: tl -> - match f hd with - | None -> optmap f tl - | Some x -> x :: optmap f tl - end;; -module List : - sig - type 'a t = 'a list = [] | (::) of 'a * 'a list - val length : 'a t -> int - val compare_lengths : 'a t -> 'b t -> int - val compare_length_with : 'a t -> int -> int - val cons : 'a -> 'a t -> 'a t - val hd : 'a t -> 'a - val tl : 'a t -> 'a t - val nth : 'a t -> int -> 'a - val nth_opt : 'a t -> int -> 'a option - val rev : 'a t -> 'a t - val init : int -> (int -> 'a) -> 'a t - val append : 'a t -> 'a t -> 'a t - val rev_append : 'a t -> 'a t -> 'a t - val concat : 'a t t -> 'a t - val flatten : 'a t t -> 'a t - val equal : ('a -> 'a -> bool) -> 'a t -> 'a t -> bool - val compare : ('a -> 'a -> int) -> 'a t -> 'a t -> int - val iter : ('a -> unit) -> 'a t -> unit - val iteri : (int -> 'a -> unit) -> 'a t -> unit - val map : ('a -> 'b) -> 'a t -> 'b t - val mapi : (int -> 'a -> 'b) -> 'a t -> 'b t - val rev_map : ('a -> 'b) -> 'a t -> 'b t - val filter_map : ('a -> 'b option) -> 'a t -> 'b t - val concat_map : ('a -> 'b t) -> 'a t -> 'b t - val fold_left_map : ('a -> 'b -> 'a * 'c) -> 'a -> 'b t -> 'a * 'c t - val fold_left : ('a -> 'b -> 'a) -> 'a -> 'b t -> 'a - val fold_right : ('a -> 'b -> 'b) -> 'a t -> 'b -> 'b - val iter2 : ('a -> 'b -> unit) -> 'a t -> 'b t -> unit - val map2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t - val rev_map2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t - val fold_left2 : ('a -> 'b -> 'c -> 'a) -> 'a -> 'b t -> 'c t -> 'a - val fold_right2 : ('a -> 'b -> 'c -> 'c) -> 'a t -> 'b t -> 'c -> 'c - val for_all : ('a -> bool) -> 'a t -> bool - val exists : ('a -> bool) -> 'a t -> bool - val for_all2 : ('a -> 'b -> bool) -> 'a t -> 'b t -> bool - val exists2 : ('a -> 'b -> bool) -> 'a t -> 'b t -> bool - val mem : 'a -> 'a t -> bool - val memq : 'a -> 'a t -> bool - val find : ('a -> bool) -> 'a t -> 'a - val find_opt : ('a -> bool) -> 'a t -> 'a option - val find_map : ('a -> 'b option) -> 'a t -> 'b option - val filter : ('a -> bool) -> 'a t -> 'a t - val find_all : ('a -> bool) -> 'a t -> 'a t - val filteri : (int -> 'a -> bool) -> 'a t -> 'a t - val partition : ('a -> bool) -> 'a t -> 'a t * 'a t - val partition_map : ('a -> ('b, 'c) Either.t) -> 'a t -> 'b t * 'c t - val assoc : 'a -> ('a * 'b) t -> 'b - val assoc_opt : 'a -> ('a * 'b) t -> 'b option - val assq : 'a -> ('a * 'b) t -> 'b - val assq_opt : 'a -> ('a * 'b) t -> 'b option - val mem_assoc : 'a -> ('a * 'b) t -> bool - val mem_assq : 'a -> ('a * 'b) t -> bool - val remove_assoc : 'a -> ('a * 'b) t -> ('a * 'b) t - val remove_assq : 'a -> ('a * 'b) t -> ('a * 'b) t - val split : ('a * 'b) t -> 'a t * 'b t - val combine : 'a t -> 'b t -> ('a * 'b) t - val sort : ('a -> 'a -> int) -> 'a t -> 'a t - val stable_sort : ('a -> 'a -> int) -> 'a t -> 'a t - val fast_sort : ('a -> 'a -> int) -> 'a t -> 'a t - val sort_uniq : ('a -> 'a -> int) -> 'a t -> 'a t - val merge : ('a -> 'a -> int) -> 'a t -> 'a t -> 'a t - val to_seq : 'a t -> 'a Seq.t - val of_seq : 'a Seq.t -> 'a t - val optmap : ('a -> 'b option) -> 'a t -> 'b t - end +module List = struct + include Stdlib.List + let uncons = function + | [] -> None + | hd :: tl -> Some (hd, tl) +end ``` -It creates a module `Extensions.List` that has everything the standard `List` -module has, plus a new `optmap` function. From another file, all we have to do -to override the default `List` module is `open Extensions` at the beginning of -the `.ml` file: +It creates a module `Extlib.List` that has everything the standard `List` module +has, plus a new `uncons` function. In order to override the default `List` +module from another `.ml` file, we need to add `open Extlib` at the beginning. - +## Stateful Modules + +A module may have an internal state. This is the case for the `Random` module from the standard library. The functions `Random.get_state` and `Random.set_state` provide read and write access to the internal state, which is nameless and has an abstract type. ```ocaml -open Extensions +# let s = Random.get_state ();; +val s : Random.State.t = + +# Random.bits ();; +- : int = 89809344 -... +# Random.bits ();; +- : int = 994326685 -List.optmap ... +# Random.set_state s;; +- : unit = () + +# Random.bits ();; +- : int = 89809344 ``` + +Values returned by `Random.bits` will differ when you run this code. The first +and third calls return the same results, showing that the internal state was +reset. + +## Conclusion + +OCaml, modules are the basic means of organising software. To sum up, a +module is a collection of definitions wrapped under a name. These definitions +can be submodules, which allows the creation of hierarchies of modules. +Top-level modules must be files and are the units of compilation. Every module +has an interface, which is the list of definitions a module exposes. By default, +a module's interface exposes all its definitions, but this can be restricted +using the interface syntax. + +Going further, here are the other means to handle OCaml software components: +- Functors, which act like functions from modules to modules +- Libraries, which are compiled modules bundled together +- Packages, which are installation and distribution units diff --git a/data/tutorials/language/1ms_01_functors.md b/data/tutorials/language/1ms_01_functors.md index 05f968fbae..22678b38d6 100644 --- a/data/tutorials/language/1ms_01_functors.md +++ b/data/tutorials/language/1ms_01_functors.md @@ -3,187 +3,518 @@ id: functors title: Functors short_title: Functors description: > - Learn about functors, modules parameterised by other modules + In OCaml, a functor is a function at the module-level. Functors take modules as arguments and return a new module. category: "Module System" --- -Functors are probably one of the most complex features of OCaml, but you don't -have to use them extensively to be a successful OCaml programmer. Actually, -you may never have to define a functor yourself, but you will surely encounter -them in the standard library. They are the only way of using the Set and Map -modules, but using them is not so difficult. - -## What Are Functors and Why Do We Need Them? - -A functor is a module that is parametrised by another module, just like a -function is a value which is parametrised by other values, the arguments. - -It allows one to parametrise a type by a value, which is not possible directly -in OCaml without functors. For example, we can define a functor that takes an -`int n` and returns a collection of array operations that work exclusively on -arrays of length `n`. If by mistake the programmer passes a regular array to one -of those functions, it will result in a compilation error. If we were not using -this functor but the standard array type, the compiler would not be able to -detect the error, and we would get a runtime error at some undetermined date in -the future, which is much worse. - -## Using an Existing Functor - -The standard library defines a `Set` module, which provides a `Make` functor. -This functor takes one argument, which is a module that provides (at least) two -things: the type of elements, given as `t` and the comparison function given as -`compare`. The point of the functor is to ensure that the same comparison -function will always be used, even if the programmer makes a mistake. - -For example, if we want to use sets of `ints`, we would do this: - -```ocaml -# module Int_set = - Set.Make (struct - type t = int - let compare = compare - end);; -module Int_set : - sig - type elt = int - type t - val empty : t - val is_empty : t -> bool - val mem : elt -> t -> bool - val add : elt -> t -> t - val singleton : elt -> t - val remove : elt -> t -> t - val union : t -> t -> t - val inter : t -> t -> t - val disjoint : t -> t -> bool - val diff : t -> t -> t - val compare : t -> t -> elt - val equal : t -> t -> bool - val subset : t -> t -> bool - val iter : (elt -> unit) -> t -> unit - val map : (elt -> elt) -> t -> t - val fold : (elt -> 'a -> 'a) -> t -> 'a -> 'a - val for_all : (elt -> bool) -> t -> bool - val exists : (elt -> bool) -> t -> bool - val filter : (elt -> bool) -> t -> t - val filter_map : (elt -> elt option) -> t -> t - val partition : (elt -> bool) -> t -> t * t - val cardinal : t -> elt - val elements : t -> elt list - val min_elt : t -> elt - val min_elt_opt : t -> elt option - val max_elt : t -> elt - val max_elt_opt : t -> elt option - val choose : t -> elt - val choose_opt : t -> elt option - val split : elt -> t -> t * bool * t - val find : elt -> t -> elt - val find_opt : elt -> t -> elt option - val find_first : (elt -> bool) -> t -> elt - val find_first_opt : (elt -> bool) -> t -> elt option - val find_last : (elt -> bool) -> t -> elt - val find_last_opt : (elt -> bool) -> t -> elt option - val of_list : elt list -> t - val to_seq_from : elt -> t -> elt Seq.t - val to_seq : t -> elt Seq.t - val to_rev_seq : t -> elt Seq.t - val add_seq : elt Seq.t -> t -> t - val of_seq : elt Seq.t -> t - end -``` - -For sets of strings, it is even easier because the standard library provides a -`String` module with a type `t` and a function `compare`. If you were following -carefully, by now you must have guessed how to create a module to -manipulate string sets: - -```ocaml -# module String_set = Set.Make (String);; -module String_set : - sig - type elt = string - type t = Set.Make(String).t - val empty : t - val is_empty : t -> bool - val mem : elt -> t -> bool - val add : elt -> t -> t - val singleton : elt -> t - val remove : elt -> t -> t - val union : t -> t -> t - val inter : t -> t -> t - val disjoint : t -> t -> bool - val diff : t -> t -> t - val compare : t -> t -> int - val equal : t -> t -> bool - val subset : t -> t -> bool - val iter : (elt -> unit) -> t -> unit - val map : (elt -> elt) -> t -> t - val fold : (elt -> 'a -> 'a) -> t -> 'a -> 'a - val for_all : (elt -> bool) -> t -> bool - val exists : (elt -> bool) -> t -> bool - val filter : (elt -> bool) -> t -> t - val filter_map : (elt -> elt option) -> t -> t - val partition : (elt -> bool) -> t -> t * t - val cardinal : t -> int - val elements : t -> elt list - val min_elt : t -> elt - val min_elt_opt : t -> elt option - val max_elt : t -> elt - val max_elt_opt : t -> elt option - val choose : t -> elt - val choose_opt : t -> elt option - val split : elt -> t -> t * bool * t - val find : elt -> t -> elt - val find_opt : elt -> t -> elt option - val find_first : (elt -> bool) -> t -> elt - val find_first_opt : (elt -> bool) -> t -> elt option - val find_last : (elt -> bool) -> t -> elt - val find_last_opt : (elt -> bool) -> t -> elt option - val of_list : elt list -> t - val to_seq_from : elt -> t -> elt Seq.t - val to_seq : t -> elt Seq.t - val to_rev_seq : t -> elt Seq.t - val add_seq : elt Seq.t -> t -> t - val of_seq : elt Seq.t -> t - end -``` - -(the parentheses are necessary) - -## Defining Functors - -A functor with one argument can be defined like this: - - -```ocaml -module F (X : X_type) = struct - ... -end -``` - -where `X` is the module that will be passed as argument, and `X_type` is its -signature, which is mandatory. - -The signature of the returned module itself can be constrained, using this -syntax: - - -```ocaml -module F (X : X_type) : Y_type = -struct - ... -end -``` - -or by specifying this in the `.mli` file: - - -```ocaml -module F (X : X_type) : Y_type -``` - -Overall, the syntax of functors is hard to grasp. The best may be to look at -the source files -[`set.ml`](https://github.com/ocaml/ocaml/blob/trunk/stdlib/set.ml) or -[`map.ml`](https://github.com/ocaml/ocaml/blob/trunk/stdlib/map.ml) of the -standard library. +## Introduction + +In this tutorial, we look at how to apply functors and how to write functors. We also show some use cases involving functors. + +As suggested by the name, a _functor_ is almost like a function. However, while functions are between values, functors are between modules. A functor has a module as a parameter and returns a module as a result. A functor in OCaml is a parametrised module, not to be confused with a [functor in mathematics](https://en.wikipedia.org/wiki/Functor). + +**Prerequisites**: [Modules](/docs/modules). + +## Project Setup + +This tutorial uses the [Dune](https://dune.build) build tool. Make sure you have installed version 3.7 or later. We start by creating a fresh project. We need a folder named `funkt` with files `dune-project`, `dune`, and `funkt.ml`. + +```shell +$ mkdir funkt; cd funkt +``` + +Place the following in the file **`dune-project`**: +```lisp +(lang dune 3.7) +(package (name funkt)) +``` + +The content of the file **`dune`** should be this: +```lisp +(executable + (name funkt) + (public_name funkt) + (libraries str)) +``` + +Create an empty file `funkt.ml`. + +Check that this works using the `opam exec -- dune exec funkt` command. It shouldn't do anything (the empty file is valid OCaml syntax), but it shouldn't fail either. The stanza `libraries str` makes the `Str` module (which we will use later) available. + +## Using an Existing Functor: `Set.Make` + +The standard library contains a [`Set`](/api/Set.html) module which is designed to handle sets. This module enables you to perform operations such as union, intersection, and difference on sets. You may check the [Set](/docs/sets) tutorial to learn more about this module, but it is not required to follow the present tutorial. + +To create a set module for a given element type (which allows you to use the provided type and its associated [functions](/api/Set.S.html)), it's necessary to use the functor `Set.Make` provided by the `Set` module. Here is a simplified version of `Set`'s interface: +```ocaml +module type OrderedType = sig + type t + val compare : t -> t -> int +end + +module Make : functor (Ord : OrderedType) -> Set.S +``` + +Here is how this reads (starting from the bottom, then going up): +* Like a function (indicated by the arrow `->`), the functor `Set.Make` + - takes a module with signature `OrderedType` and + - returns a module with signature [`Set.S`](/api/Set.S.html) +* The module type `OrderedType` requires a type `t` and a function `compare`, which are used to perform the comparisons between elements of the set. + +**Note**: Most set operations need to compare elements to check if they are the same. To allow using a user-defined comparison algorithm, the `Set.Make` functor takes a module the specifies both the element type `t` and the `compare` function. Passing the comparison function as a higher-order parameter, as done in `Array.sort`, for example, would add a lot of boilerplate code. Providing set operations as a functor allows specifying the comparison function only once. + +Here is an example how to use `Set.Make`: + +**`funkt.ml`** + +```ocaml +module StringCompare = struct + type t = string + let compare = String.compare +end + +module StringSet = Set.Make(StringCompare) +``` + +This defines a module `Funkt.StringSet`. What `Set.Make` needs are: +- Type `t`, here `string` +- Function allowing to compare two values of type `t`, here `String.compare` + +However, since the module `String` defines +- Type name `t`, which is an alias for `string` +- Function `compare` of type `t -> t -> bool` compares two strings + +This can be simplified using an _anonymous module_ expression: +```ocaml +module StringSet = Set.Make(struct + type t = string + let compare = String.compare +end) +``` + +The module expression `struct ... end` is inlined in the `Set.Make` call. + +This can be simplified even further into this: +```ocaml +module StringSet = Set.Make(String) +``` + +In all versions, the module resulting from the functor application `Set.Make` is bound to the name `StringSet`, and it has the signature `Set.S`. The module `StringSet` provides the operations on sets of strings. The function `String.compare` is used internally by `StringSet`. + +When you run `opam exec -- dune exec funkt`, it doesn't do anything, but it shouldn't fail either. + +Let's add some code to `funkt.ml` so that it does something. + +**`funkt.ml`** +```ocaml +module StringSet = Set.Make(String) + +let _ = + In_channel.input_lines stdin + |> List.concat_map Str.(split (regexp "[ \t.,;:()]+")) + |> StringSet.of_list + |> StringSet.iter print_endline +``` + +Here is how this code works: +- `In_channel.input_lines` : reads lines of text from standard input +- `List.concat_map` : splits lines into words and produces a word list +- `StringSet.of_list : string list -> StringSet.t` : converts the word list into a set +- `StringSet.iter : StringSet.t -> unit` : displays the set's elements + +The functions `StringSet.of_list` and `StringSet.iter` are available in the functor's application result. + +```shell +$ opam exec -- dune exec funkt < dune +executable +libraries +name +public_name +str +funkt +``` + +There are no duplicates in a `Set`. Therefore, the string `"funkt"` is only displayed once, although it appears twice in the `dune` file. + +## Extending a Module with a Standard Library Functor + +Using the `include` statement, here is an alternate way to expose the module created by `Set.Make(String)`: + +**`funkt.ml`** +```ocaml +module String = struct + include String + module Set = Set.Make(String) +end + +let _ = + stdin + |> In_channel.input_lines + |> List.concat_map Str.(split (regexp "[ \t.,;:()]+")) + |> String.Set.of_list + |> String.Set.iter print_endline +``` + +This allows the user to seemingly extend the module `String` with a submodule `Set`. Check the behaviour using `opam exec -- dune exec funkt < dune`. + +## Functors Allows Parametrising Modules + +### Functors From the Standard Library + +A functor is almost a module, except it needs to be applied to a module. This turns it into a module. In that sense, a functor allows module parametrisation. + +That's the case for the sets, maps, and hash tables provided by the standard library. It works like a contract between the functor and the developer. +* If you provide a module that implements what is expected, as described the parameter interface +* The functor returns a module that implements what is promised, as described by the result interface + +Here is the module's signature that the functors `Set.Make` and `Map.Make` expect: +```ocaml +module type OrderedType = sig + type t + val compare : t -> t -> int +end +``` + +Here is the module's signature that the functor `Hashtbl.Make` expects: +```ocaml +module type HashedType = sig + type t + val equal : t -> t -> bool + val hash : t -> int +end +``` + +The functors `Set.Make`, `Map.Make`, and `Hashtbl.Make` return modules satisfying the interfaces `Set.S`, `Map.S`, and `Hashtbl.S` (respectively), which all contain an abstract type `t` and associated functions. Refer to the documentation for the details about what they provide: +* [`Set.S`](/api/Set.S.html) +* [`Map.S`](/api/Map.S.html) +* [`Hashtbl.S`](/api/Hashtbl.S.html) + +### Writing Your Own Functors + +One reason to write a functor is to provide a data structure that is parametrised. This is the same as `Set` and `Map` on other data structures. In this section, we take heaps as an example. + +There are many kinds of [heap](https://en.wikipedia.org/wiki/Heap_(data_structure)) data structures. Examples include binary heaps, leftist heaps, binomial heaps, or Fibonacci heaps. + +The kind of data structures and algorithms used to implement a heap is not discussed in this document. + +The common prerequisite to implement any heap is a means to compare the elements they contain. That's the same signature as the parameter of `Set.Make` and `Map.Make`: +```ocaml +module type OrderedType = sig + type t + val compare : t -> t -> int +end +``` + +Using such a parameter, a heap implementation must provide at least this interface: +```ocaml +module type HeapType = sig + type elt + type t + val empty : t + val is_empty : t -> bool + val insert : t -> elt -> t + val merge : t -> t -> t + val find : t -> elt + val delete : t -> t +end +``` + +Heap implementations can be represented as functors from `OrderedType` into `HeapType`. Each kind of heap would be a different functor. + +Here is the skeleton of a possible implementation: + +**heap.ml** +```ocaml +module type OrderedType = sig + type t + val compare : t -> t -> int +end + +module type S = sig + type elt + type t + val empty : t + val is_empty : t -> bool + val insert : t -> elt -> t + val merge : t -> t -> t + val find : t -> elt + val delete : t -> t +end + +module Binary(Elt: OrderedType) : S = struct + type elt = | (* Replace by your own *) + type t = | (* Replace by your own *) + (* Add private functions here *) + let is_empty h = failwith "Not yet implemented" + let insert h e = failwith "Not yet implemented" + let merge h1 h2 = failwith "Not yet implemented" + let find h = failwith "Not yet implemented" + let delete h = failwith "Not yet implemented" +end +``` + +Here, binary heaps is the only implementation suggested. This can be extended to other implementations by adding one functor per each, e.g., `Heap.Leftist`, `Heap.Binomial`, `Heap.Fibonacci`, etc. + + + +## Injecting Dependencies Using Functors + +**Dependencies Between Modules** + +Here is a new version of the `funkt` program: + +**`funkt.ml`** +```ocaml +module StringSet = Set.Make(String) + +module IterPrint : sig + val f : string list -> unit +end = struct + let f = List.iter (fun s -> Out_channel.output_string stdout (s ^ "\n")) +end + +let _ = + stdin + |> In_channel.input_lines + |> List.concat_map Str.(split (regexp "[ \t.,;:()]+")) + |> StringSet.of_list + |> StringSet.elements + |> IterPrint.f +``` + +It embeds an additional `IterPrint` module that exposes a single function `f` of type `string list -> unit` and has two dependencies: + - Module `List` through `List.iter` and `f`'s type + - Module `Out_channel` through `Out_channel.output_string` + +Check the program's behaviour using `opam exec -- dune exec funkt < dune`. + +**Dependency Injection** + +[Dependency injection](https://en.wikipedia.org/wiki/Dependency_injection) is a way to parametrise over a dependency. + +Here is a refactoring of the module `IterPrint` to use this technique: + +**`iterPrint.ml`** +```ocaml +module type Iterable = sig + type 'a t + val iter : ('a -> unit) -> 'a t -> unit +end + +module type S = sig + type 'a t + val f : string t -> unit +end + +module Make(Dep: Iterable) : S with type 'a t := 'a Dep.t = struct + let f = Dep.iter (fun s -> Out_channel.output_string stdout (s ^ "\n")) +end +``` + +The module `IterPrint` is refactored into a functor that takes a module providing the function `iter` as a parameter. The `with type 'a t := 'a Dep.t` constraint means the type `t` from the parameter `Dep` replaces the type `t` in the result module. This allows `f`'s type to use `Dep`'s `t` type. With this refactoring, `IterPrint` only has one dependency. At its compilation time, no implementation of function `iter` is available yet. + +**Note**: An OCaml interface file (`.mli`) must be a module, not a functor. Functors must be embedded inside modules. Therefore, it is customary to call them `Make`. + +**`funkt.ml`** + +```ocaml +module StringSet = Set.Make(String) +module IterPrint = IterPrint.Make(List) + +let _ = + stdin + |> In_channel.input_lines + |> List.concat_map Str.(split (regexp "[ \t.,;:()]+")) + |> StringSet.of_list + |> StringSet.elements + |> IterPrint.f +``` + +The dependency `List` is _injected_ when compiling the module `Funkt`. Observe that the code using `IterPrint` is unchanged. Check the program's behaviour using `opam exec -- dune exec funkt < dune`. + +**Replacing a Dependency** + +Now, replacing the implementation of `iter` inside `IterListPrint` is no longer a refactoring; it is another functor application with another dependency. Here, `Array` replaces `List`: + +**`funkt.ml`** +```ocaml +module StringSet = Set.Make(String) +module IterPrint = IterPrint.Make(Array) + +let _ = + stdin + |> In_channel.input_lines + |> List.concat_map Str.(split (regexp "[ \t.,;:()]+")) + |> StringSet.of_list + |> StringSet.elements + |> Array.of_list + |> IterPrint.f +``` + +Check the program's behaviour using `opam exec -- dune exec funkt < dune`. + +**Note**: The functor `IterPrint.Make` returns a module that exposes the type from the injected dependency (here first `List.t` then `Array.t`). That's why a `with type` constraint is needed. When parametrising other something not exposed by the module (and _implementation detail_), the `with type` constraint is not needed. + +## Write a Functor to Extend Modules + +In this section, we define a functor to extend several modules in the same way. This is the same idea as in the [Extending a Module with a Standard Library Functor](#extending-a-module-with-a-standard-library-functor), except we write the functor ourselves. + +In this example, we extend `List` and `Array` modules with a function `scan_left`. It does almost the same as `fold_left`, except it returns all the intermediate values, not the last one as `fold_left` does. + +Create a fresh directory with the following files: + +**`dune-project`** +```lisp +(lang dune 3.7) +``` +**`dune`** +```lisp +(library (name scanLeft)) +``` + +**`scanLeft.ml`** +```ocaml +module type LeftFoldable = sig + type 'a t + val fold_left : ('b -> 'a -> 'b) -> 'b -> 'a t -> 'b + val of_list : 'a list -> 'a t +end + +module type S = sig + type 'a t + val scan_left : ('b -> 'a -> 'b) -> 'b -> 'a t -> 'b t +end + +module Make(F: LeftFoldable) : S with type 'a t := 'a F.t = struct + let scan_left f b u = + let f (b, u) a = + let b' = f b a in + (b', b' :: u) in + u |> F.fold_left f (b, []) |> snd |> List.rev |> F.of_list +end +``` + +Run the `dune utop` command. Once inside the toplevel, enter the following commands. +```ocaml +# module Array = struct + include Stdlib.Array + include ScanLeft.Make(Stdlib.Array) + end;; + +# module List = struct + include List + include ScanLeft.Make(struct + include List + let of_list = Fun.id + end) + end;; + +# Array.init 10 Fun.id |> Array.scan_left ( + ) 0;; +- : int array = [|0; 1; 3; 6; 10; 15; 21; 28; 36; 45|] + +# List.init 10 Fun.id |> List.scan_left ( + ) 0;; +- : int list = [0; 1; 3; 6; 10; 15; 21; 28; 36; 45] +``` + +Modules `Array` and `List` appear augmented with `Array.scan_left` and `List.scan_left`. For brevity, the output of the first two toplevel commands is not shown here. + +## Initialisation of Stateful Modules + +Modules can hold a state. Functors can provide a means to initialise stateful modules. As an example of such, here is a possible way to handle random number generation seeds as a state: + +**`random.ml`** +```ocaml +module type SeedType : sig + val v : int array +end + +module type S : sig + val reset_state : unit -> unit + + val bits : unit -> int + val bits32 : unit -> int32 + val bits64 : unit -> int64 + val nativebits : unit -> nativeint + val int : int -> int + val int32 : int32 -> int32 + val int64 : int64 -> int64 + val nativeint : nativeint -> nativeint + val full_int : int -> int + val float : float -> float + val bool : unit -> bool +end + +module Make(Seed: SeedType) : S = struct + let state = Seed.v |> Random.State.make |> ref + let reset_state () = state := Random.State.make Seed.v + + let bits () = Random.State.bits !state + let bits32 () = Random.State.bits32 !state + let bits64 () = Random.State.bits64 !state + let nativebits () = Random.State.nativebits !state + let int = Random.State.int !state + let int32 = Random.State.int32 !state + let int64 = Random.State.int64 !state + let nativeint = Random.State.nativeint !state + let full_int = Random.State.full_int !state + let float = Random.State.float !state + let bool () = Random.State.bool !state +end +``` + +Create this file and launch `utop`. +```ocaml +# #mod_use "random.ml";; + +# module R1 = Random.Make(struct let v = [|0; 1; 2; 3|] end);; + +# module R2 = Random.Make(struct let v = [|0; 1; 2; 3|] end);; + +# R1.bits ();; +- : int = 75783189 + +# R2.bits ();; +- : int = 75783189 + +# R1.bits ();; +- : int = 774473149 + +# R1.reset_state ();; +- : unit = () + +# R2.bits ();; +- : int = 774473149 + +# R1.bits ();; +- : int = 75783189 +``` + +Modules `R1` and `R2` are created with the same state; therefore, the first calls to `R1.bits` and `R2.bits` return the same value. + +The second call to `R1.bits` moves `R1`'s state one step and returns the corresponding bits. The call to `R1.reset_state` sets the `R1`'s state to its initial value. + +Calling `R2.bits` a second time shows the modules aren't sharing states. Otherwise, the value from the first calls to `bits` would have been returned. + +Calling `R1.bits` a third time returns the same result as the first call, which demonstrates the state has indeed been reset. + +## Conclusion + +Functor application essentially works the same way as function application: passing parameters and getting results. The difference is that we are passing modules instead of values. Beyond comfort, it enables a design approach where concerns are not only separated in silos, which is enabled by modules, but also in stages stacked upon each other. diff --git a/data/tutorials/language/1ms_02_dune.md b/data/tutorials/language/1ms_02_dune.md new file mode 100644 index 0000000000..24cfda949e --- /dev/null +++ b/data/tutorials/language/1ms_02_dune.md @@ -0,0 +1,268 @@ +--- +id: libraries-dune +title: Libraries With Dune +short_title: Libraries With Dune +description: > + Dune provides several means to arrange modules into libraries. We look at Dune's mechanisms for structuring projects with libraries that contain modules. +category: "Module System" +--- + +## Introduction + +Dune provides several means to arrange modules into libraries. We look at Dune's mechanisms for structuring projects with libraries that contain modules. + +This tutorial uses the [Dune](https://dune.build) build tool. Make sure you have version 3.7 or later installed. + +**Requirements**: [Modules](/docs/modules) and [Functors](/docs/modules). + +## Minimum Project Setup + +This section details the structure of an almost-minimum Dune project setup. Check [Your First OCaml Program](/docs/your-first-program) for automatic setup using the `dune init proj` command. +```shell +$ mkdir mixtli; cd mixtli +``` + +In this directory, create four more files: `dune-project`, `dune`, `cloud.ml`, and `wmo.ml`: + +**`dune-project`** +```lisp +(lang dune 3.7) +(package (name wmo-clouds)) +``` + +This file contains the global project configuration. It's kept almost to the minimum, including the `lang dune` stanza that specifies the required Dune version and the `package` stanza that makes this tutorial simpler. + +**`dune`** +```lisp +(executable + (name cloud) + (public_name nube)) +``` + +Each folder that requires some sort of build must contain a `dune` file. The `executable` stanza means an executable program is built. +- The `name cloud` stanza means the file `cloud.ml` contains the executable. +- The `public_name nube` stanza means the executable is made available using the name `nube`. + +**`wmo.ml`** +```ocaml +module Stratus = struct + let nimbus = "Nimbostratus (Ns)" +end + +module Cumulus = struct + let nimbus = "Cumulonimbus (Cb)" +end +``` + +**`cloud.ml`** +```ocaml +let () = + Wmo.Stratus.nimbus |> print_endline; + Wmo.Cumulus.nimbus |> print_endline +``` + +Here is the resulting output: +```shell +$ opam exec -- dune exec nube +Nimbostratus (Ns) +Cumulonimbus (Cb) +``` + + +Here is the folder contents: +```shell +$ tree +. +├── dune +├── dune-project +├── cloud.ml +└── wmo.ml +``` + +Dune stores the files it creates in a folder named `_build`. In a project managed using Git, the `_build` folder should be ignored +```shell +$ echo _build >> .gitignore +``` + +In OCaml, each source file is compiled into a module. In the `mixtli` project, the file `cloud.ml` creates a module named `Cloud`, the file `wmo.ml` creates a module `Wmo` that contains two submodules: `Stratus` and `Cumulus`. + +Here are the different names: +* `mixtli` is the project's name (it means *cloud* in Nahuatl). +* `cloud.ml` is the OCaml source file's name, referred as `cloud` in the `dune` file. +* `nube` is the executable command's name (it means *cloud* in Spanish). +* `Cloud` is the name of the module associated with the file `cloud.ml`. +* `Wmo` is the name of the module associated with the file `wmo.ml`. +* `wmo-clouds` is the name of the package built by this project. + +The `dune describe` command allows having a look at the project's module structure. Here is its output: +```lisp +((root /home/cuihtlauac/caml/mixtli-dune) + (build_context _build/default) + (executables + ((names (cloud)) + (requires ()) + (modules + (((name Wmo) + (impl (_build/default/wmo.ml)) + (intf ()) + (cmt (_build/default/.cloud.eobjs/byte/wmo.cmt)) + (cmti ())) + ((name Cloud) + (impl (_build/default/cloud.ml)) + (intf ()) + (cmt (_build/default/.cloud.eobjs/byte/cloud.cmt)) + (cmti ())))) + (include_dirs (_build/default/.cloud.eobjs/byte))))) +``` + + +## Libraries + + +In OCaml, a library is a collection of modules. By default, when Dune builds a library, it wraps the bundled modules into a module. This allows having several modules with the same name, inside different libraries, in the same project. That feature is known as [_namespaces_](https://en.wikipedia.org/wiki/Namespace) for module names. This is similar to what module do for definitions; they avoid name clashes. + +Dune creates libraries from folders. Let's look at an example. Here the folder is `lib`: +```shell +$ mkdir lib +``` + +The `lib` folder is populated with the following files. + +**`lib/dune`** +```lisp +(library (name wmo)) +``` + +**`lib/cumulus.mli`** +```ocaml +val stratus : string +``` + +**`lib/cumulus.ml`** +```ocaml +let nimbus = "Cumulonimbus (Cb)" +``` + +**`lib/stratus.mli`** +```ocaml +val cumulus : string +``` + +**`lib/stratus.ml`** +```ocaml +let nimbus = "Nimbostratus (Ns)" +``` + +All the modules found in the `lib` folder are bundled into the `Wmo` module. This module is the same as what we had in the `wmo.ml` file. To avoid redundancy, we delete it: +```shell +$ rm wmo.ml +``` + +We update the `dune` file building the executable to use the library as a dependency. + +**`dune`** +```lisp +(executable + (name cloud) + (public_name nube) + (libraries wmo)) +``` + +**Observations**: +* Dune creates a module `Wmo` from the contents of folder `lib`. +* The folder's name (here `lib`) is irrelevant. +* The library name appears uncapitalised (`wmo`) in `dune` files: + - In its definition, in `lib/dune` + - When used as a dependency in `dune` + +## Library Wrapper Modules + +By default, when Dune bundles modules into a library, they are automatically wrapped into a module. It is possible to manually write the wrapper file. The wrapper file must have the same name as the library. + +Here, we are creating a wrapper file for the `wmo` library from the previous section. + +**`lib/wmo.ml`** +```ocaml +module Cumulus = Cumulus +module Stratus = Stratus +``` + +Here is how to make sense of these module definitions: +- On the left-hand side, `module Cumulus` means module `Wmo` contains a submodule named `Cumulus`. +- On the right-hand side, `Cumulus` refers to the module defined in the file `lib/cumulus.ml`. + + +Run `dune exec nube` to see that the behaviour of the program is the same as in the previous section. + +When a library folder contains a wrapper module (here `wmo.ml`), it is the only one exposed. All other file-based modules from that folder that do not appear in the wrapper module are private. + +Using a wrapper file makes several things possible: +- Have different public and internal names, `module CumulusCloud = Cumulus` +- Define values in the wrapper module, `let ... = ` +- Expose module resulting from functor application, `module StringSet = Set.Make(String)` +- Apply the same interface type to several modules without duplicating files +- Hide modules by not listing them + +## Include Subdirectories + +By default, Dune builds a library from the modules found in the same folder as the `dune` file, but it doesn't look into subfolders. It is possible to change this behaviour. + +In this example, we create subdirectories and move files there. +```shell +$ mkdir lib/cumulus lib/stratus +$ mv lib/cumulus.ml lib/cumulus/m.ml +$ mv lib/cumulus.mli lib/cumulus/m.mli +$ mv lib/stratus.ml lib/stratus/m.ml +$ mv lib/stratus.mli lib/stratus/m.mli +``` + +Change from the default behaviour with the `include_subdirs` stanza. + +**`lib/dune`** +```lisp +(include_subdirs qualified) +(library (name wmo)) +``` + +Update the library wrapper to expose the modules created from the subdirectories. + +**`wmo.ml`** +```ocaml +module Cumulus = Cumulus.M +module Stratus = Stratus.M +``` + +Run `dune exec nube` to see that the behaviour of the program is the same as in the two previous sections. + +The `include_subdirs qualified` stanza works recursively, except on subfolders containing a `dune` file. See the [Dune](https://dune.readthedocs.io/en/stable/dune-files.html#include-subdirs) [documentation](https://github.com/ocaml/dune/issues/1084) for [more](https://discuss.ocaml.org/t/upcoming-dune-feature-include-subdirs-qualified) on this [topic](https://github.com/ocaml/dune/tree/main/test/blackbox-tests/test-cases/include-qualified). + + + +## Conclusion + +The OCaml module system allows organising a project in many ways. Dune provides several means to arrange modules into libraries. +