Understanding Rust

The Rust programming language has been hyped as a system programming language, which implies that it has to have an explicit “access” to the hardware-level abstractions, such as the Stack, the Heap and the Procedures, together with the operating system-level abstractions, such as following the particular calling conventions (a set of standardized interfaces) for an “arch and OS /duo”, and to be “aware of” the ABIs in general.

Overview

Rust is, in principle, an imperative language. The order of statements matters (a lot, giving the changes of ownership), expressions are just an addition, and there is no (and never will be) referential transparency.

Everyone who talks about Rust as a “almost a functional language” or a “language much like Standard ML” is an unqualified clueless bullshitter. Imperative code (flows of statements) and the referential transparency property are alternatives - it is either one or another.

For an imperative language one has to trace or simulate in ones mind the flow of execution of statements in order to understand the program. Only with this mental tracing one could “see” the current state of the system as a current “snapshot” of all its memory locations, which is what variables are in imperative languages.

The concurrent multi-threaded access and destructive mutations together result in a non-determinism, so such a “snapshot” of all the variables cannot be obtained in principle (only a possibly infinite set of all possible such snapshots) .

In the classic languages we have immutable bindings and all the “state” is wrapped and “hidden” inside of closures and is still described declaratively (what eventually has to be done with it in all possible cases).

The principles

The Rust compiler imposes (and reinforces) a strict discipline of what can be done with variables and references, and given that, does the “tracing” for you to make sure that certain conditions (situations) do not arise – it signals an error at a compile time (instead of producing a runtime errors as a result of a runtime checks).

The real major innovation is to threat all the references (specialization of pointers, which are just offsets) “explicitly” by restricting and formalizing their possible “behavior”. References became “typed” or lifted into (are recognized by) the type system.

Making the imperative notion of an “ownership” – which variable has the data in memory right now (in time) – explicit, and tracked by the compiler, is another innovation.

In the Classic languages we have no notion of time, which is, in principle, the right understanding.

The rather amateur and awkward notions of “borrows”, “permissions” and “lifetimes” have been introduced to explain and implement the typing rules.

The borrows from FP

A few fundamental concepts has been borrowed from the Classic World. The refs from Standard ML (where they are part of the core language) and from Haskell, where they are just an Abstract Data Type, defined in the standard library.

In these languages the refs first-class values, which themselves are immutable. Semantically they are similar to syntactic closures, which capture (and carry along) all the values of its bindings.

The fundamental referential transparency property still holds for refs as immutable bindings, so there is no imperative issues with mutation and (together with) aliasing.

Refs are passed along explicitly, and being an ADT (or even an instance of a Monad) protect the “state” (actual values) from being leaked (or even observed).

The rules

Again, since Rust is an imperative language, so the notions of “time” and flow of control are central to simulating and thus understanding of the current state of a program.

Rust formalized and restricts the possible behaviors (in the code) by enforcing the set of informal (but actually implemented inside the compiler) rules.

Variables are locations in memory (either on the stack or the heap).

Variables or function arguments “own” the values in memory.

A single owner at any time. Ownership “moves” between owners.

References do not “own” the values they refer to. Variables do.

Any number of Immutable references to a variable are allowed.

At most one Mutable reference to a variable (which owns) at a time.

Taking a mutable reference (moment in time) invalidates all the other refs.

Change in (moving of an) ownership invalidates prior variables (in time).

“Dropping” of the last “owner” returns the ownership to the previous one.

Nothing to see there

Nothing is “profound” there. It is just a systematic attempt to formalize and restrict the possible behavior of an imperative code, by borrowing some fundamental ideas from the Classic Languages.

An advanced imperative language

Aside from the restrictions and rules described above, Rust is just an ordinary imperative language, with all the usual imperative issues, just as with an assembly language, which is an “ultimate location-based imperative code”.

Everything is a location in memory (either on the stack, the heap or in the registers). The content is being “moved” (actually - copied) between locations (which include registers).

There is a reason why the most used instructions is called mov. It implicitly signals an invalidation of “the source”, and that “the destination” is now the right (current) location of the content.

Using the “old locations” thus is a fundamental imperative problem (among others).

The procedures calling issues are still out there

passing by value (a copy)
passing by a reference (a copy of a pointer)
returning by a value (a copy)
returning a reference (a pointer to a location)

The Move semantics by default

Rust “moves” the contents of a variable in assignment statements and “passing by value” procedure calls by implicitly invalidating (making inaccessible) the “source” location for any code that follows (in time and in the same scope).

C and C++ copy contents of variables in assignment and procedure calls. C++ implements the move semantics explicitly via wrapping (std::move).

The concepts

Less wrong with less bullshit.

Variable

Semantically equivalent to a named memory location (either on the stack or on the heap) with has an implicit address (an offset) and an explicit name (a symbolic reference) associated with it.

A variable is dropped when the control reaches the end of its scope. For a wrapped types “destructors” are guaranteed to be called.

This is similar to RAII (which is a set of rules enforced by the compiler) of C++.

Unlike C++, however, this is not optional, is implicit and is uniformly enforced by the compiler (for every variable of any type).

Reference

A “numeric reference” (an offset in bytes) also known as an address of a location in memory (either on the stack or on the heap) which is being used as a storage of the contents of some variable.

Ownership

Which particular memory location (an imperative variable) currently (right now in time) holds the current (the most recent in time) copy of a contents (the data).

This information is systematically being collected, maintained and used by the type system to prevent so-called “undefined behavior” or an “unsafe use” of the older (in time) contents of variables (memory locations).

Moving of the ownership

A variable becomes an “owner” of the contents of the “source” as a result of the imperative assignment operation (from a source to the target), which is technically a copying (without erasing the contents at the source location) but conceptually a “move”, since at a CPU level the mov instruction is used.

The abstract “ownership” thus “moves” (inside the compiler) together with the “concrete” contents (between particular memory locations tracked by the compiler).

A function parameter becomes an owner of the contents when a procedure is being called with passing by value semantics.

When a procedure being called with the passing by a reference semantics (explicitly uses a reference as the types of its argument) the ownership of the contents is does not change.

Borrowing

Taking a reference without taking an “ownership”. The “owner” (the variable) still owns what you “took”.

The concept is logically flawed because a reference (a specialization of a pointer) is by defending not the same as the content of a variable (in a particular location in memory). A “reference” is a pointer to (an offset or an address of) that very location.

The borrow-checker

The part of an implementation of the type system, which checks and reinforces the rules for references, both mutable and immutable (as being implicitly “typed”).

Lifetime

The principle: No reference (or a “borrow”) can ever /outlive the “owner”.

Once the “owner” goes out of the scope (being dropped) the ownership is “returned” to the previous owner, if any

The contents is deallocated if this was the only “owner”.

Wrapping in ADTs

Just like any other languages, Rust uses Abstract Data Types (ADTs, popularized by Barbara Liskov) to wrap and hide the “primitives”, and to provide “richer” abstractions on top of the primitives.

The notion of an ADT is more general that any particular programming language, and can be traced back to how mathematicians define their abstractions in terms of a set of all possible operations and an implicit set of relations or “laws” among them).

Thus the notion of an array is not just an offset to a continuous block of memory with syntactic sugar around the offset calculation and dereferencing, but become an ADT, which wraps additional values, such as size and capacity together with an actual pointer into a structured value.

These implementation details, however, has been hidden behind an ADT (such as Vec) and only the abstracted out interface has been exported.

Slices are just an ADT with a corresponding syntactic sugar.

In Rust an ADT could be defined as a struct (which is a set of all values with have a particular inner structure) or as a trait (which defines a subset of values that implements a particular set of required interfaces along with implicit “laws” and invariants) or as combinations of traits.

The “smart pointers” which “own” the data allocated on the heap are just ADTs.

Everything “high-level” within Rust is defined as some ADTs, beginning from the most basic Box and Rc types, which serve as the building blocks for Vec and the other collection types.

The ADTs and the Duck-Typing are the universal notions of all programming.

Copying

Copy-able ADTs (“values” of which can be copied) have to implement the Copy trait.

Overview#

The principles#

The borrows from FP#

The rules#

Variables are locations in memory (either on the stack or the heap).#

Variables or function arguments “own” the values in memory.#

A single owner at any time. Ownership “moves” between owners.#

References do not “own” the values they refer to. Variables do.#

Any number of Immutable references to a variable are allowed.#

At most one Mutable reference to a variable (which owns) at a time.#

Taking a mutable reference (moment in time) invalidates all the other refs.#

Change in (moving of an) ownership invalidates prior variables (in time).#

“Dropping” of the last “owner” returns the ownership to the previous one.#

Nothing to see there#

An advanced imperative language#

The Move semantics by default#

The concepts#

Variable#

Reference#

Ownership#

Moving of the ownership#

Borrowing#

The borrow-checker#

Lifetime#

Wrapping in ADTs#

Copying#