DConf'24 London - September 17 2024

Avoid the Garbage Collector in 80 lines

Dennis Korpel

https://github.com/dkorpel/dconf

Garbage Collection

Memory is automatically managed by occasionally pausing all threads and scanning for memory still in use, and freeing the rest.*

*we'll get back to this
https://github.com/dkorpel/dconf

Phobos API

import std.stdio, std.process;

void printPath()
{
    string s = environment.get("PATH");
    writeln(s);
}
https://github.com/dkorpel/dconf

Windows API

import std.stdio, core.sys.windows.windows;

void printPath()
{
    const lengthZ = GetEnvironmentVariableW("PATH", null, 0);
    wchar[] buf = new wchar[lengthZ];
    const len = GetEnvironmentVariableW("PATH", buf.ptr, buf.length);
    writeln(buf[0 .. len]);
}
https://github.com/dkorpel/dconf

Conclusion

Good thing D has a Garbage Collector πŸ‘

https://github.com/dkorpel/dconf

Except...

  • Some people don't like the GC
  • I tried @nogc approaches so you don’t have to!
  • Often awkward
    • until epiphany:
  • @safe @nogc allocator in just 80 lines
  • Built on top of malloc and free
https://github.com/dkorpel/dconf

Image

https://github.com/dkorpel/dconf

BLUF (bottom line up front)

https://github.com/dkorpel/dconf
string environmentGet(string key, return scope Allocator alloc = gc)
{
    // return new char[length];
    return alloc.array!char(length);
}

void main()
{
    Arena a;

    string s = environmentGet("PATH", a.alloc);
    writeln(s);
    writeln(environmentGet("TMP", a.alloc));

    // a.~this();
}
https://github.com/dkorpel/dconf

Whoami

  • Msc. Computer Science TU Delft
  • Part time Issue Manager for D Language Foundation
  • Part time D programmer at SARC
https://github.com/dkorpel/dconf

Coming up

  • On GC avoidance
  • On simplicity
  • 6 suboptimal @nogc approaches
  • The 80 line solution

♻️ On GC avoidance

https://github.com/dkorpel/dconf

The GC is controversial

  • I find myself on neither side of the debate
  • "GC makes D bad for real-time apps"
  • "But there's @nogc"
  • "But then you lose most of Phobos"
  • Perhaps use Reference Counting in Phobos?
https://github.com/dkorpel/dconf

(Automatic) Reference counting

struct RefCountedString {
    string* payload;
    int* count;

    this(string s) {
        payload = malloc(s.length);
        count = new int(1);
    }

    this(this) { ++*count; }

    ~this() {
        if (--*count == 0) free();
    }
}
https://github.com/dkorpel/dconf

Example: 🎹 Audio programming

float phase = 0;

void audioCallback(float[] buffer)
{
    foreach (i; 0 .. buffer.length)
    {
        buffer[i] = sin(phase);
        phase += 0.0576;
    }
}

48 Khz sample rate, 10 ms latency β‡’ 480 samples

https://github.com/dkorpel/dconf

Garbage collector comes!

Image

https://github.com/dkorpel/dconf

Deadline missed?

  • No, audio thread is 'detached' from GC
  • What if we want to load a sample in audioCallback?
  • std.stdio uses GC 😲
  • But Reference Counting wouldn't have helped
https://github.com/dkorpel/dconf

Audio guidelines

  • No locks
  • No malloc
  • No file I/O
https://github.com/dkorpel/dconf

@nogc should have a reason

Image ⚠️

https://github.com/dkorpel/dconf

@nogc should have a reason

Image βœ…

🧢 On simplicity

https://github.com/dkorpel/dconf

1960s: Linear Congruential Generator

int seed = 1;
int RANDU()
{
    seed = seed * 65539 + 0;
    return seed;
}

"Truly horrible" - Donald Knuth

MERSENNE TWISTER

https://github.com/dkorpel/dconf

1997: MERSENNE TWISTER

  • Rectifies flaws of older PRNGs
  • Used by Excel, Matlab, GNU octave
  • And Phobos (std.random: MersenneTwisterEngine)
  • Fails TestU01 Big Crush test (2007)
https://github.com/dkorpel/dconf

2014: PCG Random

  • Passes TestU01 suite
  • More complex than the twister?
  • Nope, just LCG with good constants and a tweak
https://github.com/dkorpel/dconf

Just permute the LCG

1011101000000010100100100010100010010011010000100010111111011001
10111              |
  | [01000000010100100100010100010010]
  |                |
  +----------(rotate_bits)
                   |
    10100100100010100010010][010000000 ---> output
https://github.com/dkorpel/dconf

Full implementation:

uint randomPcg32(ref ulong seed)
{
    const ulong x = seed;
    seed = x * 0x5851F42D4C957F2D + 0x14057B7EF767814F;
    uint xorshifted = cast(uint)(((x >> 18UL) ^ x) >> 27UL);
    uint rot = cast(uint)(x >> 59UL);
    return (xorshifted >> rot) | (xorshifted << ((-rot) & 0b11111));
}
https://github.com/dkorpel/dconf

Anybody can come up with a a complex solution. A simple one takes genius. You know it's genius when others say: "phui, anyone could have done that!" Except that nobody did.

-Walter Bright

https://github.com/dkorpel/dconf

Reference Counting is complex

  • Truly horrible - Donald Knuth
  • Spawns lots of language features
    • __mutable / __metadata storage class (DIP1xxx)
    • Argument Ownership and Function Calls (DIP1021)
  • Complicates types
    • string vs RefCountedString
    • str vs String
https://github.com/dkorpel/dconf

GC is... difficult

  • Simple for user
  • Complex to implement in systems language
    • Requires program-wide knowledge
    • False pointers
    • Non-portable (No WASM implementation yet)

6 suboptimal @nogc solutions

https://github.com/dkorpel/dconf

0. Manually free

void main()
{
    string s = environmentGet("PATH");
    writeln(s);
    free(s.ptr);
}
https://github.com/dkorpel/dconf

0. Manually free

void main()
{
    string s = environmentGet("PATH");
    scope(exit)
        free(s.ptr);
    writeln(s);
}
https://github.com/dkorpel/dconf

0. Manually free

  • malloc ⟺ free
  • COM programming with ITypeInfo and IMoniker:
  • GetFuncDesc ⟺ ReleaseFuncDesc
  • GetVarDesc ⟺ ReleaseVarDesc
  • GetNames ⟺ SysFreeString
  • GetDisplayName ⟺ CoTaskMemFree
https://github.com/dkorpel/dconf

0. Manually free

Documentation suggests IMalloc::Free

void getString(IMoniker moniker, IBindCtx ctx)
{
    BSTR displayName;Β  Β  Β  Β 
    moniker.GetDisplayName(ctx, null, &displayName);

    writeln(displayName.fromStringz);

    IMalloc allocator;
    CoGetMalloc(1, &allocator);
    allocator.Free(displayName);
    allocator.release();
}
https://github.com/dkorpel/dconf

0. Manually free

  • Simple (if you don't go nuts)
  • Risky (memory leaks, double free)
  • @live functions offer some protection
    • But doesn't distinguish GC/malloc pointers
https://github.com/dkorpel/dconf

The borrow checker makes it safe, right?

void main() @live
{
    int* x = cast(int*) malloc(4);
    free(x);
}
https://github.com/dkorpel/dconf

Right?
...

void main() @live
{
    int* x = new int;
    free(x); // No error, by design
}
https://github.com/dkorpel/dconf

1. Don’t allocate

void main()
{
    string paths = "C:/dmd;C:/ldc2";
    foreach (string path; paths.splitter(';'))
    {
        writeln(path);
    }
}
  • Return lazy ranges instead of arrays
  • Annoying to write for recursive algorithms

Image

https://github.com/dkorpel/dconf

1. Don’t allocate

Voldemort Types can be annoying

import std.stdio, std.path;

void main()
{
    File f = File(withExtension("basilisk", ".txt"));
    // Error: none of the overloads of `this` are
    // callable using argument types `(Result)`

    import std.array;
    File g = File(withExtension("basilisk", ".txt").array);
}
https://github.com/dkorpel/dconf

2. Stack memory

  • Automatically cleaned up
  • Can’t return it
char[] environmentGet(string var)
{
    char[32768] buf = void;
    // GetEnvironmentVariable(var, buf[]);
    return buf[]; // Error
}
https://github.com/dkorpel/dconf

2. Stack memory

  • Annoying to call
  • Small, fixed sizes only
void main()
{
    char[32768] buf;
    const str = environmentGet("PATH", buf[]);
}
https://github.com/dkorpel/dconf

3. OutputRanges / Appenders

void environmentGet(O)(string name, ref O sink)
{
    import std.range: put;
    put(sink, "...");
}

void main()
{
    import std.array : Appender;

    Appender!string appender;
    environmentGet("PATH", appender);
    string result = appender.data;
}
https://github.com/dkorpel/dconf

3. OutputRanges / Appenders

  • Annoying to write
  • Annoying to call (doesn't compose)
    • Can't do environmentGet("PATH").splitter(';')
  • Still need a @nogc Appender
    • Hard to make @safe
https://github.com/dkorpel/dconf

4. Null garbage collection

Memory is automatically managed by occasionally pausing all threads and scanning for memory still in use, and freeing the rest.

https://github.com/dkorpel/dconf

4. Null garbage collection

  • "Everybody thinks about garbage collection the wrong way" - Raymond Chen
  • Simulating a computer with infinite memory
  • Null garbage collector: never deallocate
  • Works if enough RAM
https://github.com/dkorpel/dconf

4. Null garbage collection

Amusing story from Kent Mitchell
Image

https://github.com/dkorpel/dconf

4. Null garbage collection

  • DMD does this (unless dmd -lowmem)
  • ctod does this for WebAssembly
  • "Out Of Memory" risk
https://github.com/dkorpel/dconf

5. Scope Array

  • Extension of stack memory
  • Examples:
    • std.internal.string: tempCString
    • dmd.common.string: SmallBuffer
https://github.com/dkorpel/dconf

5. Scope Array

struct ScopeArray(T)
{
    T[32] stackMem;
    T[] big;

    this(size_t length)
    {
        if (length > stackMem.length)
            big = malloc(T.sizeof * length);
    }

    T[] opIndex() => big ? big[] : stackMem[];

    ~this() { if (big.ptr) free(big.ptr); }
}
https://github.com/dkorpel/dconf

5. Scope Array

Length must be given upfront

void main()
{
    auto a = ScopeArray!char(length: 1024);

    char[] path = environmentGet("PATH", a[]);

    writeln(path);

    // a.~this();
}
https://github.com/dkorpel/dconf

5. Scope Array

Unless... πŸ€”

void main()
{
    auto a = Arena();

    char[] path = environmentGet("PATH", &a);

    writeln(path);

    // a.~this();
}

🏟️ The 80 line solution

https://github.com/dkorpel/dconf

Arenas

struct Arena
{
    ubyte[] buffer;
    ArenaPage* page = null;
}

struct ArenaPage
{
    ArenaPage* previous;
    // variable number of bytes follow
}
https://github.com/dkorpel/dconf
Stack buffer D3CCA32C 5DCA0528 0B382238 7D906630 BD79AE22 malloc() null 7D906630 BD79AE22 62B4D675 8D475B4C 2F4890E8 E33D4B24 malloc() 0x7820A8 < 5B4C8D47 90E82F48 4B24E33D 2F4890E8 malloc() 0x741BB0 < 475B4C8D 4890E82F 3D4B24E3 E82F4890 F4E82890 0E82F489 82F4890E Arena page buffer <
https://github.com/dkorpel/dconf

Works with small buffer optimization

void heap()
{
    Arena a;
    ubyte[] res = a.allocate(100); // heap allocates
}

void stack()
{
    ubyte[512] buf = void;
    Arena a = Arena(buf[]);
    ubyte[] res = a.allocate(100); // uses stack buffer
}
https://github.com/dkorpel/dconf

Allocator interface

  • Arena could be passed around by ref or pointer
  • But we want something extensible
abstract class Allocator
{
    ubyte[] allocate(size_t size, size_t alignment);
}

class Arena : Allocator;
class GcAllocator : Allocator;
class FailAllocator : Allocator;
https://github.com/dkorpel/dconf
struct Allocator
{
    AllocatorBase* x;
}

struct AllocatorBase
{
    immutable AllocFunc allocate;
}

alias AllocFunc = ubyte[] function(size_t size, size_t alignment, void* this_);

struct Arena
{
    AllocatorBase base; // Old school struct inheritance
    ubyte[] buffer;
    ArenaPage* page;
}
https://github.com/dkorpel/dconf

Why are you re-inventing classes?

  • No druntime dependency
  • Reduce redundant pointers
  • C compatibility
  • Implementing allocators is low-level anyway
https://github.com/dkorpel/dconf
struct Arena
{
    Allocator alloc() return => Allocator(&this);
}

struct Allocator
{
    AllocatorBase* base;

    T[] array(T)(size_t length) =>
        cast(T[]) base.allocate(T.sizeof * length);
}

void main()
{
    Arena a;
    char[] str = a.alloc.array!char(128);
}
https://github.com/dkorpel/dconf

Allocator should have GC default

string environmentGet(string name, Allocator alloc = gc)
{
    return alloc.array!char(n);
}
https://github.com/dkorpel/dconf

"Hannah Montana functions"

void main()
{
    string s = environmentGet("PATH");

    Arena a;
    string s = environmentGet("PATH", a.alloc);
}

🎀 Best of both worlds!

https://github.com/dkorpel/dconf

It can be @safe

// Requires `-preview=dip1000`
string environmentGet(string name, return scope Allocator alloc = gc);

string global;

void main() @safe
{
    global = environmentGet("PATH", gc); // Fine

    Arena a;
    global = environmentGet("PATH", a.alloc); // Error
}
https://github.com/dkorpel/dconf

But what about @nogc

  • There's return scope, but no @inout_nogc
  • DIPs for callback attributes still pending
  • Cheat: pretend it is @nogc
  • Hot take: @nogc should not be part of function type
  • Linting tool instead
https://github.com/dkorpel/dconf

Allocator can be stored

struct Array(T)
{
    T[] slice;
    size_t capacity;
    Allocator alloc;
}
https://github.com/dkorpel/dconf

No more invalidation problem

void main() @safe
{
    import automem.vector;
    auto vec1 = vector(1, 2, 3);
    int[] slice1 = vec1[];
    vec1.reserve(4096);
    int[] slice2 = vec1[];
}

Just don't free when growing the array πŸ™ˆ

https://github.com/dkorpel/dconf

Overhead can be reduced

  • Memory mapping instead of linked list
  • Non-portable
https://github.com/dkorpel/dconf

Resources

https://github.com/dkorpel/dconf

It cleaned up my code

  • Deleted tons of destructors and free() calls
  • Less @trusted annotations
  • Deleted ScopeArray, Stack, and NogcAppender
    • Array is all you need πŸ₯°
  • Found more uses for the pattern
https://github.com/dkorpel/dconf

GPU Memory mapping

void copying()
{
    float[] data;
    data ~= 3.0;
    data ~= 4.0;
    glBufferSubData(buffer, data);
}

void memoryMapping()
{
    float[] data = glMapBufferRange(buffer, 2 * float.sizeof);
    size_t i = 0;
    data[i++] = 3.0;
    data[i++] = 4.0;
    glUnmapBuffer(buffer);
}
https://github.com/dkorpel/dconf

Map ⋍ alloc, unmap ⋍ free

{
    auto mapper = Mapper(buffer, 2); // Arena
    scope float[] mappedBuffer = mapper[];

    auto data = Array!float(storage: mappedBuffer, failAllocator);
    data ~= 3.0;
    data ~= 4.0;

    // mapper.~this() unmaps
}
https://github.com/dkorpel/dconf

Ugly signatures

Clutter from long parameter declaration

string environmentGet(string name);

// vs

string environmentGet(string name, return scope Allocator alloc = gc);
https://github.com/dkorpel/dconf

Context struct could help

Language feature of Jai and Odin

main :: proc()
{
    context.user_index = 456
    {
        context.allocator = my_custom_allocator()
        context.user_index = 123
        supertramp() // `context` is implicitly passed
    }
    assert(context.user_index == 456)
}
https://github.com/dkorpel/dconf
struct Context
{
    Allocator allocator;
    Allocator temp_allocator;
    Assertion_Failure_Proc assertion_failure_proc;
    Logger logger;
    Random_Generator random_generator;

    void* user_ptr;
    ptrdiff_t user_index;

    // Internal use only
    void* _internal;
}

Wrapping up

https://github.com/dkorpel/dconf

Suggested GC strategy

  • Write code as if you have infinite memory
    • (Optimization for a known future is okay)
  • If you need to avoid the GC
    • Replace new T[] with allocator.array!T
    • Place Arena / Allocator where needed
    • Use -preview=dip1000 for @safe
    • Otherwise it's @system/@trusted
https://github.com/dkorpel/dconf

Takeaways

  • Look for simple solutions
  • Calling free() is not @safe
  • End-of-scope cleanup can be @safe with scope
  • Give it a try!
https://github.com/dkorpel/dconf

Avoid the Garbage Collector in 80 slides

Dennis Korpel

https://github.com/marp-team/marpit/tree/main/docs

https://github.com/marp-team/marpit/blob/main/docs/image-syntax.md

Callback needs to compute 480 samples with a strict deadline

### Image credits: - [Missile](https://commons.wikimedia.org/w/index.php?curid=120217981) by Marcus Burns, CC BY 3.0 - ----------------------------------------------------------- ### Quite Okay Formats - Dominic Szablewski - Similar to PNG / MP3 - Encoder / decoder are 400 lines <!- -_footer: https://qoiformat.org/- -> ----------------------------------------------------------- ## Partial / transitive scope ```D struct Context { string source; HashMap!(string, Declaration) symbolTable; Token[] tokens; // ... } ``` * Inference can be programmed * But how much `scope[...]` syntax do we want? ----------------------------------------------------------- ## Syntax woes * `return scope` vs. `scope return` still here * Inference by default would help * `ret&` and `retscope`? ----------------------------------------------------------- ### Scoped pointers status update * Refactored `dmd/escape.d` this year * Number of `if` statements 310 β‡’ 240 * `scope` inference still needs work * Nested functions + `scope` still broken * Want partial / transitive scope for structs ----------------------------------------------------------- ### 4. Null garbage collection ![Image height:500](img/walter-null-gc.png) <!- -_footer: https://youtu.be/bNJhtKPugSQ?si=z-USHbQyKkhrlH9c&t=579 - -> ----------------------------------------------------------- # Unpredictable lifetimes * What if allocation depends on user input? * Pre-allocate a pool * Roll your own free-list, bitmapped block, GC... <!- -_footer: https://bitbashing.io/gc-for-systems-programmers.html- -> <!- -https://forum.dlang.org/post/tnajfjmvvyqardwhxegi@forum.dlang.org- -> ----------------------------------------------------------- ## Concatenation ```D struct Person { string name, surname; string toString() const => name ~ " " ~ surname; } void main() { char[] s = "Hello " ~ Person("Dennis", "Korpel").toString() ~ "!"; } // "Dennis " // "Dennis Korpel" // "Hello Dennis Korpel" // "Hello Dennis Korpel!" ``` ----------------------------------------------------------- ``` string environmentGet(string key, return scope Allocator alloc = gc) { //int n = GetEnvironmentVariableA(key.ptr, null, 0); char[] buf = alloc.array!char(n); //GetEnvironmentVariableA(key.ptr, buf.ptr, buf.length); return buf[0 .. n]; } ``` ----------------------------------------------------------- ### GC complexity ![Image height:480](img/gc-connections.png) <!- -_footer: https://github.com/dlang/dmd/tree/master/druntime/src/core/internal/gc- -> ----------------------------------------------------------- # GC Phobia ![Image height:250](img/newsgroup-gc-phobia.png) <!- -_footer: https://forum.dlang.org/post/mailman.147.1702920162.3719.digitalmars-d-learn@puremagic.com- -> ----------------------------------------------------------- ### | Do 😎 | Don't ☹️ | |--------------------------|------------------------------| | Free big chunks | Pair each malloc ⟺ free | | Free at end of scope | Manually call free | | Return simple values | Be annoying on the call site | ----------------------------------------------------------- ## Context struct could help ![Image height:500](img/jblow-context.png) <!--_footer: https://youtu.be/uZgbKrDEzAs?si=clQT4OAd4j66KJ8i&t=2303- -> ----------------------------------------------------------- ### std.experimental.allocator is BIG * Interface + composable building blocks * 20 000 lines of code 😲 * Overkill for my purposes <!- -_footer: https://dlang.org/phobos/std_experimental_allocator.html- ->