github twitter linkedin email
Exploring MMAP In Go
Jul 24, 2022
6 minutes read

During the week I stumbled upon an experimental implementation of memory mapping in the Golang standard library, package Mmap. I decided to explore the standard library package and dig deeper into the concept of memory mapping — This article is the product of my exploration. In this post, I’ll attempt to answer some questions I initially had & assume the reader will have about MMAP.

What is Memory Mapping?

Memory mapping is a mechanism that maps a region of a file from disk to memory allowing read and write operations on file [contents] by simply accessing the address of the mapped memory. When a file is memory-mapped, the file is placed in a location in a segment of virtual memory. The segment of virtual memory will contain a reference to the file on disk — a direct relationship between the segment of virtual memory and the file on disk is established. The running process can access the memory-mapped file through pointers as if the entire file was in memory. It is important to note that memory mapping isn’t exclusive to files but can be carried out on other resources like shared memory or any other file-like resource. Memory mapping files allow shared memory between processes. Separate processes can access the same memory-mapped file. MMAP-image

What is mmap?

mmap is a Unix syscall that maps files into memory. The mmap call implements the memory mapping mechanism we discussed above. The mmap MAN page gives a great description of what mmap does. This is the Mmap syscall

*void *mmap(void *addr, size_t length, int prot, int flags,int fd, off_t offset )

It takes in seven arguments, and very quickly we shall discuss each of these parameters to get a better picture of what is required for the memory-mapping mechanism to take place.

addr - This parameter is the starting address for the new mapping. It is nullable and in such a case the kernel does the job of choosing the address to start the mapping.

length - The length parameter is used to specify the number of bytes or space that should be mapped into the memory starting from the offset which is another parameter we will describe below.

offset - This is used to specify where the mapping should start from! Mentally, I think of this the same way I think of offset in the context of pagination. Typically, this is parameter is set to 0, allowing the mapping to start at the file header. It is important to note that the offset is required to be a multiple of the page size.

prot — I think of the prot parameter as a way to specify the permission of the mapping, The MAN page describes the prot param more clearly “The prot argument describes the desired memory protection of the mapping”. An example of a possible prot argument is PROT_NONE(specifying no page should be accessed) can be found on the MAN page.

fd — This is a required parameter. It is the file descriptor that would be mapped to the address space.

flags — This is the last required parameter, it specifies the relationship between the memory map, other processes in the same address space and the underlying resource or file. The values can be MAP_SHARED indicating that updates to the mapping affect the underlying resource, MAP_PRIVATE which tells the kernel to ensure that the mapping should be kept private & MAP_SHARED_VALIDATE which is quite similar to the MAP_SHARED option.

Although the Syscall package that provides an interface to the low-level operating system primitives such as the Mmap function has existed since the earliest Go versions with the Mmap functionality, recently a wrapper around the low-level Unix mmap primitive is being developed by the Go team. The Mmap package is still experimental and under development but is exciting. Prior to the Go-Mmap package memory-mapping files with Go has been painstaking and scary because of how it was unsafe. The Mmap package provides some safe functions to perform memory-mapping operations.

The Mmap package has a ReaderAt struct which implements the io.ReadAt interface. The ReaderAt struct has an unexported field of type []byte which holds the memory-mapped file. The []byte field is unexported to ensure safety, having it publicly accessible wouldn’t guarantee that it won’t be overwritten. This memory-mapped file byte contained in the struct can be accessed safely by the ReadAt function which takes in a slice of byte and an offset. Internally, ReadAt copies a sliced portion of the memory file based on the offset to the byte slice that it was provided. Below is an excerpt of the code

    n := copy(p, r.data[off:])
    if n < len( p ) {
            return n, io.EOF
    }

An important function the MMap package provides us with is Open. This function takes in one parameter — the filename and returns ReaderAt. Internally, the Open function opens a file, makes a mmap syscall, mapping the said file to memory and initialising the ReaderAt struct. The Open function does all the work under the hood and returns ReaderAt which can be used to interact with the memory-mapped file. Putting it all together, here is a simple example of how a file can be mapped to memory and read with the Mmap package.

	// Provide the filename & let mmap do the magic

	r, err := mmap.Open("text.txt")

	if err != nil {
		panic(err)
	}

	//Read the entire file in memory at an offset of 0 => the entire thing

	p := make([]byte, r.Len())

	_, err = r.ReadAt(p, 0)

	if err != nil {
		panic_(err)
	}

	// print the file

	fmt.Println(string( p ))

Why Memory-Mapping?

The main benefit of the memory mapping mechanism is improved performance. Typically a sequential read of a file happens using standard calls — Open(), Read(), Write() and Seek(). These calls copy buffers each time they are invoked. With memory mapping, the file’s content is mapped to the memory ensuring that the process can address the file directly without any need for copying. The performance benefit of memory mapping is mainly seen when a file will be accessed several times.

Things to note about MMAP in Go

  • Asides from the official experimental package there are a bunch of other Golang packages that wrap around the mmap syscall, feel free to explore them. Make sure to check that they are safe or plan properly to deal with panics when choosing a Mmap package to use.
  • This is a piece of open-source Go code that uses a page-fault handler to deal with any IO error that may arise from memory mapping. The Debug packages SetPanicOnFault function is used.
  • The Russ Cox has a project where he used Mmap in Go, I strongly recommend checking it out here.
  • I’m still reading this paper by Andy Pavlo & other researchers on reasons to avoid MMap in your DBMS check it out if this type of stuff tickles your fancy.
  • The Prometheus TSDB library makes use of MMAP, check it out here.
  • In this post, I use mmap when referring to the Unix syscall & Mmap when referring to the Go package.
  • Feel free to reach out if you have any thoughts, questions & comments.

Back to posts