The Mandelbrot-Set with Google Go (Part II)

>> Part I: The Mandelbrot-Set with Google Go (Part I)

So here’s the second part of my little field trip to Google’s new programming language “Go“.
This time I tried to use “channels” and “goroutines” to parallelize the pixel calculations across all CPU cores.

The main changes are that the parameters that are needed for the pixel calculation are now wrapped in a PointParams struct so they can easily be put into a channel.
Done that we’re instantiating 2 buffered channels so they can hold the PointParams for all pixels and the goroutines don’t block and possibly cause a deadlock.
In the next step the PointIteration() function is invoked as a goroutine. One time for each core. To do that we just have to prefix the function call with the keyword “go”. Easy, isn’t it?
Instead of directly calculating the pixels in the nested for loop we’re now just creating new PointParams and shove them into the “in” channel.
Meanwhile the running PointIteration() goroutines are fetching the PointParams from the “in” channel and when the calculation is done they’re sending a signal to the “out” channel.
All we have to do now is to wait for all calculations to finish. This is done by looping over the “out” channel and pulling all values out. The value itself does not matter so it is discarded.

As the documentation states the current implementation of the compiler does not automatically parallelize the code. Thus, if we want CPU parallelization we have to add a call to runtime.GOMAXPROCS(NCPU) in our main() function at first.

package main

import (

const NCPU = 4		// number of CPU cores

var pointX = flag.Float64("x", -2.0, "X coordinate of starting point of Mandelbrot or fix point for Julia (range: 2.0 to 2.0)")
var pointY = flag.Float64("y", -2.0, "Y coordinate of starting point of Mandelbrot or fix point for Julia (range: 2.0 to 2.0)")
var zoom = flag.Float64("z", 1.0, "Zoom level")
var julia = flag.Bool("julia", false, "Turn on Julia calculation")
var maxIter = flag.Int("maxIter", 51, "Max number of point iterations")

type PointParams struct {
  cx float64
  cy float64
  px int
  py int

func main() {

  fmt.Printf("X: %f\n", *pointX)
  fmt.Printf("Y: %f\n", *pointY)
  fmt.Printf("Zoom: %f\n", *zoom)
  fmt.Printf("Julia: %t\n", *julia)
  fmt.Printf("MaxIter: %d\n", *maxIter)

  start := time.Nanoseconds()
  img := CalculateImage(1000, 1000)
  end := time.Nanoseconds()
  fmt.Printf("Time: %d ms\n", (end - start) / 1000 / 1000) // ms

func CalculateImage(imgWidth int, imgHeight int) *image.NRGBA {
  img := image.NewNRGBA(imgWidth, imgHeight)
  minCx := -2.0
  minCy := -2.0
  if !*julia {
    minCx = *pointX
    minCy = *pointY
  maxSquAbs := 4.0	// maximum square of the absolute value
  // calculate step widths
  stepX := math.Fabs(minCx - 2.0) / float64(imgWidth) / *zoom
  stepY := math.Fabs(minCy - 2.0) / float64(imgHeight) / *zoom
  cx := 0.0
  cy := 0.0

  // Create buffered in and out channels
  in := make(chan *PointParams, imgWidth*imgHeight)
  out := make(chan int, imgWidth*imgHeight)

  // start goroutine for each CPU core
  for i := 0; i < NCPU; i++ {
    go PointIteration(in, out, maxSquAbs, *maxIter, img)

  for px := 0; px < imgWidth; px++ {
    cx = minCx + float64(px) * stepX

    for py := 0; py < imgHeight; py++ {
      cy = minCy + float64(py) * stepY

      // put params in channel
      in <- &PointParams {cx, cy, px, py}

  // drain channel and wait for all calculations to complete
  for i := 0; i < (imgWidth*imgHeight); i++ {
    <- out

  return img

func PointIteration(in chan *PointParams, out chan int, maxSquAbs float64, maxIter int, img *image.NRGBA) {
  for {
    // get params from in channel
    pointParams := <- in

    // init vars
    squAbs := 0.0
    iter := 0
    x := 0.0
    y := 0.0
    if *julia {
      x =
      y = = *pointX = *pointY

    for squAbs <= maxSquAbs && iter < maxIter {
      xt := (x * x) - (y * y) +	// z2
      yt := (2.0 * x * y) +		// z2
      //xt := x * (x*x - 3*y*y) + cx	// z3
      //yt := y * (3*x*x - y*y) + cy	// z3
      //xt := x * (x*x*x*x - 10*x*x*y*y + 5*y*y*y*y) + cx	// z5
      //yt := y * (5*x*x*x*x - 10*x*x*y*y + y*y*y*y) + cy	// z5
      x = xt
      y = yt
      squAbs = (x * x) + (y * y)

    color := ChooseColor(iter, maxIter)
    img.Set(pointParams.px,, color)

    out <- 1;	// signal that calculation is done

func ChooseColor(iterValue int, maxIter int) *image.NRGBAColor {
  val := uint8(iterValue)
  if iterValue == maxIter {
    return &image.NRGBAColor {0, 0, 0, 255}
  multi := uint8(255 / maxIter)
  return &image.NRGBAColor {0, val*multi, 0, 255}
  //return &image.NRGBAColor{^(val*5), ^(val*5), ^(val*5), 255}

func WriteImage(img *image.NRGBA) {
  file, err := os.Create("mandelbrot.png")
  if err != nil {
    fmt.Printf("Could not create file %s", file.Name())
  writer := bufio.NewWriter(file)
  png.Encode(writer, img)

So far so good. But when I ran this code I was a bit irritated because it almost took twice as much time as the single threaded version.
After a while of wondering and pondering I decided to raise the max iterations so that the point calculations consume much more time compared to the default settings. And behold, it works. The parallelized version is much faster than the normal one.
With maxIter set to 40000 the single theaded version takes up to 46 seconds and the parallelized version takes about 11,5 seconds. That’s exactly the excpected ratio of 4:1. You can also see very nicely how all cores are being used.

So it seems that the overhead the channels and goroutines produce to provide thread-safety etc. is just too big when you just have fast calculations.

The produced image with that much iterations is of course pretty useless. In fact it’s entirely black due to my poor color choosing implementation.

Next up: Rust (well…maybe… :))

>> Part I: The Mandelbrot-Set with Google Go (Part I)

%d bloggers like this: