Marco Peereboom

Epitome2

Dedup for the masses

Agenda

What is dedup?

Better example

Better example continued

Cool savings

Deduplication types

Why do we want dedup?

Why don't we want dedup?

Industry state

What is Epitome?

Pieces and tools

Architectural Overview

Why wire protocol?

Dedup algorithm

  1. Client calculates digest of chunk
  2. Client sends digest to server
  3. Server replies with exists or doesn't exist
  4. If digest exists then the client moves on to next chunk
  5. If the digest does not exist then the chunk is compressed and the smallest result is sent to the server

Dedup algorithm details

Hash collisions

* See section 3.1 of http://doc.cat-v.org/plan_9/4th_edition/papers/venti/

Future hash considerations

epitomed

epitomize

eprepare

Protocol basics

Protocol primitives

NEG

NOP

EXISTS

READ

WRITE

WRITE_MD

READ_MD

How does epitomize work?

  1. Negotiate session with epitomed server
  2. Create metadata header
  3. Read files and chunk them with the negotiated block size
  4. Call write_exists to save chunk if it doesn't exist
  5. Save chunk metadata
  6. Repeat until all files are processed
  7. Write metadata trailer
  8. Send metadata to epitomed server
  9. Display backup token

Future development

Conclusion

Thanks!

Questions?

Shoot!