How Google Built a Consistent, Global, Authorization System with Zansibar and you can too

Challenge: You send an mail via gmail that has a google drive attachment -> Those are two seperate apps but a central auth check needs to take place to provide access to the recipient.

Access controll types

  • ACL (access control list): Pretty basic
  • RBAC: The defacto standard for a long time
  • ABAC (Attribute based access controll): Check attributes (user-id, ip address, …) on access time to make a decision
  • ReBAC (Relationship based access controll)

ReBAC

Baseline

graph LR
document-->|Is part of|folder-->|was created by|user

Relation Tuple

  • document:123#owner@user:3 -> User 3 is owhner of document 123
  • groud:engineering#membner@group:security -> Group security is a member of the group engineering

Graph representation (DAG)

graph LR
somedocument-->reader
somedocument-->writer
reader-.->|is also available via|writer
reader-->UserA
reader-->UserB
writer-->UserC
writer-->UserD

And check if there is a unidirectional way from somedocument to UserA over writer -> No = No access

Zansibar

  • Globaly distributed
  • ReBAC based
  • Zentral API

Hotspots

  • Problem: Some checks need to happen often
  • Solution: Distributed caching
  • Cache validity: Time stamp optimization by rounding to a second or 50ms
  • Improvement: Internal use of grpc
  • Lock table: If the same query get’s executed multiple times at once, calculate query once and return cached response to all waiting queries
  • Improve cache population: Don’t kill sub-checks instantly but delayed

Zookies

  • Specify a specific point in time (e.g. to bypass cache with “give me the latest”)
  • Allows control over the latency vs real-time trade-off
  • Solves the new enemy problem: You loose access at the same time it get’s changed -> may result in phantom access to the new version if cached data get’s used

Implementations

Some of the popular oppen source implementations, just for later

  • SpiceDB
  • ORY
  • Permify

Pro

  • Low latency with high throughput
  • Global consistency
  • Composable and hierarchical permission models