logo

500+ Billion Points

Organizing Point Clouds as Infrastructure

Connor Manning

The problem

Can we put Iowa's lidar in a web browser?

iowa

Iowa lidar collection

  • ≈ 37,000 files
  • ≈ 170 billion points
  • > 4.5 TB uncompressed
  • > 400 GB compressed

iowa-s3

Lots of tiles

  • Difficult to access
  • Difficult to manage
  • Visualization only in pieces
tiles
Yes, Iowa is flat

A point cloud service

  • Client-controlled access
    • Hierarchical
    • Dynamic resolution
    • Random-access
    • Flexible
  • Fragmented dataset → single logical unit

Need for reorganization

  • Cannot meet these needs with a meta-index
  • Level-of-detail query is unreasonable

Constraints

  • Memory
  • Losslessness
  • Modifiability
  • Visualization
free

Assumptions

  • Availability of scalable cloud computing
  • A parallelizable problem
  • Distributed filesystem
geyser
Lone Star Geyser
Data source: RS/GIS CRREL USACE

Goal: a massive octree

 

  • Increase depth → increase resolution
nepal-layers
Quadtree depth layering
Data source: Vanuatu village, Nepal. Global DIRT

 

  • Spatially distinct → trivially parallelizable
quadtree-splitting
Quadtree splitting
By David Eppstein, Public Domain, 🔗

 

  • Stable & flexible
  • Insertion order doesn't matter
  • No need to rebalance the tree
kd-tree
By contrast: a KD-tree - where order matters
By KiwiSunset at the English language Wikipedia, CC BY-SA 3.0, 🔗

Octrees

  • The basic implementation is pointer-based
  • > 10 TB just for the (64-bit) pointers for Iowa
octree-splitting
A classical octree
By WhiteTimberwolf - Own work, CC BY-SA 3.0, 🔗

Linearizing the tree

 

  • For a single depth: Z-order curve
  • Entirely positionally based - zero theoretical waste
morton-ordering
Z-order curve for depth 3
By user Jace, 🔗

 

  • Typically, multiple depths → string-encoding
            ∴ 0 ≠ 00
  • However, it's possible to globally linearize
  • Doing this has some interesting properties
z-order-four-levels
Z-order curve over depths 1, 2, 3, 4
By David Eppstein, based on an image by Hesperian. Own work, CC BY-SA 3.0, 🔗

 

linearization
Global linearization - assign numbering

 

hops
Traversal

 

Hopping down the array

index d = (index d - 1 << 2) + 1 + direction

Properties

 

  • Concurrency-friendly
  • Greatly simplified data structures
  • Ensuring point spacing is trivial
  • Lots of things are calculable
quadtree-pyramid
Quadtree layering
From Towards Building Deep Networks with Bayesian Factor Graphs by Buonanno & Palmieri 🔗

Point spacing

one two
Lack of spacing guarantee - effect on visualization

Math stuff

 

query

How do I query this tile at depth n?

  • Bisect southeast, then bisect northwest
  • These 2 traversals give us an ID of 17 at depth 2
  • Call this the nominal depth for this tile
    • "How many times did we split?"

 

Depth Range Max points
217-181
369-734
5309-37364
...
10316757-38229365536
  • maxPointsdepth = 4(depth - nominalDepth)
  • SELECT chunkdata FROM entwine WHERE id >= 316757 AND id < 382293

Chunk splitting


 // TODO Doesn't seem to work.

 char* myTree = malloc(∞);
        

 

  • Trivial to bound chunks spatially
    • Parallelizable
    • Queryable
  • Sparse-chunking optimization
    • Density multiplier decreases after log4(numPoints)
    • ...so stop splitting spatially near that depth
      • Order of magnitude of key reduction
      • → Order of magnitude of IO reduction

More tricks

  • Aggressive over-optimization
  • Heuristic tuning - over lots of data
  • Custom big-integer library: little-big-int
  • Custom memory pool: splice-pool

Results

A larger set - the Netherlands

  • 41,000 files
  • ≈ 640 billion points
  • ≈ 1.5 TB on disk
  • > 7.5 TB uncompressed
  • XYZ only

 

first-worker-done
Netherlands indexing pace
  • AWS EC2 & S3
  • Reprojected to EPSG:3857
  • 28 instances
    • One instance: 30 cores, 60GB
    • Per-instance pace: ≈ 2.65B points/hour
  • Total cost: ≈ $400
  • Total time: < 9.5 hours

What can we do with it?

Rijksmuseum

Using the output

 

Greyhound

  • A simple RESTful HTTP server
  • http://data.greyhound.io/resource/iowa/info
  • http://data.greyhound.io/resource/iowa/read?
    • bounds=[-10758084,4793192,-361921,-10034124,5517152,362039]&
    • depth=10&
    • compress=true&
    • schema=[{name:"X",type:"floating",size:"8"}, ...]

 

Related projects

  • Speck.ly - visualization 🔗
  • Potree fork - visualization 🔗
  • PDAL - point data abstraction 🔗
  • LAZ-perf - LAZ in the browser 🔗

Building via CLI

  • entwine build
    • -i ~/data/iowa
    • -o ~/entwine/iowa
    • -r EPSG:3857
    • -t 12
logo

Links