Getting Started with Topological Data Analysis (TDA)

There are too many resources out there on TDA. And often people come to me and say “I’m overwhelmed and don’t know how to get started.” Well do I have a surprise for you. Here’s the list of stuff I give to people who ask me. And before you send the hate mail… I have literally never taken algebra and I have literally never taken topology and I CERTAINLY have never taken algebraic topology. Other courses I have never taken include:

  • Analysis
  • Commutative Algebra
  • Measure Theory
  • Ergodic Theory
  • Functional Analysis
  • Graph Theory
  • Combinatorics
  • Probability
  • Geometry
  • Differential Geometry
  • Algebraic Geometry
  • Arithmetic Geometry
  • Analytic Number Theory
  • Algebraic Number Theory
  • Logic
  • Game Theory
  • (this goes on for a while)

So this list of resources is aimed at people who have similar to the (lack of) background that I have. Also, this list is super-obnoxiously self-serving because it includes a bunch of things I co-wrote, and/but my excuse is that I co-wrote those things as a way to myself learn what the heck is going on with TDA.

I focus on the computation of persistent homology for point cloud data, but there’s lots of other cool and relevant TDA stuff, like sub/super-level set filtrations, cubical homology, and so forth. If you have suggestions for very introductory level resources on such topics, please let me know.

Finally, before I give you my list, let me say that if you think you have cool stuff that I should include here, ask yourself “would this be accessible to someone who has never taken topology and has only ever read the stuff on Chad’s list so far?” and then ask yourself “and is the stuff I want to point Chad to really at a just-getting-started level?” If you answered yes twice, email me.

Feel free to spread this around. I’m putting this up here as a public good, and also for the selfish reason that I am tired of writing the same email to people over and over again.


TDA for kids… but still a great place to start for anyone:

Connecting the Dots: Discovering the Shape of Data

A totally nontechnical article about TDA:

Topological data analysis: One applied mathematician’s heartwarming story of struggle, triumph, and ultimately, more struggle

Also an expository article, albeit slightly more technical:

Topological data analysis of collective motion

My first paper on TDA, which uses persistent homology as a tool for exploratory data analysis:

Topological data analysis of biological aggregation models

A set of tutorial notes I wrote that explain how the computation of homology of simplicial complexes is really just a linear algebra problem:

Self-help homology tutorial for the simple(x)-minded

I am giving you the version without solutions to exercises but you can find a version with solutions on my publications page.

A review paper from a few years back that gives a great bird’s-eye look at computational tools:

A roadmap for the computation of persistent homology

Another useful review paper:

Topological Data Analysis

A review paper on the uses of TDA in biology

The shape of things to come: Topological data analysis and biology, from molecules to organisms

My second paper on TDA, which uses the topological signature of data to decide which of two models is a better fit for some experimental data:

A topological approach to selecting models of biological experiments

My third paper on TDA, which demonstrates how applying machine learning algorithms to the topological signatures of data sets from simulations of complex systems can be used to estimate parameters:

Analyzing collective motion with machine learning and topology

A tutorial to work through:

Introduction to the R Package TDA

To reinforce, another tutorial to work through:

R-TDA package tutorial