Internet Archive Book Scanning with Davide Semenzin

The Internet Archive collects historical records of the Internet. The Wayback Machine is one tool from the Internet Archive which you may be familiar with. One project you may be unfamiliar with is book scanning. Internet Archive scans high volumes of books in order to digitize them.

In today’s episode, Davide Semenzin joins the show to talk through the history of the Internet Archive and the engineering behind book digitization. We talk through OCR, storage, architecture, and scalability.

Sponsorship inquiries:


Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Epsagon, the platform that specializes in monitoring distributed environments, delivers auto-instrumented observability that scales at the speed of your microservices. Epsagon enables teams to fully understand complex microservice architectures and delivers the automation and correlation needed to instantly identify, troubleshoot, and resolve issues. A special for SE Daily listeners: start a trial of Epsagon at connect your first trace, and they’ll send you one of their cool drones!

Teleport is open-source, written in Go, and is a drop-in replacement for OpenSSH. Plus, it has a native support for Kubernetes. Gravitational’s  Teleport provides identity-aware access using short-lived certificates with SSO, session recording, and other features that ensure compliance and audit requirements. Go give it a try by going to, where there are links to downloads, documentation, and, of course, the GitHub repository.

JetBrains, a global software vendor specializing in the creation of intelligent development tools since 2000, has published the 2020 State of Developer Ecosystem  – their fourth in a series of annual reports. The report contains feedback from roughly 20,000 developers whose responses helped JetBrains identify the latest trends in tools, technologies, programming languages, and many other exciting facets of the development world. Find out more.


The Octopus platform can execute approved steps and bridge the gap between dev and ops and remove Operations bottlenecks. Octopus delivers self service options for dev teams without sacrificing control over production. By automating the processes that are forming a bottleneck, developers can free themselves from the waiting game with self-service automation. You can check out the most frequently requested runbook templates at

Software Daily

Software Daily

Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.