Best Deep Learning Books (Pragmatic)

Ironically, abstract and formal tasks that are among the most difficult mental undertakings for a human being are among the easiest for a computer… A person’s everyday life requires an immense amount of knowledge about the world… Computers need to capture this same knowledge in order to behave in an intelligent way. One of the key challenges in artificial intelligence is how to get this informal knowledge into a computer.
~ Ian Goodfellow, Yoshua Bengio and Aaron Courville—Deep Learning (The MIT Press)

Deep Learning (Pragmatic) Books: The List 👕

And here we have the two stellar books at which we’ll soon be taking an opinionated look, in turn:
  1. Deep Learning (The MIT Press) by Ian Goodfellow, Yoshua Bengio and Aaron Courville 🐳
  2. Deep Learning: A Practitioner’s Approach (O’Reilly Media) by Josh Patterson and Adam Gibson 🐋

Preamble ⛱

While we won’t be exploring either one of the two fantastic books in the pic above—or the inimitably brilliant magazine MIT Technology Review—in this essay, it may be good to know that both are classics in the field of AI and you can’t go wrong with either one of those two books! Coincidentally, Ian Goodfellow is featured as one of MIT TR‘s 35 innovators under 35 (in the MIT TR issue that’s on the right-hand side in the pic above). And you can read details in there about how “Goodfellow, now a sta‘ff research scientist with Google Brain, wondered if two neural networks could work in tandem…. Goodfellow coded the first example of what he named a generative adversarial network, or GAN. The dueling-neural-network approach has vastly improved learning from unlabeled data.” Great stuff 🏄

Introduction 🔎

The field of AI is broad and has been around for a long time. Deep learning (👶)  is a subset of the field of machine learning (👦) which is a subfield of AI (👨)
~ Josh Patterson and Adam Gibson—Deep Learning: A Practitioner’s Approach (O’Reilly Media)

The Nuts and Bolts of Deep Learning: A First Definition

Much as I mentioned in the first installment in this series of essays on deep learning, I have more than a passing interest in deep learning. Whereas the first installment focused on the foundational aspect of deep learning, the focus of the second installment is squarely on its pragmatic aspects 🐎

To set the stage, let’s first hear what Goodfellow, Bengio and Courville have to say—in Deep Learning (The MIT Press), which is perhaps the definitive book on the subject—by way of background to the crucial relevance of deep learning to solving pressing problems in machine learning (ML). Thus, in the introductory section of Deep Learning, the authors remind us how 🎯

Many artificial intelligence tasks can be solved by designing the right set of features to extract for that task, then providing these features to a simple machine learning algorithm… For many tasks, however, it is difficult to know what features should be extracted…  

One solution to this problem is to use machine learning [via an] approach [that] is known as representation learning. A major source of difficulty in many real-world artificial intelligence applications is that many of the factors of variation influence every single piece of data we are able to observe. 

Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations. Deep learning enables the computer to build complex concepts out of simpler concepts. The quintessential example of a deep learning model is the feedforward deep network, or multilayer perceptron (MLP)…

Was the quote above a model of clarity or what? 😎 And so it should not surprise anyone that I selected the stellar book Deep Learning (The MIT Press) for the top spot 🏆 Much more on it follows soon…

The Nuts and Bolts of Deep Learning: A Second Definition

Meanwhile, it just so happens that the book that comes in at the second spot has also got sparkling wisdom to shed on the essence of how the deep learning paradigm came to be. Thus, in a marvelously apt section in that book—and here we’re talking about Deep Learning: A Practitioner’s Approach (O’Reilly Media) by Patterson and Gibson—that goes under the heading What Is Deep Learning?, the authors clue us in to why 🎳

Deep learning has been a challenge to define for many because it has changed forms slowly over the past decade. One useful definition specifies that deep learning deals with a “neural network with more than two layers.” The problematic aspect to this definition is that it makes deep learning sound as if it has been around since the 1980s. We feel that neural networks had to transcend architecturally from the earlier network styles (in conjunction with a lot more processing power)   before showing the spectacular results seen in more recent years. Following are some of the facets in this
evolution of neural networks:

  • More neurons than previous networks
  • More complex ways of connecting layers/ neurons in NNs
  • Explosion in the amount of computing power available to train Automatic feature extraction 

Patterson and Gibson go on to add that 🐙

For the purposes of this book, we’ll define deep learning as neural networks with a large number of parameters and layers in one of four fundamental network architectures:

  • Unsupervised pre-trained networks
  • Convolutional neural networks
  • Recurrent neural networks
  • Recursive neural networks

With the definitional machinery in place, let’s get our hands on the fun stuff, shall we? 🍟 🍩 🍭

What Does “Pragmatic” Mean, Exactly?

Casting a glance back at my personal journey, we’ll soon dive deep into two deep learning books that have proved immensely helpful to me in grokking this intriguing field at the pragmatic level. Okay, so what I’ve got in mind when I use the word pragmatic is the following composite thinking, all globbed together after I put some deliberate thought into it 🍔
In the context of deep learning, a pragmatic understanding is what you’ve gained after achieving competency and fluency in the use of the programmatic toolbox that powers the practice of deep learning. Whereas what we looked at previously (in the first installment of this deep learning series) were the foundational aspects of deep learning—think linear algebra and all that other fun mathematical jazz—the focus of this second installment is squarely on its pragmatic aspects: I’m assuming that you’re approaching deep learning as a practitioner—you may be an engineer, statistician, analyst, artist—entering or perhaps already practicing the art and science of machine learning (ML) on a daily basis and now want to find out the best resources available on this planet to get your hands dirty with the practice of deep learning.
You are, in other words, a denizen of the land of technology and the allied arts. The awesomeness of the pragmatic aspects of deep learning lies, at least partially, in its ever-evolving nature; the world of machine learning (ML) has become inextricably enmeshed in pretty much every discipline under the sun. And as those other disciplines evolve, so, too, does deep learning. Talk about exciting times ☕

Personal Background

The computing field is always in need of new cliches: Banality soothes our nerves 🍟
~ Alan Perlis

Many of you are longtime readers of this blog, and therefore already familiar with the background—professional, intellectual, technical, and artistic—which I bring to this blog. And hey you, yes, you the know-it-all right there in the back row who said that they’re familiar with my, um, unprofessional background, heh, I need to have a word with you in private 😉

Seriously, though, for new readers of this blog, it’s only fair that I share with you an ever-so-brief backgrounder. That way, you’ll know my strengths, weaknesses, propensities—and oh yes—my biases as well 😇

We’ll soon dive deep into the ocean of deep learning, but first, keeping in mind what I said above, here then is a backgrounder by way of a handful of pointers. I present them in no particular order, so please don’t ascribe any significance to how high—or how low, for that matter—any pointer may appear in the following list, starting with the fact that

  • Directly relevant to this essay, I should point out that while I’ve read large swathes of the two books reviewed in this essay, I haven’t yet read the two in their entirety. I’m getting there, slowly, but surely, one neural network at a time ⏳
  • At the moment, the fields of deep learning, AI, and machine learning (ML)—AI of course being the proud parent of ML—continue to intrigue me, as they have for many years 👒
  • In fact, the dissertation I did for my Masters was on the then-esoteric-sounding topic of Pattern Recognition Methods using Neural Network Algorithms. So for example, I know the back-propagation algorithm like, um, the back of my hand 🌿
  • Speaking of that evocative phrase—back of my hand—look for the hauntingly beautiful lyrics (“I knew the pathway like the back of my hand”) later, from the Lily Allen song Somewhere Only We Know 🐇 🐻
  • Please know that the pointer immediately above was not random; it was my attempt at subtly preparing you for what my regular readers know all too well as, ahem, “digressions”—the exploration of themes related to the one under discussion🚶
  • Please know, too, that my propensity for digressing at the drop of a hat is not quite random either; I’ve gone to some lengths to explain exactly that, elsewhere 🚩
  • My full time professional focus currently is on designing and crafting scalable enterprise software though the fields of deep learning, ML, and AI continue to intrigue me 🚀
  • Finally, to get a flavor for your blogger’s pursuits in the areas—as are not related, at least not directly, to AI, ML, and deep learning—I invite you to check out some fun stuff by way of the handful of links that follow now, closing out this randomly-ordered list of pointers ⛵
  • My irreverent take on the role of eventual consistency in distributed systems 🏧
  • A decidedly feisty dive into the best that there is in the universe of algorithms ⛷
  • And ah yes, I took the reactive programming bull by its horns, and not too long ago, either 🚅
  • In rounding out these revelations, I might as well mention the fun my readers and I had with a digression through a seemingly staid subject—a deep dive into blending object orientation (OOP) and functional programming (FP) 🔧
Okay, so now we’re standing on the edge of the diving-board that’s perched at some height above the swimming pool of deep learning knowledge, ready to take a dive into the placid—and sometime not-too-placid—waters of deep learning 💧 🏊 🌊
And since we’ve already had a quick, drive-by look at the background (professional, intellectual, technical, and artistic) that I bring to this blog, let’s bring closure to that thought by noting that—and I went into some detail on precisely this point in my response to a recent reader comment—with thousands of readers coming to this blog every month to read these essays, the responsibility of honoring the time of so many readers has irrevocably altered my view of how an essay ought to be written. I really can’t go into more detail than that here; I invite anyone interested, though, to check out, by all means, musings on exactly this theme in the aforementioned response 🔦

1. Deep Learning (The MIT Press) by Goodfellow, Bengio and Courville 🐳


A Phenomenally Readable Introductory Book

Let’s please have you squint at the pic above 🔍 You’ll notice some decidedly pragmatic books flanking this modern classic on each side: (1) On its left-hand side is another classic, this one by Martin Fowler, entitled Refactoring: Improving the Design of Existing Code (Addison-Wesley Professional) and (2) on its right-hand side is the chock-full-of-practicalities book entitled Scalable Internet Architectures (Sams Publishing) by Theo Schlossnagle—I mean, books don’t get any more practical than that, unless I’ve been living in a hole 🚧

And so it is that this modern classic (Deep Learning) is right at home with its pragmatic brethren—and sisters, to be sure à la manière de Rosie the Riveter—on the bookshelf rack in the pic above.

In my mind, Deep Learning is to the world of deep learning what the Gang Of Four (GoF) book is to the world of software design patterns. Yes, the former is that good, and let me tell you why. For starters, Deep Learning is an incredibly polished work, much in the tradition of the highly refined GoF book; I’ll boldly venture out to say that Deep Learning has set an even higher standard for what a technical book—or non-technical book for that matter—can offer to enhance the reading experience in an endearing and pleasing way. In a similar vein, you may wish to also take a peek at another awesome and incredibly well done book, another fav of mine, entitled Refactoring to Patterns (Addison-Wesley) by Joshua Kerievsky 🏀

The Book to Read After Getting Comfortable with Linear Algebra

Some of you may recall what had I said, in connection with Deep Learning, in the first installment of this series of deep learning essays—that it serenely stands guard in the background, while the nuts-and-bolt Linear Algebra: A Modern Introduction (Brooks Cole) by David Poole basks in the limelight with brazen effrontery 🎬

Okay, so once you’re ready to take on the world of deep learning, you simply can’t go wrong with this modern classic: Deep Learning (The MIT Press) by Ian Goodfellow, Yoshua Bengio and Aaron Courville. For one thing, its editing has clearly been finessed to near-perfection. Look, I’m a big fan of books from The MIT Press—a notable exception being Introduction to Algorithms 3rd Edition (MIT Press) by Cormen, Leiserson, Rivest, and Stein—and Deep Learning did not disappoint, not one bit. The long wait for its publication was well worth it!

Oh, and lest anyone cry heresy regarding my judgement of CLRS (those being the initials of that other book’s authors: Cormen, Leiserson, Rivest, and Stein), please know that CLRS does make for a good research book, for finding citations and that sort of thing—for that it’s probably unbeatable—since it happens to be replete, to overflowing, with copious cross-references to the vast literature on algorithms. Given its intimidating style, though, I’m not still not convinced that it’s a good introductory book on the subject. I also suggest that you look up a much more detailed dsicussion, should you be interested, in yet another irreverent essay elsewhere on the joy of algorithms ⛑

Oh goodness, don’t anyone even get me started on my high esteem of books from The MIT Press; that could easily take up the remainder of this essay. Here, I’ll mention in passing merely two of my favs from those esteemed publishers:

As for Deep Learning, it simply can’t be beat for comprehensiveness, engaging style of presentation, and clarity. At this time, I’m not aware of any other book that introduces and covers this fascinating field better than Deep Learning 🏆


It’s all in here, folks. Rejoice 🎶 Your search for the resource from which to best learn deeply about deep learning is now officially over. All we to do now is set up camp and start reading ⛺ And hey, you do know what I mean, don’t you? I’m not offering any learning camps—at least not at this time anyway—just mentioning the camping metaphor FWIW 🎪

Let’s Get Ourselves Acquainted

To acquaint you better with the profoundly valuable pragmatic aspects of Deep Learning, let’s have ourselves a peek at its table of contents 📂

 I Introduction
1. Linear Algebra
2. Applied Math and Machine Learning Basics
3. Probability and Information Theory
4. Numerical Computation
5. Machine Learning Basics 

II Deep Networks: Modern Practices
6. Deep Feedforward Networks
7. Regularization for Deep Learning
8. Optimization for Training Deep Models
9. Convolutional Networks
10. Sequence Modeling: Recurrent and Recursive Nets
11. Practical Methodology
12. Applications 

III Deep Learning Research
13. Linear Factor Models
14. Autoencoders
15. Representation Learning
16. Structured Probabilistic Models for Deep Learning
17. Monte Carlo Methods
18. Confronting the Partition Function
19. Approximate Inference
20. Deep Generative Models 



A Learning Experience Imbued With Joy

Frankly, I’m not aware of a kinder, gentler, and intelligent approach to introducing yourself to the field than through a study of Deep Learning. This book is pretty much all you need to create for yourself a swimmingly good experience, all at your own pace, all on your own time ⏰

By the way, the other deep learning (pragmatic) book that we’ll be talking about later in this essay—Deep Learning: A Practitioner’s Approach (O’Reilly Media)—would make a terrific complement to your reading of Deep Learning (The MIT Press), but more on that in just a bit 🍒

The book’s three co-authors (Goodfellow, Bengio and Courville)—maybe one day they’ll come to be known as the “three amigos” of the deep learning world, much like Rumbaugh, Booch, and Jacobson came to be affectionately known as the “three amigos” of the object-oriented world—share some terrific points to help you get situated. To help you decide whether this is indeed the book for you, they’ve included a useful introductory section (Section 1.1) entitled Who Should Read This Book? where they tell the reader how 🎓 🏄

This book can be useful for a variety of readers, but we wrote it with two target audiences in mind. One of these target audiences is university students (undergraduate or graduate) learning about machine learning, including those who are beginning a career in deep learning and artificial intelligence research. The other target audience is software engineers who do not have a machine learning or statistics background but want to rapidly acquire one and begin using deep learning in their product or platform. Deep learning has already proved useful in many software disciplines, including computer vision, speech and audio processing, natural language processing, robotics, bioinformatics and chemistry, video games, search engines, online advertising and finance (italics mine).


Engaging Style, Comprehensive Coverage, and a Model of Clarity

Let’s try to get a gist of the comprehensiveness, engaging style, and clarity which this fine book can bring to your study of deep learning by looking up, as an example, their take on that sturdy stalwart of this territory, eigendecomposition, as they make mention (in Section 2.7 to be precise) as follows in describing eigendecomposition by remarking on how 🎳

Many mathematical objects can be understood better by breaking them into constituent parts, or finding some properties of them that are universal, not caused by the way we choose to represent them… 

Much as we can discover something about the true nature of an integer by decomposing it into prime factors, we can also decompose matrices in ways that show us information about their functional properties that is not obvious from the representation of the matrix as an array of elements. One of the most widely used kinds of matrix decomposition is called eigendecomposition, in which we decompose a matrix into a set of eigenvectors and eigenvalues (italics mine).

I thought that the passage above beautifully explains the guts of a typical deep learning topic—and that topic just happened to be eigendecomposition—as simply as it can be done, not any simpler, and without in the least diluting the substance and rigor of the topic. Nobody wants that, right? To that I’ll simply add that pretty much the entire book carries through with it the same clarity and rigor ✂

And let’s be fair, lest the authors’ take above struck you as a tad too conceptual: The field of deep learning is intrinsically conceptual to its core, deeply interwoven as it is with the sublime beauty of math. And I hasten to add, too, lest your blogger’s claim—that this essay is all about the pragmatic aspects of deep learning—struck you as a tad too bold. Allow me to quote Paul Graham’s sparkling-with-wit opinion on this precise topic when he noted, in his stellar book entitled Hackers & Painters: Big Ideas from the Computer Age (O’Reilly Media), how the landmark programming language Lisp embodies mathematical ideas to perfection in that 🎻

Suddenly, in a matter of weeks, McCarthy found his theoretical exercise transformed into an actual programming language—and a more powerful one than he had intended. So the short explanation of why this 1950s language is not obsolete is that it was not technology but math, and math doesn’t get stale (italics mine).


Peering Into the Future

I owe you all—perhaps in a future essay—an exploration of some of the best that the glory of math has to offer to practitioners like us. But that will have to wait for another time; should I forget, please remind me, won’t you? ⏰

Meanwhile, and albeit on a different topic, though one that’s imbued—perhaps as equally and as deeply as the field of deep learningwith the timeless strands of mathematical rigor and determinism, I recommend that you look up what math has been doing to the world of programming lately 🚀


Crystal-clear Descriptions of Algorithms

Nearly every chapter of Deep Learning is replete with succinct and crystal-clear descriptions of algorithms. It’s hard to convey the remarkable effectiveness and finesse with which the “three amigos” of the deep learning world—the book’s three co-authors (Goodfellow, Bengio and Courville)—capture the essence of a ton of crucial deep learning algorithms. Ah, good old algorithms, a subject that remains close to the heart of this computer scientist and software practitioner 💝

But hard as it surely is to convey that remarkable effectiveness, let’s give it a try, shall we? 🏈

For example, we have in Chapter 18—a chapter with the rather bracing and evocative title of Confronting the Partition Function—an algorithm for those trusty Monte Carlo Markov chain (MCMC) methods gaping at the reader in its full glory. I refer you to (Algorithm 18.1)—”A naive MCMC algorithm for maximizing the log-likelihood with an intractable partition function using gradient ascent”.

Earlier on in the book, you’ll find the goods on the early stopping meta-algorithm: (Algorithm 7.1)—”The early stopping meta-algorithm for determining the best amount of time to train. This meta-algorithm is a general strategy that works well with a variety of training algorithms and ways of quantifying error on the validation set”.

And earlier still in Deep Learning, yet, you will run into something of the back-propagation algorithm: (Algorithm 6.5)—”The outermost skeleton of the back-propagation algorithm. This portion does simple setup and cleanup work. Most of the important work happens in the build_grad subroutine of algorithm 6.6″.

Now if only that other book from The MIT Press, the one which I mentioned earlier, the looks-fashionable-to-have-a-copy-of CLRS—those being the initials of the authors of that other book, Introduction to AlgorithmsCormen, Leiserson, Rivest, and Stein—had taken a page from this book (Deep Learning), also of course from The MIT Press, I would have gladly covered it in the two installments (Part 1 and Part2) thus far, in the algorithms series of essays elsewhere on this blog site. But, um, tough luck so far 🎱

And frankly, with the publication of Deep Learning, the “three amigos” of the deep learning world have proved that there’s absolutely no need whatsoever to resort to a cryptic style of presenting algorithms. Indeed, as the “three amigos” have demonstrated, all that can be accomplished with natural language, and with great clarity. Period 🏀

Oh, and did I mention that Deep Learning is lavishly illustrated, with many of those illustrations in the splendor of full color 🎃


I love this book 💕 Don’t miss it. It will help build, strengthen, and elevate your pragmatic deep learning skills from the ground up. It’s hard to imagine anyone else having the guts to take on such an ambitious goal—cover the field of deep learning in its entirety with comprehensiveness, engaging style, and clarity—and then pulling it off as successfully as have the three co-authors (Goodfellow, Bengio and Courville) of Deep Learning, the “three amigos” of the deep learning world. For its brilliance and polish, I stand by my verdict that Deep Learning is to the world of deep learning what the Gang Of Four (GoF) book is to the world of software design patterns 🏆

2. Deep Learning: A Practitioner’s Approach (O’Reilly Media) by Josh Patterson and Adam Gibson 🐋

Appreciating the Pragmatics of Deep Learning

In the second spot we have a near-namesake of the book that landed in the first spot, and which was of course simply entitled Deep Learning. So the book landing in the second spot has a subtitle, and is entitled Deep Learning: A Practitioner’s Approach. In several ways, Deep Learning: A Practitioner’s Approach goes one step further than its near-namesake. For one thing, it is closer to the metal of the application space and goes into the guts of high-quality accompanying implementations—in the Java programming language—of many deep learning algorithms 🏊 🎯

Should the acronym DL4J already means anything to you, please know that one of the book’s two co-authors (Josh Patterson) is active in the open source community, contributing to projects such as DL4J, Apache Mahout, and IterativeReduce. So the acronym DL4J is shorthand for Deeplearning4j 🎻 Be prepared to get up close and personal with DL4J over the course of poring over the pages of this amazing book 🌹

An Invitation

But first, and much as I had invited you earlier in this essay—while exploring the book in the top spot—I invite you again to please peer into the pic above 🔍 Did you notice some decidedly pragmatic books flanking Deep Learning: A Practitioner’s Approach on each side? Among other books on the shelf in the pic, (1) on its left-hand side is the classic entitled Refactoring: Improving the Design of Existing Code (Addison-Wesley Professional) by Martin Fowler,  and (2) on its right-hand side is Scalable Internet Architectures (Sams Publishing) by Theo Schlossnagle—books don’t get any more pragmatic than that, do they really?

And so it is that this aptly titled book is right at home with its pragmatic brethren—and sisters, to be sure à la manière de Rosie the Riveter—on the bookshelf rack in the pic above.

Let’s Get Ourselves Acquainted

To acquaint you better with the solidly practical aspects of Deep Learning: A Practitioner’s Approach, let’s take a peek at the table of contents 📖

1. A Review of Machine Learning
2. Foundations of Neural Networks and Deep Learning
3. Fundamentals of Deep Networks
4. Major Architectures of Deep Networks
5. Building Deep Networks
6. Tuning Deep Networks
7. Tuning Specific Deep Network Architectures
8. Vectorization
9. Using Deep Learning and DL4J on Spark 

Appendix A. What Is Artificial Intelligence?
Appendix B. RL4J and Reinforcement Learning
Appendix C. Numbers Everyone Should Know
Appendix D. Neural Networks and Backpropagation: A Mathematical Approach
Appendix E. Using the ND4J API
Appendix F. Using DataVec
Appendix G. Working with DL4J from Source
Appendix H. Setting Up DL4J Projects
Appendix I. Setting Up GPUs for DL4J Projects
Appendix J. Troubleshooting DL4J Installations 


In a helpful introductory section entitled What’s in This Book?, the authors give a nice roundup of what to expect going forward, starting with the point that

The first four chapters of this book are focused on enough theory and fundamentals to give you, the practitioner, a working foundation for the rest of the book. The last five chapters then work from these concepts to lead you through a series of practical paths in deep learning using DL4J:

  • Building deep networks
  • Advanced tuning techniques
  • Vectorization for different data types 
  • Running deep learning workflows on Spark
My impressions of this fabulously clean and remarkably effective division between theory and practice are something like the following: Between the 🎧 first four chapters (which reveal the conceptual machinery of deep learning) and the even more valuable (IMHO) 🎾 last five chapters (which then go on to divulge the equally crucial practical aspects of deep learning) 🎧 🎾


No Royal Road to the Practice of Deep Learning

Reminding ourselves of the adage that there is no royal road to geometry—or to deep learning, for that matter—a sustained study of Deep Learning: A Practitioner’s Approach will help you immensely in the long run. The authors share the essence of this book’s gravitas by telling the reader how

We designed the book in this manner because we felt there was a need for a book covering “enough theory” while being practical enough to build production-class deep learning workflows. We feel that this hybrid approach to the book’s coverage fits this space well (italics mine).

Okay, so here’s the deal—anytime someone make mention of that million dollar word above, “hybrid”, I reflexively start thinking of the closely allied notion of “blended” 📬 I strongly feel that conceptual “blending” is an often-overlooked notion that has much to offer to us practitioners. And the reason for that is simple: We need powerful tools in our toolbox if we are to have a good shot at taming complexity in software, and elsewhere. There’s actually a ton that I’ve already written, elsewhere, on precisely that notion through what I recall was a deliberate—and hopefully scientific—analysis of blending object orientation (OOP) 🎁 with functional programming (FP) ⛩

Should the reader’s interest be piqued, I’ll mention in passing that the tour de force book entitled The Way We Think: Conceptual Blending And The Mind’s Hidden Complexities by Gilles Fauconnier and Mark Turner (Basic Books) is likely the final word on (conceptual) blending, taken to its logical conclusion. Yes, we need all the tools—conceptual, algorithmic, infrastructural, thematic, AI, probabilistic, you name it—that we can get our hands on 🍴

A Bit on Coursera for Context

To put my thoughts above in context, let’s segue a bit. So I earned a certificate online last year from Coursera—specifically a certificate for the Machine Learning (ML) course taught by Andrew Ng, then with Stanford University—which will make you do deep dives (pun was totally unintentional) through the ocean of linear algebra, so I have a pretty good idea of what I’m talking about in the preceding section 🐘

For the large part, I happily pursue the craft of software development these days, primarily using the Java and Scala programming languages. At the same time, I’ve remained intrigued enough over the years by ML and deep learning to pick up the Octave programming language for scientific computing; learning the Octave programming language, in fact, came about especially while I was taking the Andrew Ng Machine Learning (ML) course I mentioned above.

And while we’re talking about Coursera, I’ll mention in passing that they’ve got some of the most well-thought and well-designed courses available online; it was actually my passion for the Scala programming language which, by the way, brought me to Coursera in the first place. And that’s how I discovered their offering on ML and other cool stuff. Other courses I took—and for which I also earned a certificate each—include the following, which I can highly recommend, should anyone have an interest in this sort of thing 🎓

  • Functional Programming Principles in Scala 
  • Functional Program Design in Scala 
  • Parallel Programming

Standout Feature

Okay, so we won’t be talking about those kinds of features—such as what you might have had in mind, along the lines of automatic feature extraction—here, since do happen to be deep into deep learning territory 😉 The feature I have in mind here is the copious use of super-helpful “fact boxes” throughout the pages of Deep Learning: A Practitioner’s Approach. Let’s look at an example to get a better sense for what I’m talking about.

Machine Learning Versus Data Mining Data
Data mining has been around for many decades, and like many terms in machine learning, it is misunderstood or used poorly. For the context of this book, we consider the practice of “data mining” to be “extracting information from data.” Machine learning differs in that it refers to the algorithms used during data mining for acquiring the structural descriptions from the raw data. Here’s a simple way to think of data mining:  

To learn concepts
    we need examples of raw data
Examples are made of rows or instances of the data
    Which show specific patterns in the data
The machine learns concepts from these patterns in the data
    Through algorithms in machine learning 

Overall, this process can be considered “data mining.”

Much like the one above, similarly helpful “fact boxes” are liberally sprinkled throughout the book, pixie dust style. I found them immensely helpful, and I think that you will, too. Ah yes, since we began this section with a preamble on the concept of “features”, I’m happy to report that Deep Learning: A Practitioner’s Approach even has a “fact box” entitled What Is a Feature? 💪

The Joy of Learning

Look, I’m a sucker for adorning my essays—some might say embellishing, though I’ll politely beg to differ—with topical quotes, images, and excerpts to create a pleasurable reading experience. So I was pleased and right at home with the format I found in Deep Learning: A Practitioner’s Approach. Here are the authors, introducing the reader to Chapter 4 (Major Architectures of Deep Networks) with a quote that’s as profound as it is thought provoking:

The mother art is architecture. Without an architecture of our own we have no soul of our own civilization 🏰
~ Frank Lloyd Wright (legendary American architect, designer of the masterpiece Fallingwater)

Okay, let’s take a peek at just one more example to get a richer flavor of just how marvelously well wrought is the authors’ use of lead-in quotes to every single chapter of the book—here, they’re introducing Chapter 8 (Vectorization) with a quote that’s as apt as it is witty for the subject matter being introduced 🐚 😉

New York City, Old St. Joe, Albuquerque, New Mexico
This old rig is humming and rolling and she’s doing fine
If somebody wants to know what’s become of this so and so
Tell em’ I’m somewhere looking for the end of that long white line
~ Sturgill Simpson, Long White Line

Maybe it’s just me… But as I had emphatically pointed out in an essay elsewhere that, along with learning and stuff, “I mean, we got to have ourselves some fun along the ride, don’t we?”. And oh yes, a sucker for meaningfully adorning essay I’ll ever remain 🎈 

At the same time, the responsibility that comes with honoring the precious time of so many readers has irrevocably altered my view of how an essay ought to be written. It has not, and nothing ever will, alter my writing style, as I’ve tried to explain in response to a reader comment in an essay elsewhere 😎



Dig in to the pages of Deep Learning: A Practitioner’s Approach and you’ll see what I’m talking about. This is a book for someone who is looking for gentle, inspiring, and eminently practical guidance on mastering the pragmatics of deep learning—and don’t we all? If that’s you, don’t miss this stellar book 🎪

An Invitation 📣

In the end, I invite your comments—Having now read the brief take each on the books above 💤

  • Do you find that your experience of reading any of these pragmatic books was different? 🐢
  • Did I perhaps not cover some qualities, and which are the ones that you actually found the most helpful as you learned the practicalities of deep learning? 🌎
  • Did I leave out any of your favorite deep learning pragmatic books? 🚛

My hope is that the brief vignettes will help you in your journey to grokking deep learning pragmatics 🏀 As promised in the first installment in this series on deep learning, I leave you with a more elaborate pic collage this time. Do please post your comments to let other readers—and me—know how you like the pic collage, won’t you? 🍒

Bon Voyage 🚢

And to you, as you embark on the journey of a lifetime, setting sail on the sea of deep learning, I say with much warmth and friendship, Bon voyage ☕

Did any of the books in the pic above—they are among my favs—grab your attention? If so, do please post your comments to let other readers—and me—know about which one of these gems you would like to hear more about, won’t you? We all stand to benefit from ongoing dialogs 🍒

AI 👨 is the Proud Parent of Machine Learning 👦 Which is, in turn, the Proud Parent of Deep Learning 👶

Just to drive home an important point—at the risk of sounding like a broken record—let’s not forget the helpful relationship that is indicated in the heading above! 📀

Collage of Pics and Lyrics 🎸

I walked across an empty land
I knew the pathway like the back of my hand
I felt the earth beneath my feet
Sat by the river and it made me complete 

Oh simple thing, where have you gone?

~ Lily Allen (Lyrics from Somewhere Only We Know) 🐇 🐻


Now why would the emerald-green and haunting Tetris-syle pic above, tied as it is to that awesome movie The Matrix—yup, that’s Neo Anderson peering back at us—turn up in an essay on deep learning?! Anyone? Hint, hint: Much as we talked about in the first installment in this series on deep learning, linear algebra remains the bedrock on which the edifice of deep learning is built. And to get anywhere in linear algebra, make matrices your best friends forever 👱👰

Ah yes, so here’s what I was going to say about the pic above (of a tiny corner of one of my bookshelves), which has at its center square the gem of a book entitled If A, Then B: How the World Discovered Logic (Columbia University Press) by Michael Shenefelt and Heidi White 💎

So it goes something like the following—relying as it does on the machinery of standard categorical syllogisms—where you can’t help but notice the airtight logic of my argument 👻

I love my wife.
My wife loves logic.
Therefore, I love logic

I noticed some of you cringing at potential leaks in the, um, airtight logic of the argument I present above 😱 so let’s quickly wrap this all up with some endearing lyrics that were sung inimitably and mellifluously by the one-and-only Rocket Man when he crooned

I hope you don’t mind that I put down in words
How wonderful life is while you’re in the world.
~ Elton John (Lyrics from Your Song) 🎧

And what an awesome movie Dune was, that mind-blowing, off the wall classic! Imagine, if you will, the grains of sand in the desert as the data points that your enterprise needs you to get a grip on 🐫 Your mission, should you choose to accept it, is to find interesting patterns in those data points 📶 Ladies and gentlemen! Dare I say, with our confidence bolstered by this newfound deep learning savvy, that we’re ready to take it on? Yay, glad to hear that—So bring it on! 🏈

Feeling giddy as we surely are at this point in time, our spirits buoyed by what the craft of software can help humanity accomplish, let’s not forget the momentous and gloriously successful mission that Cassini has recently accomplished 🚀 Time to celebrate 🎉 💝

A lovestruck Romeo sings a street-suss a serenade
Laying everybody low with a love song that he made
Finds a streetlight, steps out of the shade
Says something like, You and me babe—how ’bout it?”

I can’t do the talks like they talk on the TV
And I can’t do a love song like the way it’s meant to be
I can’t do everything but I’ll do anything for you
I can’t do anything except be in love with you
~ Dire Straits (Lyrics from Romeo And Juliet)


  1. Is "Deep Learning: A practitioners approach" easily translatable to Scala? I barely know Java bit I'm very comfortable with Scala. I do most of my work in Python but would like to use a statically typed language in production.

  2. – Thanks for your comment and question, Francisco. To your question, "Is "Deep Learning: A Practitioner's Approach" easily translatable to Scala?": Yes, the book by Patterson and Gibson—Deep Learning: A Practitioner's Approach (O'Reilly Media)—is eminently suited for translation to Scala, among other (target) programming languages. But, yes, easily translatable to Scala for sure, primarily because the example (Java) code is of high quality, especially in the area of readability! (YMMV)

    – It was very helpful that you shared how, "I barely know Java bit I'm very comfortable with Scala. I do most of my work in Python but would like to use a statically typed language in production". That helps me answer your question better because I also know a thing or two about Python 🙂

    – In fact, I answer Scala-related questions from readers, every now and then (on the main international forum for Scala, at It just so happens that, a little while ago, I had answered a question from a reader (with a background quite similar to yours) who was wanting to learn more about getting proficient at Scala. FWIW, here's part of my response in what I had answered the question on that forum because I think it'll be helpful to you as well:

    "It just so happens that I've spent a fair amount of time writing Python code, and found it to be an incredibly productive language – For most of my development work, however, I use Java and Scala since these languages are easier to maintain and scale well as your code base grows."

    "Having said, I applaud your enthusiasm for wanting to embrace the reactive programming paradigm :sunglasses: Both of the books you mention are superb. To those two, I would definitely add a third one, which you should read along with the fine book by Martin Odersky; the book FP in Scala is awesome, though a bit advanced. The book I'm suggesting is entitled Programming Scala: Scalability = Functional Programming + Objects (O'Reilly), by Dean Wampler and Alex Payne. As I had noted in my review of that book, many many moons ago, "If you're going to read only one book on Scala, make it this one…" and I still stand by my words – Feel free to check out my musings on this very subject, to get a better sense of the resources available to you for your Scala programming needs…"

    "You're doing a very sensible thing by becoming proficient in Scala: Keep on reading!"

    – To that I'll quickly add that—and it was probably plenty clear already from the copious references to both Scala and Java in this essay—I'm also a big fan of using a statically typed language for most all of my production code 🙂

  3. – Finally, a question for you: Which IDE do you use for your daily Scala programming?

    – I'm open, receptive, and in fact agnostic when it comes to languags, tool choices, etc. At the same time, I strongly recommend that you give IntelliJ IDEA a good look. I'm a long-time Eclipse user, and still use Eclipse from time to time. But when it comes time to tell anyone which is the best IDE in the world, my answer is unambiguous!

    – In fact, feel free to also check out my response, elsewhere, to a reader who had a very cool comment on my essay on the subject of software tools and IDE(s)…

    – Oh, and finally, should you wish to stay on top of the nitty gritty of the Scala programming language, I highly recommend that you keep an eye on the responses of a fellow Scala expert who is, in fact, far more active than even I am, in answering questions (on the main international forum for Scala, at

    – In the context of deep learning, I had something to add on picking up the Octave programming language (for scientific computing); I think I mentioned Octave in this essay (in the context of when I took the Andrew Ng Machine Learning Stanford / Coursera course)…

    – Meanwhile, am running out of that precious commodity: "Time"…

    – But my readers mean the world to me, and I do my best to respond comprehensively, time permitting, to your all's comments and questions – So as I said in a response, elsewhere, to a reader, in the immortal catchphrase associated with Arnold Schwarzenegger, "I'll be back" 🙂

Your Comment Here!

This site uses Akismet to reduce spam. Learn how your comment data is processed.