In God we trust. All others must bring data đ¶
~Â W. Edwards Deming (statistician)
The goal is to turn data into information, and information into insight đ
~ Carly Fiorina (former president, and chair of Hewlett-Packard)
All in all itâs just another brick in the wall
All in all youâre just another brick in the wall đš~ Pink Floyd (lyrics from Another Brick in the Wall, Part 2)
My prior post was on Scala whichâalong with Java and Clojureâis a language that I find highly expressive and helpful for my programming needs. This weekend, letâs move on to another topic and see what can be done to help you in your journey to grokking the Big Data solution space đ
I do believe that the two key questions which are fueling the torrent that this age of Big Data has evolved into are these
- How best to handle and work with data at super-mega scale?
- How can one best decipher and understand that high-volume data and, in turn, convert it into a competitive advantage?
Living as we do today, well into the age of Big Data, it sure helps to have some guidance from those who are at the frontline of these endeavors which revolve around these two questionsâOnline resources are indispensable and fantastic in their own right, especially for cutting edge updates. But what about times when you simply want to sit down and really absorb the wisdom of our Big Data sagesâthe underlying conceptual infrastructure that powers the Big Data machineryâin a more sustained and methodical way?
- Advanced Analytics with Spark: Patterns for Learning from Data at Scale
 (OâReilly), by Josh Wills, Sandy Ryza, et al đ - Learning Spark: Lightning-Fast Big Data Analysis
(OâReilly) by Holden Karau, et al đŻ - Hadoop: The Definitive Guide, 4th Edition
 (OâReilly) by Tom White đ - Hadoop in Practice, 2nd Edition
, (Manning), by Alex Holmes đ» - Professional Hadoop Solutions
(Wrox), by Boris Lublinsky et al đ - Data Scientists at Work
(Apress) by Sebastian Gutierrez â - MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems
(OâReilly), by by Donald Miner and Adam Shook đŻ - Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
(Wiley), by Byron Ellis đ - Big Data: Principles and Best Practices of Scalable Realtime Data Systems
 (Manning), by Nathan Marz đŸ
And lest anyone be dying with curiosity about the origins of the phantasmagorical rendition of âThe Wallâ in the pic aboveâŠÂ Insights emerging from the brick-in-the-wall metaphor⊠And I never metaphor I didnât like đ
1. Advanced Analytics with Spark: Patterns for Learning from Data at Scale (OâReilly), by Josh Wills, Sandy Ryza, et al đ
If youâre looking for the best-written and most exciting Big Data book of the year, look no further than this one:Â Advanced Analytics with Spark: Patterns for Learning from Data at Scale
You get to understand how this open source project makes distributed programming eminently accessible to data scientists. It goes on to show how Sparkâwhile maintaining MapReduceâs linear scalability and fault toleranceâextends it in three important ways:
- Its engine can execute a more general directed acyclic graph (DAG) of operators.
- It complements this capability with a rich set of transformations.
- It extends its predecessors with in-memory processing. Its Resilient Distributed Dataset (RDD) abstraction enables developers to materialize any point in a processing pipeline into memory across the cluster.
One particularly telling remark that the authors make has to do with how, ââŠWith respect to the pertinence of munging and ETL, Spark strives to be something closer to the Python of big data than the Matlab of big dataâ. Sparkâs in-memory caching makes it equally ideal for programming in the large and small. And whatâs possibly most exciting is how Spark bridges the gap between the avenues of exploratory analytics and production (i.e. operational) analytics! And given Sparkâs tight integration with Hadoop ecosystem makes it an eminently accessible and attractive framework.
If the preceding themes strike a chord with youâand if youâre looking for deep dives to get a sense for the feel of using Spark to do complex analytics on massive data setsâlook no further than this book. It covers the entire pipeline in an exceptionally clear and engaging style. A bunch of diverse domains are engagingly covered in no less than nine case studies, to which a chapter each is devoted. These chapters make up the bulk of this stellar book.
IMHO, Advanced Analytics with Spark: Patterns for Learning from Data at Scale
Oh, and the most fun and standout chapters in this altogether stellar book are those on
- Geospatial and Temporal Data Analysis on the New York City Taxi Trip Data
- Understanding Wikipedia with Latent Semantic Analysis
- Analyzing Co-occurrence Networks with GraphX
- Chapter 1. Analyzing Big Data
- Chapter 2. Introduction to Data Analysis with Scala and Spark
- Chapter 3. Recommending Music and the Audioscrobbler Data Set
- Chapter 4. Predicting Forest Cover with Decision Trees
- Chapter 5. Anomaly Detection in Network Traffic with K-means Clustering
- Chapter 6. Understanding Wikipedia with Latent Semantic Analysis
- Chapter 7. Analyzing Co-occurrence Networks with GraphX
- Chapter 8. Geospatial and Temporal Data Analysis on the New York City Taxi Trip Data
- Chapter 9. Estimating Financial Risk through Monte Carlo Simulation
- Chapter 10. Analyzing Genomics Data and the BDG Project
- Chapter 11. Analyzing Neuroimaging Data with PySpark and Thunder
2. Learning Spark: Lightning-Fast Big Data Analysis (OâReilly) by Holden Karau, et al đŻ
The book Learning Spark: Lightning-Fast Big Data Analysis
- Spark brings value by its ease-of-use (fire up Spark on your laptop, and start using its high-level API, which enables you to focus on your domain-specific computations).
- Spark enables interactive use for tackling complex algorithms.
- And you get in Spark a general-purpose computation engine (thinking here to combining multiple types of computations, such as ML, text processing, SQL querying, etc.) that would previously have necessitated a bunch of different engines.
For us software types, the following observations by the authors are worth bringing out so you can best decide whether the targeted value that this book offers is for you
This book targets data scientists and engineers. We chose these two groups because they have the most to gain from using Spark to expand the scope of problems they can solve. Sparkâs rich collection of data-focused libraries (like MLlib) makes it easy for data scientists to go beyond problems that fit on a single machine while using their statistical background. Engineers, meanwhile, will learn how to write general-purpose distributed programs in Spark and operate production applications. Engineers and data scientists will both learn different details from this book, but will both be able to apply Spark to solve large distributed problems in their respective fields.Â
The second group this book targets is software engineers who have some experience with Java, Python, or another programming language. If you are an engineer, we hope that this book will show you how to set up a Spark cluster, use the Spark shell, and write Spark applications to solve parallel processing problems (italicized by me for emphasis). If you are familiar with Hadoop, you have a bit of a head start on figuring out how to interact with HDFS and how to manage a cluster, but either way, we will cover basic distributed execution concepts.
The full chapter devoted to Sparkâs core abstraction for doing data-intensive computationsâthe resilient distributed dataset (aka RDD)âis a standout. The other standout chapter is the one that gets into the nitty gritty of configuring a Spark application, and which also provides an overview of tuning and debugging Spark workloads in production.
Learning Spark: Lightning-Fast Big Data Analysis
3. Hadoop: The Definitive Guide 4th Edition (OâReilly) by Tom White đ
Letâs segue from Spark to Hadoop land now, beginning with a remarkable book:Â Hadoop: The Definitive Guide, 4th Edition
When reading books, weâre all gotten used to doing the inevitable google searches periodicallyâto compensate for the equally inevitable gaps in the narratives of any given technology bookâbut this book is mercifully free of the aforesaid read-some, search-online-some, resume-reading syndrome, yay!
So if youâre ready to drink deep at the Hadoop pool, you simply canât go wrong with this book. Allow me to elaborate: In the Preface, the author elegantly traces the genesis of this very pointâsparkling clear prose and unambiguous readabilityâto the works of the renowned mathematics writer, Martin Gardner, and adds
Its inner workings are complex, resting as they do on a mixture of distributed systems theory, practical engineering, and common sense. And to the uninitiated, Hadoop can appear alien. Â
But it doesnât need to be like this. Stripped to its core, the tools that Hadoop provides for working with big data are simple. If thereâs a common theme, it is about raising the level of abstractionâto create building blocks for programmers who have lots of data to store and analyze, and who donât have the time, the skill, or the inclination to become distributed systems experts to build the infrastructure to handle it.
You immediately get the sense that this book is a no-nonsense, friendly, and engaging guide to Hadoop and its ecosystem; rest assured that youâll finish this book without the author letting you down one bit. In fact, elaborating on this very themeâthat this is a no-nonsense, friendly, and engaging guide to Hadoopâthe first chapter gives a pleasant tour (a lay of the land, if you will) to the entirety of Hadoop: The Definitive Guide, 4th Edition
The book is divided into five main parts: Parts I to III are about core Hadoop, Part IV covers related projects in the Hadoop ecosystem, and Part V contains Hadoop case studies. You can read the book from cover to cover, but there are alternative pathways through the book that allow you to skip chapters that arenât needed to read later ones.
Further along, a birdâs eye view is provided for each of the chapters in the five main parts that make up this book. This summary is accompanied by a lovely flowchart of the paths that can be taken through the contentsâThoughtful design, with the reader in mind, is the hallmark of the entire book. As a reader, I felt secure in the knowledge of learning Hadoop from a master of the art. In this regard, the following remarks (in the Foreword) by Doug Cuttingâwho, along with Mike Cafarella, created Hadoop in 2005âare quite telling, and reflect just how friendly and engaging a guide this book is to all things Hadoop
Tom is now a respected senior member of the Hadoop developer community. Though heâs an expert in many technical corners of the project, his specialty is making Hadoop easier to use and understand.Â
Given this, I was very pleased when I learned that Tom intended to write a book about Hadoop. Who could be better qualified? Now you have the opportunity to learn about Hadoop from a masterânot only of the technology, but also of common sense and plain talk.
4. Hadoop in Practice 2nd Edition, (Manning), by Alex Holmes đ»
This next title is an excellent second book on Hadoop:Â Hadoop in Practice, 2nd Edition
In the About this Book section, after mentioning how, with its distributed storage and compute capabilities, Hadoop is fundamentally an enabling technology for working with huge datasets, the author goes on to identify the target audience of this book:
This hands-on book targets users who have some practical experience with Hadoop and understand the basic concepts of MapReduce and HDFS. Manningâs Hadoop in Action by Chuck Lam contains the necessary prerequisites to understand and apply the techniques covered in this book. Â
Many techniques in this book are Java-based, which means readers are expected to possess an intermediate-level knowledge of Java. An excellent text for all levels of Java users is Effective Java, Second Edition by Joshua Bloch (Addison-Wesley).
One thing I really, really like about this book is the abundance of useful diagrams and code snippets, all of which are profusely annotated with thoughtful comments! I would say that the barrier-to-entry to this book is not all that highâhastening to add that this is most emphatically not the same as saying that the contents are triflingâso if youâre determined, donât shy away from tackling this book (along with, importantly, having an introductory book by your side, such as the fine book entitled Hadoop: The Definitive Guide, by Tom White, and which is also reviewed above).
Very briefly, here is a rundown of the topics covered in this book:
1. Background and fundamentals: Chapter 1. Hadoop in a heartbeat, Chapter 2. Introduction to YARN. Â
2. Data logistics: Chapter 3. Data serializationâworking with text and beyond, Chapter 4. Organizing and optimizing data in HDFS, Chapter 5. Moving data into and out of Hadoop. Â
3. Big data patterns: Chapter 6. Applying MapReduce patterns to big data, Chapter 7. Utilizing data structures and algorithms at scale, Chapter 8. Tuning, debugging, and testing. Â
4. Beyond MapReduce: Chapter 9. SQL on Hadoop Chapter 10. Writing a YARN application.
5. Professional Hadoop Solutions (Wrox), by Boris Lublinsky et al đ
Once comfortable with the Hadoop paradigm, youâll be able to appreciate the gem of a book weâve got in this next title:Â Professional Hadoop Solutions
In my mind, the key to understanding the value in this book lies in appreciating the following observation, which the authors make in the introductory chapter
Although many publications emphasize the fact that Hadoop hides infrastructure complexity from business developers, you should understand that Hadoop extensibility is not publicized enoughâŠÂ Hadoopâs implementation was designed in a way that enables developers to easily and seamlessly incorporate new functionality into Hadoopâs execution.Â
A significant portion of this book is dedicated to describing approaches to such customizations, as well as practical implementations. These are all based on the results of work performed by the authors.
They go on to explain cogently the reasons why great emphasis is placed on MapReduce code throughout the book. So if you approach this book with the mindset that the narratives will directly revolve around MapReduce, youâll glean quite a bit of value out of this book. Their explanation of the MapReduce paradigm, as well as its nuts-and-bolts mechanisms, really are top notch.
The standout chapters are the following:
- Processing Your Data with MapReduce
- Customizing MapReduce Execution
- Hadoop Security
- Building Enterprise Security Solutions for Hadoop Implementations
The Appendix toward the end of Professional Hadoop Solutions is especially rich and useful. Overall, Iâm glad to have found this book!
6. Data Scientists at Work (Apress) by Sebastian Gutierrez â
And now letâs segue from Hadoop to a foray into Data Science kingdom proper đ°
But first a fair warning is in order about this next book: Once you start reading it, youâre going to have a terribly hard time putting it down or, for that matter, doing anything else before youâve read it all! Such was my experience of reading (and re-reading) this page-turner of a book:Â Data Scientists at Work
Consider this⊠We have these marvelous frameworksâin Spark, Hadoop, Storm and othersâbut surely they were not created in some ethereal vacuum. Right, these frameworks were of course created in the service of genuine business needs, and to solve pressing problems that folks were facing. So if youâre looking for the scoop on this nexus (i.e. the potent symbiosis between the aims of Data Science and what Big Data has to offer), this is the book for you.
The corpus of this book is made up of in-depth interviews of 16 gifted data scientists. What makes these interviews incredibly engaging is the spectacularly good job done by the interviewer (the author of this book), Sebastian Gutierrez. His academic training is from MITâwhere he earned a BS in Mathematicsâand he is a data entrepreneur who has founded three data-related companies.
The pointed and evocative questions asked throughout the book could only have come from someone who knows the pragmatics of the Data Science field inside-out! And therein lies the immense value of this book: Detailed answers by 16 top data scientists as they shed light on the human side of data science, their thoughts on how this field is evolving, where itâs headed, plus plenty of straight-from-the-trenches stories about their work.
While the quality of the interviews is uniformly excellent, the standout interviews in my mind are the ones with these data scientists who are doing stellar work
To give you a flavor of the interviewsâeach of which is given its own chapterâever so briefly, here is something from Claudia, who is the Chief Scientist at Dstillery. She teaches a high-level overview course on data mining for the NYU Stern MBA program to, in here own words, ââŠgive people a good understanding of what the opportunities are and how to manage them instead of really teaching them how to do itâ. She has taught at NYU, MIT, Wharton, and Columbia. In response to the interview question in the book (âWhat about this work is interesting and exciting for you?â), Claudia noted
I have always been fascinated by math puzzles and puzzles in general. The work that I do is a real-world version of puzzles that life just presents. Data is the footprint of real life in some form, and so it is always interesting. It is like a detective game to figure out what is really going on. Most of my time I am debugging data with a sense of finding out what is wrong with it or where it disagrees with my assumption of what it was supposed to have meant. So these are games that I am just inherently getting really excited about.
My interviewing method was designed to ask open-ended questions so that the personalities and spontaneous thought processes of each interviewee would shine through clearly and accurately. My aim was to get at the heart of how they came to be data scientists, what they love about the field, what their daily work lives entail, how they built their careers, how they developed their skills, what advice they have for people looking to become data scientists, and what they think the future of the field holds.
Some 20 years ago, when I was finishing grad schoolâat that time, I earned an MS degree in electrical engineering from Texas A&M Universityâwe didnât call work such as my dissertation (Noise-tolerant Software Method for Traffic Sign Recognition) Data Science. But in several ways, while I was reading the fine interviews in this book, I sure was reminded of the algorithms I worked out back then: Various AI programming techniques (neural networks primarily, such as the Back-propagation Neural Network and the Adaptive Resonance Theory model, aka ART2). Good stuff, and enough reminiscing, for that matter đ
So Data Scientists at Work
7. MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems (OâReilly), by by Donald Miner and Adam Shook đŻ
Segueing right back to Hadoop now, the title of the next book is decidedly open-endedâMapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems
The authors are clearly experts in the Hadoop ecosystem, and what theyâve put together is more than what youâll find in the endearing OâReilly âcookbookâ series. Thus, they donât call out specific problems and accompanying solutions. Instead, they share the lessons that they have learned along the way to becoming experts in the Hadoop ecosystem. Note, too, that this book is mostly about the analytics side of Hadoop and MapReduce.
And they assume that youâre already familiar with how Hadoop and MapReduce work, so they donât dive into the details of the APIs which they use in this bookâThose topics have already been covered thoroughly in other books, and they focus on analytics. In their own words
The motivation for us to write this book was to fill a missing gap we saw in a lot of new MapReduce developers. They had learned how to use the system, got comfortable with writing MapReduce, but were lacking the experience to understand how to do things right or well. The intent of this book is to prevent you from having to make some of your own mistakes by educating you on how experts have figured out how to solve problems with MapReduce. So, in some ways, this book can be viewed as an intermediate or advanced MapReduce developer resource, but we think early beginners and gurus will find use out of it.
One thing I appreciated a lot was the way the authors answer the question, âSo why should we use Java MapReduce in Hadoop at all when we have options like Pig and Hive?â. They point out two core reasons for spending time explaining how to implement something in hundreds of lines of code when the same can be accomplished in a couple lines with, say, Pig and Hive. In their own words
First, there is conceptual value in understanding the lower-level workings of a system like MapReduce. The developer that understands how Pig actually performs a reduce-side join will make smarter decisions. Using Pig or Hive without understanding MapReduce can lead to some dangerous situationsâŠ.Â
Second, Pig and Hive arenât there yet in terms of full functionality and maturity (as of 2012). It is obvious that they havenât reached their full potential yet. Right now, they simply canât tackle all of the problems in the ways that Java MapReduce can.
Remaining mindful of the fact that the title of this book is admittedly open-ended, I mention here the table of contents to give you a flavor of the topics covered
- Chapter 1. Design Patterns and MapReduce
- Chapter 2. Summarization Patterns
- Chapter 3. Filtering Patterns
- Chapter 4. Data Organization Patterns
- Chapter 5. Join Patterns
- Chapter 6. Metapatterns
- Chapter 7. Input and Output Patterns
- Chapter 8. Final Thoughts and the Future of Design Patterns
8. Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data (Wiley), by Byron Ellis đ
Finally, letâs segue to the land of real-time, streaming data đ
This next book is impeccably written in an eminently thoughtful styleâReal-Time Analytics: Techniques to Analyze and Visualize Streaming Data
No doubt, with enough determination and time, one can do online searches and cobble together a solution to handle real-time, high-volume mega data. But that begs the question, and Iâm not questioning anyoneâs tenacity here: Is that really the ideal strategy? And thatâs where the book shinesâWhat makes it stand out is the care and thought that have clearly been poured into making this book a one-stop resource for crafting end-to-end solutions for effectively grappling with real-time, high-volume mega data.
Much as I alluded to above, this book is impeccably written. The author has clearly honed his writing skillsâquite likely while preparing his dissertation for the Ph.D. that he earned from Harvard University đ
Clearly written books are a heaven-send, and this superb book is one. In that vein, the author notes with razor-sharp precision the aim of this book
The goal of this book is to allow a fairly broad range of potential users and implementers in an organization to gain comfort with the complete stack of applications. When real-time projects reach a certain point, they should be agile and adaptable systems that can be easily modified, which requires that the users have a fair understanding of the stack as a whole in addition to their own areas of focus. âReal timeâ applies as much to the development of new analyses as it does to the data itself. Any number of well-meaning projects have failed because they took so long to implement that the people who requested the project have either moved on to other things or simply forgotten why they wanted the data in the first place. By making the projects agile and incremental, this can be avoided as much as possible.
The author weaves into the narratives a lot of pragmatic advice; he has clearly been in the development trenches and done it all. As with the prior book, I mention here the table of contents to give you a flavor of the topics covered in Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Part I: Streaming Analytics Architecture
Chapter 1: Introduction to Streaming Data Sources of Streaming Data, Why Streaming Data Is Different, Infrastructures and Algorithms, Conclusion
Chapter 2: Designing Real-Time Streaming Architectures Real-Time Architecture Components Features of a Real-Time Architecture Languages for Real-Time Programming A Real-Time Architecture Checklist Conclusion
Chapter 3: Service Configuration and Coordination Motivation for Configuration and Coordination Systems Maintaining Distributed State Apache ZooKeeper Conclusion
Chapter 4: Data-Flow Management in Streaming Analysis Distributed Data Flows Apache Kafka: High-Throughput Distributed Messaging Apache Flume: Distributed Log Collection Conclusion
Chapter 5: Processing Streaming Data Distributed Streaming Data Processing Processing Data with Storm Processing Data with Samza Conclusion
Chapter 6: Storing Streaming Data Consistent Hashing âNoSQLâ Storage Systems Other Storage Technologies Choosing a Technology Warehousing Conclusion Â
Part II: Analysis and VisualizationÂ
Chapter 7: Delivering Streaming Metrics Streaming Web Applications Visualizing Data Mobile Streaming Applications Conclusion
Chapter 8: Exact Aggregation and Delivery Timed Counting and Summation Multi-Resolution Time-Series Aggregation Stochastic Optimization Delivering Time-Series Data Conclusion
Chapter 9: Statistical Approximation of Streaming Data Numerical Libraries Probabilities and Distributions Working with Distributions Random Number Generation Sampling Procedures Conclusion
Chapter 10: Approximating Streaming Data with Sketching Registers and Hash Functions Working with Sets The Bloom Filter Distinct Value Sketches The Count-Min Sketch Other Applications Conclusion
Chapter 11: Beyond Aggregation Models for Real-Time Data Forecasting with Models Monitoring Real-Time Optimization Conclusion Introduction Overview and Organization of This Book Who Should Read This Book Tools You Will Need Whatâs on the Website Time to Dive In
The hope is that the reader of this book would feel confident taking a proof-of-concept streaming data project in their organization from start to finish with the intent to release it into a production environment.
All this makes Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
9. Big Data: Principles and Best Practices of Scalable Realtime Data Systems (Manning), by Nathan Marz đŸ
Last, but certainly not the leastâcontinuing now in the spirit of frameworks that enable us developers to tackle real-time, streaming dataâis a book by Nathan Marz:Â Big Data: Principles and Best Practices of Scalable Realtime Data Systems
This book is dives deep into the concepts underlying Lambda Architectureâwhich is what the author dubbed the approach that he formalized during his years working at the startup BackTypeâalong with, importantly, many illustrative examples which are nicely supplemented by code snippets. The author puts it succinctly when he notes that
This book is the result of my desire to spread the knowledge of the Lambda Architecture and how it avoids the complexities of traditional architectures. It is the book I wish I had when I started working with Big Data. I hope you treat this book as a journeyâa journey to challenge what you thought you knew about data systems, and to discover that working with Big Data can be elegant, simple, and fun.
As an asideâconfessing here my fondness for Clojure, the Lisp that runs on the JVMâI couldnât help but resonate with the following sentiments echoed by Nathan Marz in the Acknowledgments section of Big Data: Principles and Best Practices of Scalable Realtime Data Systems
Rich Hickey has been one of my biggest inspirations during my programming career. Clojure is the best language I have ever used, and Iâve become a better programmer having learned it. I appreciate its practicality and focus on simplicity. Richâs philosophy on state and complexity in programming has influenced me deeply.
In sum, this is a worthwhile book, nicely structured into theory and illustration chapters.
In the end, and as I mentioned at the outset, I invite your commentsâHaving now read my brief take each on the books aboveâŠ
- Do you find that your experience of reading any of these books was different?Â
- Perhaps some qualities that I did not cover are the ones that you found the most helpful as you learned Scala and its ecosystem.Â
- Did I omit any of your favorite Big Data book(s)?Â
- Iâve covered only a partial list of the Big Data books that Iâve read, limited as you can imagine I am by the time availableâŠ
As with my prior post, which contains a set of book vignettesâthose pertaining to the finest and most useful books on Scala in printâmy aim here, too, in sharing these brief reviews remains the same, albeit on a different subject (Big Data) this time: I hope these vignettes will help you in selecting your resources well, and help you in your journey to grokking the Big Data solution space!
Bon voyage, and I leave you with an obligatory photo of a section of one of my bookshelvesâone thatâs, um, rather biased toward Big Data material in a statistically significant way, eh đ