Part I.Gentle Overview of Big Data and Spark 1. What Is Apache Spark? Apache Spark's Philosophy Context: The Big Data Problem History of Spark The Present and Future of Spark Running Spark Downloading Spark Locally Launching Spark's Interactive Consoles Running Spark in the Cloud Data Used in This Book 2. A Gentle Introduction to Spark Spark's Basic Architecture Spark Applications Spark's Language APIs Spark's APIs Starting Spark The SparkSession DataFrames Partitions Transformations Lazy Evaluation Actions Spark UI An End-to-End Example DataFrames and SQL Conclusion 3. A Tour of Spark's Too1set Running Production Applications Datasets: Type-Safe Structured APIs Structured Streaming Machine Learning and Advanced Analytics Lower-Level APIs SparkR Spark's Ecosystem and Packages Conclusion
Part II.Structured APls——DataFrames, SQL, and Datasets 4. Structured API Overview DataFrames and Datasets Schemas Overview of Structured Spark Types DataFrames Versus Datasets Columns Rows Spark Types Overview of Structured API Execution Logical Planning Physical Planning Execution Conclusion 5. Basic Structured Operations Schemas Columns and Expressions Columns Expressions Records and Rows Creating Rows DataFrame Transformations Creating DataFrames select and selectExpr Converting to Spark Types (Literals) Adding Columns …… 6.Working with Different Types of Data 7.Aggregations 8.Joins 9.Data Sources 10.Spark SQL 11.Datasets
Part IV.Production Applications 15.HowSparkRunson a Cluster 16.Developing Spark Applications 17.Deploying Spark 18.Monitoring and Debugging 19.Performance Tuning
Part V.Streaming 20.Stream Processing Fundamentals 21.Structured Streaming Basics 22.Event-Time and Stateful Processing 23.Structured Streaming in Production
Part VI.Advanced Analytics and Machine Learning 24.Advanced Analytics and Machine Learning Overview 25.Preprocessing and Feature Engineering 26.Classification 27.Regression 28.Recommendation 29.Unsupervised Learning 30.Graph Analytics 31.Deep Learning
Part VII.Ecosystem 32.Language Specifics:Python(PySpark)and R(SparkR and sparklyr) 33.Ecosystem and Community