书籍详情
《流式系统StreamingSystems数据系统处理流式作业和批处理作业的正确》[51M]百度网盘|亲测有效|pdf下载
  • 流式系统StreamingSystems数据系统处理流式作业和批处理作业的正确

  • 出版社:一键团图书专营店
  • 热度:11853
  • 上架时间:2024-06-30 09:38:03
  • 价格:0.0
书籍下载
书籍预览
免责声明

本站支持尊重有效期内的版权/著作权,所有的资源均来自于互联网网友分享或网盘资源,一旦发现资源涉及侵权,将立即删除。希望所有用户一同监督并反馈问题,如有侵权请联系站长或发送邮件到ebook666@outlook.com,本站将立马改正

内容介绍

基本信息

书名:流式系统

定价:128元

作者:TylerAkidau,SlavaChernyak,ReuvenLax 著

出版社:东南大学出版社

出版日期:2019-06-01

ISBN:9787564183677

字数:

页码:

版次:

装帧:平装

开本:16开

商品重量:

内容提要

在传统的数据处理流程中,总是先收集数据,然后将数据放到DB中。当人们需要的时候通过DB对数据做query,得到答案或进行相关的处理。这样看起来虽然合理,但是结果却的紧凑,尤其是在一些实时搜索应用环境中的某些具体问题,类似于MapReduce方式的离线处理并不能很好地解决问题。这就引出了一种新的数据计算结构---流计算方式。它可以很好地对大规模流动数据在不断变化的运动过程中实时地进行分析,捕捉到可能有用的信息,并把结果发送到下一计算节点。《流式系统()》讲解流计算原理。

作者介绍

Tyler Akidau是Google的软件工程师,担任着Data Processing Languages & Systems小组技术负责人的职务。他也是Apache Beam PMC的创始成员。
Slava Chernyak是Google的软件工程师。他花了六年时间研究Google内部的大规模流式数据处理系统。
ReuveLax是Google的软件工程师,在过去十年间一直在帮助制定Google的数据处理和分析策略,同时他也是Apache Beam PMC的成员。

目录

Preface Or: What Are You Getting Yourself Into Here?
Part Ⅰ.The Beam Model
1.Streaming 101
Terminology: What Is Streaming?
Othe Greatly Exaggerated Limitations of Streaming
Event Time Versus Processing Time
Data Processing Patterns
Bounded Data
Unbounded Data: Batch
Unbounded Data: Streaming
Summary
2.The What, Where, When, and How of Data Processing
Roadmap
Batch Foundations: What and Where
What: Transformations
Where: Windowing
Going Streaming: Wheand How
When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things!
When: Watermarks
When: Early/On-Time~Late Triggers FTWI
When: Allowed Lateness (i.e., Garbage Collection
How: Accumulation
Summary
3.Watermarks
Definition
Source Watermark Creation
Perfect Watermark Creation
Heuristic Watermark Creation
Watermark Propagation
Understanding Watermark Propagation
Watermark Propagatioand Output Timestamps
The Tricky Case of Overlapping Windows
Percentile Watermarks
Processing-Time Watermarks
Case Studies
Case Study: Watermarks iGoogle Cloud Dataflow
Case Study: Watermarks iApache Flink
Case Study: Source Watermarks for Google Cloud Pub/Sub
Summary
4.Advanced Windowing
When/Where: Processing-Time Windows
Event-Time Windowing
Processing-Time Windowing via Triggers
Processing-Time Windowing via Ingress Time
Where: SessioWindows
Where: Custom Windowing
Variations oFixed Windows
Variations oSessioWindows
One Size Does Not Fit All
Summary
5.Exactly-Once and Side Effects
Why Exactly Once Matters
Accuracy Versus Completeness
Side Effects
Problem Definition
Ensuring Exactly Once iShuffle
Addressing Determinism
Performance
Graph Optimization
Bloom Filters
Garbage Collection
Exactly Once iSources
Exactly Once iSinks
Use Cases
Example Source: Cloud Pub/Sub
Example Sink: Files
Example Sink: Google BigQuery
Other Systems
Apache Spark Streaming
Apache Flink
Summary
Part Ⅱ.Streams and Tables
6.Streams and Tables
Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity
Toward a General Theory of Stream and Table Relativity
Batch Processing Versus Streams and Tables
A Streams and Tables Analysis of MapReduce
Reconciling with Batch Processing
What, Where, When, and How ia Streams and Tables World
What: Transformations
Where: Windowing
When: Triggers
How: Accumulation
A Holistic View Of Streams and Tables ithe Beam Model
A General Theory of Stream and Table Relativity
Summary
7.The Practicalities of Persistent State
Motivation
The Inevitability of Failure
Correctness and Efficiency
Implicit State
Raw Grouping
Incremental Combining
Generalized State
Case Study: ConversioAttribution
ConversioAttributiowith Apache Beam
Summary
8.Streaming SQL
What Is Streaming SQL?
Relational Algebra
Time-Varying Relations
Streams and Tables
Looking Backward: Stream and Table Biases
The Beam Model: A Stream-Biased Approach
The SQL Model: A Table-Biased Approach
Looking Forward: Toward Robust Streaming SQL
Stream and Table Selection
Temporal Operators
Summary
9.Streaming Joins
All Your loins Are Belong to Streaming
Unwindowed loins
FULL OUTER
LEFT OUTER
RIGHT OUTER
INNER
ANTI
SEMI
Windowed loins
Fixed Windows
Temporal Validity
Summary
10.The Evolutioof Large-Scale Data Processing
MapReduce
Hadoop
Flume
Storm
Spark
MillWheel
Kafka
Cloud Dataflow
Flink
Beam
Summary
Index

编辑推荐

如今,流式数据是大数据中的一个大问题。 随着越来越多的企业试图掌控遍布的无限海量数据集,流式系统终于到了足以被主流接纳的成熟度。通过这本实用指南,数据工程师、数据科学家和开发人员将学习到如何以概念化和无关于平台的方式处理流式数据。
基于对Tyler Akidau的热门博文《Streaming 101》和
《Streaming 102》的拓展,本书将带你从入门到细致入微地理解实时数据流处理的what、where、when和how。你还将与合著者Slava Chernyak和ReuveLax一起深入了解水印和exactly-once处理。
你将学习到:
如何比较流式和批量数据处理模式
健全的乱序数据处理背后的核心原理和概念
水印如何在无限数据集中跟踪进度和完整性
exactly-once数据处理技术如何确保正确性
流和表的概念如何构成批量和流式数据处理的基础
用现实世界的例子演示强大的持久状态机制背后的实用动机
时变关系(time-varying relations)如何将流处理和熟悉的SQL及关系代数世界联系起来

^_^:506caea64b1d80512a85b9e78ac9eaa0