Airflow

Beschreibung

Airflow terms and concepts.
Zdeněk Šimůnek
Karteikarten von Zdeněk Šimůnek, aktualisiert more than 1 year ago
Zdeněk Šimůnek
Erstellt von Zdeněk Šimůnek vor etwa 4 Jahre
11
0

Zusammenfassung der Ressource

Frage Antworten
DAG Directed Acyclic Graph - Collection of tasks, their dependencies and settings. - Defined in .py script as code.
XCom Feature for cross communication between tasks.
dags_folder - The folder where airflow pipelines live. - This path must be absolute. - Airflow looks in your DAGS_FOLDER for modules that contain DAG objects in their GLOBAL NAMESPACE and adds the objects it finds in the DagBag.
DAG Run - An instance of a DAG, containing task instances that run for a specific execution_date. - Created by the Airflow scheduler or an external trigger.
Task - A Task defines a unit of work within a DAG; it is represented as a node in the DAG graph, and it is written in Python. - Each task is an implementation of an Operator.
Operator An operator describes a single task in a workflow.
Sensor An Operator that waits (polls) for a certain time, file, database row, S3 key, etc.
chain(op1, [op2, op3], [op4, op5], op6) op1 >> [op2, op3] op2 >> op4 op3 >> op5 [op4, op5] >> op6
Task Instance An instance of a task - that has been assigned to a DAG and has a state associated with a specific DAG run (i.e for a specific execution_date).
execution_date The logical date and time for a DAG Run and its Task Instances.
Jinja Jinja is a modern and designer-friendly templating language for Python, modelled after Django’s templates.
Hooks - Hooks are interfaces to external platforms and databases like Hive, S3, MySQL, Postgres, HDFS, and Pig. - Hooks implement a common interface when possible, and act as a building block for operators.
Pools Airflow pools can be used to limit the execution parallelism on arbitrary sets of tasks.
Connections The information needed to connect to external systems is stored in the Airflow metastore database. A conn_id is defined there, and hostname / login / password / schema information attached to it. Airflow pipelines retrieve centrally-managed connections information by specifying the relevant conn_id.
Zusammenfassung anzeigen Zusammenfassung ausblenden

ähnlicher Inhalt

Code Challenge Flow Chart
Charlotte Hilton
Flvs foundations of programming dba 2
mariaha vassar
psycholgoy as level topic 2 - memory
Talya Hambling
EDEXCEL IGCSE (9-1) COMPUTER SCIENCE
CreativeKai 03
Chapter 10: Medical coding
Kelly Martin
Basic Python - Strings
Rebecca Noel
Our Story
Natalia R
Coding Test!
vapetrop
HTML Tags Mindmap
Julia C.Wozniak
Operating Systems (OS)
rwc.carlton