Prerequisite
- Install python
- Install Airflow, you can check this post for help.
Code
# import libraries from datetime import datetime, timedelta from airflow import DAG from airflow.operators.bash import BashOperator #define the default arguments default_args ={ 'owner' : 'ivy', 'retries': 3, 'retry_delay' : timedelta(minutes=5), #'depends_on_past': False, #'email': ['airflow@example.com'], #'email_on_failure': True, #'email_on_retry': False, #'catchup': False, } # define the DAG with DAG( dag_id = 'Simple_dag_illustration_v1', default_args = default_args, description = 'this is simple dag illustration', start_date = datetime(2024,3,1), schedule_interval= '@daily', #tags=['example','from_DAG'] ) as dag: # define your task(s) task1 = BashOperator( task_id = 'First_task_I_buy_grocery', bash_command= "echo ---the first step I buy food---", ) task2 = BashOperator( task_id = 'Second_task_I_cook', bash_command= "echo ---the second step I cook---" ) task3 = BashOperator( task_id = 'Third_task_I_eat', bash_command="echo ---the third step I eat---" ) task4 = BashOperator( task_id = 'Final_task_I_clean', bash_command="echo ---the last step I wash dishes" ) # manage the logic order of your tasks task1 >> task2 >> task3>> task4
The full version of args i
n default_args
typically includes the following attributes:
owner:
The owner of the DAG, usually the username or email address of the person responsible for maintaining the DAG.depends_on_past:
A boolean value indicating whether a task instance should depend on the previous task’s instance to succeed.start_date:
The start date of the DAG or the first task instance. This can be a specific date and time or a timedelta object relative to the current time.email:
An email address to receive notifications related to the DAG.email_on_failure
: A boolean value indicating whether to send email notifications on task failures.email_on_retry
: A boolean value indicating whether to send email notifications on task retries.retries
: The number of retries to perform for failed tasks.retry_delay
: The delay between retries for failed tasks.catchup
: A boolean value indicating whether to backfill or catch up with the historical schedule for the DAG.