Getting Started With Apache Airlflow

Posted By : Piyush Khandelwal | 26-Jun-2020

bigdata linux postgresql python Data Analysis

Loading...

apache airflow

Apache Airflow is a WMS developed by Airbnb. It is a platform to programmatically create, schedule, analyze, and monitor workflows. A workflow can be anything from a straightforward Linux or Bash command to a fancy hive query, a python script to a Docker file. A workflow comprises of one or more tasks that are connected by a Directed Acyclic Graph. A Workflow which is termed as DAG in Airflow may be executed manually or can be automated with cron jobs. Success and failure during the execution of DAG's can be monitored, controlled, and re-triggered. DAG state may be alerted with SMTP, Slack, and other systems. Airflow offers an excellent UI that displays the states of all the tasks, diagnostic information about task execution, etc.

In Airflow, every workflow is a DAG. Each node in a DAG represents a task, and edges define the dependencies between tasks (The graph is enforced to be acyclic to prevent circular dependencies that may cause infinite execution loops).

A DAG consists of operators. An operator defines a task in the DAG that needs to be performed. There are multiple types of operators that are available in Airflow:

  • BashOperator - executes a bash command
  • PythonOperator - calls a Python function
  • EmailOperator - sends an email
  • SimpleHttpOperator - sends an HTTP request
  • MySqlOperator, SqliteOperator, PostgresOperator, MsSqlOperator, OracleOperator, JdbcOperator, etc. - Db Operators to executes a SQL command
  • Sensor - They will keep running until a certain criterion is met.

You can also create your custom operator, as per your requirement.

Installation

As Airflow is developed in python, the best way to install it is via using python package manager PIP.

pip install apache-airflow

You may also install Airflow with support for extra features like AWS or azure.

AWS - pip install 'apache-airflow[aws]'

azure - pip install 'apache-airflow[azure]'

The full list can be found here.

Now, Before you start anything, make sure you have created a folder and set it as AIRFLOW_HOME. You can keep the folder name whatever you want, say, airflow_home. Once created make sure you are in the parent folder of airflow_home and run this command

export AIRFLOW_HOME =<folder name>

Now, with airflow_home we will create another folder named dags to keep DAGs.

Now, you have to run this command,

airflow initdb

This command should create airflow.cfg, unittests.cfg in the airflow_home folder.

airflow.db is an SQLite file and airflow.cfg is to keep all the initial settings of airflow.

You can also use any other DB of your preference, you just have to change the value of sql_alchemy_conn in airflow.cfg.

Now you can run the server by this command.

airflow webserver

Conclusion

Now, You have understood how to install Apache Airflow in your system, and also have a basic understanding of how Apache Airflow works.

Apache Airflow is a workflow management platform that allows to programmatically author and schedule workflows and monitor them via the built-in Airflow user interface. At Oodles, we are an ERP development company with the goal of providing enterprises with futuristic, enterprise-wide solutions for all their business management needs. We offer development services for workforce scheduling, strategic workforce planning, and all round workforce management. Get in touch with our experts to implement software like Apache Airflow into your workforce management systems.