Casa > W > What Are The Differences/Advantages/Disadvantages Of Azkaban Vs. Oozie?

What are the differences/advantages/disadvantages of Azkaban vs. Oozie?

(I work on and with Oozie, I've played with Azkaban for evaluation purposes)

Evaluation based on playing with Azkaban 0.6 and Oozie 2.2.1 and their documentation:

What do Azkaban and Oozie do?

  • Both allow to run a series of map-reduce, pig, java & scripts actions a single workflow job
  • Both allow regular scheduling of workflow jobs


On the Functional side

Writing workflows

  • Azkaban uses a series of Properties files
  • Oozie uses an XML file


Expressing workflows

  • Azkaban uses topological sort (similar to Make/Ant)
  • Oozie uses a Direct Acyclic Graph (DAG) (PDL sytle)


Supported types of actions out of the box

  • Azkaban supports: java, javaprocess and pig
  • Oozie supports: mapreduce (java, streaming, pipes), pig, java, filesystem, ssh, sub-workflow


[In addition, Cloudera's Oozie version supports Hive & Sqoop actions]

Parameterization of workflows

  • Azkaban supports variables, i.e.: ${input}
  • Oozie supports variables and functions, i.e.: ${fs:dirSize(myInputDir)}


Alternate Execution Paths

  • Azkaban fixes execution path at workflow start time
  • Oozie supports decision nodes allowing the workflow to make decisions


Regular Scheduling

  • Azkaban interval job scheduling is time based
  • Oozie interval job scheduling is time & input-data-dependent based


Resource Control

  • Azkaban support resource locks (read/write/counter)
  • Oozie does not have explicit support for resource control


On the Implementation Side

Runtime

  • Azkaban runs as standalone (one workflows) or server (one user, multi workflows)
  • Oozie runs as server (multi user, multi workflows)


Actions Execute

  • Azkaban, actions run in the Azkaban server as the user running Azkaban
  • Oozie, actions run in the Hadoop cluster as the user that submitted the workflow


Workflows Submission, Management & Monitoring (server)

  • Azkaban, browser/HTML only
  • Oozie, command-line, HTTP REST, Java API, Browser/HTML (monitoring)


State of Running Workflows

  • Azkaban keeps state of all running workflows in memory
  • Oozie uses a SQL database, a workflow state is in memory only when doing a state transition


Resource Consumption

  • Azkaban holds at least 1 thread per running workflows
  • Oozie only uses a thread when the workflows is doing a state transition


Failover

  • Azkaban, on failure all running workflows are lost
  • Oozie, running workflows continue running from their current state

De Leeke

Qual é a melhor aplicação/site GTD/Organization para Desktop e androide? :: Qual telemóvel é o melhor para usar hackers?