Amazon MWAA: Enabling data engineers to easily execute data processing workflows in the cloud

By | 7:13 PM Leave a Comment

Amazon Web Services announced the general availability of Amazon Managed Workflows for Apache Airflow (MWAA), a new managed service that makes it easy for data engineers to execute data processing workflows in the cloud.

Apache Airflow is a popular open-source tool that helps customers author, schedule, and monitor workflows. With Amazon MWAA, customers can use the same familiar Airflow platform as they do today to manage their workflows, and enjoy improved scalability, availability, and security without the burden of having to build, scale, and manage the underlying infrastructure.

Amazon MWAA scales workflow execution capacity based on customer needs, and integrates with AWS security services to provide secure access to customers’ data. There are no up-front investments required to use Amazon MWAA and customers only pay for what they use.

Today, customers are using analytics and machine learning to derive insights from massive amounts of data. To effectively use this data, customers often need to first build a workflow that defines a series of sequential tasks to prepare and process the data.

Tens of thousands of customers use AWS Step Functions to visually build and run cost-effective and scalable event-driven workflows that execute tasks across multiple AWS services.

There are also customers who want the Apache Airflow orchestration workflow, which has an active open source community, a large library of pre-built integrations to third-party data processing tools like Apache Spark and Hadoop, and the ability to use Python scripts to create workflows.

However, using Apache Airflow requires data engineers to install, maintain, scale, and secure the Apache Airflow environments, which adds cost and operational complexity.

Furthermore, to support role-based authentication for secure access, Apache Airflow often requires a manual, iterative, and error-prone combination of configuration changes, command-line interface (CLI) commands, and, in some cases, edits to the Apache Airflow code.

Customers also must integrate and configure additional tools for alerting for issues like system downtime, workflow errors, and task execution delays. While customers really enjoy the pre-built integrations and familiar Python programming language of Apache Airflow, they want it without the added operational cost and complexity.

Amazon MWAA makes it easy for customers to build and execute Apache Airflow workflows in AWS. Amazon MWAA manages the provisioning and ongoing maintenance of Apache Airflow so customers no longer need to worry about patching, scaling, or securing self-managed Apache Airflow implementations.

With Amazon MWAA, compute resources that execute tasks are scaled on demand, providing consistent performance for users. Customer data is secure by default as workloads run in customers’ own isolated and secure cloud environments using Amazon’s Virtual Private Cloud (Amazon VPC), with stored data encrypted using AWS Key Management Service (AWS KMS).

Amazon MWAA makes it easy for customers to combine data using any of Apache Airflow’s integrations, including AWS services and popular third-party tools like Apache Hadoop, Presto, Hive, and Spark, to automate data processing, machine learning pipelines, and software development and operations.

Customers can provide role-based access to Apache Airflow’s user interface easily and securely via AWS Identity and Access Management (IAM), providing users Single Sign-On (SSO) access for scheduling and viewing their workflow executions.

Amazon MWAA automatically sends Apache Airflow system metrics and logs to AWS’s monitoring service, Amazon CloudWatch, making it easy for customers to view task execution delays and workflow errors across one or more environments without third party tools.

With Amazon MWAA, data engineers get the extensibility of Apache Airflow with the scalability, availability, and security of AWS.

“Customers have told us they really like Apache Airflow because it speeds the development of their data processing and machine learning workflows, but they want it without the burden of scaling, operating, and securing servers,” said Jesse Dougherty, Vice President, Application Integration, AWS.

“With Amazon MWAA, customers can use the same Apache Airflow platform as they do today with the scalability, availability, and security of AWS.”

Customers can launch a new Amazon MWAA environment from the AWS Management Console, CLI, AWS CloudFormation, or AWS SDK, and start running in minutes. Amazon MWAA is available in US East (Northern Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), Europe (Ireland), Europe (Frankfurt), and Europe (Stockholm), with more regions to come.

The Pokémon Company International, a subsidiary of The Pokémon Company in Japan, manages the property outside of Asia and is responsible for brand management, licensing, marketing, the Pokémon Trading Card Game, the animated TV series, home entertainment, and the official Pokémon website.

“Amazon Managed Workflows for Apache Airflow meshes with our security policy by providing single sign-on controlled access through IAM roles and the ability to restrict access to our Amazon Virtual Private Cloud,” said Eric Smith, Data Platform Engineer at The Pokémon Company International.

“With Amazon MWAA, we can focus on building reliable data pipelines that achieve business goals rather than patching and securing instances.”

Detroit-based Rocket Mortgage, the nation’s largest mortgage lender, enables the American Dream of homeownership and financial freedom through an industry-leading, digital-driven client experience.

“Amazon Managed Workflows for Apache Airflow has helped us grow and scale our data science and machine learning workflows with significantly less infrastructure overhead,” said Dan Jones, Senior Vice President of Data Intelligence for Rocket Mortgage.

“With this new service, our technology teams are able to deliver best-in-class, data-driven solutions faster than ever before.”

GoDaddy is the company that empowers everyday entrepreneurs. With more than 20 million customers worldwide, GoDaddy is the place people come to name their ideas, build a professional website, attract customers, and manage their work.

“Amazon Managed Workflows for Apache Airflow solves one of the biggest operational overheads with orchestration,” said Jeremy Zogg, Senior Director of Engineering at GoDaddy.

“We have spent a lot of hours setting up, configuring, scaling, and monitoring our on-premises Apache Airflow instances. This was our top challenge for our workflow deployments and we’re excited to migrate and concentrate on what we do best: harnessing the power of data to drive great outcomes for our customers and business.”


from Help Net Security https://ift.tt/365mDfz

0 comments:

Post a Comment