What is AWS Glue?
Articles,  Blog

What is AWS Glue?


AWS Glue is a cloud optimized Extract Transform and Load Service –
ETL for short. It allows you to organize, locate, move and transform all your data sets across your business, so you can put them to use. Glue is different from other ETL products in three important ways. First, Glue is serverless. You simply point Glue to all your ETL jobs and hit run. You don’t need to provision, configure or spin up servers. And certainly, you don’t need to manage their lifecycle. Second, Glue provides crawlers with automatic schema inference for your semi-structured and structured data sets. Crawlers automatically discover all your data sets, discover your file types, extract the schema and store all this information in a centralized metadata catalog for later querying and analysis Third, Glue automatically generates the scripts that you need to extract, transform and load your data from source to Target so you don’t have to start from scratch. Let’s see how all this works with an example Imagine that you’re an app developer, and you’ve embarked on an ad campaign to Increase adoption. You want to know where to invest your dollars. Suppose that all your ad-click logs are stored in an S3 bucket in a semi-structured format like JSON Suppose also, that your user profile data is sitting inside of a database in RDS stored in structured relations. And now, what you want to do is move all this data into a Redshift data warehouse so you can analyze and understand which demographics are actually contributing to your adoption. Well, you can point crawlers to all your databases and your S3 buckets, and they’ll automatically discover your datasets, infer the data structures inside your files and extract all the schema and store this information in tables inside the data catalog. All these table definitions will refer to your source data and will have the schema information that’s necessary to read, parse and query your source data. You can then point blue to these tables, and it will automatically generate scripts that are needed to extract and transform that data into tables in Redshift These scripts will flatten all semi-structured data, no matter how complex the data is. These scripts will transform the input into target data types throw away unneeded Columns. These scripts are actually quite forgiving and they’ll adapt to any of the changes in the structure of the input and the output. And finally you can customize these scripts using an intuitive, graph-based user interface in the console or you can just edit the scripts directly yourself. Now remember Glue is serverless, so it will actually execute these scripts on your behalf. You don’t need to spin up servers. It will do it behind the scenes. It will also access all of the data sources that it needs, automatically process that data and then load it into the data warehouse for later analysis. AWS Glue can help you do in three simple steps what used to be a month-long development process. You can organize, move and transform your data sets and put them to use for your business We invite you to try out Glue for yourself.

12 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *