16 Dec 24

Obtaining an AWS data engineering certification

I’ve been working with data systems for many years. Most of those systems had at least some fairly substantial elements in AWS. Whether they be Data Pipeline, Redshift, Dynamo, or Lambdas. I saw a post about someone else obtaining their cert for data engineering, including their experience. I figured I was familiar with and would be able to cover the exam subject matter relatively easily.

I learned there was more to the exam than just knowing a few services. The biggest theme for me was learning how AWS expects customers to use the services for common data tasks. Such as movement. Movement into AWS and movement once inside AWS.

I used a combination of external and AWS provided learning materials. The AWS Skill Builder material was quite good. It framed everything in the context of how AWS thinks. This is important.

For those not familiar, the cert exams don’t ask specific questions about a given service. The majority of the exam questions provide a scenario and ask for the best solution. Some questions provide perfectly valid solutions, plural, but look for the solution with the best cost savings. Some questions are looking for a combination of services that could achieve the given task. Some questions specify a “real-time” solution versus solutions with latency. You have to pay attention to the keywords in the question. For example, conditional data processing. There are multiple ways to process data with AWS services. But “conditional” almost always meant using Step Functions. Little details matter in the question wording.

It’s a given that you have to learn as much as possible about Redshift. It’s the data warehouse. You have to understand the overall patterns of Kinesis. Glue is a central piece of data movement inside AWS, specifically for ETL. Lambdas are such a general-purpose tool that knowing how to use them with other services like S3 is important. S3 and IAM are givens. Learn how to use them. Learn how to secure items. Learn how to achieve cost optimization with S3 tiering.

Some services that I was less familiar with that were covered pretty extensively were Lake Formation and Event Bridge. I spent a fair amount of time learning how these services worked and interacted with the greater ecosystem.

The exam covers moving data into AWS a fair amount. Services like Database Migration Service and Schema Conversion Tool are mentioned.

I didn’t spend a lot of time on services like the time series database–Timestream or the graph database–Neptune. But they were covered on the exam, and luckily, I knew enough about them in general that I was able to answer the questions correctly.

One area of the exam that I thought was out of scope was general networking. VPCs, security groups, etc. But they aren’t. They are covered in some of the security questions.

Another area that was touched on quite a bit was the differences between similar services like Aurora and RDS. And to a certain extent Dynamo. Dynamo seemed like a wild card in a lot of the prep material and on the exam itself. It’s very cool technology but I wonder if customers are having a hard time understanding or knowing when best to use it. Maybe the read/write capacity provisioning plays a part. I’m not sure but it didn’t seem to have a clear use case like the other services in the context of the exam.

Overall, I learned a lot in preparing to take the exam. I can relate to Dan Luu’s post about getting to the 95%-ile. I put in a moderate amount of intentional studying for the exam, on top of my years of experience. According to the post, the distance between a novice and an apparent expert isn’t the mythical 10,000 hours, but a few hours a week of effort and an intention to improve. I was already fairly knowledgeable about some AWS services that I had used in the past, but my goal was to improve through intentional studying. The exam itself was not easy. I am glad I put in the time to study. It was rewarding learning.

According to the Jonathan Fields Sparked assessment. I am a Maven. I just like to learn. The journey of obtaining the AWS data engineering cert aligns with that.