top of page

Blogs, Research & Past Meetups

Who is a Data Engineer? A Modern Peek in 2024

Updated: Oct 25

Maybe it’s the 6 figure salaries, the opportunity to work with cool technology or people are finally learning that data engineering is where everything starts in the data field.


Whatever the reason, people are noticing.


VCs are investing in data storage and ingestion platforms and companies are interviewing more data engineers compared to previous years.


But how does one become a data engineer? If you were to Google data engineering roadmap, then you would find a very large image of an overwhelming roadmap that has been going around Linkedin for the past few weeks with over a decade of learning.


It’s too much.


So in this article, we will provide the steps of how you can go from 0 to data engineer with a combination of free courses as well as paid that can help you gain the skills you need to be a data engineer.


But before diving into that, let’s make sure you know what a data engineer is.


What Is Data Engineer?


Data engineers move, remodel, and manage data sets from 10s if not 100s of internal company applications so analysts and data scientists don’t need to spend their time constantly pulling data sets.


They may also create a core layer of data that lets different data sources connect to it to get more information or context.


These specialists are usually the first people to handle data. They process the data so it’s useful for everyone, not just the systems that store it.


There are obvious reasons to become a data engineer — like a high salary and numerous opportunities due to limited competition within the job market — but we’re not focusing on those today. Instead, consider the following thoughts, which are a bit more relevant to the job description.


There are following tools and knowledge base a Data Engineer shall know and have proficiency over.


Building Your Base(SQL, Coding, Linux)


Before getting deep into data engineering specifics you need a solid base.


It can be tempting to start learning some of the concepts and skills that are further along the lines of distributed computing or streaming. But that’s like learning words and sentences before you learn what letters are.


That’s why you need to start with SQL, programming, and some form of server/Linux basics.

You need to be able to speak to computers in their language and these three skills will help you understand how to communicate with computers from various layers.


Building this solid foundation will ensure that you reduce your future learning curves because to interact with many of the other technical components, you will need to understand some form of programming language or command line basics.


Also, learning the basics in terms of servers such as SFTP, firewalls, PGP, and other technical components will go a long way.


Building A Flask or FAST API


You will need to interact with APIs on a daily basis if you become a data engineer. Either to automate processes or pull data.


In that way, building an API is a great first project because it will force you to use several layers of technology.


You will need to understand concepts like ports, HTTP requests, coding, command line, and if you really want to make it interesting, maybe even play around with the cloud by spinning up a VM to run your API off of.

But that’s a stretch goal. Let’s start easy.


Flask is a great python library that you can quickly spin up an API in no time. But I don’t expect you to just know how to build your first API.


Data Warehousing And Data Pipelines(ETLs, ELTs, and ELs)


When you look at the skill sets of data engineers, software engineers, and data scientists, there is a lot of cross-over.


All three tend to use Python, both data scientists and data engineers tend to use SQL pretty heavily and all three rely to some degree on some understanding of Linux.


So what differentiates data engineers?


One of the big differentiators is the focus on data warehouses and data pipelines.

But what are these?


Data warehouses and data pipelines. At least to start.


Data warehouses and data pipelines are concepts that data engineers need to understand. They are the bread and butter of any good DE.


Applying Coding And Data Warehousing


Now that you have learned about data pipelines and data warehouses, it would be a great idea to apply this knowledge.


So let’s build your second project, to solidify that knowledge. Let’s aim to implement these 4 concepts below.

  • Scrape an online data source

  • Store encrypted data into SFTP

  • Create dimensional model

  • Pull data from SFTP and load into Data Warehouse(Don’t worry too much about Workflows just yet)

At this point, this will bring many of the skills you have learned together. Whether it be learning about PGP encryption, SFTP, or data modeling.


Airflow And Docker


You will notice I have 2 step 5s. Well, that’s because we are getting to the point where order matters a little less. Steps 6, 7, 8, and so on. Could probably get a little jumbled and you would be fine.


At this point, you should have a solid enough base that any new technology that comes your way shouldn’t have the same learning curve.


Cloud and NoSQL


At this point, you have probably already done a little on the Cloud and maybe even played around with a NoSQL database.


But, let’s round out that knowledge.


How? Well, there are a few great options when it comes to rounding out knowledge. For example, now I think it would be a good time to take a certificate program.


Streaming And Distributed Systems


There are so many ways to process data in the modern world. More importantly, using more complex systems such as streaming or distributed systems is so much easier than it’s ever been.


You can spin up a fully managed service on AWS or GCP and you’re off to the races. No need to spin up 5 other services just to try to wrangle and manage your streaming system.


Enough UI/UX and Dashboarding


You may need to show basic dashboarding with Streamlit, Dash or Any other framework. Your initial role may be just to pipe and display information on a screen.


Cloud Services and Platforms


You should be well literate about major cloud platforms and their offerings. critical thinking is required to frame solution and provide scalable architectures. Major list of cloud platforms include AWS , Azure, & GCP.


We hope you loved the insight around the Data Engineers Stack and remember don't be overwhelmed. We got you covered. If you are looking to Upskill here is the opportunity to become part of the most decorated curriculums.


Certified Data Engineer (CDE)


A COURSE THAT ENABLES YOU TO UNDERSTAND & BUILD SCALABLE DATA PIPELINES ON PREMISES & CLOUD. 3.5 MONTHS INDUSTRY ACCREDITED CERTIFICATE PROGRAM THAT ENABLES COMPUTER SCIENCE PROFESSIONALS TO UPGRADE THEMSELVES INTO A BILLION DOLLAR DATA ENGINEERING FIELD. TAUGHT BY PROFESSIONALS IN PRODUCTION, THE COURSE CONSISTS OF ARCHITECTING DATA INGESTION, TO MODELLING DATAWAREHOUSE IN BOTH SQL AND NOSQL FLAVORS & FINALLY DEPLOYING END TO END STRUCTURES ON CLOUD AS APIs & APPLICATIONS. THE COURSE ALSO ACT AS A PREPARATORY FOR ANY MAJOR CLOUD CERTIFICATION (GCP & AWS)

Comments


Commenting has been turned off.
bottom of page