AWS Data Engineer Resume

PROFESSIONAL SUMMARY:

Data Engineer with 6 years of experience in IT with exceptional expertise in Big Data/Hadoop ecosystem and DataAnalytics techniques.
Hands on experience working with Big Data/Hadoop ecosystem including Apache Spark, Map Reduce, Spark Streaming, PySpark, Hive, HDFS, AWS Kinesis, Airflow Dags, Oozie.
Proficient in Python Scripting and worked in stats function with NumPy, visualizationusing Matplotlib and Pandas for organizing data.
Experience working with NoSQL database including DynamoDB and HBase.
Experience in tuning and debugging Spark application and using Spark optimization techniques.
Experience in building PySpark and Spark - Scala applications for interactive analysis,batch processing and stream processing.
Hands on experience in creating real time data streaming solutions using Apache Spark Core, Spark SQL, and Data Frames.
Extensive knowledge in implementing, configuring, and maintaining Amazon Web Services (AWS) like EC2, S3, Redshift, Glue and Athena. processing, High availability, fault tolerance, and Scalability.
Expertise in developing Spark applications for interactive analysis, batch processing and stream processing, using programming languages like PySpark, Scala and Java.
Advanced knowledge in Hadoop based Data Warehouse (HIVE) and database connectivity (SQOOP).
Ample experience using Sqoop to ingest data from RDBMS - Oracle, MS SQL Server, Teradata, PostgreSQL, and MySQL.
Experience in working with various streaming ingest services with Batch and Real-time processing using Spark Streaming, Kafka.
Proficient in using Spark API for streaming real-time data, staging, cleaning, applying transformations, and preparing data for machine learning needs.
Extensive knowledge in working with Amazon EC2 to provide a solution for computing, query processing, and storage across a wide range of applications.
Expertise in using AWS S3 to stage data and to support data transfer and data archival. Experience in using AWSRedshit for large scale data migrations using AWS DMS and implementing CDC (change data capture).
Strong experience in developing LAMBDA functions using Python to automate data ingestion and tasks.
Working knowledge of Azure cloud components (HDInsight, Databricks, DataLake, Blob Storage, Data Factory, Storage Explorer, SQL DB, SQL DWH, CosmosDB).
Experienced in building data pipelines using Azure Data Factory, Azure Databricks, and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse,and controlling database access.
Extensive experience with Azure services like HDInsight, Stream Analytics, Active Directory, Blob Storage, Cosmos DB, and Storage Explorer.
Good knowledge in understanding the security requirements and implementation using Azure Active Directory, Sentry, Ranger, and Kerberos for authentication and authorizing resources.

TECHNICAL SKILLS:

Big Data ecosystem: HDFS, MapReduce, Spark, Yarn, Kafka, Hive, Airflow, Sqoop, HBase, Flume, Pig, Ambari, Oozie, Zookeeper, NiFi, Cassandra, Scala, Impala, Storm, Splunk, Tez, Flink, Stream Sets, Sentry, Ranger, Kibana.

Hadoop Distributions: Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP

Cloud Platforms(AWS/Azure): Amazon AWS - EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, SQS, DynamoDB, Redshift, KinesisMicrosoft Azure - Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL Database, SQL Data Warehouse, Cosmos DB, Active Directory

Scripting Languages: Python, Java, Scala, R, Shell Scripting, HiveQL, Pig Latin

NoSQL Database: Cassandra, Redis, MongoDB, Neo4j

Database: MySQL, Oracle, Teradata, MS SQL SERVER, PostgreSQL, DB2

ETL/BI Tools: Tableau, Power BI, Snowflake, Informatica, Talend, SSIS, SSRS, SSAS, ER Studio

Operating Systems: Linux (Ubuntu, Centos, RedHat), Unix, Macintosh, Windows (XP/7/8/10/11)

Methodologies: Agile/Scrum, Waterfall.

Version Control: Git, SVN, Bitbucket

Others: Docker, Kubernetes, Jenkins, Chef, Ansible, Jira, Machine learning, NLP, Spring Boot, Jupyter Notebook, Terraform.

PROFESSIONAL EXPERIENCE:

Confidential

AWS Data Engineer

Responsibilities:

Participate with technical staff team and business managers and practitioners in the business unit to determine requirements and functionalities needed in a project.
Performed wide, narrow transformations, actions like filter, Lookup, Join, count, etc. on Spark Data Frames.
Worked with Parquet files and ORC using PySpark, and Spark Streaming with Data Frames.
Developed batch and streaming processing apps using Spark APIs for functional pipeline requirements.
Automated data storage from streaming sources to AWS data lakes like S3, Redshift and RDS by configuring AWSKinesis (Data Firehose).
Performed analytics using real time integration capabilities of AWS Kinesis (Data Streams) on streamed data
Created PySpark code that uses Spark SQL to generate data frames from Avro formatted raw layer and writes them to data service layer internal tables as Parquet format.
Generated workflows through Apache Airflow, then Apache Oozie for scheduling the Hadoop jobs which controls large data transformations.
Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
Involved in the creation of Hive tables, loading, and analyzing the data by using hive queries.
Have worked on creating and configuration of EC2 instances on AWS (Amazon Web Services) for the establishment of clusters on the cloud.
Worked on CI/CD solution, using Git, Jenkins, Docker to setup and configure big data architecture on AWS cloud platform.
Analyzed the SQL scripts and designed the solution to implement using PySpark.
Experienced in writing Spark Applications in Scala and Python (PySpark).
Implement Spark applications using python to perform advanced procedures like text analytics and processing, utilizing data frames and Spark SQL API with in-memory computing capabilities of Spark for faster processing of data.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Used Spark API over MapReduce FS to perform analytics on data in Hive tables and HBase Tables.
Worked on AWSLambda to run servers without managing them and to trigger run code by S3 and SNS.
Working on integrating Kafka Publisher in spark job to capture errors from Spark Application and push into database.

Environment: Hadoop, Spark, Hive, HDFS, Kafka, UNIX, Shell, AWS Services, Python, Scala, GLUE, Oozie, SQL, AWS.

Confidential

Big Data Engineer

Responsibilities:

Worked closely with stake holders to understand business requirements to design quality technical solutions that align with business and IT strategies and comply with the organization's architectural standards.
Developed multiple applications required for transforming data across multiple layers of Enterprise Analytics Platform and implement Big Data solutions to support distributed processing using Big Data technologies.
Responsible for data identification and extraction using third-party ETL and data-transformation tools or scripts. (e.g., SQL, Python)
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) &Azure SQL DB).
Developed and managed Azure Data Factory pipelines that extracted data from various data sources, transformed it according to business rules, using python scripts that utilized Pyspark and consumed APIs to move data into an Azure SQL database.
Created a new data quality check framework project in Python that utilized pandas.
Implemented source control and development environments for Azure Data Factory pipelines utilizing Azure Repos.
Created Hive/Spark external tables for each source table in the Data Lake and written Hive SQL and Spark SQL to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Designed and developed ETL & ETL frameworks using Azure Data Factory and Azure Data Bricks.
Created generic data bricks NOTEBOOKs for performing data cleansing.
Created Azure Data factory pipelines to refactor on-prem SSIS packages into Data factory pipelines.
Working with Azure BLOB and Data Lake storage for loading data into Azure SQL Synapse (DW).
Ingested and transformed source data using Azure Data flows and Azure HDInsight.
Created Azure Functions to ingest data at regular intervals.
Created Data Bricks notebooks for performing complex transformations and integrated them as activities in ADF pipelines.
Written complex SQL queries for data analysis and extraction of data in required format.
Created Power BI DataMart’s and reports for various stakeholders in the business.
Created CI/CD pipelines using Azure DevOps.
Enhanced the functionality of existing ADF pipeline by adding new logic to transform the data.
Worked on Spark jobs for data preprocessing, validation, normalization, and transmission.
Optimized code and configurations for performance tuning of Spark jobs.
Worked with unstructured and semi structured data sets to aggregate and build analytics on the data.
Work independently with business stakeholders with strong emphasis on influencing and collaboration.
Daily participation in Agile based Scrum team with tight deadlines.
Created complex data transformations and manipulations using ADF and Scala.
Worked on cloud deployments using Maven, Docker, and Jenkins.
Experience in using Avro, Parquet, ORCand JSON file formats, developed UDFinHive.

Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, S3, EC2,MapR, HDFS, Hive, PIG, Apache Kafka,Sqoop, Python, Scala,PySpark,Shell scripting, Linux, MySQL, NoSQL.

Confidential

Hadoop Developer

Responsibilities:

Involved in importing data from Microsoft SQL Server, MySQL, Teradata into HDFS using Sqoop.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
Used Hive to analyze the partitioned and bucked data to compute various metrics of reporting.
Involved in creating Hive tables loading data, and writing queries that will run internally in MapReduce
Involved in creating Hive External tables for HDFS data.
Solved performance issues in Hive and PySpark Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
Worked withSparkfor improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
Implemented end-to-end ETL pipelines using Python and SQL for high-volume analytics. Reviewed use cases before onboarding to HDFS.
Automated deployments and routine tasks using UNIX Shell Scripting
Used Spark for transformations, event joins and some aggregations before storing the data into HDFS.
Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
Analyze the large amount of data sets to determine optimal way to aggregate.
Worked on the Oozie workflow to run multiple Hive jobs.
Worked on creating Custom Hive UDF's.
Developed automated shell script to execute Hive Queries.
Involved in processing ingested raw data using Python.
Monitored continuously and managed the Hadoop cluster using Cloudera manager.
Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
Involved in converting Hive/SQL queries into Spark transformations using Data frames.
Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
Expertise with NoSQL databases like HBase.
Experienced in managing and reviewing the Hadoop log files.
Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.

Environment: HDFS, MapReduce, Sqoop, Hive, Spark, Oozie, MySQL, Eclipse, Git, GitHub, Jenkins.

Confidential

Application Developer

Responsibilities:

Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
Prepared the High- and Low-level design document and Generating Digital Signature.
For analysis and design of application created Use Cases, Class and Sequence Diagrams.
For the registration and validation of the enrolling customer developed logic and code.
Developed web-based user interfaces using struts framework.
Handled Client-side Validations used JavaScript
Wrote SQL queries, stored procedures and enhanced performance by running explain plans.
Involved in integration of various Struts actions in the framework.
Used Validation Framework for Server-side Validations
Created test cases for the Unit and Integration testing.
Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.
Designed project related documents using MS Visio which includes Use case, Class and Sequence diagrams.
Writing end-to-end flow i.e., controllers’ classes, service classes, DAOs classes as per the Spring MVC design and writing business logics using core java API and data structures
Used Spring JMS related MDB to receive the messages from other team with IBM MQ for queuing
Developed presentation layer code, using JSP, HTML, AJAX and jQuery
Developed the Business layer using spring (IOC, AOP), DTO, and JTA
Developed application service components and configured beans using Spring IOC. Implemented persistence layer and Configured EH Cache to load the static tables into secondary storage area.
Involved in the development of the User Interfaces using HTML, JSP, JS, CSS and AJAX
Created tables, triggers, stored procedures, SQL queries, joins, integrity constraints and views for multiple databases, Oracle 11g using Toad tool.
Developed the project using industry standard design patterns like Singleton, Business Delegate Factory Pattern for better maintenance of code and re-usability.

Environment: Java, J2EE, Spring, Spring Batch, Spring JMS, MyBatis, HTML, CSS, AJAX, jQuery, JavaScript, JSP, XML, UML, JUNIT, IBM WebSphere, Maven, Clear Case, SoapUI, Oracle 11g, IBM MQ

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship