We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Around 6 years of work experience in IT consisting of Data Analytics Engineering & as a Programmer Analyst.
  • Experienced with cloud platforms like Amazon Web Services, Azure, Databricks (both on Azure as well as AWS integration of Databricks)
  • Proficient with complex workflow orchestration tools namely Oozie, Airflow, Data pipelines and Azure Data Factory, CloudFormation & Terraforms.
  • Implemented Data warehouse solution consisting of ETLs, On - premise to Cloud Migration and good expertise building and deploying batch and streaming data pipelines on cloud environment.
  • Worked on Airflow 1.8(Python2) and Airflow 1.9(Python3) for orchestration and familiar with building custom Airflow operators and orchestration of workflows with dependencies involving multi-clouds.
  • Leveraged Spark as ETL tool for building data pipelines on various cloud platforms like AWS EMRs, Azure HD Insights and MapR CLDB architectures.
  • Career Interest and future aspirations include but not limited to: ML, AI, RPA & Automation everywhere motives.
  • Spark for ETL follower, Databricks Enthusiast, Cloud Adoption & Data Engineering enthusiast in Open source community.
  • Proven expertise in deploying major software solutions for various high-end clients meeting the business requirements such as Big Data Processing, Ingestion, Analytics and Cloud Migration from On-prem to Cloud.
  • Proficient with Azure Data Lake Services (ADLS), Databricks & iPython Notebooks formats, Databricks Deltalakes & Amazon Web Services (AWS).
  • Orchestration experience using Azure Data Factory, Airflow 1.8 and Airflow 1.10 on multiple cloud platforms and able to understand the process of leveraging the Airflow Operators.
  • Developed and Deployed various Lambda functions in AWS with in-built AWS Lambda Libraries and also deployed Lambda Functions in Scala with custom Libraries.
  • Expertise understanding of AWS DNS Services through Route53. Understanding of Simple, Weighted, Latency, Failover & Geolocational Route types.
  • Expertise understanding of AWS Network and Content Delivery services through Virtual Private Cloud (VPC).
  • Hands on Expertise and Functionality knowledge of IPs, Access Control Lists, Subnets, NAT Instances & Gateways, VPC-Peering Custom VPCs and Bastions.
  • Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sight.
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
  • Writing CloudFormation Templates in JSON for Network and Content Delivery of the AWS Cloud Environment.
  • Addressing complex POCs according to business requirements from the technical end.
  • Writing test cases for achieving the Unit test accomplishment.
  • Active Agile team player in Production support, Hotfix deployment, Code Reviews, System Design & Review, Test cases, Sprint planning and Demos.
  • Effectively communicate with business units and stake holders and provide strategic solutions according to the client’s requirements.

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Added value to Digital Manufacturing Data Products by contributing as an Insights Data Analyst building business driven sourcing solutions for based on Business users’ requirements.
  • Data Stack typically includes - AWS, Snowflake, DynamoDB, S3, RDSs, AI & ML Data exploration, RPA-co-relations & causations, Spark SQL, SQLs, Data Modeling, Tableau, Excel
  • Communicated data analytic finds for Digital Manufacturing, Audit Data, Distribution Centers analysis etc.
  • Design Review contributor to cross functional technical teams and communicating technical finds to Visualizations Developers.
  • Investigate Data Quality issues and generate presentable narratives based on biases possible due to incompleteness of data.
  • Good Understanding of Data ingestion, Airflow Operators for Data Orchestration and other related python libraries.
  • Worked on Python APIs calls and landed data to S3 from external sources.
  • Analyze Machine Data and create excel visualization plots as a story narrative for Business Users and Product Owners.
  • Used Tableau and Excel for Visualization charts etc. and regularly communicating finding with Product Owners.
  • Worked with Data Engineers and Data Scientists to help and understand the gaps between the Product Integrity Datasets and Digital Manufacturing by leveraging the Advanced Analytics.
  • Worked on Tableau Visualization Charts and daily status Dashboards.
  • Worked in an Agile Environment and indulged in Design Review and End-to-End UATs and assisted QA in automating test cases.
  • Unit testing, UAT testing and End-to-End automation design reviews.
  • Demonstrated good communication skills and story narratives while Sprint Demos to leadership and Stake holders.

Environment: PySpark, AWS, S3, Snowflake, Elastic Map Reduce, Tableau, Airflow, SQL, SSIS, Excel, DynamoDB, Snowflake, Python 3, Spark SQL, NumPy, Sci-kit, Pandas, Boto3, S3cmd etc.

Confidential

Data Engineer

Responsibilities:

  • Worked creating, developing & production support for AAS models on retail sales, POS, Corporate Forecast data by developing & deploying spark pipelines for ensuring continuous delivery of data from various cross-functional teams like Omni-channel, Demand-Supply, Global Logistics and deliver enterprise cleansed datasets to BI Engineering & Data Science teams.
  • Collaborate Solution Architects, Principle Engineers, Data Scientists, DevOps team, Data Governance & Business Analysts to understand the precise business needs of the acceptance criteria and ensuring the deliverables of data products leveraged for Corporate Forecasts and Forecast Accuracy biases by Business & Product owners.
  • Data Stack: Azure Databricks, ADLS, ADF, AAS, DAX, Azure Automation Accounts, Azure Active Directory(AD), Azure IAM security Groups, Pyspark, Spark SQL, Azure Data warehouse (ADW), Power BI, DAX coding, MSBI, SSAS, CI/CD and Production Support.
  • Performed end-to-end delivery of pyspark ETL pipelines on Azure-databricks to perform the transformation of data orchestrated via Azure Data Factory (ADF) scheduled through Azure automation accounts and trigger them using Tidal Schedular.
  • Created Azure AAS Models from scratch by designing the Dimensional tables derived from fact table and Normalize data to create a tabular model on top which PowerBI reports are generated by business users.
  • Created security groups through CI/CD process and associated the object-IDs to the user groups based on the business domain filters.
  • Used Enterprise GitHub and Azure DevOps Repos for version control. Good understanding of branching strategies while collaborating with peer groups and other teams on shared repositories.

Environment: Azure Databricks, Azure Databases, Azure Devops, Azure Repos, Pyspark, Delta-lake, Azure Data-warehouse, Tidal Scheduler, Azure Data Factory(ADF), Data Lake Storage (ADLS), Analysis Services (AAS), Databricks (DBRX), PowerBI, SQL Server Management Studio (SSMS),Azure Automation Accounts, Runbooks, Webhooks, SparkSQL.

We'd love your feedback!