Data Engineer Resume
4.00/5 (Submit Your Rating)
SUMMARY
- Around 6 years of work experience in IT consisting of Data Analytics Engineering & as a Programmer Analyst.
- Experienced with cloud platforms like Amazon Web Services, Azure, Databricks (both on Azure as well as AWS integration of Databricks)
- Proficient with complex workflow orchestration tools namely Oozie, Airflow, Data pipelines and Azure Data Factory, CloudFormation & Terraforms.
- Implemented Data warehouse solution consisting of ETLs, On - premise to Cloud Migration and good expertise building and deploying batch and streaming data pipelines on cloud environment.
- Worked on Airflow 1.8(Python2) and Airflow 1.9(Python3) for orchestration and familiar with building custom Airflow operators and orchestration of workflows with dependencies involving multi-clouds.
- Leveraged Spark as ETL tool for building data pipelines on various cloud platforms like AWS EMRs, Azure HD Insights and MapR CLDB architectures.
- Career Interest and future aspirations include but not limited to: ML, AI, RPA & Automation everywhere motives.
- Spark for ETL follower, Databricks Enthusiast, Cloud Adoption & Data Engineering enthusiast in Open source community.
- Proven expertise in deploying major software solutions for various high-end clients meeting the business requirements such as Big Data Processing, Ingestion, Analytics and Cloud Migration from On-prem to Cloud.
- Proficient with Azure Data Lake Services (ADLS), Databricks & iPython Notebooks formats, Databricks Deltalakes & Amazon Web Services (AWS).
- Orchestration experience using Azure Data Factory, Airflow 1.8 and Airflow 1.10 on multiple cloud platforms and able to understand the process of leveraging the Airflow Operators.
- Developed and Deployed various Lambda functions in AWS with in-built AWS Lambda Libraries and also deployed Lambda Functions in Scala with custom Libraries.
- Expertise understanding of AWS DNS Services through Route53. Understanding of Simple, Weighted, Latency, Failover & Geolocational Route types.
- Expertise understanding of AWS Network and Content Delivery services through Virtual Private Cloud (VPC).
- Hands on Expertise and Functionality knowledge of IPs, Access Control Lists, Subnets, NAT Instances & Gateways, VPC-Peering Custom VPCs and Bastions.
- Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sight.
- Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
- Writing CloudFormation Templates in JSON for Network and Content Delivery of the AWS Cloud Environment.
- Addressing complex POCs according to business requirements from the technical end.
- Writing test cases for achieving the Unit test accomplishment.
- Active Agile team player in Production support, Hotfix deployment, Code Reviews, System Design & Review, Test cases, Sprint planning and Demos.
- Effectively communicate with business units and stake holders and provide strategic solutions according to the client’s requirements.
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer
Responsibilities:
- Added value to Digital Manufacturing Data Products by contributing as an Insights Data Analyst building business driven sourcing solutions for based on Business users’ requirements.
- Data Stack typically includes - AWS, Snowflake, DynamoDB, S3, RDSs, AI & ML Data exploration, RPA-co-relations & causations, Spark SQL, SQLs, Data Modeling, Tableau, Excel
- Communicated data analytic finds for Digital Manufacturing, Audit Data, Distribution Centers analysis etc.
- Design Review contributor to cross functional technical teams and communicating technical finds to Visualizations Developers.
- Investigate Data Quality issues and generate presentable narratives based on biases possible due to incompleteness of data.
- Good Understanding of Data ingestion, Airflow Operators for Data Orchestration and other related python libraries.
- Worked on Python APIs calls and landed data to S3 from external sources.
- Analyze Machine Data and create excel visualization plots as a story narrative for Business Users and Product Owners.
- Used Tableau and Excel for Visualization charts etc. and regularly communicating finding with Product Owners.
- Worked with Data Engineers and Data Scientists to help and understand the gaps between the Product Integrity Datasets and Digital Manufacturing by leveraging the Advanced Analytics.
- Worked on Tableau Visualization Charts and daily status Dashboards.
- Worked in an Agile Environment and indulged in Design Review and End-to-End UATs and assisted QA in automating test cases.
- Unit testing, UAT testing and End-to-End automation design reviews.
- Demonstrated good communication skills and story narratives while Sprint Demos to leadership and Stake holders.
Environment: PySpark, AWS, S3, Snowflake, Elastic Map Reduce, Tableau, Airflow, SQL, SSIS, Excel, DynamoDB, Snowflake, Python 3, Spark SQL, NumPy, Sci-kit, Pandas, Boto3, S3cmd etc.
Confidential
Data Engineer
Responsibilities:
- Worked creating, developing & production support for AAS models on retail sales, POS, Corporate Forecast data by developing & deploying spark pipelines for ensuring continuous delivery of data from various cross-functional teams like Omni-channel, Demand-Supply, Global Logistics and deliver enterprise cleansed datasets to BI Engineering & Data Science teams.
- Collaborate Solution Architects, Principle Engineers, Data Scientists, DevOps team, Data Governance & Business Analysts to understand the precise business needs of the acceptance criteria and ensuring the deliverables of data products leveraged for Corporate Forecasts and Forecast Accuracy biases by Business & Product owners.
- Data Stack: Azure Databricks, ADLS, ADF, AAS, DAX, Azure Automation Accounts, Azure Active Directory(AD), Azure IAM security Groups, Pyspark, Spark SQL, Azure Data warehouse (ADW), Power BI, DAX coding, MSBI, SSAS, CI/CD and Production Support.
- Performed end-to-end delivery of pyspark ETL pipelines on Azure-databricks to perform the transformation of data orchestrated via Azure Data Factory (ADF) scheduled through Azure automation accounts and trigger them using Tidal Schedular.
- Created Azure AAS Models from scratch by designing the Dimensional tables derived from fact table and Normalize data to create a tabular model on top which PowerBI reports are generated by business users.
- Created security groups through CI/CD process and associated the object-IDs to the user groups based on the business domain filters.
- Used Enterprise GitHub and Azure DevOps Repos for version control. Good understanding of branching strategies while collaborating with peer groups and other teams on shared repositories.
Environment: Azure Databricks, Azure Databases, Azure Devops, Azure Repos, Pyspark, Delta-lake, Azure Data-warehouse, Tidal Scheduler, Azure Data Factory(ADF), Data Lake Storage (ADLS), Analysis Services (AAS), Databricks (DBRX), PowerBI, SQL Server Management Studio (SSMS),Azure Automation Accounts, Runbooks, Webhooks, SparkSQL.