Senior Data Engineer Resume
Atlanta, GA
SUMMARY
- Over 7 years of experience in Business Intelligence, Data Engineering and Data Warehousing
- Hands on experience in Infrastructure Development and Operations. Designed and deployed applications using AWS services like EC2, S3, Glue, Lambda, EMR, VPC, RDS, Auto scaling, Cloud Formation, Cloud Watch, Redshift, Athena and Kinesis Data Firehose and Data Streams
- Experienced in extract transform and load (ETL) processing large datasets of different forms including structured, semi - structured and unstructured data
- Experience in understanding business requirements for analysis, database design & development of applications
- Strong SQL development skills including writing Stored Procedures, Triggers, Views, and User Defined functions
- Experience developing Real Time processes
- Extensive Shell/Python scripting experience for Scheduling and Process Automation
- Versatile in deploying the content to Cloud platform on AWS using AWS, S3
- Experience in configuring AWS IAM and Security Group in public and private Subnets in VPC
- Expertise in converting AWS existing infrastructure to server less architecture (AWS Lambda, Kinesis) and deployed AWS Cloud formation
- Extensively worked on Jenkins by installing, configuring, and maintaining the purpose of Continuous Integration (CI) and for End-to-End automation for all build and deployments and in implementing CI/CD for database using Jenkins
- Strong work ethics with desire to succeed and make significant contributions to the organization
- Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
TECHNICAL SKILLS
Big Data Technologies: AWS, S3, Lambda, Triggers, Glue, EMR, Kinesis, Redshift, Hadoop, HDFS, Hive, MapReduce, Pig, Flume, Oozie, HBase, Spark
Programming Languages: Python, Java, Scala
Databases: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g
Scripting Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell
ETL Tools: Cassandra, HBase
Operating System: Linux, Windows XP/7/8/10
Software Life Cycle: SDLC, Waterfall, Agile
Office Tools: MS-Office, MS-Project, Risk Analysis tools, Visio
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Senior Data Engineer
Responsibilities:
- Designed and built full end-to-end Data Warehouse infrastructure from the ground up on Redshift for large scale data handling Thousands of records every day
- Implemented and Managed ETL solutions and automating operational processes
- Design and develop ETL integration patterns using Python on Spark
- Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs
- Wrote various data normalization jobs for new data ingested into Redshift
- Worked at optimizing volumes and EC2 instances and created multiple VPC instances and on IAM to create new accounts, roles and groups
- Implemented SparkRDD transformations to map business analysis and apply actions on top of Transformations
- Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS
- Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
- Integrated services like GitHub, AWS CodePipeline, Jenkins and AWS Elastic Beanstalk to create a deployment pipeline
- Created monitors, alarms and notifications for EC2 hosts using CloudWatch
- Implemented new projects builds framework using Jenkins as build framework tools.
Environment: AWS, S3, Redshift, Kinesis Firehose, Kinesis Data Streams, Cloud Watch, GIT, Apache, Spark, Python, PySpark, MySQL, Shell Scripts, Lambda, Cloud Formation, Cloud Trail, Cloud Front, Docker
Confidential
Data Engineer
Responsibilities:
- Designed and implemented scalable, secure cloud architecture based on Amazon Web Services. Leveraged AWS cloud services such as EC2; auto-scaling; and VPC (Virtual Private Cloud) to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts and can quickly evolve during development iterations
- Designed and Deployed AWS Solutions using EC2, S3, EBS, Elastic Load balancer (ELB), auto-scaling groups and OpsWorks
- Worked at optimizing volumes and EC2 instances and created multiple VPC instances and on IAM to create new accounts, roles and groups
- Experience in creating alarms and notifications for EC2 instances using Cloud Watch
- Experience involving configuring S3 versioning and lifecycle policies and backup files and archive files in glacier and Creating Lambda function to automate snapshot back up on AWS and set up the scheduled backup
- Extracting the data and sourcing it to Data Scientist Team.
- Worked closely with the Data Scientist team and helping them in understanding the data
- Used the AWS-CLI to suspend an AWS Lambda function processing an Amazon Kinesis stream, then to resume it again
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework
- Worked on Designing, Installing and Implementing Ansible configuration management system and writing playbooks for Ansible using YAML and deploying applications
Environment: GIT, Jenkins, AWS, EC2, VPC, S3, EBS, ELB, OpsWorks, IAM, Cloud Watch, Lambda, CLI, Kinesis, SQL, Lambda
Confidential
Big Data Engineer
Responsibilities:
- Developed simple to complex MapReduce streaming jobs for processing and validating the data
- Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioural data into HDFS for analysis
- Developed MapReduce and Spark jobs to discover trends in data usage by users
- Implemented Spark using Python and Spark SQL for faster processing of data
- Developed Pig Latin scripts to perform Map Reduce jobs
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Written automated HBase test cases for data quality checks using HBase command line tools
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Used Tez framework for building high performance jobs in Pig and Hive
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
- Configured Kafka to read and write messages from external programs as well as to handle real time data
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
- Written Storm topology to emit data into Cassandra DB as well as accept data from Kafka producer and process the data
- Developed interactive shell scripts for scheduling various data cleansing and data loading process
- Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data
Environment: Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Oozie, HBase, Kafka, Spark streaming, Flume, Storm, Tez, Impala, Mahout, Cassandra, Cloudera manager, MySQL, Windows, Unix
Confidential
Java Developer
Responsibilities:
- Worked with the front-end applications using HTML, CSS and Java Script
- Responsible for developing various modules, front-end and back-end components using several design patterns based on client's business requirements
- Designed and Developed application modules using spring and Hibernate frameworks
- Designed and developed the front-end with Swings and Spring MVC framework, Tag libraries and Custom Tag Libraries and development of Presentation Tier using JSP pages integrating AJAX, Custom Tag's, JSP Tag Lists, HTML, JavaScript and JQuery
- Used Hibernate to develop persistent classes following ORM principles
- Deployed spring configuration files such as application context, application resources and application files
- Used Java-J2EE patterns like Model View Controller (MVC), Business Delegate, Session facade, Service Locator, Data Transfer Objects, Data Access Objects, Singleton and factory patterns
- Worked with Maven for build scripts and Setup the Log4J Logging framework
- Managing the version control for the deliverables by streamlining and re-basing the development streams of the SVN
Environment: Java/JDK, J2EE, Spring MVC, Hibernate, Eclipse, XML, JavaScript, Maven2, Web Services, JQuery, SVN, JUnit, Windows, Oracle