Sr. Aws Data Engineer Resume
Rosemont, IL
SUMMARY
- Having around 10+ years of experience in information technology industry with a strong background in System Analysis, Design and Development in the fields of AWS, BIGDATA and ETL mechanism as a part of the Decision Support Systems (DSS) initiatives.
- Involved in implementing large and complex projects.
- 4+ years of implementing big large - scale AWS in Ecommerce, Banking &Financial Sectors
- Proven Expertise on Dimensional Modeling like Star schema and Snowflake schema.
- Expertise in AWS Stack AWS, SNOWFLAKE, EC2,S3,IAM,LAMBDA,DATAPIPELINE,EMR,SNS,CloudWatch,AWS-REDSHIFT,DMS,ATHENA
- Proficient Knowledge and worked on AWS and BIG Data Technologies like HDFS, HIVE, SQOOP, EMR, SPARK AWS, REDSHIFT, EMR, EC2, DATAPIPELINE
- Expert in the Hadoop stack (Spark, HDFS, Sqoop, Hive)
- Worked on QuickSight for omni data
- Solid experience in managing and developing Ab-Initio applications for Extraction, Transformation, Cleansing and Loading into Data Warehouse/Data mart. Used Ab-Initio with Very Large Database Systems (VLDB) that are Massively Parallel (MPP)
- Proven Expertise in performing analytics on AWS Redshift and Big Data using Hive.
- Hands on experience with importing and exporting data from Relational data bases to HDFS, Hive and using Sqoop.
- Extensively worked on Moving data from Snowflake to RDS/Snowflake to S3
- Worked on migration of Snowflake from Oracle
- Used SPARK Streaming and Spark SQL to build low latency applications.
- Strong understanding of Hadoop internals, different compressions like AVRO, JSON, and different file formats
- Experience with configuration of Hadoop Ecosystem components: HDFS, Hive, Sqoop.
- Expertise in Troubleshooting and resolve Data Pipeline, S3, Redshift issues.
- Expertise in Creating AbInitio graphs to read and write to HDFS, Generic graphs, EME, Dependency Analysis, Conduct IT, Continuous flows, utilizing Rollup, Join, Sort, Normalize, Scan, Partition components to speed up the ETL process
- Worked on L1/L2/L3 Production Support Issues.
- Expertise in Resolving Service Now Tasks and Incidents
- Created Ad-hoc reports using SAP-BO Web Intelligence for end user’s requirements.
- Integration of Business Intelligence Reporting Solution (Tableau) with various databases like RDBMS/Hadoop.
- Involved in effort estimation, design, development, review, implementation and maintenance of AbInitio graphs.
- Understand and contribute towards Projects’ Technical Design as well, along with the requirement specifications. Involved in preparing HLD and LLD
- Have worked on Autosys and control-m scheduling tools
- Have played crucial lead role in handling production support L3issues which requires good analytical skill and quick response.
- Sound Skills in structured query language (Oracle SQL). Experience with all phases of SDLC including design, development review and maintenance.
PROFESSIONAL EXPERIENCE
Confidential, Rosemont, IL
Sr. AWS Data Engineer
Responsibilities:
- Data Solutions is an ongoing increment data load from various data source.
- Data Solutions feeds are transferred to Snowflake/MySQL/S3 process through ETL EMR process
- Extensively worked on Moving data from Snowflake to Snowflake to S3 for the TMCOMP/ESD feeds
- Extensively worked on Moving data from Snowflake to Snowflake to S3 for the LMA/LMA Search
- Expertise in Troubleshooting and resolve Data Pipeline related issues.
- Used enqueue/Init Lambda to trigger the EMR Process
- EMR used for all the Transformation of SQL and Python Scripts loads ETL process. informatica is used for picking the file s3 from Source
- Worked on CDMR migration project from Oracle to Snoflake
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Python On EMR
- Processed data from different sources to SNOWFLAKE and MYSQL.
- Implemented Spark using Python Spark SQL for faster testing and processing of data.
- Write the data and Loaded data into Data Lake environment (SNOWFLAKE) From AWS-EMR which was accessed by business users and data scientist s Using Tableau/OBIEE
- Copied data from s3 to Snowflake and connect with SQL workbench seamless importing and movement of data via S3.
- Worked on DMS to Process the data to SQL Server
- Met with business/user groups to understand the business process and fixed the High priority Production support issues
- Data solution team has been involved and supported all production support activities
- Served as a Subject Matter Expert on assigned projects.
- POC for LMA Search team end -to end production process
- Delivers MTTR Analysis Report every quarter and WSR Reports Weekly
Environment: AWS, EC2, S3, IAM,SQS,SNS,SNOWFLAKE, EMR-SPARK, RDS, JSON MySQL Workbench, ETL-Informatica, Oracle, Red Hat Linux, Tableau.
Confidential
AWS Lead Data Engineer
Responsibilities:
- Rivet-Know me is an ongoing increment data load from various data source.
- Rivet- Know me feeds are transferred through ETL process from three types of data Sources Hybris, Omniture, Responsys is transferred to AWS S3
- Used DMS to load source data into Type 1 and Type 2 Redshift.
- Extensively worked on Moving data from Snowflake to RDS/Snowflake to S3 for the ESD and CLM projects
- Expertise in Troubleshooting and resolve Data Pipeline related issues.
- Worked on L1/L2/L3 Production Support Issues.
- Expertise in Resolving Service Now Tasks and Incidents
- Used Lambda to trigger the files using Data pipelines
- Data pipeline used for all the Transformation of SQL and Python Scripts loads in Redshift for Incremental load process and ETL-Talend is used for picking the file s3 to targets
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Python On EMR
- Processed data from different sources to AWS Redshift using EMR - Spark, Python programming.
- Implemented Spark using Python and Scala and Spark SQL for faster testing and processing of data.
- Write the data and Loaded data into Data Lake environment (AWS-Redshift) From AWS-EMR and Data Pipelines
- AWS - Redshift which was accessed by business users and data scientist s Using Tableau
- Copied data from s3 to Redshift and connect with SQL workbench seamless importing and movement of data via S3.
- Worked on DMS to Process the data to Redshift for Business/Analytics Team
- SQL Workbench for Redshift brings greater access to your data faster
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Met with business/user groups to understand the business process and fixed the High priority Production support issues
- Leading the KNOWME team and involved and supported all production support activities
- Served as a Subject Matter Expert on assigned projects.
- Worked on QuickSight POC for omni data
- POC for KNOWME/GMI team end -to end production process
- Delivers MTTR Analysis Report every quarter and WSR Reports Weekly
Environment: AWS, EC2, S3, IAM, DATAPIPELINE, EMR-SPARK, RDS, REDSHIFT, RDS, HADOOP, AVRO, JSON SqlWorkBench, ETL-TALEND, Oracle, Red Hat Linux, Tableau.
Confidential, Lewisville, TX
Sr. Data Analyst
Responsibilities:
- As a part of the CCB data ecosystem, Chase Data Services (CDS) is a suite of applications, a set of reusable engines and services that will be deployed to allow for automation, data processing, Big Data component and platform
- Served as a Subject Matter Expert on assigned projects.
- Handled data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Imported data using Sqoop to load data from RDMS to HDFS on regular basis
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Real time streaming the data using Spark with Kafka
- Processed data from different sources to hive target using Python Spark programming
- Write the data into HBase, Hive target From Kafka consumer
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Spark-SQL/Streaming for faster testing and processing of data.
- Leveraged big data to solve strategic, tactical, structured, and unstructured problems.
- Assisted in technical specifications and other deliverable documents.
- Supported Tableau Dashboards whereas source is from different Data Sources like RDBMS/Hadoop.
Environment: Hadoop, HDFS, Python, MapReduce, Sqoop, Informatica, Ab-initio, Hive, Oracle, Java, RDMS, Sqoop, Kafka, Spark, DB2, Tableau
Confidential, NY
Sr. Technology lead
Responsibilities:
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.
- Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed.
- Developing Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Experienced in managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Involved in installing Hadoop ecosystems (Hive, Sqoop, HBase, Oozie) on top of Hadoop cluster
- Importing data from SQL to HDFS & Hive for analytical purpose.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Worked on clean dependency analysis.
- Created AbInitio graphs to read and write to HDFS, utilizing Rollup, Join, Sort, Replicate, Partition components to speed up the ETL process.
- Understand and contribute towards Projects’ Technical Design as well, along with the requirement specifications.
- Testing the solution to validate project objectives.
- Managing the application end-to-end delivery, Ownership of the quarterly application release development and ensure a smooth User Acceptance Testing, issues resolution
- Preparation and review of Test Plans/Scenarios/Test Cases at development, IST, UAT, prod stages.
- Close participation in all stages of SDLC creation of AbInitio development work products that conform to the stated business requirements and high-level design documents.
- Done appropriate unit-level testing of work products and the management of the review process for the AbInitio deliverables.
- Tracks and reports on issues and risks, escalating as needed
- Expertly handles last minute requests and stressful situations
- Develop test strategy based on design/architectural documents, requirements, specifications and other documented sources
- Develop test cases and test scripts based on documented sources
- Close participation in all stages of SDLC creation using Agile methodology
- Organizing events & conducting Presentations, Trainings, Effective Meetings, Project Status Reporting to Senior Management.
- Coordinates with other team members to ensure that all work products integrate together as a complete solution and adopts a supporting role to any other team member to resolve issues, or to complete tasks sooner
Environment: Hadoop, HDFS, Sqoop, Hive, RDMS, Sqoop, AbInitio ETL Tool, oracle, Teradata, UNIX and Autosys.