Big Data Engineer Resume
VA
SUMMARY
- SDLC: 10 years of experience in analyzing business needs of clients, developing effective and efficient solutions and ensuring client deliverable within committed deadlines.
- Teradata: 6 Years of experience in Teradata, Performance Tuning.
- Informatica: 10 years of strong data warehouse experience in building and managing data warehouses.
- Databases: 10 years of experience using Teradata, Oracle, DB2, SQL Server, SQL, PL/SQL.
- AWS: 2 years of experience using EC2, S3, EBS, EMR, Redshift, RDS, Athena, Kinesis, Glue, Lambda, Step Functions, VPC, CloudWatch, IAM and CloudFormation.
PROFESSIONAL EXPERIENCE
Data Warehousing: Informatica Power Center, ETL, OLAP, OLTP, Tidal.
Dimensional Data Modeling: Dimensional Data Modeling, Star Schema Modeling, Snow - Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling, ERwin 4.5/4.0.
Teradata: Teradata, BTEQ, FastLoad, MultiLoad, FastExport
Programming: SQL, PL/SQL, Python, PySpark, Unix Shell Scripting.
Environment: HP-UX, IBM AIX, Win XP, MS Dos
Databases: Teradata, Oracle, IBM DB2 UDB, MS SQL Server, MS Access.
AWS: EC2, S3, EBS, EMR, Redshift, RDS, Athena, Kinesis, Glue, Lambda, Step Functions, VPC, CloudWatch, IAM, CloudFormation.
DevOps tools: Git, Jenkins, Bamboo, ArgoCD, Docker, OpenShift Container Platform, Terraform
PROFESSIONAL EXPERIENCE
Confidential, VA
Big Data Engineer
Responsibilities:
- Involved in Designing, analyzing, architecting and testing various application models and integrating them based on different business rules for decision processing.
- Supported post release Big data validation and worked with Project team, internal/external stakeholder to improve existing database applications.
- Experience implementing ETL using Data Pipeline and Glue.
- Expertise on Spark streaming, Spark SQL, Tuning and Debugging the Spark Cluster (MESOS).
- Devised procedures to source data from APIs into excel models and automated data processing and transformation using Python
- Built web scrapers in Python to streamline the data collection from several sources for supporting business needs
- Design and Develop ETL Processes in AWS Glue using PySpark to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.
- Used Jupyter to generate scripts in PySpark to automate the workflow.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining and advanced data processing.
- Design, build and deliver reusable foundational frameworks for Enterprise Data Lake workloads, and drive adoption of the frameworks to enable the delivery teams
- Partner with Data Lake Architects to identify and resolve functional gaps in the Cloud Data architecture, and define new Cloud Architecture to meet emerging Data Lake needs
- Utilized Informatica Power center, Developer Studio, Power Exchange for ETL solutions
- Extensively worked on BTEQ scripts to load huge volume of data in to EDW
- Performance tuning of Teradata, Oracle and SQL queries
- Used utilities like Fast Load and Multi Load to insert data in Teradata Tables
- Created Stored Procedures, Functions, Triggers, Packages and Macros
- Performed Unit Testing and Code Deployment.
- Created Unix Shell scripting for filename validation and data validation
- Understood the existing business application, reviewed and analyzed the project requirements
- Tasked the developers onshore and offshore for project deliverables and code reviewing after development done
- Used OpenShift Container Platform to orchestrate the deployment, scaling and management of Docker Containers.
- Used Jenkins as a continuous integration tool for automation of daily process.
- Implemented a Continuous Delivery framework by using Jenkins.
- Experience on agile methodologies and used JIRA for Sprint tracking.
Environment: AWS, Python, PySpark, Databricks, Informatica, Teradata, Oracle, SQL Server, DynamoDB, Unix, Tidal scheduling tool, Tableau, SSRS .
Confidential, FL
Informatica Lead / Architect
Responsibilities:
- Defined development process methodology and best practices
- Reviewed the code developed by developers
- Participated in User meetings, gathering requirements and translating user inputs into Technical Specification documents
- Designed and created technical specification documentation
- Performance tuning of Teradata SQL queries
- Used utilities like Fast Load and Multi Load to insert data in Teradata Tables
- Developed unit test cases for different scenarios
- Developing Informatica mappings & tuning them when necessary
Environment: Teradata, Informatica, Oracle, MSSQL Server, UNIX.
Confidential, OH
Senior Informatica/Teradata/BI Developer
Responsibilities:
- Mentored offshore team on daily activities and deliverables
- Reviewed the code developed by offshore team
- Followed work break down structure (WBS) for assigning tasks to offshore team
- Reviewed and translated BRD/BSD into technical specifications design (TSD)
- Developed technical specifications design (TSD) for Claims tracking
- Coordinated with SME for technical clarifications
- Extensively worked on BTEQ scripts to load huge volume of data in to EDW
- Loaded data into some of the X-ref tables
- Loaded data in to Landing Zone (LZ) Teradata tables, applied transformations and then loaded the data in to conformed staging area (CSA)
- Participated in User meetings, gathering requirements and translating user inputs into Technical Specification documents
- Performance Tuned Informatica Mappings and Teradata Components
- Used utilities like Fast Load and Multi Load to insert data in Teradata Tables
- Designed and developed reports using Cognos BI tool
- Developed scripts to parameterize the date values for the incremental extracts
- Extensively worked on Informatica 8.6.1 to extract the data and load it in to LZ
Environment: Informatica Power Center 8.6, Teradata, Erwin, BTEQ, Enterprise Architect, Meta data Manager, ER studio, Cognos, Oracle 10g, Windows NT/2000, HP-Unix, WLM.
Confidential, CT
Senior Informatica Developer
Responsibilities:
- Involved in the analysis, design, implementation, testing of applications using Informatica
- Developed Stored Procedures and Functions when necessary
- Used TOAD developer tool for testing and scripting the codes
- Scheduled Informatica jobs maintaining dependency between steps
Environment: Informatica Power Center 8.1/7.x/6.x, Oracle 9i/10g, TOAD.