We provide IT Staff Augmentation Services!

Big Data Engineer Resume

VA

SUMMARY

  • SDLC: 10 years of experience in analyzing business needs of clients, developing effective and efficient solutions and ensuring client deliverable within committed deadlines.
  • Teradata: 6 Years of experience in Teradata, Performance Tuning.
  • Informatica: 10 years of strong data warehouse experience in building and managing data warehouses.
  • Databases: 10 years of experience using Teradata, Oracle, DB2, SQL Server, SQL, PL/SQL.
  • AWS: 2 years of experience using EC2, S3, EBS, EMR, Redshift, RDS, Athena, Kinesis, Glue, Lambda, Step Functions, VPC, CloudWatch, IAM and CloudFormation.

PROFESSIONAL EXPERIENCE

Data Warehousing: Informatica Power Center, ETL, OLAP, OLTP, Tidal.

Dimensional Data Modeling: Dimensional Data Modeling, Star Schema Modeling, Snow - Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling, ERwin 4.5/4.0.

Teradata: Teradata, BTEQ, FastLoad, MultiLoad, FastExport

Programming: SQL, PL/SQL, Python, PySpark, Unix Shell Scripting.

Environment: HP-UX, IBM AIX, Win XP, MS Dos

Databases: Teradata, Oracle, IBM DB2 UDB, MS SQL Server, MS Access.

AWS: EC2, S3, EBS, EMR, Redshift, RDS, Athena, Kinesis, Glue, Lambda, Step Functions, VPC, CloudWatch, IAM, CloudFormation.

DevOps tools: Git, Jenkins, Bamboo, ArgoCD, Docker, OpenShift Container Platform, Terraform

PROFESSIONAL EXPERIENCE

Confidential, VA

Big Data Engineer

Responsibilities:

  • Involved in Designing, analyzing, architecting and testing various application models and integrating them based on different business rules for decision processing.
  • Supported post release Big data validation and worked with Project team, internal/external stakeholder to improve existing database applications.
  • Experience implementing ETL using Data Pipeline and Glue.
  • Expertise on Spark streaming, Spark SQL, Tuning and Debugging the Spark Cluster (MESOS).
  • Devised procedures to source data from APIs into excel models and automated data processing and transformation using Python
  • Built web scrapers in Python to streamline the data collection from several sources for supporting business needs
  • Design and Develop ETL Processes in AWS Glue using PySpark to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.
  • Used Jupyter to generate scripts in PySpark to automate the workflow.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining and advanced data processing.
  • Design, build and deliver reusable foundational frameworks for Enterprise Data Lake workloads, and drive adoption of the frameworks to enable the delivery teams
  • Partner with Data Lake Architects to identify and resolve functional gaps in the Cloud Data architecture, and define new Cloud Architecture to meet emerging Data Lake needs
  • Utilized Informatica Power center, Developer Studio, Power Exchange for ETL solutions
  • Extensively worked on BTEQ scripts to load huge volume of data in to EDW
  • Performance tuning of Teradata, Oracle and SQL queries
  • Used utilities like Fast Load and Multi Load to insert data in Teradata Tables
  • Created Stored Procedures, Functions, Triggers, Packages and Macros
  • Performed Unit Testing and Code Deployment.
  • Created Unix Shell scripting for filename validation and data validation
  • Understood the existing business application, reviewed and analyzed the project requirements
  • Tasked the developers onshore and offshore for project deliverables and code reviewing after development done
  • Used OpenShift Container Platform to orchestrate the deployment, scaling and management of Docker Containers.
  • Used Jenkins as a continuous integration tool for automation of daily process.
  • Implemented a Continuous Delivery framework by using Jenkins.
  • Experience on agile methodologies and used JIRA for Sprint tracking.

Environment: AWS, Python, PySpark, Databricks, Informatica, Teradata, Oracle, SQL Server, DynamoDB, Unix, Tidal scheduling tool, Tableau, SSRS .

Confidential, FL

Informatica Lead / Architect

Responsibilities:

  • Defined development process methodology and best practices
  • Reviewed the code developed by developers
  • Participated in User meetings, gathering requirements and translating user inputs into Technical Specification documents
  • Designed and created technical specification documentation
  • Performance tuning of Teradata SQL queries
  • Used utilities like Fast Load and Multi Load to insert data in Teradata Tables
  • Developed unit test cases for different scenarios
  • Developing Informatica mappings & tuning them when necessary

Environment: Teradata, Informatica, Oracle, MSSQL Server, UNIX.

Confidential, OH

Senior Informatica/Teradata/BI Developer

Responsibilities:

  • Mentored offshore team on daily activities and deliverables
  • Reviewed the code developed by offshore team
  • Followed work break down structure (WBS) for assigning tasks to offshore team
  • Reviewed and translated BRD/BSD into technical specifications design (TSD)
  • Developed technical specifications design (TSD) for Claims tracking
  • Coordinated with SME for technical clarifications
  • Extensively worked on BTEQ scripts to load huge volume of data in to EDW
  • Loaded data into some of the X-ref tables
  • Loaded data in to Landing Zone (LZ) Teradata tables, applied transformations and then loaded the data in to conformed staging area (CSA)
  • Participated in User meetings, gathering requirements and translating user inputs into Technical Specification documents
  • Performance Tuned Informatica Mappings and Teradata Components
  • Used utilities like Fast Load and Multi Load to insert data in Teradata Tables
  • Designed and developed reports using Cognos BI tool
  • Developed scripts to parameterize the date values for the incremental extracts
  • Extensively worked on Informatica 8.6.1 to extract the data and load it in to LZ

Environment: Informatica Power Center 8.6, Teradata, Erwin, BTEQ, Enterprise Architect, Meta data Manager, ER studio, Cognos, Oracle 10g, Windows NT/2000, HP-Unix, WLM.

Confidential, CT

Senior Informatica Developer

Responsibilities:

  • Involved in the analysis, design, implementation, testing of applications using Informatica
  • Developed Stored Procedures and Functions when necessary
  • Used TOAD developer tool for testing and scripting the codes
  • Scheduled Informatica jobs maintaining dependency between steps

Environment: Informatica Power Center 8.1/7.x/6.x, Oracle 9i/10g, TOAD.

Hire Now