We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Irving, TexaS

SUMMARY

  • 7+ years of IT experience in all phases of Software Development with Agile methodology which includes User Interaction, Business Analysis/Modeling, Design & Development, Integration, Planning and testing, migration and documentation in applications using ETL pipelines and distributed applications.
  • 2+ years of Strong expertise in using Hadoop big data technologies Apache Spark, Scala, python, Kafka, Cassandra, Jenkins Pipelines, Kubernetes, Kibana, Rancher, GITHUB, Rancher, Kibana, Hadoop HDFS, Hive, IntelliJ, Cassandra, sql serveretc ..
  • Designing and developing Spark jobs with data frames using different file formats like Text, Sequence, Xml, parquet and Avro
  • Used Python with OpenStack, OpenERP (now ODOO), SQL Alchemy, Django CMS.
  • Excellent experience with Requests, NumPy, Matplotlib, SciPy, PySpark and Pandas python libraries during development lifecycle and experience in developing API's for the application using Python, Django, MongoDB, Express, ReactJS, and NodeJS.
  • Proficient of AWS services like VPC, GluePipelines, Glue Crawler, Cloud front, EC2, ECS, EKS, Elastic bean stalk, Lambda, S3, Storage gateway, RDBS, Dynamo db, Redshift, Elastic Cache, DMS, SMS, Data Pipeline, IAM, WAF, Artifacts, API gateway, SNS, SQS, SES, Auto Scaling, Cloud Formation, Cloud Watch and Cloud Trail
  • Knowledge about setting up Python REST API Framework using Django.
  • Worked with Google Cloud(GCP) Services likeBigQuery, Compute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP.
  • Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation Worked with Google Cloud(GCP) Services like Compute Engine, Cloud Functions, Cloud DNS,BigQuery, Cloud Storage and Cloud Deployment using GCP.
  • Worked with various transformations like Normalizer, expression, rank, filter, group, aggregator, lookups, joiner, sequence generator, sorter, SQLT, stored procedure, Update strategy, Source Qualifier, Transaction Control, JAVA, Union, CDC etc.,
  • Design Star Schema in GCP Big Query.
  • Using Rest API and Python to Ingest data from and other site to Big Query.
  • Strong experience with spark real time streaming data using Kafka and Spring boot API.
  • Loaded Data in Oracle Tables using SQL Loader.
  • Very good in data modeling knowledge in Dimensional Data modeling, Star Schema, Snow - Flake Schema, FACT and Dimensions tables.
  • Validating data against files and performing technical data quality checks to certify source and target/business usage.
  • Worked on optimizing volumes and EC2 instances and created multiple VPC instances. Deployed applications on AWS using Elastic Beanstalk and Implemented and set up Route53 for AWS Web Instances.
  • Coordinating with Business Users, functional Design team and testing team during the different phases of project development and resolving the issues.
  • Worked with different database oracle, SQL Server, Teradata, Cassandra SQL Programming.
  • 4+ years of Strong expertise in using ETL Tool Informatica Power Center 10.x/9.x/8.x (Designer, Workflow Manager, Repository Manager, ETL and Data Warehouse.
  • Experienced in using advanced concepts of Informatica like push down optimization (PDO).
  • Handled operations and maintenance support for AWS cloud resources, including launching, maintaining and troubleshooting EC2 instances, S3 buckets, Auto Scaling, DynamoDB, AWS IAM, and Elastic Load Balancers (ELB) and Relational Database Services (RDS). Also created snapshots for data to store in AWS S3.
  • Experience in writing UNIX shell scripts to process Data Warehouse jobs, file operation purpose and data analytics purpose.
  • Extensive experience with Data Extraction, Transformation, and Loading (ETL) from disparate data sources and targets like Multiple Relational Databases & Cloud Flatforms (APACHE Spark, Scala, HDFS file system, Hive, Cassandra, Teradata, Oracle, SQL SERVER, DB2,salesforce), xml and diff file structures
  • Excellent domain knowledge of Health care, Banking Financial, Manufacturing, Entertainment and Insurance, Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Experience in Performance Tuning and Debugging of existing ETL processes.
  • Very good in defining standards, methodologies and performing technical design reviews.
  • Excellent communication skills, interpersonal skills, self-motivated, quick learner, team player.

TECHNICAL SKILLS

Database: SQL(Oracle, SQL server), NoSQL(Cassandra, DynamoDB)

Operating Systems: Windows, UNIX, LINUX.

Programming Languages: SQL, NoSQL, Python, Java(basic), Scala, Pyspark, Batch scripts

Cloud: AWS, GCP

CI/CD: Jenkins

PROFESSIONAL EXPERIENCE

Confidential, Irving, Texas

Data Engineer

Responsibilities:

  • Participate in requirement grooming meetings which involves understanding functional requirements from business perspective and providing estimates to convert those requirements into software solutions(Design and Develop & Deliver the Code to IT/UAT/PROD and validate and manage data Pipelines from multiple applications with fast-paced Agile Development methodology using Sprints with JIRA Management Tool)
  • Responsible to check data in DynamoDB tables and to check EC2 instances are upon running for ( DEV, QA, CERT and PROD ) in AWS.
  • Experience in developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Analysison existing data flows and Create high level/low level technical design documents for business stakeholders that confirm technical design aligns with business requirements.
  • Extensively worked on Data Modeling involving Dimensional Data modeling, Star Schema/Snow flake schema, FACT & Dimensions tables, Physical & logical data modeling.
  • Creation and deployment of Spark jobs in different environments and loading data to no sql database Cassandra/Hive/HDFS. Secure the data by implementing encryption-based
  • Worked with Google Cloud(GCP) Services likeBigQuery,Compute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP.
  • Designed front end and backend of the application utilizing Python on Django Web Framework. Develop buyer-based highlights and applications utilizing Python and Django in test driven development and pair-based programming.
  • Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Glue Pipelines,Glue Crawler, Auto scaling groups, Optimized volumes, and EC2 instances and created monitors, alarms, and notifications for EC2 hosts using Cloud Watch.
  • Involved in Designing Snowflake Schema for Data Warehouse, ODS architecture by using tools like Data Model, Erwin.
  • Developed data models and data migration strategies utilizing concepts of snowflake schema.
  • Worked with Google Cloud(GCP) Services likeBigQuery,Compute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP.
  • Developing code using: Apache Spark and Scala, IntelliJ, NoSQL databases (Cassandra), Jenkins, Docker pipelines, GITHUB, Kubernetes, HDFS file System, Hive, Kafka for streaming Real time streaming data, Kibana for monitor logs etc.. authentication/authorization to the dataResponsible to deployments to DEV, QA, PRE-PROD (CERT) and PROD using AWS.
  • Scheduled Informatica Jobs through Autosys scheduling tool.
  • Created quick Filters Customized Calculations on SOQL for SFDC queries, Used Data loader for ad hoc data loads for Salesforce
  • Extensively worked on Informatica power center Mappings, Mapping Parameters, Workflows, Variables and Session Parameters.
  • Responsible for facilitating load data pipelines and benchmarking the developed product with the set performance standards.
  • Used Debugger within the Mapping Designer to test the data flow between source and target and to troubleshoot the invalid mappings.
  • Worked on SQL tools like TOAD and SQL Developer to run SQL Queries and validate the data.
  • Study the existing system and conduct reviews to provide a unified review on jobs.
  • Involved in Onsite & Offshore coordination to ensure the deliverables.
  • Involving in testing the database using complex SQL scripts and handling the performance issues effectively.

Environment: Apache spark 2.4.5, Scala2.1.1, Cassandra, HDFS, Hive, GitHub,Jenkins,kafka, Informatica PowerCenter 10.x, SQL Server 2008, Salesforce Cloud, Visio, TOAD, Putty, Autosys Scheduler, UNIX,AWS, snow flake,GCP,CSV,WinScp, Salesforce data loader, SFDC Developer console,python, Version One, Service Now etc.

Confidential, Los angels

Data engineer

Responsibilities:

  • Worked on multiple Modules, HCM Global Integration with different Region’s and ONECRM Salesforce Cloud.
  • Involved in gathering and analyzing the requirements and preparing business Requirements.
  • Analyze and develop Data Integration templates to extract, cleanse, transform, integrate and load to data marts for user consumption. Review the code against standards and checklists.
  • DevOps role converting existing AWS infrastructure to Server-less architecture (AWS Lambda, Kinesis) deployed via Cloud Formation.
  • Create High & Low level design documents for the various modules. Review the design to ensure adherence to standards, templates and corporate guidelines. Validate design specifications against the results from proof of concept and technical considerations.
  • Coordinate with the Application support team and help them assist understand the business and necessary components for the Integration, Extraction, Transformation and load data.
  • Used Python with OpenStack, OpenERP (presently ODOO), SQL Alchemy, DJango CMS and so forth.
  • Worked with Google Cloud(GCP) Services like Big Query, Compute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP.
  • Perform Analysis on the existing source systems, understand the Informatica/ETL/SQL/Unix based applications and provide the services which are required for development & maintenance of the applications.
  • Deployed the application using Docker and AWSConsole services.
  • Create a Deployment document for the developed code and provide support during the code migration phase.
  • Adept in working quickly and efficiently in close collaboration with stake holders along with creating TDS(technical data sheet)
  • Create Initial Unit Test Plan to demonstrate that the software, scripts and databases developed conforms to the Design Document.
  • Provides support during the integration testing and User Acceptance phase of the project. Also provide hyper care support post deployment.

Environment: Informatica 10.1.1,, Gcp, SQL server, AWS, Unix, Flat files, Autosys,Web services,HCM Oracle Fusion, Soup UI,Salesforce cloud,python, Oracle MDM,ESB.

Confidential, San Francisco

Data engineer

Responsibilities:

  • Create High & Low level design documents for the various modules. Review the design to ensure adherence to standards, templates and corporate guidelines. Validate design specifications against the results from proof of concept and technical considerations.
  • Worked on implementing pipelines and analytical workloads using big data technologies such as Hadoop, Spark, Hive and HDFS.
  • Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, Impala.
  • Perform Analysis on the existing source systems, understand the Informatica/Teradata based applications and provide the services which are required for development & maintenance of the applications.
  • Worked with Google Cloud(GCP) Services like Compute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP.
  • Coordinate with the Application support team and help them assist understand the business and necessary components for the Integration, Extraction, Transformation and load data.
  • Analyze and develop Data Integration templates to extract, cleanse, transform, integrate and load to data marts for user consumption. Review the code against standards and checklists.
  • Create a Deployment document for the developed code and provide support during the code migration phase.
  • Create Initial Unit Test Plan to demonstrate that the software, scripts and databases developed conforms to the Design Document.
  • Provides support during the integration testing and User Acceptance phase of the project. Also provide hyper care support post deployment.

Environment: Informatica power center 9.6.1, Power exchange, Teradata database and Utilities, Oracle, GCP, Python, Business Objects, Tableau, Flat files, UC4,big data,HDFS, Mastreo scheduler,Unix.

Confidential

ETL Developer

Responsibilities:

  • Building new ETL Designs to load DataMart’s.
  • Involved in High Level Requirements gathering and Estimations, Data modeling, Data Design, Logistics and environment set up for the project, Code reviews based on Target customized checklists.
  • Lead offshore developers and help them assist understand the business and the necessary components for the Data Integration, Extraction, Transformation and Load.
  • Developed ETL design and ETL tool to extract data from DB2, Oracle and Xml databases.
  • Extract Transformation and Loading process has been implemented with the help of Informatica power center & Power exchange, Main frames, shell script which populates the database tables, used for generating the reports with Business Objects.
  • Expertise in Build, Unit testing, System testing and User acceptance testing.
  • Designed documented report processing logic, Standardized process of report interaction to non-technical business users.
  • Coordinate Offshore team on deliverables, Make sure deliverables donot impact.

Environment: Informatica power center 9.6.1, Power exchange, Teradata database and Utilities,Visio, Oracle, Flat files, autosys scheduler,Unix.

Confidential

ETL Developer

Responsibilities:

  • Involved in SDLC using Informatica Power center, DMExpress ETL tool has been implemented with the help of Teradata load utilities.
  • Worked proficiently on different database versions across all platforms (Windows, Unix)
  • Involved in data scaling and data dicing.
  • Proficiency in Data Analysis, handling complex query building and performance tuning.
  • Developed data archiving, data loading and performance test suites using Etl tools like Power center & DM Express, Teradata, Unix, SSIS.
  • Expertise in extracting and analyzing data from existing data stores using power center and DM express tools and performing ad-hoc queries against warehouse environments such as Teradata.
  • Extensively worked on adhoc requests using different etl tools to load data.
Environment: Informatica Power Center, Teradata 13.11,DB2,SQL server, Flat files,DM Express, SSIS,SSRS, Unix Shell scripting, Microstrategy.

We'd love your feedback!