We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Charlotte, NC

PROFESSIONAL SUMMARY

  • Experienced Data Engineer with 8+ years of demonstrated history of working in the various industries. Worked in Big data Technologies, Spark, AWS EMR, ETL tools like Talend and Informatica, PL/SQL, Shell scripting, Autosys, Oracle Database, Netezza, Redshift, Postgres.
  • Data Engineer having end to end experience in developing data pipelines in Hadoop ecosystem and AWS.
  • Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, SQS.
  • Experience in creating data loading patterns to AWS S3 and Snowflake when migrating data from on - Prem data warehouses.
  • Experience in developing Python ETL jobs run on AWS services and integrating with enterprise systems like Enterprise logging and alerting, enterprise configuration management and enterprise build and versioning infrastructure.
  • Experience in using Terraform for building AWS infrastructure services like EC2, Lambda and S3.
  • Strong understanding in cloud security and experience in working with IAM roles, policies and IAM trust policies.
  • Experience with Microsoft Azure cloud for implementation and migration of solutions
  • Experience in integrating applications with enterprise CI/CD pipelines.
  • Experience working with Snowflake, a cloud-based data warehouse.
  • Experience in analysing data using Hadoop Ecosystem including HDFS, Hive, Spark, Spark Streaming, MLLib, Nifi, Elastic Search, Kibana, Kafka, HBase, Zookeeper, PIG, Sqoop, Flume.
  • Experience in importing and exporting data using spark connectors for NoSQL databases like Cassandra and HBase.
  • Experience in developing Spark Streaming applications. Getting the data using Nifi, writing the stream data into Kafka and analysing the data through Spark (conducted ETL processes and connected to different SQL and NoSQL databases).
  • Experience in tuning Spark/Hadoop jobs for performance and parallel processing (maximize allocation of resources vs dynamic allocation of resources).
  • Experience with event-driven and scheduled AWS Custom Lambda (Python) functions to trigger various AWS resources.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Horton Works HDP, Apache Hadoop,
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in python
  • Experience working with various Python IDEs using IDLE, PyCharm, PyDev, Spyder, PyScripter, PyStudio,
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in managing and reviewing Hadoop log files.
  • Hands on experience in application development using Oracle(SQL,PL/sqL),Pyhon,shell scripting.
  • Knowledge of various NoSQL storage technologies (Key-Value, Column-Family, Document)
  • Experience in managing lifecycle of MongoDB database including database sizing, deployment automation, monitoring and tuning
  • Good experience of integration with Hadoop using mongo dB
  • Experience database backups and test recoverability regularly and overall performance of the mongo dB

TECHNICAL SKILLS:

ETL Technologies: Informatica, SSIS, Data Stage, Data net

BI Technologies: OBIEE, Micro Strategy, Tableau, Cognos, Quick Sight, SSAS, SSRS

Big data: Hadoop, MapReduce, Spark, HBase, Hive, Sqoop, Spark, Presto, Kafka, Airflow, TEZ, STORM, Zookeeper

RDBMS: Redshift, Oracle, SQL Server, Teradata, Netezza, etc

Scripting Languages: Python, Java, Shell

NO SQL: Dynamo DB, Cassandra, Mongo DB, HBase

AWS Cloud: S3, EMR, Redshift, EC2, Glue, Kinesis(streaming), SQS/SNS

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

  • Designing, architecting and testing various application models and integrating them based on different business rules for decision processing.
  • Upgrading / automating database scripts to increase effectiveness & productivity in Hadoop Py-Spark
  • Providing post release data validation and working with Project team, internal/external stakeholder to improve existing database applications in Snowflake
  • Research opportunities for data acquisition and new uses for existing data
  • Working with requirements and various Data Analysts to re-engineer applications and translate it into functional and/or technical specifications.
  • Extracting and Analyzing data from various sources. Data wrangling and cleanup using Python-pandas
  • Wrote Python modules to view and connect the Apache Cassandra instance
  • Help with the migration from the old server to Jira database with Python scripts for transferring and verifying the information
  • Migrated existing SQL Server workloads to Microsoft Azure Virtual Machines for scalability
  • Used visualization tools like, Tableau, Power BI for creating reports and dashboards
  • Create / update various database/scripts to support teams for effective daily activities in Apache Data Bricks
  • Integrate new data management technologies and software engineering tools into existing structures.
  • Coordinate and guide support teams for daily activities in Data Bricks
  • Using Git to update existing versions of the Hadoop Py-Spark script to its new model.
  • Developing/Maintaining model data from various sources, debugging and increasing solution feasibility.
  • Conducting business meetings to gather functional and technical details of a requirement.
  • Exposure to Apache Data Bricks to generate scripts in Py-Spark to automate the reports.
  • Testing and Pushing Marketing campaigns to Production for various credit card customers in Capital One through Jenkins
  • Using those campaigns to update income information and increase credit line for millions of people.
  • Testing Different templates for various campaigns using Amazon S3 and to check production using Seed Emails.
  • Using Software tools like Kibana and Gumbo to monitor and push Campaign to production.

Confidential, Dallas, TX

Data Engineer

Responsibilities:

  • Create and maintain optimal data pipeline architecture.
  • Assemble large, complex data sets that meet functional and non-functional business requirements.
  • Identify, design, and implement internal process improvements automating manual processes. This includes optimizing data delivery, re-designing code and infrastructure for greater scalability.
  • ExecutedDataAnalysis andDataVisualization on surveydatausing Tableau Desktop as well as Compared respondent's demographicsdatawith Univariate Analysis usingPython(Pandas, NumPy, Seaborn, Sklearn, and Matplotlib)
  • Build re-usable data pipelines to ingest, standardize, and shape data from various zones in Hadoop data lake.
  • Migrated on-premise data to Azure cloud Database using Azure Service
  • Build analytic tools that utilize the data pipeline to deliver actionable insights into customer acquisition, Revenue Management, Digital and Marketing areas for operational efficiency and critical metrics.
  • Design and build BI APIs on established enterprise Architecture patterns for data sharing from various sources.
  • Design and integrate data using big data tools - Spark, Scala, Hive etc.
  • Help manage the library of all deployed APIs.
  • Support API documentation of classes, methods scenarios, code, design rationales, and contracts.
  • Design, build and maintain small set of highly flexible and scalable models linked to Hilton’s specific business needs.

Confidential, West Point, PA

Data Engineer

Responsibilities -

  • Expertise assisting applications and development teams by providing design, configuration and operational support in Big Data Eco system. Engaged in multiple major initiatives to migrate and redesign its older systems using cloud technologies.
  • Significant exposure to leading edge cloud and big data technologies on systems scaling to Petabytes of data.
  • Expertise in Performance tuning Big Data clusters: Hadoop, Cassandra and EMR clusters.
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Analyse system failures, identifying root causes, and recommended course of actions. Document the systems processes and procedures for future references.
  • Cluster job performance and capacity planning. Collaborate with cross-functional teams to ensure that applications are properly tested, configured and deployed
  • Administering In-house and Cloud infrastructure. 19 Hadoop clusters in total
  • Major upgrade of DR and PROD clusters from CDH 4.6 to CDH 5.4.2
  • Analyse, Optimizing & Performance tune the EMR clusters. Integrating all the Big Data logs (EC2, EMR) into SPLUNK.
  • Analyse EMR billing info and performance tuned the EMR clusters to improve cost efficiency.
  • Experience architecting, configuring, migrating from Oracle solutions on to Cloud with AWS Redshift and EMR providing split second responses to queries.
  • Integrated Splunk on AWS and opened the visualization into EMR cluster which gives a sneak peek into various metrics.
  • Built a Visualization on AWS which was later presented Confidential AWS Re-invent 2015.
  • Providing Cloud solution for an App team to migrate from Oracle - Performed Presto on EMR, Redshift and Spark Analysis

Confidential, Providence RI

Database Developer

Responsibilities

  • Develop ETLs using PL/SQL in Oracle 10g & 11g to extract, transform and load data from OLTP into Warehouse
  • Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to target Oracle Data Warehouse database.
  • Developed mappings and mapplets in Informatica to load the data from various sources into the Data Warehouse, using different transformations like Joiner, Aggregator, Update Strategy, Rank, Router, Lookup, Sequence Generator, Filter, Sorter, Source Qualifier
  • Using Workflow Manager to create Sessions and scheduled them to run Confidential specified time with required frequency
  • Utilized PL/SQL nested tables for conditional trafficking of data within ETL process.
  • Messaging & filtering of data by developing reusable PL/SQL functions
  • Performance tuning using Oracle Hints and Result caching where appropriate.
  • Utilized PL/SQL bulk collect feature to optimize the ETL performance.
  • Develop BASH shell scripts to set up batch jobs on Unix Solaris 10 server.
  • Document application from a technical maintenance point of view and host a knowledge transfer session.
  • Maintain and Enhance Oracle PL/SQL batch process for patient level data collected in a clinical trial and reporting system
  • Load the data from MS Excel to Oracle Table and Oracle table to MS Excel.
  • Loaded the data using the SQL loader, Imports & UTL files based on file formats.
  • Debugging Production issues using Toad 9.7 debugger
  • Used SQL Trace and TKProf for analysing performance issues

Confidential

DataStage Developer

Responsibilities -

  • Interacted with SMEs, gathered requirements, rated and assigned priorities to each of the requirements by building a requirements traceability matrix.
  • Acted as a liaison between the end users and the technical team
  • Used elicitation techniques such as JAD Sessions, questionnaires, brainstorming sessions, and focus groups.
  • Designed and deployed the Extract, Transform and Load process using DataStage by studying the business requirement from the business users.
  • Automated both Monthly and Weekly refresh (data load) using the UNIX shell scripts.
  • Responsible for analysing source and target data models and make necessary changes.
  • Involved in the requirements gathering, analysis to define business and functional specifications.
  • Responsible for Data Analysis, Functional and Technical Design for the Client.
  • Involved in designing, developing and documenting of the ETL (Extract, Transformation and Load) strategy to populate the Data Warehouse from various source systems (Oracle) feeds using Data Stage, PL/SQL and Unix Shell scripts.
  • Extracted data from various sources like DB2, Oracle and Flat Files and loaded to target tables.
  • Used Datastage Designer to design and develop jobs for extracting, transforming, integrating, and loading data targets.
  • Developed Shared Containers, which can be reused several times.
  • Automated the FTP Extraction Process by job control to use one job for several extractions.
  • Developed Hash files for performance improvement by reducing the reference lookup time.
  • Manage the negotiation and execution of investment trades on multiple exchanges covering foreign exchange, Commodities, stocks and fixed income. Implement corporate trading strategies and execute external client trading services. Trade a variety of investment vehicles including commodity futures, FOREX, equities, options and bonds.
  • Used Job Control routines and Transform functions in the process of developing the job.
  • Job Parameters were extensively used to parameterize the server jobs.
  • Used DataStage Director and the runtime engine to schedule running the server jobs, monitoring and debugging its components.
  • Developed star schemas and created custom star schema as per the business requirements.
  • Applied Performance Tuning logic to optimize session performance.
  • Developed PL/SQL stored procedures for source pre load and target pre load to verify the existence of tables.
  • Designed & developed the complete user manual for the newly built system.
  • Trained several team members.

We'd love your feedback!