We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

3.00/5 (Submit Your Rating)

New, YorK

SUMMARY

  • Almost 7 years of IT experience in Software Development with 5+ years’ work experience as Big Data engineer/Hadoop Developer with good knowledge of Hadoop framework.
  • Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
  • Competent in all aspects of development from initial implementation and requirement discovery,through release, enhancement, and support (SDLC & Agile techniques).
  • 3+ years of experience in Design, Development, Data Migration, Testing, Support and Maintenance using Redshift Databases.
  • Having 5 years of experience and good exposure in Hadoop ecosystem and different frameworks inside it - HDFS, YARN, MapReduce, Apache Pig, Python, Hive, Flume, Sqoop, ZooKeeper, Oozie,Hbase, Impala,Spark, Scala and Kafka
  • 2+ years of experience in AWS cloud solution development using Lambda, SQS, SNS, DynamoDB, Athena, S3, EMR, EC2, Redshift, Glue, and CloudFormation.
  • Experienced in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory,Functions, Databricks and HDInsight.
  • Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amounts of data 300 TB.
  • Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS and Azure.
  • 4+ years of experience in writing python as ETL framework and Pyspark to process huge amounts of data daily.
  • Strong experience in implementing data models and loading unstructured data using HBase, DynamoDb and Cassandra.
  • Proficient in creating multiple report dashboards, visualizations and heat maps using tableau reporting tools.
  • Extensively knowledgeable in extracting and loading data using complex business logic using Hive from different data sources and built the ETL pipelines to process terabytes of data daily.
  • Proficient in transporting and processing real time event streaming using Kafka and Real-time processing Framework (Apache Spark)
  • Hands-on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
  • Skilled in processing real time data using Kafka 0.10.1 producers and stream processors and implemented stream processes using Kinesis and data landed into data lake S3.
  • Designed and developed spark pipelines to ingest real time event-based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive.
  • Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
  • Excellent working experience in Scrum / Agile framework, Iterative and Waterfall project execution methodologies.
  • Capable of organizing, coordinating, and managing multiple tasks simultaneously.
  • Excellent communication and interpersonal skills, self-motivated, organized and detail-oriented, able to work well under deadlines in a changing environment and perform multiple tasks effectively and concurrently.
  • Strong analytical skills with ability to quickly understand a client's business needs and involved in meetings to gather information and requirements from the clients.

TECHNICAL SKILLS

Hadoop: Hadoop, Map Reduce, HIVE, PIG, Impala SQOOP, HDFS, HBASE, Oozie, Spark, Scala, and MongoDB

Cloud Technologies: AWS Kinesis, Lambda, EMR, EC2, SNS, SQS, DynamoDB, Step Functions, Glue, Athena, CloudWatch, Azure Data Factory, Azure Data Lake, Functions, Azure SQL Data Warehouse, Databricks and HDInsight

DBMS: Amazon Redshift, Postgres, Oracle 9i, SQL Server, IBM DB2 And TeraData

ETL Tools: Informatica and Pentaho

Reporting Tools: Power BI, Tableau,, Python dashboard

Deployment Tools: Git, Jenkins and CloudFormation

Programming Language: Python, PL/SQL, and Java

Scripting: Unix Shell and Bash scripting

PROFESSIONAL EXPERIENCE

Confidential, New York

Senior Data Engineer

Responsibilities:

  • Written ETL jobs in using spark data pipelines to process data from different sources to transform data to multiple targets.
  • Created streams using Spark and processed real time data into RDDs & data frames and created analytics using SPARK SQL.
  • Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
  • Implemented kinesis data streams to read real time data and loaded into data s3 for downstream processing.
  • AWS Infrastructure setup on EC2 and S3 API implementation for accessing S3 bucket data file.
  • Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
  • Written ETL flows and MapReduce to process data from AWS S3 to dynamo DB and HBase.
  • Involved in the ETL phase of the project & Designed and analyzed the data in oracle and migrated to Redshift and Hive.
  • Created Databases and tables using Redshift and dynamo DB and wrote complex EMR scripts to process Terabytes of data into AWS S3 cluster.
  • Performed real time analytics on transactional data using python to create statistical models for predictive and reverse product analysis.
  • Involving in client meetings and explaining the views to supporting and gathering requirements.
  • Working in an agile methodology, understand the requirements of the user stories.
  • Prepared High-level design documentation for approval
  • Also, data visualization software tableau, quick sight and Kibana are used as part of bringing new insights from data extracted and better representation of data.
  • Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs

Confidential, Atlanta, Georgia

Big Data Engineer/Hadoop Developer

Responsibilities:

  • Full life cycle of the project from Design, Analysis, logical and physical architecture test new requirements in the data pipeline using PERL, BASH, PIG and OOZIE in the Hadoop ecosystem
  • Provide full operational support - analyze code to identify root causes of production issues and provide solutions or workarounds and lead it to resolution
  • Participate in full development life cycle including requirements analysis, design, development, deployment and operations support
  • Work with engineering team members to explore and create interesting solutions while sharing knowledge within the team
  • Work across product teams to help solve customer-facing issues
  • Demonstrable experience designing modeling,development, Implementation, testing.
  • Conferring with data scientists and other developers to obtain information on limitations or capabilities for data processing projects.
  • Support, maintain, and document Hadoop and MySQL data warehouse
  • Iterate and improve existing features in the pipeline as well as add new ones
  • Design, develop, document, and technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions
  • Designed and developed automation test scripts using Python.
  • Creating Data Pipelines using Azure Data Factory.
  • Automating the jobs using Python.
  • Creating tables and loading data in Azure MySql Database
  • Creating Azure Functions, Logic Apps for Automating the Data pipelines using Blob triggers.
  • Analyze SQL scripts and design the solution to implement using Pyspark
  • Developed Spark code using Python (Pyspark) for faster processing and testing of data.
  • Used SparkAPI to perform analytics on data in Hive.
  • Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing, or other advanced techniques.
  • Data Cleansing, Integration and Transformation using PIG.
  • Involved in exporting and importing data from local file systems and RDBMS to HDFS.
  • Designing and coding the pattern for inserting data into Data lake.
  • Moving the data from On-Prem HDP clusters to Azure
  • Building, installing, upgrading, or migrating petabyte size big data systems.
  • Fixing Data related issues
  • Loading data to DB2 database using Data Stage.
  • Monitoring the functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they always operate at their peak performance.
  • Created Hive tables and loaded and analyzed data using hive queries.
  • Communicatingregularly with the business teams to ensure that any gaps between business requirementsand technicalrequirements are resolved.
  • Reading and translating data models, data querying and identifying data anomalies and providing root cause analysis.
  • Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
  • Engage in project planning and delivering to commitments.
  • POC’s on new technologies (Snowflake) that are available in the market to determine the best suitable one for the Organization needs.

Confidential, Washington DC

Hadoop Analyst

Responsibilities:

  • Participated in SDLC Requirements gathering, Analysis, Design, Development and Testing of applications developed using AGILE methodology.
  • Developing Managed, external and partition tables as per the requirement.
  • Ingested structured data into appropriate schemas and tables to support the rule and analytics.
  • Developing custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirements.
  • Developing Pig Scripts, Pig UDF in Hive Scripts, Hive UDF to load data files.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Implemented scripts for loading data from the UNIX file system to HDFS.
  • Load and transform large sets of structured, semi structured, and unstructured data.
  • Analyzed large amounts of data sets to determine optimal ways to aggregate and report on it.
  • Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
  • Developed the presentation layer using HTML, CSS, JSPs, BootStrap, and AngularJS.
  • Adopted J2EE design patterns like DTO, DAO, Command and Singleton.
  • Implemented Object-relation mapping in the persistence layer using hibernate framework in conjunction with spring functionality.
  • Generated POJO classes to map to the database table.
  • Configured Hibernates second level cache using EHCache to reduce the number of hits to the configuration table data.
  • ORM tool Hibernate to represent entities and fetching strategies for optimization.
  • Implementing the transaction management in the application by applying Spring Transaction and Spring AOP methodologies.
  • Written SQL queries and stored procedures for the application to communicate with Database.
  • Used Junit framework for unit testing of applications.
  • Used Maven to build and deploy the application.

We'd love your feedback!