Senior Data Engineer Resume New York - Hire IT People

SUMMARY

Almost 7 years of IT experience in Software Development with 5+ years’ work experience as Big Data engineer/Hadoop Developer with good knowledge of Hadoop framework.
Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
Competent in all aspects of development from initial implementation and requirement discovery,through release, enhancement, and support (SDLC & Agile techniques).
3+ years of experience in Design, Development, Data Migration, Testing, Support and Maintenance using Redshift Databases.
Having 5 years of experience and good exposure in Hadoop ecosystem and different frameworks inside it - HDFS, YARN, MapReduce, Apache Pig, Python, Hive, Flume, Sqoop, ZooKeeper, Oozie,Hbase, Impala,Spark, Scala and Kafka
2+ years of experience in AWS cloud solution development using Lambda, SQS, SNS, DynamoDB, Athena, S3, EMR, EC2, Redshift, Glue, and CloudFormation.
Experienced in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory,Functions, Databricks and HDInsight.
Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amounts of data 300 TB.
Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS and Azure.
4+ years of experience in writing python as ETL framework and Pyspark to process huge amounts of data daily.
Strong experience in implementing data models and loading unstructured data using HBase, DynamoDb and Cassandra.
Proficient in creating multiple report dashboards, visualizations and heat maps using tableau reporting tools.
Extensively knowledgeable in extracting and loading data using complex business logic using Hive from different data sources and built the ETL pipelines to process terabytes of data daily.
Proficient in transporting and processing real time event streaming using Kafka and Real-time processing Framework (Apache Spark)
Hands-on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
Skilled in processing real time data using Kafka 0.10.1 producers and stream processors and implemented stream processes using Kinesis and data landed into data lake S3.
Designed and developed spark pipelines to ingest real time event-based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive.
Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
Excellent working experience in Scrum / Agile framework, Iterative and Waterfall project execution methodologies.
Capable of organizing, coordinating, and managing multiple tasks simultaneously.
Excellent communication and interpersonal skills, self-motivated, organized and detail-oriented, able to work well under deadlines in a changing environment and perform multiple tasks effectively and concurrently.
Strong analytical skills with ability to quickly understand a client's business needs and involved in meetings to gather information and requirements from the clients.

TECHNICAL SKILLS

Hadoop: Hadoop, Map Reduce, HIVE, PIG, Impala SQOOP, HDFS, HBASE, Oozie, Spark, Scala, and MongoDB

Cloud Technologies: AWS Kinesis, Lambda, EMR, EC2, SNS, SQS, DynamoDB, Step Functions, Glue, Athena, CloudWatch, Azure Data Factory, Azure Data Lake, Functions, Azure SQL Data Warehouse, Databricks and HDInsight

DBMS: Amazon Redshift, Postgres, Oracle 9i, SQL Server, IBM DB2 And TeraData

ETL Tools: Informatica and Pentaho

Reporting Tools: Power BI, Tableau,, Python dashboard

Deployment Tools: Git, Jenkins and CloudFormation

Programming Language: Python, PL/SQL, and Java

Scripting: Unix Shell and Bash scripting

PROFESSIONAL EXPERIENCE

Confidential, New York

Senior Data Engineer

Responsibilities:

Written ETL jobs in using spark data pipelines to process data from different sources to transform data to multiple targets.
Created streams using Spark and processed real time data into RDDs & data frames and created analytics using SPARK SQL.
Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
Implemented kinesis data streams to read real time data and loaded into data s3 for downstream processing.
AWS Infrastructure setup on EC2 and S3 API implementation for accessing S3 bucket data file.
Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
Written ETL flows and MapReduce to process data from AWS S3 to dynamo DB and HBase.
Involved in the ETL phase of the project & Designed and analyzed the data in oracle and migrated to Redshift and Hive.
Created Databases and tables using Redshift and dynamo DB and wrote complex EMR scripts to process Terabytes of data into AWS S3 cluster.
Performed real time analytics on transactional data using python to create statistical models for predictive and reverse product analysis.
Involving in client meetings and explaining the views to supporting and gathering requirements.
Working in an agile methodology, understand the requirements of the user stories.
Prepared High-level design documentation for approval
Also, data visualization software tableau, quick sight and Kibana are used as part of bringing new insights from data extracted and better representation of data.
Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs

Confidential, Atlanta, Georgia

Big Data Engineer/Hadoop Developer

Responsibilities:

Full life cycle of the project from Design, Analysis, logical and physical architecture test new requirements in the data pipeline using PERL, BASH, PIG and OOZIE in the Hadoop ecosystem
Provide full operational support - analyze code to identify root causes of production issues and provide solutions or workarounds and lead it to resolution
Participate in full development life cycle including requirements analysis, design, development, deployment and operations support
Work with engineering team members to explore and create interesting solutions while sharing knowledge within the team
Work across product teams to help solve customer-facing issues
Demonstrable experience designing modeling,development, Implementation, testing.
Conferring with data scientists and other developers to obtain information on limitations or capabilities for data processing projects.
Support, maintain, and document Hadoop and MySQL data warehouse
Iterate and improve existing features in the pipeline as well as add new ones
Design, develop, document, and technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions
Designed and developed automation test scripts using Python.
Creating Data Pipelines using Azure Data Factory.
Automating the jobs using Python.
Creating tables and loading data in Azure MySql Database
Creating Azure Functions, Logic Apps for Automating the Data pipelines using Blob triggers.
Analyze SQL scripts and design the solution to implement using Pyspark
Developed Spark code using Python (Pyspark) for faster processing and testing of data.
Used SparkAPI to perform analytics on data in Hive.
Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing, or other advanced techniques.
Data Cleansing, Integration and Transformation using PIG.
Involved in exporting and importing data from local file systems and RDBMS to HDFS.
Designing and coding the pattern for inserting data into Data lake.
Moving the data from On-Prem HDP clusters to Azure
Building, installing, upgrading, or migrating petabyte size big data systems.
Fixing Data related issues
Loading data to DB2 database using Data Stage.
Monitoring the functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they always operate at their peak performance.
Created Hive tables and loaded and analyzed data using hive queries.
Communicatingregularly with the business teams to ensure that any gaps between business requirementsand technicalrequirements are resolved.
Reading and translating data models, data querying and identifying data anomalies and providing root cause analysis.
Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
Engage in project planning and delivering to commitments.
POC’s on new technologies (Snowflake) that are available in the market to determine the best suitable one for the Organization needs.

Confidential, Washington DC

Hadoop Analyst

Responsibilities:

Participated in SDLC Requirements gathering, Analysis, Design, Development and Testing of applications developed using AGILE methodology.
Developing Managed, external and partition tables as per the requirement.
Ingested structured data into appropriate schemas and tables to support the rule and analytics.
Developing custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirements.
Developing Pig Scripts, Pig UDF in Hive Scripts, Hive UDF to load data files.
Responsible for building scalable distributed data solutions using Hadoop.
Involved in loading data from edge node to HDFS using shell scripting.
Implemented scripts for loading data from the UNIX file system to HDFS.
Load and transform large sets of structured, semi structured, and unstructured data.
Analyzed large amounts of data sets to determine optimal ways to aggregate and report on it.
Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
Developed the presentation layer using HTML, CSS, JSPs, BootStrap, and AngularJS.
Adopted J2EE design patterns like DTO, DAO, Command and Singleton.
Implemented Object-relation mapping in the persistence layer using hibernate framework in conjunction with spring functionality.
Generated POJO classes to map to the database table.
Configured Hibernates second level cache using EHCache to reduce the number of hits to the configuration table data.
ORM tool Hibernate to represent entities and fetching strategies for optimization.
Implementing the transaction management in the application by applying Spring Transaction and Spring AOP methodologies.
Written SQL queries and stored procedures for the application to communicate with Database.
Used Junit framework for unit testing of applications.
Used Maven to build and deploy the application.

We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship