Big Data Engineer Resume Mclean,Va - Hire IT People

SUMMARY:

Experience in all phases of software development life cycle & Agile Methodology.
Expertise in implementing, consulting, managing hadoop clusters and eco system components like HDFS, MapReduce, Pig, Hive, Flume, Oozie & Zookeeper.
Around 5 years of experience in building large scale distributed data processing and in - depth knowledge of hadoop architecture MR1 & MR2 (YARN) and 7+ years of experience in Core Java.
Expertise in Spark with Scala, Nifi
Expertise in batch processing using hadoop MapReduce, pig & hive
Good knowledge in real time processing using Spark streaming (Scala) with Kafka .
Hands on experience in writing Pig/Hive Scripts and custom UDF's.
Experience in partitioning, bucketing and joins in Hive.
Experience in query optimization and performance tuning with Hive.
Hands on experience in importing and exporting data to/from RDBMS and HDFS/HBase/Hive thru Sqoop full refresh and incremental.
Hands on experience in loading the log data from multiple sources into HDFS thru Flume Agent.
Experience in configuring and implementing Flume components such as Source, Channel and Sink.
Experience in HBase, NoSQL Database
Experience working with various hadoop distributions like OpenSource Apache, Cloudera, and HortonWorks & MapR .
Programming experience in UNIX Shell Script.
Experience with Agile daily stand-up meetings, writing user Stories, evaluating story points, creating tasks, ETA tasks, task progress with daily burn-down chart, completing the backlogs.

TECHNICAL SKILLS:

Programming Languages: Java, Scala

Big Data Technologies: HDFS, MapReduce, YARN, Hive, Hue, Pig, Sqoop, Flume, Oozie, Zookeeper, NoSQL, HBase, Nifi

RDBMS: MySQL, Oracle, SQLServer, DB2

Data Ingestion Tools: Flume, Sqoop, Kafka Real-time Streaming and Processing Storm, Spark Streaming

Operating Systems: Windows 9x/2000/XP/7/8/10, Linux, UNIX, Mac

Development Tools: Eclipse

Build and Log Tools: Maven

Version Control: SVN, Git

PROFESSIONAL EXPERIENCE:

Confidential,Mclean,Va

Big Data Engineer

Responsibilities:

Involved in technical discussion and responsible for architecture design of the sources
Created ticket in the JIRA for the tasks, created branches in the Git
Worked on NIFI to create templates for the process and process groups
Created components to pull the files from the INFA
Building the process groups and processes in the Nifi to pull the files from the various servers and placing the files in the HDFS and components to convert it into JSON and evaluate and store the file information in the file tracker and in kafka topics
Written Java Components to create dynamic folders in the HDFS for different sources
Configuration of source file type information, pattern, header information, split type in xml files
Created data frames to ingest the hdfs files into hive internal / external tables with partitions
Unit tested the code and update in the JIRA tickets, commit and push the code to the remote branch, raised pull request for code merge
Updated the solution architecture document in the confluence for the sources
Created SOP document for the production support activities
Performance tuning of spark applications by configuring the driver memory, executor memory, increasing the cores and queues for spark jobs with limitation
Worked with architects for the Migration of spark 1.6 to 2.1

Environment: Hadoop, HDFS, Hive, Spark 2.0, Scala, Nifi,, HBase, Kafka, Knox, Atlas, Ranger, Kerberos, Atlassian Confluence, Confidential Bamboo, JIRA, BitBucket/Stash, HortonWorks Distribution in AWS

Confidential,Phoenix, AZ

Big Data Lead

Responsibilities:

Involved in technical discussion and responsible for architecture design
Mentoring the team and provide technical solutions
Performance tuning of Spark Applications, analysing various dependencies, storage levels, resource tuning and memory management
Created and processed RDD’s and DataFrames using SparkSQL
Design and develop Shell Scripts, Pig Scripts, Hive Scripts and MapReduce jobs
Hive queries and partitions to store the data in internal tables
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Unix Shell /Pig script to pre-process the data stored in the Cornerstone Platform

Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, MapReduce, MapR Distribution

Confidential,Eden Prairie,MN

Senior Big Data Lead Consultant

Responsibilities:

Designing and developing Logical Data Models for the Legacy & Cornerstone Databases
Creation of Sqoop scripts for tables using Linux Scripts
Creation, Deletion & Execution of Sqoop Jobs in sqoop metastore
HBase Table's Hbase row key design and mapping with RDBMS table column names
Mapping of HBase Table columns with Hive External table columns
Historical and Incremental Importing of RDBMS data to HBase table using metastore
Validation of Sqoop scripts, Hive Scripts, Hbase Scripts
Creation of Hbase Tables and column families, altering the column families, providing permission to Hbase tables, defining region server space
Automation of workflow thru Oozie
Written transformation and actions on Scala to process complex data
Bug fixing and production support running processes.
Participated in SCRUM Daily stand-up, sprint planning, and Backlog grooming & Retrospective meetings.

Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Oozie, HBase, Spark, Scala, Zookeeper, MapR Distribution

Confidential, Atlanta, GA

Senior Big data Lead Consultant

Responsibilities:

Designing technical architecture and developed various Big Data workflows using MapReduce, Hive, YARN, Kafka, Spark, Scala
Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF’s in Hive querying.
Used FLUME to dump the application server logs into HDFS.
The logs that are stored on HDFS are analysed and the cleaned data is imported into Hive warehouse which enabled end business analysts to write Hive queries.
Experience in working with search engine Elastic Search in getting real time data analytics integrating with Kibana dashboard.
Process Kafka message using spark streaming
Applied transformation on RDD’s and Data Frames for filtering, mapping, joining and aggregation
Experience in data migration from RDBMS & processed events from Spark Streaming to Cassandra
Stores the streaming events in Parquet Format
Participated in SCRUM Daily stand-up, sprint planning, and Backlog grooming & Retrospective meetings.

Environment: MapReduce, Pig, Hive, FLUME, JDK 1.6, Linux, Kafka, Spark Streaming, Scala, Elastic-Search, YARN, Hue, HDFS, Git, Kibana, Linux Scripting

Confidential,Bloomington, IL

Senior Big data Consultant

Responsibilities:

Written M/R jobs to process trip summary & scheduled to execute hourly, daily, weekly, monthly & quarterly.
Responsible for loading machine data into Hadoop cluster coming from different sources using Flume
Used Flume to collect, aggregate, and store the log data from different web servers.
Ingested data into HBase and retrieve using Java API's
Used SPARK SQL from extracting data from different data sources and placing the processed data into NoSQL
Used SPARK for analysing the machine emitted & sensor data to help extracting data sets for meaningful information such as location, driving speed, acceleration, braking speed, driving pattern and so on.
Used Git as version control to checkout and check-in of files.
Reviewed high level design & code & mentoring team members.
Participated in SCRUM Daily stand-up, sprint planning, and Backlog grooming & Retrospective meetings.

Environment: Hadoop, MapReduce, OpenStack, Flume-NG, HBase 0.98.2, Spark-SQL,, Scala, Kafka, Map/Reduce, HDFS, Zookeeper

Confidential

Big Data Engineer

Responsibilities:

Analysed the functional specification
Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and semi-structured data.
Load data to External tables by using Hive Scripts
Performed aggregate Joins, transformation using Hive queries
Implemented Partitions, Dynamic Partitions, Buckets in Hive
Optimized HIVE SQL queries and thus improved the job performance
Developed Sqoop scripts to import and export the data from relational sources and handled incremental loading on the customer and transaction data by date
Performed Hadoop cluster environment administration that includes adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and trouble shooting
Written Unit Test Cases for Hive Scripts

Environment: Java, Hadoop, HDFS, MapReduce, Pig, Hive, Flume, Zookeeper, CHEF

Confidential

Senior Software Engineer

Responsibilities:

Understanding the functional requirements of the client for designing the technical specifications, to develop the system and subsequently documenting the requirement
Responsible for developing class diagrams, sequence diagrams
Designed and implemented a separate middle ware Java component on Fusion
Reviewed high level design & code & mentoring team members.
Participated in SCRUM Daily stand-up, sprint planning, and Backlog grooming & Retrospective meetings.

Environment: Java1.6, Oracle Fusion Middleware, Eclipse, WebSphere, Spring F/w

Confidential

Senior Software Engineer

Responsibilities:

Understanding the functional requirements of the client for designing the technical specifications, to develop the system and subsequently documenting the requirement.
Prepared LLD - Class Diagrams, Sequence Diagrams, Activity Diagram using Enterprise Architect UML Tool
Worked on Hibernate, Spring IOC, DAO, JSON Parsing
Prepared Unit test cases for the developed UI.
Responsible for problem tracking, diagnosis, replications, troubleshooting, and resolution of client problems.

Environment: Java, ACG Proprietary F/w using DOJO, Hibernate, Spring, DB2, RSA, Rational ClearCase, RPM, RQM, Mantis

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Mclean, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship