We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Mclean, VA

SUMMARY:

  • Experience in all phases of software development life cycle & Agile Methodology.
  • Expertise in implementing, consulting, managing hadoop clusters and eco system components like HDFS, MapReduce, Pig, Hive, Flume, Oozie & Zookeeper.
  • Around 5 years of experience in building large scale distributed data processing and in - depth knowledge of hadoop architecture MR1 & MR2 (YARN) and 7+ years of experience in Core Java.
  • Expertise in Spark with Scala, Nifi
  • Expertise in batch processing using hadoop MapReduce, pig & hive
  • Good knowledge in real time processing using Spark streaming (Scala) with Kafka .
  • Hands on experience in writing Pig/Hive Scripts and custom UDF's.
  • Experience in partitioning, bucketing and joins in Hive.
  • Experience in query optimization and performance tuning with Hive.
  • Hands on experience in importing and exporting data to/from RDBMS and HDFS/HBase/Hive thru Sqoop full refresh and incremental.
  • Hands on experience in loading the log data from multiple sources into HDFS thru Flume Agent.
  • Experience in configuring and implementing Flume components such as Source, Channel and Sink.
  • Experience in HBase, NoSQL Database
  • Experience working with various hadoop distributions like OpenSource Apache, Cloudera, and HortonWorks & MapR .
  • Programming experience in UNIX Shell Script.
  • Experience with Agile daily stand-up meetings, writing user Stories, evaluating story points, creating tasks, ETA tasks, task progress with daily burn-down chart, completing the backlogs.

TECHNICAL SKILLS:

Programming Languages: Java, Scala

Big Data Technologies: HDFS, MapReduce, YARN, Hive, Hue, Pig, Sqoop, Flume, Oozie, Zookeeper, NoSQL, HBase, Nifi

RDBMS: MySQL, Oracle, SQLServer, DB2

Data Ingestion Tools: Flume, Sqoop, Kafka Real-time Streaming and Processing Storm, Spark Streaming

Operating Systems: Windows 9x/2000/XP/7/8/10, Linux, UNIX, Mac

Development Tools: Eclipse

Build and Log Tools: Maven

Version Control: SVN, Git

PROFESSIONAL EXPERIENCE:

Confidential,Mclean,Va

Big Data Engineer

Responsibilities:
  • Involved in technical discussion and responsible for architecture design of the sources
  • Created ticket in the JIRA for the tasks, created branches in the Git
  • Worked on NIFI to create templates for the process and process groups
  • Created components to pull the files from the INFA
  • Building the process groups and processes in the Nifi to pull the files from the various servers and placing the files in the HDFS and components to convert it into JSON and evaluate and store the file information in the file tracker and in kafka topics
  • Written Java Components to create dynamic folders in the HDFS for different sources
  • Configuration of source file type information, pattern, header information, split type in xml files
  • Created data frames to ingest the hdfs files into hive internal / external tables with partitions
  • Unit tested the code and update in the JIRA tickets, commit and push the code to the remote branch, raised pull request for code merge
  • Updated the solution architecture document in the confluence for the sources
  • Created SOP document for the production support activities
  • Performance tuning of spark applications by configuring the driver memory, executor memory, increasing the cores and queues for spark jobs with limitation
  • Worked with architects for the Migration of spark 1.6 to 2.1

Environment: Hadoop, HDFS, Hive, Spark 2.0, Scala, Nifi,, HBase, Kafka, Knox, Atlas, Ranger, Kerberos, Atlassian Confluence, Confidential Bamboo, JIRA, BitBucket/Stash, HortonWorks Distribution in AWS

Confidential,Phoenix, AZ

Big Data Lead

Responsibilities:
  • Involved in technical discussion and responsible for architecture design
  • Mentoring the team and provide technical solutions
  • Performance tuning of Spark Applications, analysing various dependencies, storage levels, resource tuning and memory management
  • Created and processed RDD’s and DataFrames using SparkSQL
  • Design and develop Shell Scripts, Pig Scripts, Hive Scripts and MapReduce jobs
  • Hive queries and partitions to store the data in internal tables
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Unix Shell /Pig script to pre-process the data stored in the Cornerstone Platform

Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, MapReduce, MapR Distribution

Confidential,Eden Prairie,MN

Senior Big Data Lead Consultant

Responsibilities:
  • Designing and developing Logical Data Models for the Legacy & Cornerstone Databases
  • Creation of Sqoop scripts for tables using Linux Scripts
  • Creation, Deletion & Execution of Sqoop Jobs in sqoop metastore
  • HBase Table's Hbase row key design and mapping with RDBMS table column names
  • Mapping of HBase Table columns with Hive External table columns
  • Historical and Incremental Importing of RDBMS data to HBase table using metastore
  • Validation of Sqoop scripts, Hive Scripts, Hbase Scripts
  • Creation of Hbase Tables and column families, altering the column families, providing permission to Hbase tables, defining region server space
  • Automation of workflow thru Oozie
  • Written transformation and actions on Scala to process complex data
  • Bug fixing and production support running processes.
  • Participated in SCRUM Daily stand-up, sprint planning, and Backlog grooming & Retrospective meetings.

Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Oozie, HBase, Spark, Scala, Zookeeper, MapR Distribution

Confidential, Atlanta, GA

Senior Big data Lead Consultant

Responsibilities:
  • Designing technical architecture and developed various Big Data workflows using MapReduce, Hive, YARN, Kafka, Spark, Scala
  • Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF’s in Hive querying.
  • Used FLUME to dump the application server logs into HDFS.
  • The logs that are stored on HDFS are analysed and the cleaned data is imported into Hive warehouse which enabled end business analysts to write Hive queries.
  • Experience in working with search engine Elastic Search in getting real time data analytics integrating with Kibana dashboard.
  • Process Kafka message using spark streaming
  • Applied transformation on RDD’s and Data Frames for filtering, mapping, joining and aggregation
  • Experience in data migration from RDBMS & processed events from Spark Streaming to Cassandra
  • Stores the streaming events in Parquet Format
  • Participated in SCRUM Daily stand-up, sprint planning, and Backlog grooming & Retrospective meetings.

Environment: MapReduce, Pig, Hive, FLUME, JDK 1.6, Linux, Kafka, Spark Streaming, Scala, Elastic-Search, YARN, Hue, HDFS, Git, Kibana, Linux Scripting

Confidential,Bloomington, IL

Senior Big data Consultant

Responsibilities:
  • Written M/R jobs to process trip summary & scheduled to execute hourly, daily, weekly, monthly & quarterly.
  • Responsible for loading machine data into Hadoop cluster coming from different sources using Flume
  • Used Flume to collect, aggregate, and store the log data from different web servers.
  • Ingested data into HBase and retrieve using Java API's
  • Used SPARK SQL from extracting data from different data sources and placing the processed data into NoSQL
  • Used SPARK for analysing the machine emitted & sensor data to help extracting data sets for meaningful information such as location, driving speed, acceleration, braking speed, driving pattern and so on.
  • Used Git as version control to checkout and check-in of files.
  • Reviewed high level design & code & mentoring team members.
  • Participated in SCRUM Daily stand-up, sprint planning, and Backlog grooming & Retrospective meetings.

Environment: Hadoop, MapReduce, OpenStack, Flume-NG, HBase 0.98.2, Spark-SQL,, Scala, Kafka, Map/Reduce, HDFS, Zookeeper

Confidential

Big Data Engineer

Responsibilities:
  • Analysed the functional specification
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and semi-structured data.
  • Load data to External tables by using Hive Scripts
  • Performed aggregate Joins, transformation using Hive queries
  • Implemented Partitions, Dynamic Partitions, Buckets in Hive
  • Optimized HIVE SQL queries and thus improved the job performance
  • Developed Sqoop scripts to import and export the data from relational sources and handled incremental loading on the customer and transaction data by date
  • Performed Hadoop cluster environment administration that includes adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and trouble shooting
  • Written Unit Test Cases for Hive Scripts

Environment: Java, Hadoop, HDFS, MapReduce, Pig, Hive, Flume, Zookeeper, CHEF

Confidential

Senior Software Engineer

Responsibilities:
  • Understanding the functional requirements of the client for designing the technical specifications, to develop the system and subsequently documenting the requirement
  • Responsible for developing class diagrams, sequence diagrams
  • Designed and implemented a separate middle ware Java component on Fusion
  • Reviewed high level design & code & mentoring team members.
  • Participated in SCRUM Daily stand-up, sprint planning, and Backlog grooming & Retrospective meetings.

Environment: Java1.6, Oracle Fusion Middleware, Eclipse, WebSphere, Spring F/w

Confidential

Senior Software Engineer

Responsibilities:
  • Understanding the functional requirements of the client for designing the technical specifications, to develop the system and subsequently documenting the requirement.
  • Prepared LLD - Class Diagrams, Sequence Diagrams, Activity Diagram using Enterprise Architect UML Tool
  • Worked on Hibernate, Spring IOC, DAO, JSON Parsing
  • Prepared Unit test cases for the developed UI.
  • Responsible for problem tracking, diagnosis, replications, troubleshooting, and resolution of client problems.

Environment: Java, ACG Proprietary F/w using DOJO, Hibernate, Spring, DB2, RSA, Rational ClearCase, RPM, RQM, Mantis

Hire Now