We provide IT Staff Augmentation Services!

Senior Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

AtlantA

PROFESSIONAL SUMMARY:

  • Hadoop Developer with 7+ years of experience in Information Technology & 4+ years in Hadoop Ecosystem.
  • Experienced in working with 100 to 200 nodes cluster.
  • Expertise in Hadoop Ecosystem components HDFS, Map Reduce, Hive, Pig, Sqoop, Hbase and Flume for Data Analytics.
  • Have a hands-on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
  • Capable of processing large sets of structured, semi-structured and unstructured data sets.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Expertise in writing Map-Reduce Jobs in Java for processing large sets of structured, semi-structured and unstructured data sets and store them in HDFS.
  • Experience in developing Custom UDFs for datasets in Pig and Hive.
  • Proficient in designing and querying the NoSQL databases like HBase.
  • Knowledge on integrating different eco-systems like HBase - Hive, HBase - Pig
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience on streaming data using Apache Flume.
  • Good Knowledge in Apache Spark and SparkSQL.
  • Experience in running spark streaming applications in cluster mode.
  • Experienced in Spark log debugging.
  • Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
  • Deep Knowledge in the core concepts of MapReduce Framework and Hadoop ecosystem
  • Analyzed large structured datasets using Hive's data warehousing infrastructure
  • Extensive knowledge of creating manage tables and external tables in Hive Eco system.
  • Worked extensively in design and development of business process using SQOOP, PIG, HIVE, HBASE
  • Knowledge on Spark framework for batch and real-time data processing.
  • Knowledge on Scala Programming Language.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Strong in core java, data structure, algorithms design, Object-Oriented Design(OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading.
  • Hands on experience in MVC architecture and Java EE frameworks like Struts2, Spring MVC, and Hibernate. 
  • Good knowledge in Software Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC).
  • P roficient in unit testing the application using Junit, MRUnit and logging the application using Log4J .
  • Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.

TECHNICAL SKILLS:

Hadoop /Big Data Technologies: Hadoop 2.x, HDFS, Map Reduce V1, MapReduce V2, HBase, Pig 0.14.0, Hive 1.2.4, Sqoop, Yarn, Flume 1.4.0, Zookeeper 3.4.6, Spark 2.1.0, Kafka 0.8.0 and Oozie 4.0.1, Hue, Impala, Whirr, Kerberos, RabbitMQ

Shell Scripting/Programming Languages: SQL, Pig Latin, HiveQL / Python, Perl, Java, Scala, Log4j

Web Technologies: HTML, XML, JSON, JavaScript 1.2/1.1, Ajax, CSS, SOAP and WSDL

Databases/NoSQL Databases: SQL Server 9.0, MYSQL 5.0, Oracle10g, PostgreSQL 3,0/ MongoDB 3.2, Cassandra, NoSQL-Hbase

Database Tools: TOAD, Chordiant CRM tool, Billing tool, Oracle Warehouse Builder (OWB).

Operating Systems: Linux, Unix, Windows, Mac, CentOS

Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering, UML methodologies, ETL tools, Tableau

PROESSIONAL EXPERIENCE:

Confidential,Atlanta

Senior Hadoop/Spark Developer

Roles & Responsibilities:

  • Implemented and configured High Availability Hadoop Cluster.
  • Hands-on experience 100 node cluster.
  • Handle the installation and configuration with capacity planning of a Hadoop cluster.
  • Worked in Kerberos and how it interacts with Hadoop and LDAP .
  • Worked in Kerberos, Active Directory/LDAP, Unix based File System.
  • Implemented Kerberos Security Authentication protocol for production cluster.
  • Hands on experience working on Hadoop ecosystem components like Yarn, Hadoop Map Reduce, HDFS, Zoo Keeper, Oozie, Hive, Sqoop, Pig, Flume.
  • Worked in Unix commands and Shell Scripting.
  • Worked on Spark REST APIs like Cluster API and Workspace API.
  • Experienced in working with RDD’s and Dstreams to perform Transformations and Actions on them.
  • Implemented Sentry for role based authentication on hive and Hbase.
  • Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency. Implemented automatic failover zookeeper and zookeeper failover controller
  • Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow
  • Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Monitored services through Zookeeper
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked on analyzing Data with HIVE and PIG.
  • Deployed Network file system for Name Node Metadata backup.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Deployed Spark Applications on Yarn using cluster mode.
  • Worked in setting up the log level across all the executors in Apache Spark.
  • Good Experience working with Amazon AWS for setting up Hadoop cluster
  • Experienced in using Python programming language for job executions on cluster at task level.
  • Implemented Tableau Servers configuration in development and prod environments.
  • Updated and validate Tableau Services with new licenses and patchwork to sync data with Hadoop.
  • Implemented YARN capacity scheduler for long running jobs in the Yarn queue.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Designed the cluster so that only one secondary name node daemon could be run at any given time.

Environment: Hadoop, MapReduce, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Cloudera, Zookeper, Metadata, Flume, Yarn, Python, Tableau, Kerboros. Chef.

Confidential,Jersey City

 Hadoop Developer

Roles & Responsibilities:

  • Installed and Configured Hadoop monitoring and Administrating tools: Nagios and Ganglia
  • Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data. 
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables. 
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume. 
  • Developed algorithms for identifying influencers with in specified social network channels.
  • Developed and updated social media analytics dashboards on regular basis.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Analyzing data with Hive, Pig and Hadoop Streaming. 
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data. 
  • Creating volumes and snapshots though the MapR CONTROL SYSTEM. 
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map. 
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically. 
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka. 
  • Experienced in working with Apache Storm. 
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting. 
  • Involved in fetching brands data from social media applications like Facebook, twitter. 
  • Performed data mining investigations to find new insights related to customers. 
  • Developed sentiment analysis system per particular domain using machine learning concepts by using supervised learning methodology. 
  • Involved in collecting the data and identifying data patterns to build trained model using Machine Learning. 
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance. 
  • Manage and review Hadoop log files. 
  • Involved in identification of topics and trends and building context around that brand. 
  • Developed different formulas for calculating engagement on social media posts. 
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output. 

Environment: Hadoop Quorum Based, MapR, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Cloudera, Zookeper, Nagios, Ganglia, Metadata, Flume, Yarn, Amazon Web Services, EC2, Hortonworks

Confidential, New Jersey

Hadoop/Spark Developer

Roles & Responsibilities:

  • Importing Large Data Sets from DB2 to Hive Table using Sqoop . 
  • Created Hive Managed and External Tables as per the requirements 
  • Designing and developing tables in HBase and storing aggregating data from Hive 
  • Developing Hive Scripts for data aggregating and processing as per the Use Case. 
  • Writing Java Custom UDF's for processing data in Hive. 
  • Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive. 
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Optimized Hive queries for performance tuning. 
  • Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.  

Environment: Hadoop v2.6.0, HDFS, CDH 5.3.x, Map Reduce, HBase, Sqoop, Core Java, Hive, Oozie DB, Spark Streaming and Apache Kafka

Confidential,Overland Park, KS

Hadoop Developer

Roles & Responsibilities:

  • Worked with technology and business groups for Hadoop migration strategy.
  • Experienced in migrating the huge volume of data from EDW to IDW Environment.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration. 
  • Experienced in Migrating data of file sources and Mount sources from RDMS system to Hadoop using by using Sqoop .
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations. 
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used some of the Pig and written pig scripts to transform the data in structured format.
  • Implemented Pig scripts and used Skewed, replicated and merge Joins for performance improvements.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and mysql. 
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig. 
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS. 
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Worked in transforming data from HBase to Hive as bulk operations.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations
  • Used spark for real-time batch processing.
  • Followed Agile methodology for the entire project.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.

Environment: Hadoop, Mapreduce, Hive, Pig, Hbase, Cassandra, Flume, Spark, Storm, Rabbit MQ, Active MQ, Sqoop, Accurev, Zookeeper, Oozie, Autosys, shell scripting.

Confidential

Java Developer

Roles & Responsibilities:

  • As a Java developer, maintain J2EE applications implemented with Struts 2 MVC framework, Spring, Freemarker and Hibernate.
  • Salesforce integration: develop web services to communicate bi-directionally between Salesforce and J2EE applications using SOAP/REST.
  • Maintain legacy web applications implemented with ColdFusion script, HTML5 and AJAX. Analyze user requirements and develop new features.
  • Create complex SQL scripts to pull data from SQL server database, build relations among business entities and execute migration with Apex Data Loader.
  • Work with Agile software development process using sprint planning and a daily scrum to manage tasks. As a senior engineer, coordinate code review, create and maintain GIT/SVN branches during the software release cycle.

Environment: HTMLS, AJAX, SOAP/REST, MVC Framework, Spring, Hibernate

Confidential

Java Developer

Roles & Responsibilities:

  • Involved in designing and development using UML with Rational Rose
  • Played a significant role in performance tuning and optimizing the memory consumption of the application.
  • Developed various enhancements and features using Java 5.0
  • Developed advanced server-side classes using Networks, IO and Multi-Threading.
  • Lead the issue management team and achieved significant stability to the product by bringing down the bug count to single digits.
  • Designed and developed various complex and advanced user interface using Swing.
  • Used SAX/DOM XML Parser for parsing the XML file

Environment: HTMLS, Java 5.0, JFC Swing, Multi-Threading, IO, Networks, XML, JBuilder, UML, CVS, WinCVS, Ant & JUnit, Win XP, Unix.

We'd love your feedback!