We provide IT Staff Augmentation Services!

Senior Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

AtlantA

SUMMARY:

  • Experience in Information Technology & 4+ years in Hadoop Ecosystem.
  • Experienced in working with 100 to 200 nodes cluster.
  • Expertise in Hadoop Ecosystem components HDFS, Map Reduce, Hive, Pig, Sqoop, Hbase and Flume for Data Analytics.
  • Have a hands - on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
  • Capable of processing large sets of structured, semi-structured and unstructured data sets.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Expertise in writing Map-Reduce Jobs in Java for processing large sets of structured, semi-structured and unstructured data sets and store them in HDFS.
  • Experience in developing Custom UDFs for datasets in Pig and Hive.
  • Proficient in designing and querying the NoSQL databases like HBase.
  • Knowledge on integrating different eco-systems like HBase - Hive, HBase - Pig
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience on streaming data using Apache Flume.
  • Good Knowledge in Apache Spark and SparkSQL.
  • Experience in running spark streaming applications in cluster mode.
  • Experienced in Spark log debugging.
  • Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
  • Deep Knowledge in the core concepts of MapReduce Framework and Hadoop ecosystem
  • Analyzed large structured datasets using Hive's data warehousing infrastructure
  • Extensive knowledge of creating manage tables and external tables in Hive Eco system.
  • Worked extensively in design and development of business process using SQOOP, PIG, HIVE, HBASE
  • Knowledge on Spark framework for batch and real-time data processing.
  • Knowledge on Scala Programming Language.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Strong in core java, data structure, algorithms design, Object-Oriented Design(OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading.
  • Hands on experience in MVC architecture and Java EE frameworks like Struts2, Spring MVC, and Hibernate.
  • Good knowledge in Software Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC).
  • Proficient in unit testing the application using Junit, MRUnit and logging the application using Log4J.
  • Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.

TECHNICAL SKILLS:

Hadoop /Big Data Technologies: Hadoop 2.x, HDFS, Map Reduce V1, MapReduce V2, HBase, Pig 0.14.0, Hive 1.2.4, Sqoop, Yarn, Flume 1.4.0, Zookeeper 3.4.6, Spark 2.1.0, Kafka 0.8.0 and Oozie 4.0.1, Hue, Impala, Whirr, Kerberos, RabbitMQ

Shell Scripting/Programming Languages: SQL, Pig Latin, HiveQL / Python, Perl, Java, Scala, Log4j

Web Technologies: HTML, XML, JSON, JavaScript 1.2/1.1, Ajax, CSS, SOAP and WSDL

Databases/NoSQL Databases: SQL Server 9.0, MYSQL 5.0, Oracle10g, PostgreSQL 3,0/ MongoDB 3.2, Cassandra, NoSQL-Hbase

Database Tools: TOAD, Chordiant CRM tool, Billing tool, Oracle Warehouse Builder (OWB).

Operating Systems: Linux, Unix, Windows, Mac, CentOS

Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering, UML methodologies, ETL tools, Tableau

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta

Senior Hadoop/Spark Developer

Responsibilities:

  • Implemented and configured High Availability Hadoop Cluster.
  • Hands-on experience 100 node cluster.
  • Handle the installation and configuration with capacity planning of a Hadoop cluster.
  • Worked in Kerberos and how it interacts with Hadoop and LDAP.
  • Worked in Kerberos, Active Directory/LDAP, Unix based File System.
  • Implemented Kerberos Security Authentication protocol for production cluster.
  • Hands on experience working on Hadoop ecosystem components like Yarn, Hadoop Map Reduce, HDFS, Zoo Keeper, Oozie, Hive, Sqoop, Pig, Flume.
  • Worked in Unix commands and Shell Scripting.
  • Worked on Spark REST APIs like Cluster API and Workspace API.
  • Experienced in working with RDD’s and Dstreams to perform Transformations and Actions on them.
  • Implemented Sentry for role based authentication on hive and Hbase.
  • Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency. Implemented automatic failover zookeeper and zookeeper failover controller
  • Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow
  • Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Monitored services through Zookeeper
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked on analyzing Data with HIVE and PIG.
  • Deployed Network file system for Name Node Metadata backup.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Deployed Spark Applications on Yarn using cluster mode.
  • Worked in setting up the log level across all the executors in Apache Spark.
  • Good Experience working with Amazon AWS for setting up Hadoop cluster
  • Experienced in using Python programming language for job executions on cluster at task level.
  • Implemented Tableau Servers configuration in development and prod environments.
  • Updated and validate Tableau Services with new licenses and patchwork to sync data with Hadoop.
  • Implemented YARN capacity scheduler for long running jobs in the Yarn queue.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Designed the cluster so that only one secondary name node daemon could be run at any given time.

Environment: Hadoop, MapReduce, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Cloudera, Zookeper, Metadata, Flume, Yarn, Python, Tableau, Kerboros. Chef.

Confidential, Jersey City

Hadoop Developer

Responsibilities:

  • Installed and Configured Hadoop monitoring and Administrating tools: Nagios and Ganglia
  • Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Developed algorithms for identifying influencers with in specified social network channels.
  • Developed and updated social media analytics dashboards on regular basis.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Analyzing data with Hive, Pig andHadoopStreaming.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Creating volumes and snapshots though theMapRCONTROL SYSTEM.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Experienced in working with Apache Storm.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Involved in fetching brands data from social media applications like Facebook, twitter.
  • Performed data mining investigations to find new insights related to customers.
  • Developed sentiment analysis system per particular domain using machine learning concepts by using supervised learning methodology.
  • Involved in collecting the data and identifying data patterns to build trained model using Machine Learning.
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
  • Manage and reviewHadooplog files.
  • Involved in identification of topics and trends and building context around that brand.
  • Developed different formulas for calculating engagement on social media posts.
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop Quorum Based, MapR, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Cloudera, Zookeper, Nagios, Ganglia, Metadata, Flume, Yarn, Amazon Web Services, EC2, Hortonworks

Confidential, New Jersey

Hadoop/Spark Developer

Responsibilities:

  • Importing Large Data Sets from DB2 to Hive Table using Sqoop.
  • Created Hive Managed and External Tables as per the requirements
  • Designing and developing tables in HBase and storing aggregating data from Hive
  • Developing Hive Scripts for data aggregating and processing as per the Use Case.
  • Writing Java Custom UDF's for processing data in Hive.
  • Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Optimized Hive queries for performance tuning.
  • Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.

Environment: Hadoopv2.6.0, HDFS, CDH 5.3.x, Map Reduce, HBase, Sqoop, Core Java, Hive, Oozie DB, Spark Streaming and Apache Kafka

We'd love your feedback!