We provide IT Staff Augmentation Services!

Senior Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • Hadoop Developer with 10+ years of experience in Information Technology & 5+ years in Hadoop Ecosystem.
  • Experienced in working with 50 to 200 nodes cluster.
  • Expertise in Hadoop Ecosystem components HDFS, Map Reduce, Hive, Pig, Sqoop, Hbase and Flume for Data Analytics.
  • Have a hands - on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming, Flume and Apache Kafka.
  • Expert level knowledge in Python programming.
  • Knowledge on R language and Machine learning algorithms.
  • Knowledge on regression techniques (Logistic and Linear).
  • Knowledge on Random Forest, KNN, K-Mean and MBA Algorithms.
  • Worked on End-to-End Machine Learning implementation using R and python.
  • Proficient knowledge on pandas, numpy and Scikit-learn packages.
  • Good Knowledge on bs4, geocoding and matplotlib.
  • Knowledge on ggplots in R.
  • Capable of processing large sets of structured, semi-structured and unstructured data sets.
  • Experience in job workflow scheduling and monitoring tools like Autosys, Oozie and Zookeeper.
  • Experience in developing Custom UDFs for datasets in Pig and Hive using Python.
  • Proficient in designing and querying the NoSQL databases like HBase, MarkLogic and MongoDB.
  • Knowledge on integrating different eco-systems like HBase - Hive, HBase - Pig
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience on streaming data using Apache Flume.
  • Good Knowledge in Apache Spark and SparkSQL using Pyspark.
  • Experience in running spark streaming applications in cluster mode.
  • Experienced in Spark log debugging.
  • Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
  • Deep Knowledge in the core concepts of MapReduce Framework and Hadoop ecosystem
  • Analyzed large structured datasets using Hive's data warehousing infrastructure
  • Extensive knowledge of creating manage tables and external tables in Hive Eco system.
  • Worked extensively in design and development of business process using SQOOP, PIG, HIVE, HBASE
  • Knowledge on Spark framework for batch and real-time data processing.
  • Knowledge on Scala Programming Language.
  • Good experience with Informatica BDE and BDM for designing ETL Jobs for processing of data.
  • Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.
  • One of the active member in stackoverflow on the topics Hadoop, Hive, Spark, Pyspark and Python related discusions.

TECHNICAL SKILLS:

Hadoop /Big Data Technologies: Hadoop 2.x, HDFS, HBase, Pig 0.14.0, Hive 1.2.4, Sqoop, Yarn, Flume 1.4.0, Zookeeper 3.4.6, Spark 2.1.0, Kafka 0.8.0 and Oozie 4.0.1, Hue

Hadoop Distribution: Cloudera, Hortonworks

Programming Languages: SQL, Pig Latin, HiveQL, Python, Scala, R

Databases/NoSQL Databases: SQL Server 9.0, MYSQL 5.0, Oracle10g, PostgreSQL 3.0/ MongoDB 3.2, Hbase, GreenPlum (Pivotal)

Database Tools: TOAD, Aginity

Operating Systems: Linux, Unix, Windows, CentOS

Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering, UML methodologies, ETL tools, Tableau, D3.js, SVN, TFS, putty, WinSvc, ALM, Pycharm, Informatica, Informatica BDE and BDM

PROESSIONAL EXPERIENCE:

Senior Hadoop/Spark Developer

Confidential

Responsibilities:

  • Implemented and configured High Availability Hadoop Cluster.
  • Hands-on experience 56 node cluster.
  • Hands on experience working on Hadoop ecosystem components like Yarn, Hadoop Map Reduce, HDFS, Zoo Keeper, Oozie, Hive, Sqoop, Pig, Flume.
  • Created Data-Ingestion framework using Python Called iEngine.
  • iEngine can handle structured and semi structured files and Tables and load into Hive, Hbase, HDFS and GreenPlum.
  • Worked on Database ingestion using Sqoop (Sqoop integrated in iEngine Framework).
  • Worked in Unix commands and Shell Scripting.
  • Worked on Spark REST APIs like Cluster API and Workspace API.
  • Experienced in working with RDD’s and Dstreams to perform Transformations and Actions on them.
  • Worked on NCPDP, X12, 835 and Encounter (HealthCare specific) conversion to structured format and loaded in Hive and Hbase.
  • Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency. Implemented automatic failover zookeeper and zookeeper failover controller
  • Experience in using Flume to stream data into HDFS - from various sources. Used Autosys.
  • Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Monitored services through Zookeeper.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked on analyzing Data with HIVE and PIG.
  • Deployed Network file system for Name Node Metadata backup.
  • Deployed Spark Applications on Yarn using cluster mode.
  • Implemented Tableau Servers configuration in development and prod environments.
  • Implemented YARN capacity scheduler for long running jobs in the Yarn queue.
  • Worked on Informatica BDE and BDM for designing ETL Jobs for processing of data.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Designed the cluster so that only one secondary name node daemon could be run at any given time.

Environment: Hadoop, MapReduce, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Cloudera, Zookeper, Metadata, Flume, Yarn, Python, Tableau

Confidential

Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in moving all log files generated from various sources (SAN servers) to HDFS for further processing through Flume.
  • Worked on K-Mean, KNN algorithms to categorize customer spending using Python and R.
  • Analyzed and created influences based on the data from Ad-Platform (home grown).
  • Developed algorithms for identifying influencers with in specified social network channels.
  • Developed and updated social media analytics dashboards on regular basis using D3.js and Tableau.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop and Flume.
  • Involved in Data Cleansing, Data Wrangling activities using Python Pandas and Numpy.
  • Created Web-Scraping scripts using Python bs4 module.
  • Analyzing data with Hive, Pig andHadoopStreaming.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Involved in fetching brands data from social media applications like Facebook, twitter.
  • Performed data mining investigations to find new insights related to customers.
  • Analyzed customer feedback system using NLP and NLTK(python).
  • Involved in collecting the data and identifying data patterns to build trained model using Logistic Regression techniques.
  • Create a complete processing engine, based on HortonWorks distribution, enhanced to performance.
  • Worked on call log patterns using MapReduce and Neo4j.
  • Involved in identification of topics and trends and building context around that brand.
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, Python, R, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Zookeper, Metadata, Flume, Yarn, Hortonworks and Machine Learning.

MicroFocus COBOL and Python Developer

Confidential

Roles & Responsibilities:

  • Involved in designing and development using COBOL and MicroFocus COBOL
  • Worked on Autosys, Control-M and TWS scheduling.
  • Python 2.4, 2.5, 2.6 on Unix and Windows environment.
  • Worked on MicroFocus COBOL in Unix and Windows Environment and COBOL on Mainframe.
  • Worked on XML formatting using Python and Microfocus COBOL.
  • Played a significant role in performance tuning and optimizing the memory consumption of the application.
  • Worked on IBM DB2, IMS DB databases.
  • Worked on Middle-Ware systems like MQ.
  • Developed advanced server-side classes using Networks, IO and Multi-Threading.
  • Lead the issue management team and achieved significant stability to the product by bringing down the bug count to single digits.
  • Worked on XML and JSON parsing using Python.

Environment: Unix, DB2, Python, COBOL, MicroFocus COBOL, Autosys, Control-M, zOS, Mainframe

We'd love your feedback!