Senior Hadoop/Spark Developer Resume

PROFESSIONAL SUMMARY:

Hadoop Developer with 10+ years of experience in Information Technology & 5+ years in Hadoop Ecosystem.
Experienced in working with 50 to 200 nodes cluster.
Expertise in Hadoop Ecosystem components HDFS, Map Reduce, Hive, Pig, Sqoop, Hbase and Flume for Data Analytics.
Have a hands - on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming, Flume and Apache Kafka.
Expert level knowledge in Python programming.
Knowledge on R language and Machine learning algorithms.
Knowledge on regression techniques (Logistic and Linear).
Knowledge on Random Forest, KNN, K-Mean and MBA Algorithms.
Worked on End-to-End Machine Learning implementation using R and python.
Proficient knowledge on pandas, numpy and Scikit-learn packages.
Good Knowledge on bs4, geocoding and matplotlib.
Knowledge on ggplots in R.
Capable of processing large sets of structured, semi-structured and unstructured data sets.
Experience in job workflow scheduling and monitoring tools like Autosys, Oozie and Zookeeper.
Experience in developing Custom UDFs for datasets in Pig and Hive using Python.
Proficient in designing and querying the NoSQL databases like HBase, MarkLogic and MongoDB.
Knowledge on integrating different eco-systems like HBase - Hive, HBase - Pig
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Experience on streaming data using Apache Flume.
Good Knowledge in Apache Spark and SparkSQL using Pyspark.
Experience in running spark streaming applications in cluster mode.
Experienced in Spark log debugging.
Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
Deep Knowledge in the core concepts of MapReduce Framework and Hadoop ecosystem
Analyzed large structured datasets using Hive's data warehousing infrastructure
Extensive knowledge of creating manage tables and external tables in Hive Eco system.
Worked extensively in design and development of business process using SQOOP, PIG, HIVE, HBASE
Knowledge on Spark framework for batch and real-time data processing.
Knowledge on Scala Programming Language.
Good experience with Informatica BDE and BDM for designing ETL Jobs for processing of data.
Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.
One of the active member in stackoverflow on the topics Hadoop, Hive, Spark, Pyspark and Python related discusions.

TECHNICAL SKILLS:

Hadoop /Big Data Technologies: Hadoop 2.x, HDFS, HBase, Pig 0.14.0, Hive 1.2.4, Sqoop, Yarn, Flume 1.4.0, Zookeeper 3.4.6, Spark 2.1.0, Kafka 0.8.0 and Oozie 4.0.1, Hue

Hadoop Distribution: Cloudera, Hortonworks

Programming Languages: SQL, Pig Latin, HiveQL, Python, Scala, R

Databases/NoSQL Databases: SQL Server 9.0, MYSQL 5.0, Oracle10g, PostgreSQL 3.0/ MongoDB 3.2, Hbase, GreenPlum (Pivotal)

Database Tools: TOAD, Aginity

Operating Systems: Linux, Unix, Windows, CentOS

Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering, UML methodologies, ETL tools, Tableau, D3.js, SVN, TFS, putty, WinSvc, ALM, Pycharm, Informatica, Informatica BDE and BDM

PROESSIONAL EXPERIENCE:

Senior Hadoop/Spark Developer

Confidential

Responsibilities:

Implemented and configured High Availability Hadoop Cluster.
Hands-on experience 56 node cluster.
Hands on experience working on Hadoop ecosystem components like Yarn, Hadoop Map Reduce, HDFS, Zoo Keeper, Oozie, Hive, Sqoop, Pig, Flume.
Created Data-Ingestion framework using Python Called iEngine.
iEngine can handle structured and semi structured files and Tables and load into Hive, Hbase, HDFS and GreenPlum.
Worked on Database ingestion using Sqoop (Sqoop integrated in iEngine Framework).
Worked in Unix commands and Shell Scripting.
Worked on Spark REST APIs like Cluster API and Workspace API.
Experienced in working with RDD’s and Dstreams to perform Transformations and Actions on them.
Worked on NCPDP, X12, 835 and Encounter (HealthCare specific) conversion to structured format and loaded in Hive and Hbase.
Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency. Implemented automatic failover zookeeper and zookeeper failover controller
Experience in using Flume to stream data into HDFS - from various sources. Used Autosys.
Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
Monitored services through Zookeeper.
Installed Oozie workflow engine to run multiple Hive and pig jobs.
Worked on analyzing Data with HIVE and PIG.
Deployed Network file system for Name Node Metadata backup.
Deployed Spark Applications on Yarn using cluster mode.
Implemented Tableau Servers configuration in development and prod environments.
Implemented YARN capacity scheduler for long running jobs in the Yarn queue.
Worked on Informatica BDE and BDM for designing ETL Jobs for processing of data.
Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
Designed the cluster so that only one secondary name node daemon could be run at any given time.

Environment: Hadoop, MapReduce, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Cloudera, Zookeper, Metadata, Flume, Yarn, Python, Tableau

Confidential

Hadoop Developer

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed MapReduce programs to parse the raw data and store the refined data in tables.
Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
Involved in moving all log files generated from various sources (SAN servers) to HDFS for further processing through Flume.
Worked on K-Mean, KNN algorithms to categorize customer spending using Python and R.
Analyzed and created influences based on the data from Ad-Platform (home grown).
Developed algorithms for identifying influencers with in specified social network channels.
Developed and updated social media analytics dashboards on regular basis using D3.js and Tableau.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop and Flume.
Involved in Data Cleansing, Data Wrangling activities using Python Pandas and Numpy.
Created Web-Scraping scripts using Python bs4 module.
Analyzing data with Hive, Pig andHadoopStreaming.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Involved in fetching brands data from social media applications like Facebook, twitter.
Performed data mining investigations to find new insights related to customers.
Analyzed customer feedback system using NLP and NLTK(python).
Involved in collecting the data and identifying data patterns to build trained model using Logistic Regression techniques.
Create a complete processing engine, based on HortonWorks distribution, enhanced to performance.
Worked on call log patterns using MapReduce and Neo4j.
Involved in identification of topics and trends and building context around that brand.
Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, Python, R, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Zookeper, Metadata, Flume, Yarn, Hortonworks and Machine Learning.

MicroFocus COBOL and Python Developer

Confidential

Roles & Responsibilities:

Involved in designing and development using COBOL and MicroFocus COBOL
Worked on Autosys, Control-M and TWS scheduling.
Python 2.4, 2.5, 2.6 on Unix and Windows environment.
Worked on MicroFocus COBOL in Unix and Windows Environment and COBOL on Mainframe.
Worked on XML formatting using Python and Microfocus COBOL.
Played a significant role in performance tuning and optimizing the memory consumption of the application.
Worked on IBM DB2, IMS DB databases.
Worked on Middle-Ware systems like MQ.
Developed advanced server-side classes using Networks, IO and Multi-Threading.
Lead the issue management team and achieved significant stability to the product by bringing down the bug count to single digits.
Worked on XML and JSON parsing using Python.

Environment: Unix, DB2, Python, COBOL, MicroFocus COBOL, Autosys, Control-M, zOS, Mainframe

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship