Senior Hadoop/spark Developer Resume
PROFESSIONAL SUMMARY:
- Hadoop Developer with 10+ years of experience in Information Technology & 5+ years in Hadoop Ecosystem.
- Experienced in working with 50 to 200 nodes cluster.
- Expertise in Hadoop Ecosystem components HDFS, Map Reduce, Hive, Pig, Sqoop, Hbase and Flume for Data Analytics.
- Have a hands - on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming, Flume and Apache Kafka.
- Expert level knowledge in Python programming.
- Knowledge on R language and Machine learning algorithms.
- Knowledge on regression techniques (Logistic and Linear).
- Knowledge on Random Forest, KNN, K-Mean and MBA Algorithms.
- Worked on End-to-End Machine Learning implementation using R and python.
- Proficient knowledge on pandas, numpy and Scikit-learn packages.
- Good Knowledge on bs4, geocoding and matplotlib.
- Knowledge on ggplots in R.
- Capable of processing large sets of structured, semi-structured and unstructured data sets.
- Experience in job workflow scheduling and monitoring tools like Autosys, Oozie and Zookeeper.
- Experience in developing Custom UDFs for datasets in Pig and Hive using Python.
- Proficient in designing and querying the NoSQL databases like HBase, MarkLogic and MongoDB.
- Knowledge on integrating different eco-systems like HBase - Hive, HBase - Pig
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience on streaming data using Apache Flume.
- Good Knowledge in Apache Spark and SparkSQL using Pyspark.
- Experience in running spark streaming applications in cluster mode.
- Experienced in Spark log debugging.
- Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
- Deep Knowledge in the core concepts of MapReduce Framework and Hadoop ecosystem
- Analyzed large structured datasets using Hive's data warehousing infrastructure
- Extensive knowledge of creating manage tables and external tables in Hive Eco system.
- Worked extensively in design and development of business process using SQOOP, PIG, HIVE, HBASE
- Knowledge on Spark framework for batch and real-time data processing.
- Knowledge on Scala Programming Language.
- Good experience with Informatica BDE and BDM for designing ETL Jobs for processing of data.
- Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.
- One of the active member in stackoverflow on the topics Hadoop, Hive, Spark, Pyspark and Python related discusions.
TECHNICAL SKILLS:
Hadoop /Big Data Technologies: Hadoop 2.x, HDFS, HBase, Pig 0.14.0, Hive 1.2.4, Sqoop, Yarn, Flume 1.4.0, Zookeeper 3.4.6, Spark 2.1.0, Kafka 0.8.0 and Oozie 4.0.1, Hue
Hadoop Distribution: Cloudera, Hortonworks
Programming Languages: SQL, Pig Latin, HiveQL, Python, Scala, R
Databases/NoSQL Databases: SQL Server 9.0, MYSQL 5.0, Oracle10g, PostgreSQL 3.0/ MongoDB 3.2, Hbase, GreenPlum (Pivotal)
Database Tools: TOAD, Aginity
Operating Systems: Linux, Unix, Windows, CentOS
Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering, UML methodologies, ETL tools, Tableau, D3.js, SVN, TFS, putty, WinSvc, ALM, Pycharm, Informatica, Informatica BDE and BDM
PROESSIONAL EXPERIENCE:
Senior Hadoop/Spark Developer
Confidential
Responsibilities:
- Implemented and configured High Availability Hadoop Cluster.
- Hands-on experience 56 node cluster.
- Hands on experience working on Hadoop ecosystem components like Yarn, Hadoop Map Reduce, HDFS, Zoo Keeper, Oozie, Hive, Sqoop, Pig, Flume.
- Created Data-Ingestion framework using Python Called iEngine.
- iEngine can handle structured and semi structured files and Tables and load into Hive, Hbase, HDFS and GreenPlum.
- Worked on Database ingestion using Sqoop (Sqoop integrated in iEngine Framework).
- Worked in Unix commands and Shell Scripting.
- Worked on Spark REST APIs like Cluster API and Workspace API.
- Experienced in working with RDD’s and Dstreams to perform Transformations and Actions on them.
- Worked on NCPDP, X12, 835 and Encounter (HealthCare specific) conversion to structured format and loaded in Hive and Hbase.
- Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency. Implemented automatic failover zookeeper and zookeeper failover controller
- Experience in using Flume to stream data into HDFS - from various sources. Used Autosys.
- Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Monitored services through Zookeeper.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Worked on analyzing Data with HIVE and PIG.
- Deployed Network file system for Name Node Metadata backup.
- Deployed Spark Applications on Yarn using cluster mode.
- Implemented Tableau Servers configuration in development and prod environments.
- Implemented YARN capacity scheduler for long running jobs in the Yarn queue.
- Worked on Informatica BDE and BDM for designing ETL Jobs for processing of data.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Designed the cluster so that only one secondary name node daemon could be run at any given time.
Environment: Hadoop, MapReduce, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Cloudera, Zookeper, Metadata, Flume, Yarn, Python, Tableau
Confidential
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources (SAN servers) to HDFS for further processing through Flume.
- Worked on K-Mean, KNN algorithms to categorize customer spending using Python and R.
- Analyzed and created influences based on the data from Ad-Platform (home grown).
- Developed algorithms for identifying influencers with in specified social network channels.
- Developed and updated social media analytics dashboards on regular basis using D3.js and Tableau.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop and Flume.
- Involved in Data Cleansing, Data Wrangling activities using Python Pandas and Numpy.
- Created Web-Scraping scripts using Python bs4 module.
- Analyzing data with Hive, Pig andHadoopStreaming.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Involved in fetching brands data from social media applications like Facebook, twitter.
- Performed data mining investigations to find new insights related to customers.
- Analyzed customer feedback system using NLP and NLTK(python).
- Involved in collecting the data and identifying data patterns to build trained model using Logistic Regression techniques.
- Create a complete processing engine, based on HortonWorks distribution, enhanced to performance.
- Worked on call log patterns using MapReduce and Neo4j.
- Involved in identification of topics and trends and building context around that brand.
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
Environment: Hadoop, Python, R, Oozie, Hive, Pig, Sqoop, MapReduce, HDFS, Zookeper, Metadata, Flume, Yarn, Hortonworks and Machine Learning.
MicroFocus COBOL and Python Developer
Confidential
Roles & Responsibilities:
- Involved in designing and development using COBOL and MicroFocus COBOL
- Worked on Autosys, Control-M and TWS scheduling.
- Python 2.4, 2.5, 2.6 on Unix and Windows environment.
- Worked on MicroFocus COBOL in Unix and Windows Environment and COBOL on Mainframe.
- Worked on XML formatting using Python and Microfocus COBOL.
- Played a significant role in performance tuning and optimizing the memory consumption of the application.
- Worked on IBM DB2, IMS DB databases.
- Worked on Middle-Ware systems like MQ.
- Developed advanced server-side classes using Networks, IO and Multi-Threading.
- Lead the issue management team and achieved significant stability to the product by bringing down the bug count to single digits.
- Worked on XML and JSON parsing using Python.
Environment: Unix, DB2, Python, COBOL, MicroFocus COBOL, Autosys, Control-M, zOS, Mainframe