Sr. Hadoop Developer Resume
Hudson, OhiO
PROFESSIONAL SUMMARY:
- 8 years of experience in developing, implementing, and configuring Hadoop ecosystem and development of various web applications using Java, J2EE.
- Having good experience in Big Data Analytics using apache Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, YARN, Mesos, Spark, Scala, Oozie, Kafka and Flume.
- Work experience with different Hadoop distributions Horton Works and Cloudera.
- Excellent understanding of Hadoop distributed File system and experienced in developing efficient MapReduce jobs to process large datasets.
- Good working knowledge in using Sqoop and Flume for data ingestion into HDFS.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Implemented Talend jobs to load data from different sources and integrated with Kafka.
- Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
- Very good at loading data into spark schema RDD’s and querying them using Spark - SQL.
- Good at writing custom RDD’s in Scala and implemented design patterns to improve the performance.
- Experienced in using apache Hue and Ambari to manage and monitor the Hadoop clusters.
- Experience in analysing large amounts of data using Pig and Hive scripts.
- Mastery in writing customized UDF's using java to extend Pig and Hive functionalities.
- Sound knowledge in using Apache Solr to search against structured and un-structured data.
- Worked with Azkaban and Oozie workflow schedulers to recurrently run Hadoop jobs.
- Experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL
- Experience in implementing Kerberos authentication protocol in Hadoop for data security.
- Experience in creating dash boards and generating reports using QlikSense.
- Experience in using Sequence files, ORC, Parquet and Avro file formats and compression techniques like LZO.
- Developed Spark code and Spark-SQl/Streaming for faster testing and processing of data.
- Worked on NoSQL databases like HBase, Cassandra and MongoDB to store the processed data.
- Good knowledge in cloud integration with Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2), Amazon's Simple Storage Service (S3) and Microsoft Azure.
- Hands on experience on UNIX environment and shell scripting.
- Experienced in using version control system GIT, build tool Maven and integration tool Jenkins.
- Expertise in development of Web Applications using J2EE technologies like Servlets, JSP, Web Services, Spring, Hibernate, HTML, JQuery, Ajax etc.
- Implemented design patterns to improve quality and performance of the applications.
- Used core java concepts such as Collections, Algorithms, Data Structures and Multithreading
- Worked on Junit to test the functionality of java methods and used Python to do automation.
- Good experience in using Relational databases Oracle, SQL Server, and PostgreSQL.
- Experience in developing the J2EE applications using technologies like Java, JDBC and Servlets.
- Worked with Waterfall, agile, Scrum and Sprint software development framework for managing product development.
AREAS OF EXPERTISE:
Hadoop Eco System: HDFS, MapReduce, Pig, Hive, Sqoop, Flume, Zookeeper, Oozie, Kafka, Storm, Talend, Spark, NiFi, Mesos, Avro, and Crunch.
Programming languages: Java, Python, Scala.
No SQL Databases: HBase, Cassandra, MongoDB.
Databases: Oracle, SQL Server, PostgreSQL.
Web Technologies: HTML, JQuery, Ajax, CSS, JavaScript, JSON, XML.
Business Intelligence Tools: QlikSense, Jasper reports.
Testing: Hadoop Testing, Hive Testing, MRUnit.
Operating Systems: Linux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP.
Hadoop Distributions: Cloudera Enterprise, Horton Works.
Technologies and Tools: Servlets, JSP, Spring (Boot, MVC, Batch, Security), Web Services, Hibernate, Maven, GitHub.
Application Servers: Tomcat, JBoss.
IDE’s: Eclipse, Net Beans, IntelliJ.
WORK EXPERIENCE:
Confidential, Hudson, Ohio
Sr. Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution.
- Worked on Kafka and REST API to collect and load the data on Hadoop file system also used sqoop to load the data from relational databases.
- Implemented Talend jobs to load data from excel sheets.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra database.
- Developed Spark scripts by writing custom RDDs in Scala and Python for data transformations and actions on RDDs.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala, Python .
- Worked on reading multiple data formats on HDFS using scala.
- Worked with Python , to develop analytical jobs using light weight PySpark API of spark.
- Worked with Avro , ORC file formats and compression techniques like LZO .
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions , Dynamic Partitions, Buckets on HIVE tables.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Worked on migrating MapReduce programs into Spark transformations using Scala.
- Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
- Using Job management scheduler apache Oozie to execute the workflow.
- Using Ambari to monitor node’s health, status of the jobs and to run the analytics jobs in Hadoop clusters.
- Worked on Qliksense to build customized interactive reports, worksheets, and dashboards.
- Implemented Kerberos for strong authentication to provide data security.
- Involved in performance tuning of spark jobs using Cache and by utilizing complete advantage of cluster environment.
Environment: Hadoop , HDP, Spark, Scala, Python, Kafka, Hive, Sqoop, Ambari, Talend, Oozie, Cassandra, QlikSense, Jenkins, Hortonworks.
Confidential, Bowie, Maryland
Hadoop Developer
Responsibilities:
- Design and develop analytic systems to extract meaningful data from large scale structured and unstructured health data.
- Created Sqoop jobs to populate data present in relational databases to hive tables.
- Developed UDF’s in java for enhancing functionalities of Pig and Hive scripts.
- Solved performance issues in Pig and Hive scripts with deep understanding in joins, groups and aggregations and how these jobs do translate into MapReduce jobs.
- Involved in creating Hive external tables, loading data, and writing Hive queries.
- Developed the processed data in HBase for faster querying and random access.
- Defined job flow using Azkaban scheduler to automate the Hadoop jobs and installed zookeepers for automatic node failovers.
- Managing and reviewing Hadoop log files to find the source for job failures and debugging the scripts for code optimization.
- Developed complex MapReduce Programs to analyse data that exists on the cluster.
- Developed the processes to load data from server logs into HDFS using Flume and loading from UNIX file system to HDFS.
- Build a platform to query and display the analysis results in dashboard using QlikSense .
- Used apache Hue web interface to monitor the Hadoop cluster and run the jobs.
- Implemented apache Sentry for role-based authorization to access the data.
- Developed Shell scripts to automate routine DBA tasks (i.e. data refresh, backups)
- Involved in the performance tuning for Pig Scripts and Hive Queries.
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Azkaban, QlikSense, Java, Maven, Git, Cloudera, Eclipse and Shell Scripting.
Confidential, Atlanta, Georgia
Hadoop Developer
Responsibilities:
- Worked on Hadoop Cloudera cluster of 50 data nodes with Red Hat enterprise Linux installed.
- Involved in loading data from UNIX file system to HDFS using Shell Scripting.
- Importing and exporting data into HDFS from Oracle 10.2 database using Sqoop .
- Developed ETL processes to load data from multiple data sources to HDFS using Sqoop, analyzing data using MapReduce , Hive and Pig Latin .
- Developed MapReduce jobs in Python for data cleaning and data processing.
- Developed custom UDF’s for pig scripts for cleaning unstructured data and used different joins and groups whenever required to optimize the pig scripts.
- Created hive external tables on top of processed data to easily manage and query the data using HiveQL.
- Involved in performance tuning of Hive Queries by implementing Dynamic Partitions, buckets in Hive to improve the performance.
- Integrated Map Reduce with HBase to import bulk data using MR programs.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
- Wrote the Map Reduce jobs in java to parse the web logs, which are stored in HDFS and used MRUnit to test and debug MapReduce programs.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Coordinated with team in resolving the issues technically as well as functionally.
Environment: Cloudera, HDFS, MapReduce, Pig, Hive, Flume, Sqoop, HBase, Oozie, Maven, Git, Java, Python and Linux.
Confidential
Java Developer
Responsibilities:
- Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.
- Developed custom JSP tags for the application.
- Writing queries for fetching and manipulating data using ORM software iBatis.
- Used Quartz schedulers to run the jobs sequentially at given time.
- Implemented design patterns like Filter, Cache Manager and Singleton to improve the performance of the application.
- Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.
- Deployed the application in client’s location on Tomcat Server.
Environment: HTML, Java Script, Ajax, Java, Servlets, JSP, iBatis, Tomcat Server, SQL Server, Jasper Reports.
Confidential
Java Developer
Responsibilities:
- Involved in designing and development of the project using java and J2EE technologies by following MVC architecture of which JSP’s are views and Servlets as controllers.
- Using StarUML designed network and use case diagrams to monitor the work flow.
- Wrote server-side programs to handle requests coming from different types of devices like iOS and using RESTful Web Services.
- Implemented design patterns like Cache Manager and Factory classes to improve the performance of the application.
- Used hibernate ORM tool to store and retrieve the data from PostgreSQL database.
- Involved in writing test cases for the application using Junit.
- Followed the Agile software development process to do this project and achieved the fast development.
Environment: JSP, Spring MVC, Spring Security, Servlets, Ajax, RESTful, Hibernate, Design Patterns, StarUML, Eclipse and PostgreSQL.
