Hadoop Developer Resume
Overland Park, KS
SUMMARY:
- 8years of experience in developing, implementing andconfiguring Hadoop ecosystem and development of various web applications using Java,J2EE.
- Having 4 years of experience inBig Data Analyticsusing Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, YARN, Spark, Scala, Oozie, Kafka and Flume.
- Work experience with different Hadoop distributions like Horton Worksand Cloudera.
- Excellent understanding of Hadoop distributed File systemand experienced in developing efficient MapReduceand YARN programs to process large datasets.
- Good working knowledge in using Sqoop to transfer bulk data between relational databases & HDFS and Flume for ingesting streaming data into HDFS.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Implemented Talend jobs to load data from different sources and also integrated with Kafka.
- Highly skilled in integrating Kafkawith Spark streamingfor high speed data processing.
- Very good at loading data into spark schema RDD’s and querying them using Spark - SQL.
- Good at writing custom RDD’s in Scala for applying data specific transformations and also implemented design patterns to improve the performance.
- Experienced in using apache Hue and Ambarito manage and monitor the Hadoop clusters.
- Experience in analysing large amounts of data using Pig Latin Scripts and Hive Query Language and also assisted with performance tuning.
- Mastery in writing customized UDF's using java to extend PIG and HIVE functionalities.
- Sound knowledge in using Apache Solr to search against structured and un-structured data.
- Worked with Azkaban and Oozie workflow schedulersto recurrently run Hadoop jobs.
- Experience in implementing Kerberos authentication protocol in Hadoop for data security.
- Experience in creating dash boards and generating reports using Tableau by connecting to tables in Hive and HBase.
- Experience in using Sequence files, RC, ORC and Avrofile formats and compression techniques.
- Worked on NoSQL databases like HBase, Cassandra to store structured and unstructured data.
- Good knowledge in cloud integration with Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2), Amazon's Simple Storage Service (S3) and Microsoft Azure.
- Hands on experience onUNIXenvironment and shell scripting.
- Experienced in using version control systems like SVN, GITbuild toolMaven and continuous integration tool Jenkins.
- Expertise in development of Web Applications using J2EE technologies likeServlets, JSP, WebServices,Spring,Hibernate,HTML,JQuery, Ajax and etc.
- Implemented design patternsto improve quality and performance of the applications.
- Worked on Junit to test the functionality of java methodsand used Python to do automation.
- Good experience in using Relational databases Oracle, SQLServer and PostgreSQL.
- Worked with agile, Scrum and Confidential software development framework for managing product development.
TECHNICAL SKILLS:
Hadoop Eco System: HDFS, MapReduce, Pig, Hive, Sqoop, Flume, Zookeeper, Oozie, Kafka, Storm, Talend, Spark, NiFi, Impala, Solr, Avro, and Crunch.
Programming languages: Java, Python, Scala.
No SQL Databases: HBase, Cassandra, MongoDB
Databases: Oracle, SQL Server, PostgreSQL
Web Technologies: HTML, JQuery, Ajax, CSS, JavaScript, JSON, XML.
Business Intelligence Tools: Tableau, Jasper reports.
Testing: Hadoop Testing, Hive Testing, MRUnit.
OperatingSystems: Linux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP.
Hadoop Distributions: Cloudera Enterprise, Horton Works.
Technologies and Tools: Servlets, JSP, Spring, Web Services, Hibernate, Maven, GitHub.
Application Servers: Tomcat, JBoss
IDE’s: Eclipse, Net Beans, IntelliJ
PROFESSIONAL EXPERIENCE:
Confidential, Overland Park, KS
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop cluster environmentwith Hortonworks distribution on 110 data nodes.
- Worked on Kafka REST API to collect and load the data on Hadoop file system and also used sqoop to load the data from relational databases.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra.
- Started using apache NiFi to copy the data from local file system to HDFS.
- Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs.
- Worked with Avro , ORCfile formats and compression techniques like LZO .
- Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions , Dynamic Partitions, Buckets on HIVE tables.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala .
- Worked on migrating MapReduce programs into Spark transformations using Scala.
- Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
- UsingJob management scheduler apache Oozie to execute the workflow.
- Using Ambari to monitor node’s health and status of the jobs in Hadoop clusters.
- Worked on Tableau to build customized interactive reports, worksheets and dashboards.
- Implemented Kerberos for strong authentication to provide data security.
- Worked on apache Solr for indexing and load balanced querying to search for specific data in larger datasets.
- Involved in performance tuning of spark jobs using Cache and using complete advantage of cluster environment.
Environment: Hadoop , HDFS, Spark, Scala, Kafka, Hive, Sqoop, Ambari, Solr, Oozie, Cassandra, Tableau,Jenkins, Bit bucket,Hortonworks and Red Hat Linux.
Confidential, Kansas City, MO
Hadoop Developer
Responsibilities:
- Design and develop analytic systems to extract meaningful data from large scale structured and unstructured health data.
- Created Sqoop jobs to populate data present inrelational databases to hive tables.
- Developed UDF’s in java for enhancing functionalities of Pig and Hive scripts.
- Solved performance issues in Pig and Hive scripts with deep understanding in joins, groups and aggregations and how these jobs doestranslated into MapReduce jobs.
- Involved in creating Hive external tables, loading data, and writing Hive queries.
- Developed the processed data in HBase for faster querying and random access.
- Defined job flow using Azkaban scheduler to automate the Hadoop jobs and installed zookeepers for automatic node failovers.
- Managing and reviewing Hadoop log files to find the source for job failures and debugging the scripts for code optimization.
- Developed complex MapReduce Programs to analyse data that exists on the cluster.
- Developed the processes to load data from server logs into HDFS using Flume and also loading from UNIX file system to HDFS.
- Build a platform to query and display the analysis results in dashboard using Tableau .
- Used apacheHue web interface to monitor the Hadoop cluster and run the jobs.
- Implemented apache Sentry for role based authorization to access the data.
- Developed Shell scripts to automate routine DBA tasks (i.e. data refresh, backups)
- Involved in the performance tuning for Pig Scripts and Hive Queries.
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume,HBase, Azkaban, Tableau,Java, Maven, Git, Cloudera, Eclipseand Shell Scripting.
Confidential, Sunnyvale, CA
Hadoop Developer
Responsibilities:
- Worked on Hadoop Cloudera clusterof50 data nodes with Red Hat enterprise Linux installed.
- Involved in loading data from UNIX file system to HDFS using Shell Scripting.
- Importing and exporting data into HDFS from Oracle 10.2 databaseusing Sqoop .
- Developed ETL processes to load data from multiple data sources to HDFS using Sqoop, analyzing data using MapReduce , Hive and Pig Latin .
- Developed custom UDF’s for pig scripts for cleaning unstructured data and used different joins and groups whenever required to optimize the pig scripts.
- Created hive external tables on top of processed data to easily manage and query the data using HiveQL.
- Involved in performance tuning of Hive Queries by implementing Dynamic Partitions, buckets in Hive to improve the performance.
- Integrated Map Reduce with HBase to import bulk data using MR programs.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
- Wrote the Map Reduce jobs in java to parse the web logs, which are stored in HDFS and used MRUnit to test and debug MapReduce programs.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Coordinated with team in resolving the issues technically as well as functionally.
Environment: Cloudera, HDFS, MapReduce, Pig, Hive, Flume, Sqoop,HBase, Oozie, Maven, Git, Java, Pythonand Linux.
Confidential
Java Developer
Responsibilities:
- Involved in designing and development of the project using java and J2EE technologies by following MVC architecture of which JSP’s are views and Servlets as controllers.
- Using StarUMLdesigned network and use case diagrams to monitor the work flow.
- Wrote server side programs to handle requests coming from different types of devices like iOS and using RESTfulWebServices.
- Implemented design patterns like CacheManager and Factory classes to improve the performance of the application.
- Used hibernateORM tool to store and retrieve the data from PostgreSQL database.
- Involved in writing test cases for the application using Junit.
- Followed the Agilesoftware development process to do this project and achieved the fast development.
Environment: JSP, Servlets, Ajax, RESTful,Hibernate, Design Patterns, StarUML,Eclipseand PostgreSQL.
Confidential
Java Developer
Responsibilities:
- Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.
- Developed custom JSP tags for the application.
- Writing queries for fetching and manipulating data using ORM software iBatis.
- Used Quartz schedulers to run the jobs sequentially at given time.
- Implemented design patterns like Filter, Cache Manager and Singleton to improve the performance of the application.
- Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.
- Deployed the application in client’s location on Tomcat Server.
Environment:HTML, Java Script, Ajax, Servlets, JSP, iBatis, Tomcat Server, PostgreSQL, Jasper Reports.