Sr. Hadoop Developer Resume
SUMMARY
- Around 9 years of Professional Experience in various IT sectors such as health - care, Finance, Insurance, and retail, which includes 4 years of experience with Big Data and Hadoop Eco Systems.
- Extensive experience of development using Hadoop ecosystem components like Spark, Hive, Kafka, Impala, HBase, MapReduce, Pig, Sqoop, Yarn and Oozie.
- Strong programming experience using Java, Scala, Python and SQL.
- Strong fundamental understanding of Distributed Systems Architecture and parallel processing frameworks.
- Strong experience designing and implementing end-to-end data pipelines running on terabytes of data.
- Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
- Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.
- Experience in using D-Streams in spark streaming, accumulators, Broadcast variables, various levels of caching and optimization techniques in spark.
- Strong experience working with data ingestion tools Sqoop and Kafka.
- Good knowledge and development experience with using MapReduce framework.
- Hands on experience in writing AD-hoc Queries for moving data from HDFS to Hive and analyzing data using Hive QL.
- Proficient in creating Hive DDL's, writing Hive custom UDF’s.
- Knowledge in job workflow managing and monitoring tools like Oozie and Rundeck.
- Experience in designing, implementing and managing secure authentication mechanism to Hadoop cluster with Kerberos.
- Experience in working with NoSQL database like HBase, Cassandra and Mongo DB.
- Experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
- Good knowledge in creating ETL jobs through Talend to load huge volumes of data into Hadoop Ecosystem and relational databases.
- Experience working with Cloudera, Hortonworks and Amazon AWS EMR distributions.
- Good experience in developing applications using Java, J2EE, JSP, MVC, EJB, JMS, JSF, Hibernate, AJAX and web based development tools.
- Strong experience in RDBMS technologies like MySQL, Oracle and Teradata.
- Strong expertise in creating Shell-Scripts, Regular Expressions and Cron Job Automation.
- Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
- Experience with various version control systems such as CVS, TFS, SVN.
- Worked with geographically distributed and culturally diverse team, including roles that involve interaction with clients and team members.
TECHNICAL SKILLS
Big Data Eco System: Hadoop, HDFS, MapReduce, Hive, Pig, Impala, HBase, Sqoop, NoSQL (HBase), Spark, Spark Streaming, Zookeeper, Oozie, Kafka, Flume, Hue, Cloudera Manager, Amazon AWS, Hortonworks
Java/J2EE & Web Technologies: J2EE, JMS, JSF, Servlets, HTML, CSS, XML, XHTML, AJAX, Angular JS, JSP, JSTL
Languages: C, C++, Core Java, Shell Scripting, PL/SQL, Python, Pig Latin, Scala
Scripting Languages: JavaScript and UNIX Shell Scripting, Python
Operating system: Windows, MacOS, Linux and Unix
Design: UML, Rational Rose, Microsoft Visio, E-R Modelling
DBMS / RDBMS: Oracle 11g/10g/9i, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, RDBMS, MongoDB, Cassandra, HBase
IDE and Build Tools: Eclipse, NetBeans, Microsoft Visual Studio, Ant, Maven, JIRA, Confluence
Version Control: SVN, CVS, GITHUB
Security: Kerberos
Web Services: SOAP, RESTful, JAX-RS
Web Servers: Web Logic, Web Sphere, Apache Tomcat, Jetty
PROFESSIONAL EXPERIENCE
Confidential
Sr. Hadoop Developer
Responsibilities:
- Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
- Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
- Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
- Worked on troubleshooting spark application to make them more error tolerant.
- Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
- Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
- Wrote Spark-Streaming applications to consume the data from KAFKA topics and write the processed streams to HBase.
- Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
- Worked extensively with Sqoop for importing data from Oracle.
- Experience working for EMR cluster in AWS cloud and working with S3.
- Involved in creating Hive tables, loading and analyzing data using hive scripts.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Good experience with continuous Integration of application using Bamboo.
- Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.
- Designed, documented operational problems by following standards and procedures using JIRA.
Environment: Hadoop 2.x, Spark, Scala, Hive, Sqoop, Oozie, Kafka, Amazon EMR, ZooKeeper, Impala, YARN, JIRA, Kerberos, Amazon AWS, Shell Scripting, SBT, GITHUB, Maven.
Confidential
Hadoop Developer
Responsibilities:
- Involved in requirement analysis, design, coding and implementation phases of the project.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
- Written new spark jobs in Scala to analyze the data of the customers and sales history.
- Used Kafka to get data from many streaming sources into HDFS.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
- Created Hive external tables to perform ETL on data that is generated on daily basics.
- Written HBase bulk load jobs to load processed data to Hbase tables by converting to HFiles.
- Performed validation on the data ingested to filter and cleanse the data in Hive.
- Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
- Loaded the data into hive tables from spark and used parquet columnar format.
- Developed oozie workflows to automate and product ionize the data pipelines.
- Developed Sqoop import Scripts for importing reference data from Netezza.
Environment: HDFS, Hadoop, Pig, Hive, HBase, Sqoop, Kafka, Teradata, Map Reduce, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, Amazon Web Services, Maven, Agile Methodology, JIRA, Linux.
Confidential
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Developed custom MapReduce programs and custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
- Wrote MapReduce jobs using Java API and Pig Latin.
- Extracted the data from the flat files and other RDBMS databases into staging area and ingested to Hadoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Developed numerable Pig batch programs for both implementation, and optimization needs.
- Used HBase in accordance with Hive/Pig as per the requirement.
- Created different Pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
- Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.
- Integrated the hive warehouse with HBase for information sharing among teams.
- Developed complex Hive UDFs to work with sequence files.
- Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of MapReduce outputs.
- Created dashboards in Tableau to create meaningful metrics for decision making.
- Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked with Avro Data Serialization system to work w0069th JSON data formats.
- Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.
- End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by Directed Acyclic graph (DAG) of actions with control flows.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: HDFS, HBase, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Oozie, Avro, HDP Distribution, Eclipse, Log4j, JUnit, Linux.
Confidential
Java Developer
Responsibilities:
- Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
- Used Rational Rose for developing Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
- Used spring for cross cutting concerns and IOC for dependency injection.
- Implemented application level persistence using Hibernate and spring.
- Consuming and exposing various Web services using JAX-RS to different systems like NPI Validation, Address validation.
- Implemented the core java programming for the inventory cost.
- Developed complex Web Services and tailored JAX-RSAPI to suit the requirement.
- Development of UI models using HTML, JSP, JavaScript, AJAX, Web Link and CSS.
- Wrote custom JavaScript and CSS to maintain user friendly look and feel.
- Wrote jQuery function while implementing various UI Screens across the whole web application.
- Wrote application level code to perform client side validation using jQuery, JavaScript.
- Primarily focused on the spring components such as Spring MVC, Dispatcher Servlets, Controllers, Model and View Objects, View Resolver.
- Wrote complex named SQL queries using Hibernate.
- Generated POJO classes with JPA Annotations using Reverse Engineering.
- Developed the application using IntelliJ IDE.
- Used LOG4J, JUnit for debugging, testing and maintaining the system state.
- Used SOAP-UI for testing the Web-Services.
- Used SVN to maintain source and version management.
- Using JIRA to manage the issues/project work flow.
- Implemented SOLID Design Principles throughout the development of Project.
- Unit tested all the classes using JUNIT at various class level and methods level.
Environment: Java/Java EE5, JSP2.1, Spring 2.5, Spring MVC, Hibernate3.0, Web services, JAX-RS, Rational Rose, WADL, SoapUI, HTML, CSS, JavaScript, AJAX, JSON, jQuery, Maven, JMS, Maven, log4j, Jenkins, JPA, Oracle, MY SQL, SQL Developer, JIRA, SVN, PL/SQL, Weblogic 10.3, IntelliJ, UNIX.
Confidential
Java Developer
Responsibilities:
- Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
- Understood open source frameworks along with debugging by Eclipse tool.
- Utilized Spring Framework including encouraging application architectures based on the MVC (J2EE Design Patterns) design paradigm.
- Implemented RESTful API Web Services.
- Performed server side programming using AJAX, JQuery.
- Configured the hibernate files for the libraries of the project.
- Implemented bootstrap in designing the responsive design of the web page.
- Created wireframes in designing the structure of the project.
- Involved in system design and development in core java using Collections, multithreading and exception handling.
- Designed user interface using HTML, CSS, Servlet, JSP.
- Implemented templates for different rules for accessing different applications.
- Performed Client Side validations using Java script.
- Developed Web Pages using HTML, DHTML and CSS.
- Actively involved in the integration of different use cases, code reviews and refactoring.
- Used Log4J to maintain the user defined logs on system.
- Created unit test cases using Junit for the end-end testing.
- Actively worked with the client to collect requirements for the project.
- Involved in the implementation of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
Environment: Spring, Core Java, HTML, DHTML, Log4J, UNIX OS, CSS, JavaScript, AJAX, JQuery, Eclipse IDE, RESTful Web Service, Maven, UML, Java Mail API, Hibernate, MVC, JSP, Junit, wireframes.