Sr. Hadoop Developer Lead Resume
St Louis, MO
SUMMARY
- Over 10 years of experience in software development, building scalable and high performance Big Data applications with specialization in Apache Hadoop Stack, NoSQL databases, distributed computing and Java/ J2EE technologies
- Expertise working across all phases of SDLC viz requirements gathering, system design, development, enhancement, maintenance, testing, deployment, production support, and documentation
- Expertise with Hadoop architecture and its components such as HDFS, YARN, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm
- Good experience in using data processing tools viz MapReduce, Pig and Hive for performing business transformations, data validations, and metadata driven data parsing routines
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Pig, HBase, ZooKeeper, Oozie, Hive, Sqoop, and Flume
- Working experience in creating complex data ingestion pipelines, data transformations, data management and data governance in a centralized enterprise data hub
- Experience with cloud computing technologies viz Windows Azure HDInsight, AWS/EMR, Direct - Hadoop-EC2 (non EMR), and Cloudera Manager
- Working experience with pushing data as delimited files into HDFS using Talend Big Data tool
- Strong expertise in writing complex MapReducejobs, Pig Scripts, and Hive queries for data modeling
- Good experience in HDFS design, daemons, federation and HDFS high availability (HA)
- Familiarity with Hadoop Security projects like Apache Knox Gateway, Sentry, Ranger, and Project Rhino
- Very good working knowledge of Apache Cassandra, MongoDB, and Flume
- Experienced in Integrated Data Warehousing and MDM projects
- Experience in working with BI team in transforming big data requirements into Hadoop centric design and solutions
- Experience in performance tuning the Hadoop cluster by gathering information and analyzing the existing infrastructure
- Good working experience using Apache Sqoop to import data into HDFS from RDBMS and vice-versa
- Adept in creating real time data streaming solutions using Apache Spark/ Spark Streaming, Kafka and Flume
- Adept in extending Hive and Pig core functionality by writing custom UDFs
- Good understanding of Data Mining and Machine Learning techniques
- Developed various MapReduce applications to perform ETL workloads on terabytes of data
- Strong work ethic with desire to succeed and make significant contributions to the organization
- Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Web Services, Oracle, SQL Server, and DB2
- Experience in building, deploying and integrating with Ant, Maven and Jenkins
- Extensive work experience with different SDLC approaches such as Waterfall and Agile methodologies
- Strong inter-personnel and communication skills with an ability to grasp new things quickly
- Ability to successfully work under tight deadlines
- Experience in leading small teams with Onshore and Offshore model
- Ability to identify and resolve problems both independently and quickly
- A great team player and ability to effectively communicate with people at all levels of the organization such as technical, management, and customers
TECHNICAL SKILLS
Big Data Framework and EcoSystems: Hadoop, MapReduce, HBase, Hive, Pig, HDFS, Zookeeper, Sqoop, Cassandra, MongoDB, Kafka, Apache Kafka, Oozie, Flume, ElasticSearch 2.x, MRUnit, Spark on Scala
J2EE Technologies: Servlets, JSP, JDBC, JUnit
Languages: Java, Ruby, C, SQL, PL/SQL
ETL Tools: Talend Open Studio, Pentaho Data Integration (PDI /Kettle)
Middleware: Hibernate 3.x
Web Technologies: CSS, HTML, XHTML, AJAX, XML, XSLT
Databases: Oracle 8i/9i/10g, MySQL, MS Access
IDE: Eclipse 3.x, 4.x, Eclipse RCP, NetBeans 6, STS 2.0, EditPlus, Notepad++
Design Methodologies: UML, Rational Rose
Version Control Tools: CVS, SVN
Operating Systems: Windows XP/Vista/7, Linux, UNIX, Cent OS
Tools: Ant, Maven, Putty
PROFESSIONAL EXPERIENCE
Confidential, St. Louis, MO
Sr. Hadoop Developer Lead
Responsibilities:
- Gathered business requirements from the business analysts and subject matter experts
- Involved in installing Hadoop Ecosystem components on 50-nodes production environment
- Installed/ configured/ maintained Hortonworks Hadoop clusters for application development and Hadoop tools like YARN, Hive, Pig, HBase, Zookeeper and Sqoop
- Installed and configured Hadoop security and access controls using Kerberos, and Active Directory
- Responsible for managing data coming from different sources into HDFS viz Sqoop, and Flume
- Responsible for troubleshooting and monitoring Hadoop services using Cloudera Manager
- Monitored and fine-tuned MapReduce programs running on the cluster
- Involved in HDFS maintenance and loading of structured and unstructured data
- Developed several MapReduce programs for data pre-processing
- Loaded data from MySQL to HDFS and vice-versa on regular basis using Sqoop Import and Export commands
- Wrote Hive queries for data analysis to meet the business requirements
- Designed and implemented jobs and transformations. Loaded the data sequentially and in parallel for initial and incremental loads
- Implemented various Pentaho Data Integration steps in cleansing and loading the data as per the business needs
- Configured Pentaho Data integration server to run the jobs in local, remote server and cluster mode
- Prepared System Design document with all functional implementations
- Involved in Data modelling sessions to develop models for Hive tables
- Interpreted the existing enterprise data warehouse set up to understand the design and provided design and architecture suggestions on converting to Hadoop using MapReduce, Hive, Sqoop, Flume and Pig Latin
- Converted existing ETL logic to Hadoop mappings
- Extensive hands on experience in Hadoop file system commands for file handling operations
- Worked on Sequence files, Map-side joins, bucketing, partitioning for hive performance enhancement and storage improvement
- Worked with parsing XML files using MapReduce to extract related attributes and store it in HDFS
- Performed unit testing using MRUnit testing framework
- Involved in building Tbuild scripts to import data from Teradata using Teradata Parallel transport APIs
Environment: CDH 5, Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, XML, Cloudera Manager, Teradata and Pentaho (PDI / Kettle)
Confidential, Denver, CO
Hadoop Developer
Responsibilities:
- Built a scalable distributed data solution using Hadoop to perform analysis on 25+ terabytes of customer usage data using Cloudera Distribution
- Created Pig and Hive UDFs to analyze the complex data to find specific user behavior
- Configured periodic incremental imports of data from DB2 into HDFS using Sqoop
- Worked extensively with importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive
- Used Oozie workflow engine to schedule multiple recurring Hive and Pig jobs
- Created HBase tables to store various formats of data coming from different portfolios
- Created Hive tables to store the processed results in a tabular format.
- Utilized cluster co-ordination services through Zookeeper
- Extensively used HiveQL, Pig Latin and Spark on Scala
- Assisted in cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
- Generated various marketing reports using Tableau with Hadoop as the source for data
- Created relationships, actions, data blending, filters, parameters, hierarchies, calculated fields, sorting, groupings, live connections, and in-memory in Tableau
- Created customized reports using various chart types like text tables, bar, pie, donut chart, funnel charts, heat maps, line charts, area charts, scatter plot, and dual combinations charts in Tableau
- Blended data from multiple databases into one report by selecting primary keys from each database for data validation
- Created high-level dashboards and stories in Tableau for Business and Product owners
- Thorough understanding of ETL tools and how they can be applied in a Big Data environment
- Performed unit testing using MRUnit testing framework
- Involved in troubleshooting, performance tuning of reports and resolving issues within Tableau Server and Reports
Environment: CDH 4, Hadoop, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Zookeeper, PL/SQL, Tableau 8.0.x, Scala, Spark, DB2, Cloudera Manager, UNIX Shell, YARN
Confidential, St. Louis, MO
Java Developer
Responsibilities:
- Worked with Business Analyst and helped representing the business domain details with technical specifications
- Actively involved in setting coding standards and writing related documentation
- Developed the Java Code using Eclipse as IDE
- Developed JSPs and Servlets to dynamically generate HTML and display the data on client side
- Developed applications on Struts MVC architecture utilizing Action Classes, Action Forms and validations
- Tiles were used as an implementation of Composite View pattern
- Was responsible for implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Façade, and Factory Pattern
- Generated few dynamic Tag Libs and implemented MVC design patterns using Java Struts
- Performed code review and debugging using Eclipse Debugger
- Was responsible for developing and deploying the EJB (Session & MDB)
- Configured Queues in WebLogic server where the messages, using JMS API, were published
- Consumed Web Services (WSDL, SOAP, UDDI) from third party for authorizing payments to/ from customers
- Performed unit testing using JUnit testing framework and used Log4j to monitor the error log
- Wrote complex database queries
- Built web applications using Maven as build tool
- Used CVS for version control
Environment: Java/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP, HTML, JavaScript, JMS, Servlets, UML, XML, Eclipse, Struts, Web Services, WSDL, SOAP, UDDI
Confidential, St. Louis, MO
Java/ J2EE Application Developer
Responsibilities:
- Responsible for gathering and analyzing requirements and converting them into technical specifications
- Used Rational Rose for creating sequence and class diagrams
- Developed presentation layer using JSP, Java, HTML and JavaScript
- Used Spring Core Annotations for Dependency Injection
- Designed and developed ‘Convention Based Coding’ utilizing Hibernate’s persistence framework and O-R mapping capability to enable dynamic fetching and displaying of various table data with JSF tag libraries
- Designed and developed Hibernate configuration and session-per-request design pattern for making database connectivity and accessing the session for database transactions respectively. Used SQL for fetching and storing data in databases
- Participated in the design and development of database schema and entity-relationship diagrams of the backend Oracle database tables for the application
- Implemented web services using Apache Axis
- Designed and developed Stored Procedures and Triggers in Oracle to cater the needs for the entire application
- Developed complex SQL queries for extracting data from the database
- Designed and implemented SOAP web service interfaces in Java
- Used Apache Ant for the build process
- Used ClearCase for version control and ClearQuest for bug tracking
Environment: Java, JDK 1.5, Servlets, Hibernate, Ajax, Oracle 10g, Eclipse, Apache Ant, Web Services (SOAP), Apache Axis, Apache Ant, Web Logic Server, JavaScript, HTML, ClearCase, ClearQuest
Confidential
Junior Java Developer
Responsibilities:
- Developed the user interface screens using Swing for accepting various system inputs such as contractual terms, monthly data pertaining to production, inventory and transportation
- Involved in designing database connections using JDBC
- Involved in design and development of UI using HTML, JavaScript and CSS
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL Server 2005, database modification using SQL, PL/SQL, stored procedures, triggers, and views in oracle
- Developed the business components (in core Java) for the calculation module (calculating various entitlement attributes)
- Involved in the logical and physical database design and implemented it by creating suitable tables, views and triggers
- Created the related procedures and functions used by JDBC calls in the above components
- Involved in fixing bugs and minor enhancements for the front-end modules
- Successfully migrated model database from Oracle to DB2
- Created UNIX build script for Enterprise Data Translator
- Effectively used Log4j for logging, Bugzilla for bug tracking and JUnit for unit testing
Environment: Java, HTML, Java Script, CSS, Oracle, JDBC, Swing and Eclipse