Hadoop Developer Resume
New York City, NY
PROFESSIONAL SUMMARY:
- Over 7 years of extensive software development experience in full life cycle development which includes more than 3+ years of experience as Hadoop Developer focusing on various Big Data Technologies.
- Extensive experience in development of Big Data projects using Hadoop, Map reduces, Pig, Hive, Sqoop, Flume, Oozie, Kafka and Cassandra.
- Experience in installation, configuration, supporting and managing Hadoop clusters.
- Good experience working with Horton works Distribution and Cloudera Distribution.
- Implemented standards and processes for Hadoop based application design and implementation.
- Responsible for writing Map Reduce programs using Java.
- Logical Implementation and interaction with HBase.
- Developed Map Reduce jobs to automate transfer of data from HBase.
- Performed data analysis using Hive and Pig.
- Loaded streaming log data from various webservers into HDFS using Flume.
- Successfully loaded files to Hive and HDFS from Oracle and SQL Server using SQOOP.
- Assist with the addition of Hadoop processing to the IT infrastructure.
- Worked in Multiple Environment in installation and configuration.
- Document and explain implemented processes and configurations in upgrades.
- Support development, testing, and operations teams during new system deployments.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- Experience in using Sqoop, Oozie and Cloudera Manager.
- Strong understanding of Data warehouse concepts, ETL, Star Schema, Snowflake, data modeling experience using Normalization, Business Process Analysis, Reengineering, Dimensional Data modeling, physical & logical data modeling.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Implemented stand - alone installation, file system management, backups, process control, user administration and device management in a networked environment.
- Developed core modules in large cross-platform applications using JAVA, J2EE, spring, Struts, Hibernate, JAX-WS Web Services, and JMS.
- Worked on debugging tools such as Dtrace, Struss and Top. Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Working with relative ease with different working strategies like Agile, Waterfall and Scrum methodologies
- An excellent team player and self-starter with good communication skills and proven abilities to finish tasks before target deadlines.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, and ZooKeeper.
No SQL Databases: Hbase, Cassandra, mongoDB
Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
PROFESSIONAL EXPERIENCE:
Confidential, New York City, NY
Hadoop Developer
Responsibilities:
- Developed MapReduce programs in Java for Data Analysis.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed HQL for the analysis of semi structured data.
- Handle the installation and configuration of a Hadoop cluster.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and Cassandra instead of HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Handle the data exchange between HDFS and different web sources using Flume and Sqoop
- Install Kafka on Hadoop cluster and configure producer and consumer coding part in java to establish connection
- From twitter source to HDFS with popular hash tags
- Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance of Mapreduce jobs.
- Written Oozie workflow scripts to run multiple MR,Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations.
- Environment: Hadoop (CDH), MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Oozie, Java, SQL,Eclipse, Kafka, Cassandra.
Confidential, Bluebell, PA
Hadoop Developer
Responsibilities:
- Responsible for installing and configuring Hadoop MapReduce, HDFS, also developed various MapReduce jobs for data cleaning
- Installed and configured Hive to create tables for the unstructured data in HDFS
- Hold good expertise on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Sqoop and Flume.
- Involved in loading data from UNIX file system to HDFS
- Responsible for managing and scheduling jobs on Hadoop Cluster
- Responsible for importing and exporting data into HDFS and Hive using Sqoop
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data
- Experienced in managing Hadoop log files
- Worked on managing data coming from different sources
- Wrote HQL queries to create tables and loaded data from HDFS to make it structured and
- Load and transform large sets of structured, semi structured and unstructured data
- Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view the data for further analysis
- Created Hive tables, loaded them with data and wrote hive queries that run internally in MapReduce way
- Wrote and modified store procedures enabling to load and modify data according to the project requirements
- Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS
- Extensively used Flume to collect the log files from the web servers and then integrated these files into HDFS
- Responsible for implementing schedulers on Job Tracker enabling them to effectively use the resources available in the cluster for any given MapReduce jobs.
- Constantly worked on tuning the performance of the queries in Hive and Pig, making the queries work even more powerfully in processing and retrieving the data
- Supported Map Reduce Programs running on the cluster
- Created external tables in Hive and loaded the data into these tables
- Hands on experience in database performance tuning and data modeling
- Monitored the cluster coordination using ZooKeeper
Environment: Hadoo Hadoop, HDFS, MapReduce, Hadoop distribution of HortonWorks, Hive, Cloudera, MapR, Java (jdk1.6), DataStax, Flat files, UNIX Shell Scripting, Oracle 11g 10g, PL SQL, SQL*PLUS, Toad 9.6, Windows NTp, HDFS, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat.
Confidential, Cincinnati, OH
Hadoop Developer
Responsibilities:
- Involved in capacity planning Big Data platform
- Developed Map-Reduce programs in Java for data cleaning and pre-processing
- Created Map Reduce jobs for data transformations and data parsing
- Created Hive scripts for extracting the summarized information from hive tables
- Created Hive UDFS to extract data from staging tables
- Involved in creating Hive tables, loading data and querying data
- Design and implementation of High Availability feature for Search Engine
- Volume testing to calculate cluster's throughput
- Helped the team to increase the Cluster size from 22 to 30 Nodes
- Solution architecture for proposals in Web technologies, log file analysis of storage equipment data
- Maintain System Integrity of all sub-components (primarily HDFS, MR)
- Monitor System health, logs and respond accordingly to any warnings or failures
- Unit testing, Volume testing and Bug fixing
- Co-ordination with Client and offshore counterparts
Environment: JDK1.6, CentOS, FLUME, HBase, Maven, Map-Reduce, Hadoop, Hive, Pig, Sqoop, Oozie, Zookeeper
Confidential - Raleigh, NC
Java/J2ee Developer
Responsibilities:
- Involved in Presentation Tier Development using JSF Framework and ICE Faces tag Libraries.
- Involved in business requirement gathering and technical specifications.
- Implemented J2EE standards, MVC2 architecture using JSF Framework.
- Implementing Servlets, JSP and Ajax to design the user interface.
- Extensive experience in building GUI (Graphical User Interface) using JSF and ICE Faces.
- Developed Rich Enterprise Applications using ICE Faces and Portlets technologies.
- Experience using ICE Faces Tag Libraries to develop user interface components.
- Used JSF, JSP, Java Script, HTML, and CSS for manipulating, validating, customizing, error messages to the User Interface.
- Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests.
- All the Business logic in all the modules is written in core Java.
- Wrote WebServices using SOAP for sending and getting data from the external interface.
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
- Middleware Services layer is implemented using EJB (Enterprise Java Bean - stateless) in WebSphere environment.
- Used Design patterns such as Business delegate, Service locator, Model View Controller, Session façade, DAO.
- Funds Transfers are sent to another application using JMS technology asynchronously.
- Involved in implementing the JMS (Java messaging service) for asynchronous communication.
- Involved in writing JMS Publishers to post messages.
- Involved in writing MDB(Message Driven Beans) as subscribers.
- Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle
- Interaction with Oracle database is implemented using Hibernate.
Environment: J2EE, EJB, JSF, ICE Faces, EJB, WebServices, XML, XSD, Agile, Microsoft Visio, Clear Case, Oracle 9.i/10.g, Weblogic8.1/10.3,RAD, LOG4j, Servlets, JSP, Unix.
Confidential, Pittsburgh, PA
J2EE Developer
Responsibilities:
- Developed detail design document based on design discussions.
- Involved in designing the database tables and java classes used in the application.
- Involved in development, Unit testing and system integration testing of the travel network builder side of application.
- Involved in design, development and building the travel network file system to be stored in NAS drives.
- Setup Linux environment for to interact with route smart library (.so) file and NAS drive file operations using JNI.
- Implemented and configure Hudson as Continuous Integration server and Sonar for maintaining code and remove redundant code.
- Worked with Route-smart C++ code to interact with Java application using SWIG and Java Native interfaces.
- Developed the user interface for requesting a travel network build using JSP and Servlets.
- Build business logic to users can specify which version of the travel network files to be used for the solve process.
- Used Spring Data Access Object to access the data with data source.
- Build an independent property sub-system to ensure that the request always picks the latest set of properties.
- Implemented thread Monitor system to monitor threads. Used JUnit to do the Unit testing around the development modules.
- Wrote SQL queries and procedures for the application, interacted with third party ESRI functions to retrieve map data.
- Building and Deployment of JAR, WAR, EAR files on dev, QA servers.
- Bug fixing (Log 4j for logging) and testing support after the development.
- Prepared requirements and research to move the map data using Hadoop framework for future usage.
Environment: Java 1.6.21, J2EE, Oracle 10g, Log4J 1.17, Windows 7 and Red Hat Linux, Sub version, Spring 3.1.0, Icefaces 3, ESRI, Weblogic 10.3.5, Eclipse Juno, Junit 4.8.2, Maven 3.0.3, Hudson 3.0.0 and Sonar 3.0.0
Confidential
Java Developer
Responsibilities:
- Technical responsibilities included high level architecture and rapid development.
- Design architecture following J2EE MVC framework.
- Developed interfaces using HTML, JSP pages and Struts -Presentation View.
- Involved in designing & developing web-services using SOAP and WSDL.
- Developed and implemented Servlets running under JBoss.
- Used J2EE design Patterns for the Middle Tier development .
- Used J2EE design patterns and Data Access Object (DAO) for the business tier and integration Tier layer of the project.
- Created UML class diagrams that depict the code’s design and its compliance with the functional requirements.
- Developed various EJBs for handling business logic and data manipulations from database.
- Designed and developed the UI using Struts view component, JSP, HTML, CSS and JavaScript .
- Implemented CMP entity beans for persistence of business logic implementation
- Development of database interaction code to JDBC API making extensive use of SQL Query Statements and advanced prepared statement.
- Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
- Inspection/Review of quality deliverables such as Design Documents.
- Involved in creation running of Test Cases for JUnit Testing.
- Experience in implementing Web Services using SOAP, REST and XML/HTTP technologies.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Wrote SQL Scripts, Stored procedures and SQL Loader to load reference data.
Environment: J2EE (Java Servlets, JSP, Struts), MVC Framework, Apache Tomcat, JBoss, Oracle8i.