Hadoop Developer Resume
Bethesda, MD
SUMMARY:
- Overall 9+ years of professional IT experience as a software developer with a background in analysis, development, integration and testing of applications.
- 3 years of experience as a Certified Hadoop Developer and Big Data analyst.
- Solid expertise in the workings of Hadoop internals, architecture and supporting ecosystem components like Hive, Big SQL, Sqoop, Pig, Flume and Oozie.
- Experience on IBM InfoSphere BigInsights, InfoSphere Streams and Cloudera Distributions.
- Apart from developing on the Hadoop ecosystem, also have good experience in installing and configuring of the Cloudera’s distribution and IBM BigInsights (2.X.X and 3.X.X).
- Good experience with setting up and configuring a Hadoop cluster on Amazon web Services (EC2) on clusters of nodes running on CentOS 5.4, 6.3 and RHEL.
- Adept at HiveQL and have good experience of partitioning (time based), dynamic partitioning and bucketing to optimize Hive queries. Also used Hive’s MapJoin to speed up the queries when possible.
- Used Hive to create tables in both delimited text storage format and binary storage format.
- Have excellent working experience in using the two popular Hadoop binary storage formats Avro datafiles and Sequence files.
- Created Pig Latin scripts made up of series of operations and transformations that were applied to the input data to produce the required output.
- Good experience with the range of Pig functions like Eval, Filter, Load and Store functions.
- Good working experience using Sqoop to ingesting data into HDFS from RDBMS and vice - versa. Also have good experience in using the Sqoop direct mode with external tables to perform very fast data loads.
- Good working experience using Discp to ingesting data from Landing zone to HDFS.
- Good knowledge on ingesting log data into HDFS using Flume.
- Used OOZIE engine for creating workflow and coordinator jobs that schedule and execute various Hadoop jobs such as MapReduce jobs, Hive, Pig and Sqoop operations.
- Good knowledge on Statistics, R and Python.
- Experienced in developing JAVA Map Reduce Programs using Apache Hadoop for analyzing data as per the requirement.
- Solid experience writing complex SQL queries. Also experienced in working with NOSQL databases like Cassandra 2.1.
- Experienced in creative and effective front-end development using JSP, JavaScript, HTML 5, DHTML, XHTML Ajax and CSS
- Expert level skills in programming with Struts Framework, Custom Tag Libraries and JSTL.
- Good experience with Hibernate and JPA for object mapping with database. Configured xml files for mapping and hooking it with other frameworks like Spring, Struts
- Working knowledge of database such as Oracle 8i/9i/10g,MySQL
- Have extensive experience in building and deploying applications on Web/Application Servers like Weblogic, Websphere, and Tomcat
- Experience in Building, Deploying and Integrating with Ant, Maven
- Experience in development of logging standards and mechanism based on Log4J
- Strong work ethic with desire to succeed and make significant contributions to the organization
- Complementing my technical skills are my solid communication skills.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Serivces, EMR, MRUnit, Spark, Storm, Greenplum, Datameer, Language R, Ignite.
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE s: Eclipse, Net beans
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C,C++, Java, Python, Ant scripts, Linux shell scripts, R, Perl
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, MongoDB, Couch DB. Graph DB
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools: Informatica, IBM Infosphere, Qlikview and Cognos
PROFESSIONAL EXPERIENCE:
Confidential, Bethesda, MD
Hadoop Developer
Responsibilities:
- Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Involved in ETL, Data Integration and Migration.
- Hands on experience in COGNOS 10.x/8.x Suite (Framework Manger, Cognos Transformer, COGNOS Connection, Report Studio, Query Studio, Business Insight, Advanced Business Insight, Analysis Studio and Event Studio) with expertise in Metadata Modeling-Create project, Prepare Metadata, Prepare the Business View, Create and Manage packages, Set security and publish into portal.
- Having Work Experience in Cognos Life Cycle Manger while migrating from Cognos 8.4 to Cognos 10.
- Responsible for managing data from multiple source.
- Testing Hadoop Framework using MRUnit.
- Developed the Pig and Hive queries as well as UDF'S to pre-process the data for analysis.
- Importing and exporting data into HDFS and Hive using Flume.
- Developer of data quality monitoring and systems software in Python with Flask, coding in Python working on news content systems and infrastructure.
- Cluster co-ordination services through ZooKeeper.
- Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
- Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
- Applied Hive quires to perform data analysis on HBase using Storage meet the business requirements.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Hands on experience with NoSQL databases like HBase and Cassandra and Amazon Web Services.
- Used different file formats like Text files, Sequence Files, Avro etc.
- Installed and configured Hadoop, Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Flume, HBase, Oozie, Cassandra, Java, Zookeeper, My SQL, Cognos.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using IBM BigInsights.
- Created and populated bucketed tables in Hive to allow for faster mapside joins and for more efficient jobs and more efficient sampling. Also performed partitioning of data to optimize Hive queries.
- Experienced in loading data from landing zone to HDFS using DistCp.
- Experienced in working with various kinds of data sources such as DB2 and Oracle Successfully loaded files to HDFS from DB2, subsequently loaded the data from hdfs to hive tables.
- Handled importing of data from Oracle 10g to Hive tables using Sqoop on a regular basis, later performed join operations on the data in the Hive tables using Big SQL.
- Develop User defined functions in Hive to work on multiple input rows and provide an aggregated result based on the business requirement.
- Developed a MapReduce job to perform lookups of all entries based on a given key from a collection of MapFiles that were created from the data.
- Performance increments to the Hadoop job by using CombineFileInputFormat to make sure maps have sufficient data to process when there is a large number of small files. Also packaged a collection of small files into a SequenceFile which was used as input to the MapReduce job.
- Used the Webconsole to gain insight into the Mapreduce programs CPU usage and heap usage.
- Implemented compression of Map output to reduce the data being written to the disk and also to reduce the data being transferred across the network to the reducer nodes. LZO fast compression was used.
- Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
- Continuous monitoring and managing the Hadoop cluster using Web Console
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Experienced in Oozie workflow engine to run multiple Hive and Pig jobs
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Environment: IBM BigInsights2.0 to3.0, InfoSphere Streams, Sqoop, Big SQL, Flume, Oozie, Teradata, Oracle 10g, Java (jdk 1.6), Eclipse
Confidential, Milwaukee, WI
Hadoop developer
Responsibilities:
- Collected the business requirement from the subject matter experts like data scientists and business partners.
- Installing and configuration of HADOOP Cluster and Hadoop environment components.
- Installed and configured Cassandra. In depth knowledge about Cassandra architecture, query, read and write path.
- Involved in NoSQL (DataStax Cassandra) database design, integration and implementation.
- Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
- Hands on experience in loading data from UNIX file system to HDFS. Also performed parallel transfer of data from landing zone to the HDFS file system using DistCp.
- Experienced on loading and transforming of large sets of structured and semi structured data from HDFS through Sqoop and placed in HDFS for further processing.
- Developed PIG scripts to arrange incoming data into suitable and structured data before piping it out for analysis.
- Designed appropriate partitioning/bucketing schema to allow faster data retrieval during analysis using HIVE.
- Involved in processing the data in the Hive tables using Impala high-performance, low-latency SQL queries.
- Transferred the analyzed data across relational database from HDFS using Sqoop enabling BI team to visualize analytics.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Extensive working knowledge of partitioned table, UDFs, performance tuning, compression-related properties in Hive.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
- Working knowledge in writing Pig’s Load and Store functions
Environment: Hadoop, MapReduce, HDFS, CentOS 6.2, Hbase, Hive, Pig, Oozie, Flume, Java (jdk 1.6), Eclipse
Confidential, Burlington, MA
Java & J2EE developer
Responsibilities:
- Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
- Responsible for analysis and design of the application based on MVC Architecture, using open source Struts Framework.
- Involved in configuring Struts, Tiles and developing the configuration files.
- Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
- Developed and deployed UI layer logics using JSP, XML, JavaScript, HTML /DHTML.
- Involved in Configuring web.xml and struts-config.xml according to the struts framework.
- Provided connections using JDBC to the database and developed SQL queries to manipulate the data.
- Developed DAO using spring JDBC Template to run performance intensive queries.
- Developed ANT script for auto generation and deployment of the web service.
- Wrote stored procedure and used JAVA APIs to call these procedures.
- Developed various test cases such as unit tests, mock tests, and integration tests using the JUNIT.
- Experience writing Stored Procedures, Functions and Packages.
- Used log4j to perform logging in the applications.
Environment: Java, J2EE, Struts MVC, JDBC, JSP, JavaScript, HTML, Ant, Web sphere Application Server, Oracle, JUNIT and Log4j, Eclipse
Confidential
Java & J2EE developer
Responsibilities:
- Implemented various J2EE design patterns for designing this application.
- Design patterns of Business Delegates, Service Locator and DTO are used for designing the web module of the application.
- Used Factory, Singleton design patterns for implementing enterprise modules/DTO's.
- Developed the Web Interface using Struts, Java Script, HTML and CSS.
- Extensively used the Struts controller component classes for developing the applications.
- Extensively used the struts application resources properties file for error codes, views labels and for Product Internationalization.
- Used RAD (Rational Application Developer 7.0) as a Development platform
- Struts 1.2 has provided its own Controller component and integrates with other technologies to provide the Model and the View for the Model, used Struts to interact with standard data access technologies, like JDBC and EJB.
- JavaBeans were used to store in a number of different collections of "attributes". The JavaServer Pages (JSP) Specification defines scope choices.
- Struts Framework provided the functionality to validate the form data. It's used to validate the data on the user’s browser as well as on the server side. Struts Framework emits the java scripts and it's used to validate the form data on the client browser.
- Used JavaScript for the web page validation and Struts Validator for server side validation of data.
- Consumed webservices using Axis Webservices.
- Used JDBC and Hibernate to connect to the database using Oracle.
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Involved in developing database specific data access objects (DAO) for Oracle.
- Used CVS for source code control and JUNIT for unit testing.
- The entire Application is deployed in WebSphere Application Server.
Environment: Java, J2EE, Struts MVC, JDBC, JSP, JavaScript, HTML, Web sphere Application Server, Oracle, JUNIT and Log4j