Hadoop Developer Resume
Hoffman Estates, IL
SUMMARY
- Around 7 years of professional experience in field of Information Technology that includes analysis, design, development and testing of complex applications.
- Working knowledge on all phases of Software Development Life Cycle (SDLC). Ability to track projects from inception to deployment.
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, DataNode and MapReduce programming paradigm.
- Experience in Hadoop cluster performance tuning by gathering and analyzing the existing infrastructure.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Expert on Hadoop distributions like Cloudera and HortonWorks.
- Expert in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa.
- Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Experience in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Worked on Extending Hive and Pig core functionality by writing custom UDFs.
- Experience working with technologies like Teradata, Apache Tomcat, Apache Solr, and ElasticSearch
- Experience in writing MapReduce Programs and using Apache Hadoop API for analyzing the logs.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Expertise in creating Hadoop Clusters using AWS like Amazon EMR, Amazon EC2 & Amazon S3 bucket.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Excellent Understanding of from Amazon Web Services (AWS) services like EC2, S3, EBS, RDS and VPC.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Experience in writing SQL and PL/SQL scripts & stored procedures for databases like Oracle 9i.
- Ability to quickly ramp up and start producing results on given any tool or technology.
- An individual with excellent communication skills and strong business acumen
- Team player with creative problem solving skills, technical competency and leadership skills.
TECHNICAL SKILLS
Hadoop Ecosystem: MapReduce, HDFS, Hive, Pig, Sqoop, ZooKeeper, Oozie, Flume, HBase, Spark, Hue, Kafka
Language: Python, R, Scala, SQL, PL/SQL
Framework: Spring, Hibernate, Struts, MVC
Methodologies: Agile, Scrum, Waterfall
Databases: Oracle9i, MS SQL server, MySQL, HBase
Application/Web server: Apache Tomcat, Apache Solr, Elastic Search AWS
ETL Tool: Pentaho
Version Controls: SVN, CVS, Visual SourceSafe(VSS)
Operating System: Windows 98/NT/2000/2003/XP/7, Linux
PROFESSIONAL EXPERIENCE
Confidential, Hoffman Estates, IL
Hadoop Developer
Responsibilities:
- Involved in full development cycle of planning, Analysis, Design, Development, Testing and Implementation
- Launched and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster on Linux
- Experience in loading data from UNIX local file system to HDFS
- Developed MapReduce programs in Java for parsing the raw data and populating staging tables
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates
- Experience in loading and transforming large sets of structured, semi structured and unstructured data
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster
- Worked on Job management using Fair scheduler and Developed job processing scripts using Oozie workflow
- Performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
- Consumed the data from Kafka using Apache spark
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data through Sqoop
- Expert in installing Hive, creating Hive tables and performing data manipulations using HiveQL
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
- Worked on Installing and configuring of Zoo Keeper to co-ordinate and monitor the cluster resources
- Worked on POC’s with Apache Spark using scala to implement spark in project
- Collected the logs data from Web Servers and integrated in to HDFS using Flume
Environment: Core Java, Hadoop, Linux, Unix, Hive, HBase, HDFS, Flume, Sqoop, MapReduce programming, Oozie, Spark, Zookeeper, Kafka
Confidential - Norwalk, CT
Hadoop/AWS Developer
Responsibilities:
- Responsible for coding MapReduce program, Hive queries, testing and debugging the Map Reduce programs
- Designed, developed and maintained data integration programs in Hadoop and RDBMS environment with both RDBMS and NoSQL data stores for data access and analysis
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Designed and developedPentahojobs and transformations to load data into dimensions and facts
- Created Hive tables, loaded the data and Performed data manipulations usng Hive queries in Mapreduce Execution Mode
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
- Consumed the data from Kafka using Apache spark
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters
- Good experience in handling data manipulation using python Scripts
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) using Python
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS
- Experience working with Apache Solr and Elastic Search
- Implemented test scripts to support test driven development and continuous integration
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data
Environment: Hadoop, Spark, Kafka, Apache Solr, Elastic Solr, AWS, Python, MapReduce, HDFS, Hive, Java (jdk1.6), Oozie, Pentaho.
Confidential
SQL Developer
Responsibilities:
- Involved in development of Software Development Life Cycle (SDLC) and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail in design phase
- Created new tables, views, indexes and user defined functions.
- Performed daily database backup & restoration and monitor the performance of Database Server.
- Actively designed database to fasten certain daily jobs and stored procedures.
- Optimized query performance by creating indexes.
- Developed Stored Procedures, Views to be used to supply data for all reports. Complex formulas were used to show derived fields and to format data based on specific conditions.
- Involved in Administration of SQL Server by creating users & login ids with appropriate roles & grant privileges to users and roles. Worked on authentication modules to provide controlled access to users on various modules
- Created joins and sub-queries for complex queries involving multiple tables.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update tables to implement business logic.
- Responsible for report generation using SQL Server Reporting Services (SSRS) and Crystal Reports based on business requirements.
- Developed complex SQL queries to perform efficient data retrieval operations including stored procedures, triggers etc.
- Designed and Implemented tables and indexes using SQL Server.
Environment: Eclipse, Java/J2EE, Oracle, HTML, PL/SQL, Oracle, XML, SQL
Confidential
Programmer Analyst
Responsibilities:
- Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and delete data in MS SQL database tables.
- Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database.
- Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
- Responsible for designing advance SQL queries, Cursor and Triggers.
- Build data connection to database using MS SQL server.
- Worked on project to extract data from XML file to SQL table and generate file reporting using SQL server 2008.
- Utilized Tomcat webserver for development purpose.
- Involved in creation of test cases and performing unit testing.
Environment: PL/SQL, My SQL, SQL Server 2008(SSRS & SSIS), Visual studio 2000/2005, MS Excel
Confidential
Java/J2EE Developer
Responsibilities:
- Experience in requirements gathering, Analysis, Design and Testing phases.
- As part of the Design phase, designed state, class, and sequence diagrams using Astah Professional.
- Experience in working with Scrum Methodologies.
- Coded Struts Action classes and Model classes.
- Developed DAO classes using JDBC API and wrote SQL queries to interact with Oracle Database.
- Handled all bug fixes and enhancements.
- Hands on experience on JUnit framework and EasyMock.
- Utilized Log4j for logging and Putty tool to check the server logs.
- Used SoapUI tool to invoke the Web services.
- Experience in working on Apache ANT as build tool and CVS as repository.
- Used ANT as a build tool and developed Buildfile.
Environment: Java 1.5, J2EE, Struts 1.2, JavaScripts, JDBC, Log4j, SOAP, JUnit, WebSphere.