Hadoop Developer Resume
New York, CitY
SUMMARY:
- Senior Software Engineer having 7+ years of professional IT experience with 3+ years of Big Data Ecosystem experience in ingestion, storage, querying, processing and analysis of big data.
- Over three years of experience in design, development, maintenance and support of Big Data Analytics using Hadoop Ecosystem com components like HDFS, Hive, Pig, Hbase, Sqoop, Flume, Zookeeper, Spark,MapReduce, and Oozie.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate - wide ETL Solutions and Data warehouse tools for reporting and data analysis
- Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from the clients. Leading the Team and involved in Onsite, Offshore co-ordination
- Experience includes Requirements Gathering/Analysis, Design, Development, versioning, Integration, Documentation, Testing, Build and Deployment
- Experience in working with Map Reduce programs, Pig scripts and Hive commands to deliver the best results
- Good Knowledge in creating event processing data pipelines using Kafka and Storm
- Familiar with (NoSQL) big-data database HBase and familiar with storage concepts like MongoDB and Cassandra
- Good Knowledge and experience with the Hive Query optimization and Performance tuning.
- Hands on experience in writing Pig Latin Scripts and custom implementations using UDF'S.
- Experience in supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud. Performed Export and import of data into S3.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in using Flume to load log files into HDFS and Oozie for data scrubbing and process.
- Worked on developing, monitoring and Jobs Scheduling using UNIX Shell Scripting.
- Experienced in installing, configuring, and administrating Hadoop cluster.
- Hands-on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
- Experience in tuning of Hadoop Cluster to achieve good performance in processing.
- Experience in upgrading the existing Hadoop cluster to latest releases.
- Experience in Data Integration between Pentaho and Hadoop.
- Experienced in using NFS (network file systems) for Name node metadata backup.
- Well trained in Problem Solving Techniques, Operating System Concepts, Programming Basics, Structured Programming and RDBMS.
- Technical professional with management skills, excellent business understanding and strong communication skills.
TECHNICAL SKILLS:
Language: C, C++, Java, J2EE, Python, UML
Hadoop EcoSystems: MapReduce, Sqoop, Hive, Pig, HBase, HDFS, Zookeeper,Spark
Application/Web server: Apache Tomcat, WebSphere, JBoss, Web Logic
Event Processing: Kafka,Storm
NoSQL Databases: Cassandra, MongoDB.
Databases: Oracle9i, MySQL
Cloud Computing: Amazon Web Services (EC2, EMR, S3, RDS)
Web/Application Servers: JBOSS 4.2.3, Tomcat 5x/6x
Build and Management Tools: Maven, Ant, Jenkins
Databases Tools: TOAD,SQL Developer, SQL Workbench
Web Technologies: JQuery, Applets, JavaScript, CSS, HTML, XHTML, AJAX, XML, XSLT
Development Tools: Eclipse, Putty,NetBeans
PROFESSIONAL EXPERIENCE:
Confidential - New York City
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Hive and Spark
- Prepared Linux shell scripts for automating the process.
- Implemented Impala for data analysis.
- Implemented Storm topologies to pre-process data before move into HDFS system.
- Developed Map Reduce programs to parse the raw data and store the pre aggregated data in the portioned tables.
- Installed and Configured Hadoop cluster using Amazon Web Services(AWS) for POC purposes.
- Load and transform large sets of Structured, semi structured, and unstructured data with Map Reduce, hive and Pig.
- Developed Sqoop commands to pull the data from Teradata.
- Implemented Hive complex UDF’s to execute business logic with Hive queries
- Importing log files using Flume into HDFS and load into Hive tables to query data.
- Developed Oozie workflow engine for job scheduling.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
Environment: Cloudera CDH5, HDFS, Hadoop 2.2.0(yarn), Eclipse, Hive, Impala,PIG Latin, Spark,Sqoop, Zookeeper, Apache Kafka, Apache Storm, MySQL.
Confidential - New York
Hadoop Developer
Responsibilities:
- Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also logs data from servers.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Developed UDFs in Pig and Hive
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
- Worked with BI teams in generating the reports on Tableau
- Installed and configured various components of Hadoop ecosystem and maintained their integrity.
- Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Upgraded Hadoop Versions using automation tools.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover Zookeeper and zookeeper failover controller.
Environment: Amazon Web Services, Hadoop, MapReduce, Kafka, Storm, Oozie, Sqoop, Hive, Pig, Zoo Keeper, Tableau and Oracle.
Confidential
Hadoop Developer
Responsibilities:
- Developed MapReduce jobs in java for data cleansing and preprocessing.
- Moving data from DB2, Oracle Exadata to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked with different file formats and compression techniques to determine standards
- Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF’S to extend the ETL functionality.
- Developed hive queries and UDFS to analyze/transform the data in HDFS.
- Developed hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Developed Pig scripts and UDF’s as per the Business logic.
- Developed user defined functions in Pig.
- Analyzing/Transforming data with Hive and Pig .
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Implemented Hadoop Float equivalent to the DB2 Decimal.
- Involved in End to End implementation of ETL logic.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities, test data creation and Unit testing activities.
Environment: Hadoop, MapReduce, Sqoop, Hive, Pig Latin, Zoo Keeper, Oozie, Teradata and Eclipse.
Confidential
Sr. Java Developer
Responsibilities:
- Key responsibilities include requirements gathering, designing and developing the Java application.
- Implemented design patterns and Object Oriented Java design concepts to build the code.
- Participated in planning and development of UML diagrams like Use Case Diagrams, Object Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
- Identified and fixed transactional issues due to incorrect exceptional handling and concurrency issues due to unsynchronized block of code.
- Created Java application module for providing authentication to the users for using this application and to synchronize handset with the Exchange server.
- Performed unit testing, system testing and user acceptance test.
- Involved in Analysis, Design, Coding and Development of custom Interfaces.
- Gathered requirements from the client for designing the Web Pages.
- Gathered specifications for the Library site from different departments and users of the services.
- Assisted in proposing suitable UML class diagrams for the project.
- Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers.
- Designed and implemented the UI using HTML and Java.
- Strong knowledge on MVC design pattern.
- Worked on database interaction layer for insertions, updating and retrieval operations on data.
- Implemented Multi-threading functionality using Java Threading API.
Environment: Java, JDBC, HTML, SQL, Oracle, IBM Rational Rose, Eclipse IDE, LDAP
Confidential
Software Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Designed tables and indexes.
- Wrote complex SQL and stored procedures.
- Involved in fixing bugs and unit testing with test cases using Junit.
- Study and Analyze 1404 UNIX scripts schedule in Crontab to determine the migration plan from the 46 databases schemas, candidate applications shall be analyzed.
- Study and Analyze integration of 1404 jobs in Control M in infodev environment.
- Study and Analyze integration of SFTP implementation for these 1404 jobs.
- Define an approach for identifying the migration path for existing Unix jobs schedule in crontab to Control-M scheduler
- Perform script migration of all jobs that are part of crontab and migration of selected scripts from FTP to SFTP i.e. secure file transfer protocol.
- A tool and template based approach will be used to perform Migration
Environment: Java 1.6, UNIX Shell Scripting, MS SQL Server, Eclipse, Putty and Win-SCP.