Hadoop Developer Resume
New York City, NY
PROFESSIONAL SUMMARY
- Around 7 years of experience in various IT sectors such as banking, health - care, and financial services, which includes hands-on experience in Big Data technologies.
- 3 years of experience as a Hadoop Developer in all phases of Hadoop and HDFS development.
- Hands on experience with HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Oozie, Hbase, Zookeeper and Sqoop).
- Experience in Data warehousing, Data Extraction, Transformation and loading (ETL) data from various sources like Oracle, Teradata, DB2, Microsoft Excel and Flat files into Data Warehouse and Data Marts using Informatica Power Center.
- Well versed with developing and implementing MapReduce jobs using Hadoop to work with Big Data.
- Has experience with Spark processing Framework such as Spark and Spark Sql.
- Experience in NoSQL databases like HBase, MongoDB
- Experience on Microservices, Spring boot and Spring cloud.
- Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
- Experienced in writing custom UDFs and UDAFs for extending Hive and Pig core functionalities.
- Ability to develop Pig UDF'S to pre-process teh data for analysis.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS), Teradata and vice versa.
- Skilled in creating workflows using Oozie for cron jobs.
- Strong experience in Hadoop Administration and Linux.
- Experienced with Java API and REST to access HBase data.
- Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Hands on experience in PERL Scripting and Python.
- Experience on AWS EC2, EMR, LAMBDA and Cloud Watch
- Experience working with JAVA, J2EE, JDBC, ODBC, JSP, Java Eclipse.
- Extensive experience with SQL, PL/SQL and database concepts.
- Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge in Oracle 11g and SQL.
- Good experience working with Distributions such as MAPR, Horton works and Cloudera.
- Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
- Having good knowledge on Hadoop Administration like Cluster configuration, Single Node Configuration, Multi Node Configuration, Data Node Commissioning and Decommissioning, Name Node Backup and Recovery, HBase, HDFS and Hive Configuration, Monitoring clusters, Access control List.
- Good Inter personnel skills and ability to work as part of a team. Exceptional ability to learn, master new technologies and to deliver outputs in short deadlines.Delivery Assurance - Quality Focused & Process Oriented:
- Ability to work in high-pressure environments delivering to and managing stakeholder expectations.
- Application of structured methods to: Project Scoping and Planning, risks, issues, schedules and deliverables.
- Strong analytical and problem-solving skills.
- Good Inter personnel skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
TECHNICAL SKILLS:
Technology: Hadoop Ecosystem /Spring Boot/Microservices/AWS /J2SE/J2EE/Oracle
Operating Systems: WindowsVista/XP/NT/2000Series, UNIX/LINUX (Ubuntu, CentOS, Redhat).
DBMS/Databases: DB2, My SQL, SQL, PL/SQL
Programming Languages: C, C++, JSE, XML, Spring, HTML, JavaScript, jQuery, Web services.
Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive/Impala, Pig, Sqoop, Zookeeper and HbaseSpark, Scala:
Methodologies: Agile, Water Fall
NOSQL Databases: Mongo DB, Hbase
Version Control Tools: SVN, CVS, VSS, PVCS
WORK EXPERIENCE:
Confidential, New York City, NY
Hadoop Developer
Responsibilities:
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on teh customer, transaction data by date.
- Migrated existing java application into microservices using spring boot and spring cloud.
- Working knowledge in different IDEs like Eclipse,Spring Tool Suite.
- Collected teh required data from various sources, manipulated them by using SQL, performed Ad hocdata analysis.
- Hands on Experience in creating Api Proxies inApigeeEdge using NodeJS, Java Script.
- Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Good knowledge inDataModeling,OLAP/OLTPSystems, generation of Surrogate Keys
- DevelopedSparkcode using Scala andSpark-SQL/Streaming for faster testing and processing of data.
- Worked as a part of AWS build team.
- Implemented read preferences inMongoDBreplica set.
- Design LogicalDatawarehousemodels and specific DataMart’s to house teh data from different applications dat meet teh reporting needs required from teh Financial Data warehouse.
- Create, configure and managing S3 bucket(storage).
- UtilizedRfor modeling and visualization.
- Skilled in deploying, configuring and administeringSplunk clusters.
- Developed customized Shell scripts in order to install, manage, configure multiple instances ofSplunkforwarders, indexers, search heads, deployment servers.
- Developed customized application configurations inSplunk to parse, index multiple types of log format across all application environments.
- Import data from Excel file, flat file, xml file into SQL Server using ETL tool SSIS; Import clinical data from database, flat file, Excel file etc intoSAS Python,Rand PowerShell, Tableau and Power BI etc.
- Experience on AWS EC2, EMR, LAMBDA and Cloud Watch.
- Import teh data from different sources like HDFS/Hbase intoSparkRDD.
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running teh scripts in parallel to reduce run-time of teh scripts.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
- Involved in performance of troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries dat run within teh map.
- Implemented business logic by writing Hive UDFs in Java.
- Developed Shell scripts and some of Perl scripts based on teh user requirement.
- Wrote XML scripts to build OOZIE functionality.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Built Data set, Lens and visualization charts/graphs in teh PLATFORA environment.
- Evaluated suitability of Hadoop and its ecosystem to teh above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from teh Big Data Hadoop initiative.
Environment: Map Reduce, HDFS, Spring Boot, Microservices, AWS, Hive, Pig, SQL, Sqoop, Oozie, Shell scripting, Cron Jobs, Apache Kafka, J2EE.
Confidential, Boston
Hadoop Developer
Responsibilities:
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on teh Hadoop cluster.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on teh customer, transaction data by date.
- Build microservices for teh delivery of software products across teh enterprise.
- Develop strategy for integrating internal security model into new projects with Spring Security and Spring Boot.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- DevelopedSparkcode using Scala andSpark-SQL/Streaming for faster testing and processing of data.
- Import teh data from different sources like HDFS/Hbase intoSparkRDD.
- Wrote Python script to provide dataobjects as per requirements ofdata analytics engine, pullingdata from MongoDB and MySQL DBs
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis
- Create Process to ensure Data Quality checks on teh data dat is processed in teh ETL and loaded into theDatawarehouseprior to teh hand-off of teh reporting process.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Installing, Upgrading and Managing Hadoop Clusters
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
- Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running teh scripts in parallel to reduce run-time of teh scripts.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Performed CRUD operations like Update, Insert and Delete data in MongoDB.
- Experience in deploying, managing and developingMongoDBclusters on Linux and Windows environment
- Worked onMongoDBdatabase design and indexing techniques.
- Good knowledge onMongoDBwrite concern majority.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Experience on Amazon Web Services(AWS), Amazon Cloud Services like Elastic Compute Cloud(EC2), Simple Storage Service(S3), Elastic Map Reduce(EMR) Amazon Simple DB, Amazon Cloud Watch, SNS, SQS, LAMBDA.
- Evaluated suitability of Hadoop and its ecosystem to teh above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from teh Big Data Hadoop initiative.
Environment: Spring Boot, Microservices, AWS, Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, Shell scripting, Cron Jobs, Perl scripting, Apache Kafka, J2EE.
Confidential
Java and Hadoop Developer
Responsibilities:
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on teh Hadoop cluster.
- Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system
- Design LogicalDatawarehousemodels and specific DataMarts to house teh data from different applications dat meet teh reporting needs required from teh Data warehouse.
- End to End ETL development and ETL to Hadoop for PBM Data Conversion Project.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on teh customer, transaction data by date.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developed Spark scripts by using Scala Shell commands as per teh requirement.
- Developed and implemented core API services using Scala and Spark.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running teh scripts in parallel to reduce run-time of teh scripts.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Installing, Upgrading and Managing Hadoop Clusters
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
- Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries dat run within teh map.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Processed teh source data to structured data and store in NoSQL database Cassandra.
- Created alter, insert and delete queries involving lists, sets and maps in Cassandra.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to teh Cassandra through Java services.
- Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
- Evaluated suitability of Hadoop and its ecosystem to teh above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from teh Big Data Hadoop initiative.
Environment: Map Reduce, HDFS, Hive, Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, J2EE, Eclipse, Cassandra.
Confidential
Java / J2EE Developer
Responsibilities:
- Developed teh application using Struts Framework dat leverages classical Model View Layer (MVC) Architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used
- Gatheird business requirements and wrote functional specifications and detailed design documents
- Extensively used Core Java, Servlets, JSP and XML
- Designed teh logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database
- Implemented Enterprise Logging service using JMS and apache CXF.
- Developed Unit Test Cases, and used JUNIT for unit testing of teh application
- Implemented Framework Component to consume ELS service.
- Implemented JMS producer and Consumer using Mule ESB.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations
- Designed Low Level design documents for ELS Service.
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Development carried out under Eclipse Integrated Development Environment (IDE).
- Used JBoss for deploying various components of application.
- Involved in Unit testing, Integration testing and User Acceptance testing.
- Utilizes Java and SQL day to day to debug and fix issues with client processes.
Environment: Java, spring core, JBoss, JUNIT, JMS, JDK, SVN, Maven, Servlets, JSP and XML
