Sr.hadoop Developer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- Over 8 + years of experience in analysis, design, development, implementation of web - based distributed applications
- 4+ years of experience in Hadoop, HDFS, Map Reduce, Sqoop, Pig, Hive and HBase, LINUX, UNIX .
- Experience of working experience in Big Data Hadoop technologies like Map Reduce, Hive, HBase, Pig, Spark, Kafka, Sqoop, Oozie, Zookeeper and HDFS Scala and Storm.
- Worked on a Hadoop Cluster with current size of 56 Nodes and 896 Terabytes capacity.
- Hands on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
- Having strong techno-functional skill to efficiently conceptualize business scenarios into Big Data Use Cases.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Capable of Designing and Architecting Hadoop Applications and recommending the right solutions and technologies for the application.
- Developed many MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
- Have re-engineered many Legacy Mainframe Applications into Hadoop using
- MapReduce API to reduce mainframe MIPS and Storage Cost .
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
- Good knowledge on Horton works Data platform 2.2
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, HBase, Sqoop, Oozie and Flume.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
- Experience in working with the NoSQL Mongo DB, Apache Cassandra.
- Experience in managing and reviewing Hadoop Log files.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Have good knowledge on installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster using Apache and Horton works.
- Good experience working with Distributions such as MAPR, Horton works and Cloudera.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Hands on experience in designing and coding web applications using Core Java and J2EE technologies.
- Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Experience in working with Oracle and DB2.
- Experience in Web Services using XML, HTML and SOAP.
- Excellent Java development skills using J2EE, J2SE, Servlets, JUnit, JSP, JDBC.
- Familiarity working with popular frameworks likes Struts, Hibernate, Spring, MVC and AJAX.
- Implemented Proofs of Concept on Hadoop stack and different big data analytic tools.
- Experience in migration from different databases (i.e. VSAM, DB2, PLSQL and MYSQL) to Hadoop.
- Experience in NoSQL databases like HBase, Cassandra, Redis and MongoDB.
- Experience in working with HP Profile & Project Management.
- Experience on working with Clear Case, SVN, GitHub and Perforce P4V.
- Committed to timely and quality work, Quick learner, able to adapt effortlessly to new technologies, ability to work within a team as well as cross-team
- Defined and Developed ETL process to automate the data conversions, catalog uploading, error handling and auditing using Talend.
- Proven competencies: problem solving and analytical skills, excellent presentation and documentation skills, application development, project management, leadership
- Highly motivated and a self-starter with effective communication and organizational skills, combined with attention to detail and business process improvements.
TECHNICAL SKILLS:
Technology: Hadoop Ecosystem/ J2SE/ J2EE/ Oracle.
Operating Systems: WindowsVista/XP/NT/2000Series,UNIX/LINUX (Ubuntu, CentOS, Redhat)/ AIX/ Solaris.
DBMS/Databases: DB2, My SQL, SQL, PL/SQL.
Programming Languages: C, C++, JSE, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, JQuery, Web services.
Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and HBase, Storm, Kafka, Spark, Scala.
Methodologies: Agile, WaterFall.
NOSQL Databases: Cassandra, MongoDB, HBase.
Version Control Tools: SVN, CVS, VSS, PVCS.
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Sr.Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Worked on analyzing Hadoop stack and different big data analytic tools including Kafka,Storm,Hive,Pig, HBase database and Sqoop.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Developed and implemented core API services using Scala and Spark.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Experienced in managing and reviewing Hadoop log files.
- Extracted files from Couch DB through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
- Involved in performance of troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Implemented business logic by writing Hive UDFs in Java.
- Got good experience with NoSQL database.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Experienced in managing and reviewing Hadoop log files.
- Extracted files from Couch DB through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Got good experience with NoSQL database.
- Wrote XML scripts to build OOZIE functionality.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various Proof of Concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Used Sqoop and mongoDump to move the data between MongoDB and HDFS.
- Work closely with the business and analytics team in gathering the system requirements.
- Data ingestion into HDFS from various Mainframe Db2 table using Sqoop.
- Create interface to convert mainframe data (EBCDIC) into ASCII.
- Developed workflows using custom MapReduce.
- Loading data into HBase tables using Java MapReduce.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
Environment: Java 6, Eclipse, Linux, Hadoop, HBase, Sqoop, Kafka, Pig, Hive, Cloudera Hadoop Distribution, HDFS, MapReduce, MongoDB, Shell scripting, LINUX, Flume.
Confidential, Frederick, MD
Sr.Hadoop Developer
Responsibilities:
- Responsible for Requirement gathering, preparation of design documents.
- Involved Low level design for MR, Hive, Shell scripts to process data.
- Worked on ETL scripts to pull the data from DB2/Oracle/MS-SQL Data Base into HDFS.
- Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Designed a data warehouse using Hive.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Extensively used Pig for data cleansing.
- Experience in importing the real time data to hadoop using Kafka and implemented the Oozie job.
- Created partitioned tables in Hive.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Experience in importing the real time data to hadoop using Kafka and implemented the Oozie job
- Developed and implemented core API services using Scala and Spark.
- Performed complex data transformations in Spark using Scala.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Supported Map Reduce Programs those are running on the cluster.
- Worked with business teams and created Hive queries for ad hoc access.
- Evaluated usage of Oozie for Workflow Orchestration.
- Mentored analyst and test team for writing Hive Queries.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Installing, Upgrading and Managing Hadoop Clusters
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
- Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Gained very good business knowledge on Cloud based environment.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Extensively worked on HIVE data stores for text, Avro and RC storage formats.
- Worked on populating analytical data stores for data science team.
- Created external tables on top of the flat file which are stored in HDFS using HIVE.
- Created tools using Java for performing balance tests.
- Worked with architects to build efficient OOZIE workflows with coordinators.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Integrated the hive warehouse with HBase.
- Written customized Hive UDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
Environment: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of Horton works, Oozie, Core Java, Pig, Sqoop, Shell scripting, Kafka, LINUX, HBase Oracle 11g/10g.
Confidential, Hillsboro, OR
Hadoop Developer
Responsibilities:
- Responsible for Requirement gathering, analyzing Data Sources like Omniture, Spotify etc, and preparation of design documents.
- Wrote MapReduce jobs using Java API.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Defining a logical architecture of the layers and components of Apache Spark solution. Selecting the right products to implement a big data solution.
- Redesigning existing Teradata application to Apache Spark. Involved in performance tuning of Spark applications.
- Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased
- Reliability and Ease of Scalability over traditional MSMQ.
- Converted all BTEQ scripts to Spark SQL’s. Migration of UC4 workflows to Oozie.
- Automation of Import and exports from Teradata to Hadoop. Created framework to run all Spark Jobs.
- Expertise into monitoring and administration of Spark applications. Involved in writing PySpark scripts.
- Automated Spark streaming process using Kafka. Cluster monitoring and user management. Log analysis for benchmarking technologies.
- Involved heavily in writing complex SQL queries based on the given requirements on Teradata platform.
- Extracted Data through different source systems like Oracle, MySQL and SQL Server Databases for Applications Development and report Maintenances.
- Worked on several BTEQ scripts to transform the data and load into the Teradata database.
- Performed Data analysis and prepared the physical database based on the requirements. Performed Data validations.
- Written several Teradata BTEQ scripts to implement the business logic.
- Involved in Creating the Unix Shell Scripts/Wrapper Scripts that are used to run the BTEQ and other Teradata jobs from the Control Panel tool.
- Deployed the reviewed scripts into Dev, SIT, UAT, PROD boxes using Tortoise Subversion.
- Involved in Tracking, managing, fixing bugs in SIT, UAT and PROD cycles using JIRA and HP Quality Center.
- Involved in active communication and interaction with offshore support team during the development, Testing and production implementation phases of the project.
- Worked on Tuning, and troubleshooting Teradata system at various levels. Performed unit testing, regression testing and Integration testing.
- Assisted the Testing team in developing SQL/PLSQL scripts for Automated Testing.
- Maintain System integrity of all sub-components related to Hadoop.
- Involved in orchestration of delta generation for time series data and Developed ETL from ParAcel database.
- Involved for Cassandra Database Schema design.
- Using BULK LOAD Utility data pushed to Cassandra databases.
Environment: Hadoop, Hive, Spark, Spark-SQL, Sqoop, Kafka, Teradata Viewpoint, Unix, UNIX Shell Scripting, TPT, Fast Export, FLoad, and BTEQ, GitHub, Framework, Pig, SQOOP, ORACLE, MySQL.
Confidential, Irving, TX
Java/Hadoop Developer
Responsibilities:
- Requirements gathering, preparation Low Level Design.
- Creating class diagrams, sequence diagrams, Data Model and Object Model using Rational Rose and MS-Visio.
- Responsible for the design and development of the application
- Used JSF Framework to develop the application. Used DAO and DTO Design patterns.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented Spring quartz Jobs for generating feed to the various downstream applications.
- Used Rational Rose to draw UML diagrams and to develop the Use cases, Domain model and Design Model
- Implemented the functionalities using Java, J2EE, JSP, and AJAX, Servlets.
- Dynamic chart generation using JFreeChart API in java
- Involved in development of system comprised Trading desks that were created internally (logical entity) to handle region specific customers and business entities namely Broker were created to provide an interface for the clients to place orders
- Developed java batch, for performance updates, implemented Multi Thread concepts.
- Involved Database programming in DB2.
- Created the Stored Procedures, functions and triggers using PL/SQL.
- Implemented struts MVC framework with tiles and validators.
- Application UI development using AJAX, HTML, JSP, XML and CSS.
- Implemented the functionalities using Java, J2EE, JSP, and AJAX, Servlets
- Developed automation mail notification system using Java Mail API in java FTP programming.
- Involved Database programming in oracle10g.
- Worked as a module/tech lead for various modules like GCSP, ORNIS of the application.
- Created the Stored Procedures, functions and triggers using PL/SQL.
- Responsible for developing design and development of the application
Environment: Java, J2EE, JSP, MVC, JNDI, WAS5.1, Eclipse, Ant, web services, SOAP, WSDL, UDDI, Java Script, MTG, AJAX, JDBC, WAS5.1, Eclipse, Oracle 10g, PL/SQL, HTML, DHTML,XML
Confidential
Software Engineer
Responsibilities:
- Conducted requirements gathering sessions with the business user to collect business requirements (BRDs), data requirement, and user interface requirements.
- Responsible for the initiation, planning, execution, control and completion of the project
- Worked alongside the Development team in solving critical issues during the development.
- Responsible for developing management reporting using Cognos reporting tool.
- Conducted User Interview and documented reconciliation work flows.
- Worked with Business and System Analyst to complete the development in time.
- Implemented the presentation layer with HTML, CSS and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented secured cookies using Servlets.
- Wrote complex SQL queries and stored procedures.
- Implemented Persistent layer using HibernateAPI.
- Implemented Transaction and session handling using Hibernate Utils.
- Implemented Search queries using Hibernate Criteria interface.
- Conducted detailed analysis of current processes and developed new process flow, data flow, and work flow models, Use Cases using Rational Rose & MS Visio
- Designed and developed JSP, Servlets.
- Wrote Build Script for compiling the application
- Developed stored procedures, triggers, and queries using PL/SQL.
- Deployed application in the WebSphere application server
- Maintained responsibility for database design, implementation and administration.
- Testing the functionality and behavioral aspect of the software.
- Responsible for customer interaction, analysis of the requirements and project scheduling.
- Responsible for designing the system based on UML concepts, which included data flow diagrams, class diagrams, sequence diagrams, state diagrams using Rational Rose Enterprise Edition.
Environment: UNIX, Windows, Core Java, SQL, JDBC, JavaScript, HTML, JSP, Servlets, Oracle, J2EE, JCL, DB2, CICS.